Customized Policies Using Zero Shot Classification

Overview

We are excited to announce that Blueteam's latest release of dreamcatcher (version 0.2.1) includes a new "DLP (Beta)" policy type that leverages zero shot classification. This powerful feature enables administrators to create and enforce policies on arbitrary categories beyond the fixed categories typically detected by out-of-the-box solutions. Now, you can define and manage categories such as medication names, SKUs, and part numbers, even if the model has not encountered them during training.

Background

A significant barrier to the widespread adoption of Generative AI (GenAI) is the risk of misuse. We've identified two primary concerns:

  1. Leaking of protected data.
  2. Ad-hoc usage that may be illegal (e.g., making hiring decisions, generating fake news) or undesirable (e.g., discussing inappropriate topics).
Enforcing policies which prevent data leakage and inappropriate use cases helps mitigate risks from GenAI misuse.
Enforcing policies which prevent data leakage and inappropriate use cases helps mitigate risks from GenAI misuse.

To ensure responsible AI usage, organizations need a robust engine to enforce customized policies which protect data (Data Leak Protection, DLP) and govern acceptable use (Content Moderation).

Pain Point

Existing technologies, such as Presidio and Llamaguard, perform well for common categories like credit card numbers and SSNs (DLP), as well as explicit language or hate speech (Content Moderation). However, they often fall short in specific use cases, such as redacting drug names or moderating conversations to avoid discussing firearms. These out-of-the-box solutions may not have predefined categories for such specialized needs.

Supported entities detected by Microsoft's Presidio DLP as of July 2024.
Supported entities detected by Microsoft's Presidio DLP as of July 2024.

The above image shows entities supported by Microsoft's Presidio DLP as of July 2024. While common entities such as credit cards and email addresses can be used out of the box, its support is lacking for more specialized use cases involving additional data categories (e.g. medications, firearms). As a security or compliance practitioner, I need away to define and enforce policies on entities beyond the limited set that is available out of the box.

Solution

To address these challenges, we are introducing a new policy type tentatively named "DLP (Beta)" that utilizes zero shot classification. This feature allows for the classification of novel entity types that the model may not have seen during training, simplifying the user experience by enabling the specification of custom categories for detection and enforcement.

Users of Blueteam dreamcatcher can access zero-shot classification policies as "DLP (Beta)" when creating a new policy:

Zero shot classification policies are available under "DLP (Beta)" when creating a new policy
Zero shot classification policies are available under "DLP (Beta)" when creating a new policy

Real-world Application

Imagine Alice, who works for a healthcare organization, needs to anonymize medication names when summarizing patient notes using AI. Current solutions like Presidio do not offer a built-in category for medications. With the new "DLP (Beta)" policy, Alice can leverage zero shot classification to create a custom policy. She can specify "medications" as a category for detection and redaction.

Alice's policy which uses zero shot classification to detect and redact medication data elements
Alice's policy which uses zero shot classification to detect and redact medication data elements

She then uses the policy playground to sanity check her new policy works as expected:

Sanity checking that the policy correctly redacts medication names "ibuprofin" and "acetaminophin" (s.p.)
Sanity checking that the policy correctly redacts medication names "ibuprofin" and "acetaminophin" (s.p.)

With confidence that her policy works as expected on ad-hoc queries, she now sets the policy as active on her Blueteam dreamcatcher endpoint.
Consequently, when a user submits content containing medication names to Alice's endpoint:

An example message containing medication names
An example message containing medication names

Alice can rest assured, knowing that Blueteam's policy engine is redacting medication names as requested:

Backend logs proving that medication names were successfully redacted and no data leakage occurred
Backend logs proving that medication names were successfully redacted and no data leakage occurred

Here, "sudafed" was redacted to "<medications 0>" and "nyquil" to "<medications 1>".
No data leakage occurs and Alice is able to responsibly utilize GenAI for summarizing PHI
while remaining compliant.

Call to Action

Zero shot classification policies are available in the latest release (version 0.2.2) of dreamcatcher. If you need to define custom policies and find existing solutions inadequate, we invite you to try this new feature and share your feedback with us!

Stay tuned for more updates, and thank you for being part of the Blueteam community.