Anthropic Prompt Sanitization Tutorial
Scrub Your Claude Chatbot Prompts to Prevent Sensitive Data Disclosure (OWASP LLM06)
Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (OWASP LLM06). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:
Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property
Real-world scenarios highlight the urgency of this issue:
Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.
Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.
Standard Pattern for Using Anthropic Claude APIs
A typical pattern for leveraging Claude is as follows:
Get an API key and set environment variables
Initialize the Anthropic SDK client (e.g. Anthropic Python client), or use the API directly to construct a request
Construct your prompt and decide which endpoint and model is most applicable.
Send the request to Anthropic
Let's look at a simple example in Python. We’ll ask a Claude model for an auto-generated response we can send to a customer who is asking our customer support team about an issue with their payment method. Note how easy it is to send sensitive data, in this case, a credit card number, to Claude.
This is a risky practice because now we are sending sensitive customer information to Anthropic. Next, let’s explore how we can prevent this while still benefitting from Claude.
Adding Content Filtering to the Pattern
Updating this pattern by using Nightfall is straightforward to check for sensitive findings and ensure sensitive data isn’t sent out. Here’s how:
Step 1: Setup Nightfall
Get an API key for Nightfall and set environment variables. Learn more about creating a Nightfall API key here. In this example, we’ll use the Nightfall Python SDK.
Step 2: Configure Detection
Create a pre-configured detection rule in the Nightfall dashboard or an inline detection rule with the Nightfall API or SDK client.
Consider using Redaction
Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.
Step 3: Classify, Redact, Filter
Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.
For example, let’s say we send Nightfall the following:
We get back the following redacted text:
Send Redacted Prompt to Anthropic
Review the response to see if Nightfall has returned sensitive findings:
If there are sensitive findings:
You can specify a redaction config in your request so that sensitive findings are redacted automatically.
Without a redaction config, you can break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
Initialize the Anthropic SDK client (e.g., Anthropic Python client), or use the API directly to construct a request.
Construct your outgoing prompt.
If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
Use the Anthropic API or SDK client to send the prompt to the AI model.
Python Example
Let's look at a Python example using Anthropic Claude and Nightfall's Python SDK. You can download this sample code here.
Step 1: Setup Nightfall
Get an API key for Nightfall and set environment variables. Learn more about creating an API key here.
Step 2: Configure Detection
Create an inline detection rule with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.
If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.
Step 3: Classify, Redact, Filter Your User Input
Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.
For example, let’s say we send Nightfall the following:
We get back the following redacted text:
Step 4: Send Redacted Prompt to Anthropic
Review the response to see if Nightfall has returned sensitive findings:
If there are sensitive findings:
You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.
Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
Construct your outgoing prompt.
If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
Use the Claude API or SDK client to send the prompt to the AI model.
Safely Leveraging Generative AI
You'll see that the message we originally intended to send had sensitive data:
And the message we ultimately sent was redacted, and that’s what we sent to Anthropic:
Anthropic sends us the same response either way because it doesn’t need to receive sensitive data to generate a cogent response. This means we were able to leverage Claude just as easily but we didn’t risk sending Anthropic any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.
Last updated