OpenAI Prompt Sanitization Tutorial

Scrub Your OpenAI Chatbot Prompts to Prevent Sensitive Data Disclosure

Generative AI systems like OpenAI's ChatGPT can inadvertently receive sensitive information from user inputs, posing significant privacy concerns (OWASP LLM06). Without content filtering, these AI platforms can process and retain confidential data such as health records, financial details, and personal identifying information.

Consider the following real-world scenarios:

  • Support Chatbots: You are using OpenAI to power a level-1 support chatbot to help users resolve issues. Users will likely overshare sensitive information like credit card and Social Security numbers. Without content filtering, this information would be transmitted to OpenAI and added to your support ticketing system.

  • Healthcare Apps: You are using OpenAI to moderate content sent by patients or doctors in your developing health app. These queries may contain sensitive protected health information (PHI), which could be unnecessarily transmitted to OpenAI.

Content filtering can remove sensitive data before it is processed by the AI system, ensuring that only the necessary information is used to generate content. This prevents sensitive data from spreading to AI systems.

Standard Pattern for Using OpenAI Model APIs

A typical pattern for leveraging GPT is as follows:

  1. Get an API key and set environment variables

  2. Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request

  3. Construct your prompt and decide which endpoint and model is most applicable.

  4. Send the request to OpenAI

Let's look at a simple example in Python. We’ll ask a GPT model for an auto-generated response we can send to a customer who is asking our customer support team about an issue with their payment method. Note how easy it is to send sensitive data, in this case, a credit card number, to ChatGPT. DON'T DO THIS :D

import os
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

user_input = "My credit card number is 4916-6734-7572-5015, and the card is getting declined."

# Send prompt to OpenAI model for AI-generated response
completion = client.chat.completions.create(model="gpt-4",
messages=[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": user_input}
],
max_tokens=1024)

print("\nHere's a generated response you can send the customer:\n", completion.choices[0].message.content)

This is a risky practice because now we are sending sensitive customer information to OpenAI. Next, let’s explore how we can prevent this while benefiting from ChatGPT.

Adding Content Filtering to the Pattern

Updating this pattern by using Nightfall is straightforward to check for sensitive findings and ensure sensitive data isn’t sent out. Here’s how:

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating a Nightfall API key here. In this example, we’ll use the Nightfall Python SDK.

Step 2: Configure Detection

Create a pre-configured detection rule in the Nightfall dashboard or an inline detection rule with the Nightfall API or SDK client.

Consider using Redaction

Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Send Redacted Prompt to OpenAI

Review the response to see if Nightfall has returned sensitive findings:

  • If there are sensitive findings:

    • You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.

    • Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.

  • If no sensitive findings or you chose to redact findings with a redaction config:

    • Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request.

    • Construct your outgoing prompt.

    • If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.

    • Use the OpenAI API or SDK client to send the prompt to the AI model.

Python Example

Let's take a look at what this would look like in a Python example using the OpenAI and Nightfall Python SDKs:

import os
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

nightfall = Nightfall() # By default Nightfall will read the NIGHTFALL_API_KEY environment variable

# The message you intend to send
user_input = "My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015."
payload = [user_input]

print("\nHere's the user's question before sanitization:\n", user_input)

# Define an inline detection rule that looks for Likely Credit Card Numbers and redacts them
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

# Send the message to Nightfall to scan it for sensitive data
# Nightfall returns the sensitive findings and a copy of your input payload with sensitive data redacted
findings, redacted_payload = nightfall.scan_text(
    payload,
    detection_rules=detection_rule
)

# If the message has sensitive data, use the redacted version, otherwise use the original message
if redacted_payload[0]:
    user_input_sanitized = redacted_payload[0]
else:
    user_input_sanitized = payload[0]

print("\nHere's the user's question after sanitization:\n", user_input_sanitized)

# Send prompt to OpenAI model for AI-generated response
completion = client.chat.completions.create(model="gpt-4",
messages=[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": user_input_sanitized}
],
max_tokens=1024)

print("\nHere's a generated response you can send the customer:\n", completion.choices[0].message.content)

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?

And the message we ultimately sent was redacted, and that’s what we sent to OpenAI:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?

OpenAI sends us the same response either way because it doesn’t need to receive sensitive data to generate a cogent response. This means we were able to leverage ChatGPT just as easily but we didn’t risk sending OpenAI any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.

Last updated