Nightfall Documentation
  • Data Detection and Response
  • Posture Management
  • Data Exfiltration Prevention
  • Data Encryption
  • Firewall for AI
  • Data Classification and Discovery
  • Welcome
  • Introduction to Firewall for AI
    • Overview
    • Quickstart
    • Use Cases
    • Authentication and Security
  • Key Concepts
    • Entities and Terms to Know
    • Setting Up Nightfall
      • Creating API Key
      • Creating Detectors
      • Creating Detection Rules
      • Creating Policies
    • Alerting
    • Scanning Text
    • Scanning Files
      • Supported File Types
      • File Scanning and Webhooks
      • Uploading and Scanning API Calls
      • Special File Types
      • Specialized File Detectors
      • Webhooks and Asynchronous Notifications
        • Accessing Your Webhook Signing Key
        • Creating a Webhook Server
    • Scanning Features
      • Using Pre-Configured Detection Rules
        • Scanning Images for patterns using Custom Regex Detectors
      • Creating an Inline Detection Rule
      • Using Exclusion Rules
      • Using Context Rules
      • Using Redaction
      • Using Policies to Send Alerts
      • Detecting Secrets
      • PHI Detection Rules
    • Detector Glossary
    • Test Datasets
    • Errors
    • Nightfall Playground
  • Nightfall APIs
    • DLP APIs - Firewall for AI Platform
      • Rate Limits for Firewall APIs
    • DLP APIs - Native SaaS Apps
      • Policy User Scope Update API
      • Rate Limits for Native SaaS app APIs
  • Exfiltration Prevention APIs
    • Default
    • Models
  • Posture Management APIs
    • Default
    • Models
  • Nightfall Software Development Kit (SDK)
    • Overview
    • Java SDK
    • Python SDK
    • Go SDK
    • Node.JS SDK
  • Language Specific Guides
    • Overview
    • Python
    • Ruby
    • Java
  • Tutorials
    • GenAI Protection
      • OpenAI Prompt Sanitization Tutorial
      • Anthropic Prompt Sanitization Tutorial
      • LangChain Prompt Sanitization Tutorial
    • SaaS Protection
      • HubSpot DLP Tutorial
      • Zendesk DLP Tutorial
    • Observability Protection
      • Datadog DLP Tutorial
      • New Relic DLP Tutorial
    • Datastore Protection
      • Airtable DLP Tutorial
      • Amazon Kinesis DLP Tutorial
      • Amazon RDS DLP Tutorial
      • Amazon RDS DLP Tutorial - Full Scan
      • Amazon S3 DLP Tutorial
      • Elasticsearch DLP Tutorial
      • Snowflake DLP Tutorial
  • Nightfall Use Cases
    • Overview
    • GenAI Content Filtering-How to prevent exposure of sensitive data
    • Redacting Sensitive Data in 4 Lines of Code
    • Detecting Sensitive Data in SMS Automations
    • Building Endpoint DLP to Detect PII on Your Machine in Real-Time
    • Deploy a File Scanner for Sensitive Data in 40 Lines of Code
    • Using Scan API (with Python)
  • FAQs
    • What Can I do with the Firewall for AI
    • How quickly can I get started with Firewall for AI?
    • What types of data can I scan with API?
    • What types of detectors are supported out of the box?
    • Can I customize or bring my own detectors?
    • What is the pricing model?
    • How do I know my data is secure?
    • How do I get in touch with you?
    • Can I test out the detection and my own detection rules before writing any code?
    • How does Nightfall support custom data types?
    • How does Nightfall's Firewall for AI differs from other solutions?
  • Nightfall Playground
  • Login to Nightfall
  • Contact Us
Powered by GitBook
On this page
  • Protecting Sensitive Information in AI Interactions: The Critical Role of Content Filtering
  • Steps to Identify and Sanitize ChatGPT Prompts
  • Safely Leveraging Generative AI

Was this helpful?

Export as PDF
  1. Tutorials
  2. GenAI Protection

OpenAI Prompt Sanitization Tutorial

PreviousGenAI ProtectionNextAnthropic Prompt Sanitization Tutorial

Last updated 8 months ago

Was this helpful?

Protecting Sensitive Information in AI Interactions: The Critical Role of Content Filtering

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

  • Personally Identifiable Information (PII)

  • Protected Health Information (PHI)

  • Financial details (e.g., credit card numbers, bank account information)

  • Intellectual property

Real-world scenarios highlight the urgency of this issue:

  1. Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.

  2. Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.

Steps to Identify and Sanitize ChatGPT Prompts

Let's look at a Python example using OpenAI and Nightfall's Python SDK. You can download this sample code .

import os
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Example user input with sensitive information
user_input = "My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015."
payload = [user_input]

# 1) Get the Nightfall API key
nightfall = Nightfall() # By default Nightfall will read the NIGHTFALL_API_KEY environment variable

print("\nHere's the user's question before sanitization:\n", user_input)

# 2) Configure Nightfall detection and redaction
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

# 3) Classify, Redact, Filter Your User Input
# Send the message to Nightfall to scan it for sensitive data
# Nightfall returns the sensitive findings and a copy of your input payload with sensitive data redacted
findings, redacted_payload = nightfall.scan_text(
    payload,
    detection_rules=detection_rule
)

# If the message has sensitive data, use the redacted version otherwise, use the original message
if redacted_payload[0]:
    user_input_sanitized = redacted_payload[0]
else:
    user_input_sanitized = payload[0]

print("\nHere's the user's question after sanitization:\n", user_input_sanitized)

# 4) Send prompt to OpenAI model for AI-generated response
completion = client.chat.completions.create(model="gpt-4",
messages=[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": user_input_sanitized}
],
max_tokens=1024)

print("\nHere's a generated response you can send the customer:\n", completion.choices[0].message.content)

Step 1: Setup Nightfall

Step 2: Configure Detection

Step 3: Classify, Redact, Filter Your User Input

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Step 4: Send Redacted Prompt to OpenAI

Review the response to see if Nightfall has returned sensitive findings:

  • If there are sensitive findings:

    • You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.

    • Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.

  • If no sensitive findings or you chose to redact findings with a redaction config:

    • Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request.

    • Construct your outgoing prompt.

    • If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.

    • Use the OpenAI API or SDK client to send the prompt to the AI model.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?

And the message we ultimately sent was redacted, and that’s what we sent to OpenAI:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?

OpenAI sends us the same response either way because it doesn’t need to receive sensitive data to generate a cogent response. This means we were able to leverage ChatGPT just as easily but we didn’t risk sending OpenAI any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.

Get an API key for Nightfall and set environment variables. Learn more about creating an API key .

Create an with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction .

OWASP LLM06
here
here
inline detection rule
here