1 of 4

GenAI Protection

This section consists of various documents that assist you in scanning various popular SaaS GenAI services and frameworks using Nightfall APIs.

OpenAI Prompt Sanitization Tutorial
Anthropic Prompt Sanitization Tutorial
LangChain Prompt Sanitization Tutorial

OpenAI Prompt Sanitization Tutorial

Protecting Sensitive Information in AI Interactions: The Critical Role of Content Filtering

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property

Real-world scenarios highlight the urgency of this issue:

Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.

Steps to Identify and Sanitize ChatGPT Prompts

Let's look at a Python example using OpenAI and Nightfall's Python SDK. You can download this sample code .

Step 1: Setup Nightfall

Step 2: Configure Detection

Step 3: Classify, Redact, Filter Your User Input

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

We get back the following redacted text:

Step 4: Send Redacted Prompt to OpenAI

Review the response to see if Nightfall has returned sensitive findings:

If there are sensitive findings:
- You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.
- Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
- Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request.
- Construct your outgoing prompt.
- If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
- Use the OpenAI API or SDK client to send the prompt to the AI model.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

And the message we ultimately sent was redacted, and that’s what we sent to OpenAI:

OpenAI sends us the same response either way because it doesn’t need to receive sensitive data to generate a cogent response. This means we were able to leverage ChatGPT just as easily but we didn’t risk sending OpenAI any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.

Anthropic Prompt Sanitization Tutorial

Scrub Your Claude Chatbot Prompts to Prevent Sensitive Data Disclosure (OWASP LLM06)

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (OWASP LLM06). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property

Real-world scenarios highlight the urgency of this issue:

Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Standard Pattern for Using Anthropic Claude APIs

A typical pattern for leveraging Claude is as follows:

Get an API key and set environment variables
Initialize the Anthropic SDK client (e.g. Anthropic Python client), or use the API directly to construct a request
Construct your prompt and decide which endpoint and model is most applicable.
Send the request to Anthropic

Let's look at a simple example in Python. We’ll ask a Claude model for an auto-generated response we can send to a customer who is asking our customer support team about an issue with their payment method. Note how easy it is to send sensitive data, in this case, a credit card number, to Claude.

import os
from anthropic import Anthropic

# Initialize the Anthropic client with your API key
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

# The user input you intend to send. Notice the credit card number in the message. Don't do this!!
user_input = "My credit card number is 4916-6734-7572-5015, and the card is getting declined."
  
# Define your prompt, ensuring it starts with "\n\nHuman:" and ending with "\n\nAssistant:"
prompt = "\nYou are a level 1 support bot. Your role is to assist users with common issues and provide helpful information. \n\nHuman: " + user_input + "\n\nAssistant:"

response = client.completions.create(
    model="claude-2.1",
    prompt=prompt,
    max_tokens_to_sample=1024,
    temperature=0.7,
    top_p=1.0
)

print("\nHere's a generated response you can send the customer:\n", response.completion)

This is a risky practice because now we are sending sensitive customer information to Anthropic. Next, let’s explore how we can prevent this while still benefitting from Claude.

Adding Content Filtering to the Pattern

Updating this pattern by using Nightfall is straightforward to check for sensitive findings and ensure sensitive data isn’t sent out. Here’s how:

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating a Nightfall API key here. In this example, we’ll use the Nightfall Python SDK.

Step 2: Configure Detection

Create a pre-configured detection rule in the Nightfall dashboard or an inline detection rule with the Nightfall API or SDK client.

Consider using Redaction

Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Send Redacted Prompt to Anthropic

Review the response to see if Nightfall has returned sensitive findings:

If there are sensitive findings:
- You can specify a redaction config in your request so that sensitive findings are redacted automatically.
- Without a redaction config, you can break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
- Initialize the Anthropic SDK client (e.g., Anthropic Python client), or use the API directly to construct a request.
- Construct your outgoing prompt.
- If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
- Use the Anthropic API or SDK client to send the prompt to the AI model.

Python Example

Let's look at a Python example using Anthropic Claude and Nightfall's Python SDK. You can download this sample code here.

import os
from dotenv import load_dotenv
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from anthropic import Anthropic

# Load environment variables
load_dotenv()

# Initialize clients
try:
    # By default Nightfall will read the NIGHTFALL_API_KEY environment variable
    nightfall = Nightfall()  

    # Initialize the Anthropic client with your API key
    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

except Exception as e:
    print(f"Error initializing clients: {e}")
    exit(1)

# Example user input with sensitive information
user_input = "The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?"
payload = [user_input]

print("\nHere's the user's question before sanitization:\n", user_input)

# 2) Configure Nightfall detection and redaction
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

try:
    # 3) Classify, Redact, Filter Your User Input
    findings, redacted_payload = nightfall.scan_text(
        payload,
        detection_rules=detection_rule
    )

    # If the message has sensitive data, use the redacted version, otherwise use the original message
    user_input_sanitized = redacted_payload[0] if redacted_payload[0] else payload[0]

    print("\nHere's the user's question after sanitization:\n", user_input_sanitized)

    # Define your prompt, ensuring it starts with "\n\nHuman:" and ending with "\n\nAssistant:"
    prompt = "\nYou are a level 1 support bot. Your role is to assist users with common issues and provide helpful information. \n\nHuman: " + user_input_sanitized + "\n\nAssistant:"

    # 4) Send prompt to Anthropic model for AI-generated response
    response = client.completions.create(
        model="claude-2.1",
        prompt=prompt,
        max_tokens_to_sample=1024,
        temperature=0.7,
        top_p=1.0
    )

    print("\nHere's a generated response you can send the customer:\n", response.completion)

except Exception as e:
    print(f"An error occurred: {e}")

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating an API key here.

Step 2: Configure Detection

Create an inline detection rule with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter Your User Input

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Step 4: Send Redacted Prompt to Anthropic

Review the response to see if Nightfall has returned sensitive findings:

If there are sensitive findings:
- You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.
- Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
- Construct your outgoing prompt.
- If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
- Use the Claude API or SDK client to send the prompt to the AI model.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

The customer said: '4916-6734-7572-5015 is my credit card number and the card is getting declined.' How should I respond to the customer?

And the message we ultimately sent was redacted, and that’s what we sent to Anthropic:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Anthropic sends us the same response either way because it doesn’t need to receive sensitive data to generate a cogent response. This means we were able to leverage Claude just as easily but we didn’t risk sending Anthropic any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.

LangChain Prompt Sanitization Tutorial

LangChain Tutorial: Integrating Nightfall for Secure Prompt Sanitization

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (OWASP LLM06). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property

Real-world scenarios highlight the urgency of this issue:

Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Python Example

Let's examine this in a Python example using the LangChain, Anthropic, and Nightfall Python SDKs. You can download this sample code here.


import os
from dotenv import load_dotenv
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from typing import Dict, List
from langchain.chains.base import Chain
from langchain.schema.language_model import BaseLanguageModel
from langchain.schema.prompt_template import BasePromptTemplate
from langchain.prompts import PromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain.schema.runnable import RunnableSequence, RunnablePassthrough
from pydantic import Field

# Load environment variables
load_dotenv()

# 1) Setup Nightfall
# By default Nightfall will read the NIGHTFALL_API_KEY environment variable
nightfall = Nightfall()

# 2) Define a Nightfall detection rule
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

# 3) Classify, Redact, Filter Your User Input

# Setup Nightfall Chain element
class NightfallSanitizationChain(Chain):
    input_key: str = "input"
    output_key: str = "sanitized_input"

    @property
    def input_keys(self) -> List[str]:
        return [self.input_key]

    @property
    def output_keys(self) -> List[str]:
        return [self.output_key]

    def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
        text = inputs[self.input_key]
        payload = [text]
        try:
            findings, redacted_payload = nightfall.scan_text(
                payload,
                detection_rules=detection_rule
            )
            sanitized_text = redacted_payload[0] if redacted_payload[0] else text
            print(f"\nsanitized input:\n {sanitized_text}")
        except Exception as e:
            print(f"Error in sanitizing input: {e}")
            sanitized_text = text
        return {self.output_key: sanitized_text}

# Initialize the Anthropic LLM
llm = ChatAnthropic(model="claude-2.1")

# Create a prompt template
template = "The customer said: '{customer_input}' How should I respond to the customer?"
prompt = PromptTemplate(template=template, input_variables=["customer_input"])

# Create the sanitization chain
sanitization_chain = NightfallSanitizationChain()

# Create the full chain using RunnableSequence
full_chain = (
    RunnablePassthrough() |
    sanitization_chain |
    (lambda x: {"customer_input": x["sanitized_input"]}) |
    prompt |
    llm
)

# Use the combined chain
customer_input = "My credit card number is 4916-6734-7572-5015, and the card is getting declined."
print(f"\ncustomer input:\n {customer_input}")
try:
    response = full_chain.invoke({"input": customer_input})
    print("\model reponse:\n", response.content)
except Exception as e:
    print("An error occurred:", e)

Step 1: Setup Nightfall

If you don't yet have a Nightfall account, sign up here.
Create a Nightfall key. Here are the instructions.

Install the necessary packages using the command line:

pip install langchain anthropic nightfall python-dotenv

Set up environment variables. Create a .env file in your project directory:

ANTHROPIC_API_KEY=your_anthropic_api_key
NIGHTFALL_API_KEY=your_nightfall_api_key

Step 2: Configure Detection

Create an inline detection rule with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter Your User Input

to integrate content filtering into our LangChain pipeline seamlessly. We'll create a custom LangChain component for Nightfall sanitization. This allows us to seamlessly integrate content filtering into our LangChain pipeline.

Explanation

We start by importing necessary modules and loading environment variables.
We initialize the Nightfall client and define detection rules for credit card numbers.
The NightfallSanitizationChain class is a custom LangChain component that handles content sanitization using Nightfall.
We set up the Anthropic LLM and create a prompt template for customer service responses.
We create separate chains for sanitization and response generation, then combine them using SimpleSequentialChain.
The process_customer_input function provides an easy-to-use interface for our chain.

Error Handling and Logging

In a production environment, you might want to add more robust error handling and logging. For example:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def sanitize_input(text):
    payload = [text]
    try:
        findings, redacted_payload = nightfall.scan_text(
            payload,
            detection_rules=[detection_rule]
        )
        if findings:
            logger.info(f"Sensitive information detected and redacted")
        return redacted_payload[0] if redacted_payload[0] else text
    except Exception as e:
        logger.error(f"Error in sanitizing input: {e}")
        # Depending on your use case, you might want to return the original text or an error message
        return text

Usage

To use this script, you can either run it directly or import the process_customer_input function in another script.

Running the Script Directly

Simply run the script:

python secure_langchain.py

This will process the example customer input and print the sanitized input and final response.

Using in Another Script

You can import the process_customer_input function in another script:

from secure_langchain import process_customer_input

customer_input = "My credit card 4916-6734-7572-5015 isn't working. Contact me at alice@example.com."
response = process_customer_input(customer_input)
print(response)

Expected Output

What does success look like?

If the example runs properly, you should expect to see an output demonstrating the sanitization process and the final response from Claude. Here's what the output might look like:

> Entering new SimpleSequentialChain chain...

> Finished chain.

Sanitized input: The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015, and the card is getting declined.' How should I respond to the customer?

Final Response: I understand you're having trouble with your credit card (XXXX-XXXX-XXXX-5015) being declined. I apologize for the inconvenience. To assist you better, I'll need some additional information...

Anthropic Prompt Sanitization Tutorial

Scrub Your Claude Chatbot Prompts to Prevent Sensitive Data Disclosure (OWASP LLM06)

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (OWASP LLM06). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property

Real-world scenarios highlight the urgency of this issue:

Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Standard Pattern for Using Anthropic Claude APIs

A typical pattern for leveraging Claude is as follows:

Get an API key and set environment variables
Initialize the Anthropic SDK client (e.g. Anthropic Python client), or use the API directly to construct a request
Construct your prompt and decide which endpoint and model is most applicable.
Send the request to Anthropic

import os
from anthropic import Anthropic

# Initialize the Anthropic client with your API key
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

# The user input you intend to send. Notice the credit card number in the message. Don't do this!!
user_input = "My credit card number is 4916-6734-7572-5015, and the card is getting declined."
  
# Define your prompt, ensuring it starts with "\n\nHuman:" and ending with "\n\nAssistant:"
prompt = "\nYou are a level 1 support bot. Your role is to assist users with common issues and provide helpful information. \n\nHuman: " + user_input + "\n\nAssistant:"

response = client.completions.create(
    model="claude-2.1",
    prompt=prompt,
    max_tokens_to_sample=1024,
    temperature=0.7,
    top_p=1.0
)

print("\nHere's a generated response you can send the customer:\n", response.completion)

This is a risky practice because now we are sending sensitive customer information to Anthropic. Next, let’s explore how we can prevent this while still benefitting from Claude.

Adding Content Filtering to the Pattern

Updating this pattern by using Nightfall is straightforward to check for sensitive findings and ensure sensitive data isn’t sent out. Here’s how:

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating a Nightfall API key here. In this example, we’ll use the Nightfall Python SDK.

Step 2: Configure Detection

Create a pre-configured detection rule in the Nightfall dashboard or an inline detection rule with the Nightfall API or SDK client.

Consider using Redaction

Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Send Redacted Prompt to Anthropic

Review the response to see if Nightfall has returned sensitive findings:

If there are sensitive findings:
- You can specify a redaction config in your request so that sensitive findings are redacted automatically.
- Without a redaction config, you can break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
- Initialize the Anthropic SDK client (e.g., Anthropic Python client), or use the API directly to construct a request.
- Construct your outgoing prompt.
- If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
- Use the Anthropic API or SDK client to send the prompt to the AI model.

Python Example

Let's look at a Python example using Anthropic Claude and Nightfall's Python SDK. You can download this sample code here.

import os
from dotenv import load_dotenv
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from anthropic import Anthropic

# Load environment variables
load_dotenv()

# Initialize clients
try:
    # By default Nightfall will read the NIGHTFALL_API_KEY environment variable
    nightfall = Nightfall()  

    # Initialize the Anthropic client with your API key
    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

except Exception as e:
    print(f"Error initializing clients: {e}")
    exit(1)

# Example user input with sensitive information
user_input = "The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?"
payload = [user_input]

print("\nHere's the user's question before sanitization:\n", user_input)

# 2) Configure Nightfall detection and redaction
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

try:
    # 3) Classify, Redact, Filter Your User Input
    findings, redacted_payload = nightfall.scan_text(
        payload,
        detection_rules=detection_rule
    )

    # If the message has sensitive data, use the redacted version, otherwise use the original message
    user_input_sanitized = redacted_payload[0] if redacted_payload[0] else payload[0]

    print("\nHere's the user's question after sanitization:\n", user_input_sanitized)

    # Define your prompt, ensuring it starts with "\n\nHuman:" and ending with "\n\nAssistant:"
    prompt = "\nYou are a level 1 support bot. Your role is to assist users with common issues and provide helpful information. \n\nHuman: " + user_input_sanitized + "\n\nAssistant:"

    # 4) Send prompt to Anthropic model for AI-generated response
    response = client.completions.create(
        model="claude-2.1",
        prompt=prompt,
        max_tokens_to_sample=1024,
        temperature=0.7,
        top_p=1.0
    )

    print("\nHere's a generated response you can send the customer:\n", response.completion)

except Exception as e:
    print(f"An error occurred: {e}")

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating an API key here.

Step 2: Configure Detection

Create an inline detection rule with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter Your User Input

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Step 4: Send Redacted Prompt to Anthropic

Review the response to see if Nightfall has returned sensitive findings:

If there are sensitive findings:
- You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.
- Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
- Construct your outgoing prompt.
- If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
- Use the Claude API or SDK client to send the prompt to the AI model.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

The customer said: '4916-6734-7572-5015 is my credit card number and the card is getting declined.' How should I respond to the customer?

And the message we ultimately sent was redacted, and that’s what we sent to Anthropic:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

LangChain Prompt Sanitization Tutorial

LangChain Tutorial: Integrating Nightfall for Secure Prompt Sanitization

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (OWASP LLM06). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property

Real-world scenarios highlight the urgency of this issue:

Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Python Example

Let's examine this in a Python example using the LangChain, Anthropic, and Nightfall Python SDKs. You can download this sample code here.


import os
from dotenv import load_dotenv
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from typing import Dict, List
from langchain.chains.base import Chain
from langchain.schema.language_model import BaseLanguageModel
from langchain.schema.prompt_template import BasePromptTemplate
from langchain.prompts import PromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain.schema.runnable import RunnableSequence, RunnablePassthrough
from pydantic import Field

# Load environment variables
load_dotenv()

# 1) Setup Nightfall
# By default Nightfall will read the NIGHTFALL_API_KEY environment variable
nightfall = Nightfall()

# 2) Define a Nightfall detection rule
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

# 3) Classify, Redact, Filter Your User Input

# Setup Nightfall Chain element
class NightfallSanitizationChain(Chain):
    input_key: str = "input"
    output_key: str = "sanitized_input"

    @property
    def input_keys(self) -> List[str]:
        return [self.input_key]

    @property
    def output_keys(self) -> List[str]:
        return [self.output_key]

    def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
        text = inputs[self.input_key]
        payload = [text]
        try:
            findings, redacted_payload = nightfall.scan_text(
                payload,
                detection_rules=detection_rule
            )
            sanitized_text = redacted_payload[0] if redacted_payload[0] else text
            print(f"\nsanitized input:\n {sanitized_text}")
        except Exception as e:
            print(f"Error in sanitizing input: {e}")
            sanitized_text = text
        return {self.output_key: sanitized_text}

# Initialize the Anthropic LLM
llm = ChatAnthropic(model="claude-2.1")

# Create a prompt template
template = "The customer said: '{customer_input}' How should I respond to the customer?"
prompt = PromptTemplate(template=template, input_variables=["customer_input"])

# Create the sanitization chain
sanitization_chain = NightfallSanitizationChain()

# Create the full chain using RunnableSequence
full_chain = (
    RunnablePassthrough() |
    sanitization_chain |
    (lambda x: {"customer_input": x["sanitized_input"]}) |
    prompt |
    llm
)

# Use the combined chain
customer_input = "My credit card number is 4916-6734-7572-5015, and the card is getting declined."
print(f"\ncustomer input:\n {customer_input}")
try:
    response = full_chain.invoke({"input": customer_input})
    print("\model reponse:\n", response.content)
except Exception as e:
    print("An error occurred:", e)

Step 1: Setup Nightfall

If you don't yet have a Nightfall account, sign up here.
Create a Nightfall key. Here are the instructions.

Install the necessary packages using the command line:

pip install langchain anthropic nightfall python-dotenv

Set up environment variables. Create a .env file in your project directory:

ANTHROPIC_API_KEY=your_anthropic_api_key
NIGHTFALL_API_KEY=your_nightfall_api_key

Step 2: Configure Detection

Create an inline detection rule with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter Your User Input

Explanation

We start by importing necessary modules and loading environment variables.
We initialize the Nightfall client and define detection rules for credit card numbers.
The NightfallSanitizationChain class is a custom LangChain component that handles content sanitization using Nightfall.
We set up the Anthropic LLM and create a prompt template for customer service responses.
We create separate chains for sanitization and response generation, then combine them using SimpleSequentialChain.
The process_customer_input function provides an easy-to-use interface for our chain.

Error Handling and Logging

In a production environment, you might want to add more robust error handling and logging. For example:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def sanitize_input(text):
    payload = [text]
    try:
        findings, redacted_payload = nightfall.scan_text(
            payload,
            detection_rules=[detection_rule]
        )
        if findings:
            logger.info(f"Sensitive information detected and redacted")
        return redacted_payload[0] if redacted_payload[0] else text
    except Exception as e:
        logger.error(f"Error in sanitizing input: {e}")
        # Depending on your use case, you might want to return the original text or an error message
        return text

Usage

To use this script, you can either run it directly or import the process_customer_input function in another script.

Running the Script Directly

Simply run the script:

python secure_langchain.py

This will process the example customer input and print the sanitized input and final response.

Using in Another Script

You can import the process_customer_input function in another script:

from secure_langchain import process_customer_input

customer_input = "My credit card 4916-6734-7572-5015 isn't working. Contact me at alice@example.com."
response = process_customer_input(customer_input)
print(response)

Expected Output

What does success look like?

If the example runs properly, you should expect to see an output demonstrating the sanitization process and the final response from Claude. Here's what the output might look like:

> Entering new SimpleSequentialChain chain...

> Finished chain.

Sanitized input: The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015, and the card is getting declined.' How should I respond to the customer?

Final Response: I understand you're having trouble with your credit card (XXXX-XXXX-XXXX-5015) being declined. I apologize for the inconvenience. To assist you better, I'll need some additional information...

OpenAI Prompt Sanitization Tutorial

Protecting Sensitive Information in AI Interactions: The Critical Role of Content Filtering

Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property

Real-world scenarios highlight the urgency of this issue:

Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Steps to Identify and Sanitize ChatGPT Prompts

Let's look at a Python example using OpenAI and Nightfall's Python SDK. You can download this sample code .

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating an API key .

Step 2: Configure Detection

Create an with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction .

Step 3: Classify, Redact, Filter Your User Input

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Step 4: Send Redacted Prompt to OpenAI

Review the response to see if Nightfall has returned sensitive findings:

If there are sensitive findings:
- You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.
- Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
- Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request.
- Construct your outgoing prompt.
- If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
- Use the OpenAI API or SDK client to send the prompt to the AI model.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?

And the message we ultimately sent was redacted, and that’s what we sent to OpenAI:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?