GenAI Content Filtering-How to prevent exposure of sensitive data
LangChain/OpenAI Tutorial: Integrating Nightfall for Secure Prompt Sanitization
LLMs like ChatGPT and Claude can inadvertently receive sensitive information from user inputs, posing significant privacy concerns (OWASP LLM06). Without content filtering, these AI platforms can process and retain confidential data such as health records, financial details, and personal identifying information.
Consider the following real-world scenarios:
Support Chatbots: You use LangChain/Claude to power a level-1 support chatbot to help users resolve issues. Users will likely overshare sensitive information like credit card and Social Security numbers. Without content filtering, this information would be transmitted to Anthropic and added to your support ticketing system.
Healthcare Apps: You are using LangChain/Claude to moderate content sent by patients or doctors in your developing health app. These queries may contain sensitive protected health information (PHI), which could be unnecessarily transmitted to Anthropic.
Implementing robust content filtering mechanisms is crucial to protect sensitive data and comply with data protection regulations. In this guide, we will explore how to sanitize prompts using Nightfall before sending them to Claude.
LangChain/OpenAI Example
If you're not using LangChain, check our OpenAI and Claude tutorials.
Let's take a look at what this would look like in a Python example using the LangChain, Anthropic, and Nightfall Python SDKs:
Implementing Nightfall Sanitization as a LangChain Component
to integrate content filtering into our LangChain pipeline seamlessly. We'll create a custom LangChain component for Nightfall sanitization. This allows us to incorporate content filtering into your LangChain pipeline seamlessly.
import osfrom dotenv import load_dotenvfrom langchain.llms import Anthropicfrom langchain.prompts import PromptTemplatefrom langchain.chains import LLMChainfrom langchain.chains.base import Chainfrom nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfallfrom typing import Dict, List# Load environment variablesload_dotenv()# Initialize Nightfall clientnightfall =Nightfall()# Define Nightfall detection ruledetection_rule = [DetectionRule( [Detector( min_confidence=Confidence.VERY_LIKELY, nightfall_detector="CREDIT_CARD_NUMBER", display_name="Credit Card Number", redaction_config=RedactionConfig( remove_finding=False, mask_config=MaskConfig( masking_char="X", num_chars_to_leave_unmasked=4, mask_right_to_left=True, chars_to_ignore=["-"]) ) )])]classNightfallSanitizationChain(Chain): input_key:str="input" output_key:str="sanitized_input"@propertydefinput_keys(self) -> List[str]:return [self.input_key]@propertydefoutput_keys(self) -> List[str]:return [self.output_key]def_call(self,inputs: Dict[str,str]) -> Dict[str,str]: text = inputs[self.input_key] payload = [text]try: findings, redacted_payload = nightfall.scan_text( payload, detection_rules=[detection_rule] ) sanitized_text = redacted_payload[0]if redacted_payload[0]else textexceptExceptionas e:print(f"Error in sanitizing input: {e}") sanitized_text = textreturn{self.output_key: sanitized_text}# Initialize the Anthropic LLMllm =Anthropic(model="claude-v1")# Create a prompt templatetemplate ="The customer said: '{customer_input}' How should I respond to the customer?"prompt =PromptTemplate(template=template, input_variables=["customer_input"])# Create chainssanitization_chain =NightfallSanitizationChain()response_chain =LLMChain(llm=llm, prompt=prompt)# Combine chainsfrom langchain.chains import SimpleSequentialChainfull_chain =SimpleSequentialChain( chains=[sanitization_chain, response_chain], verbose=True)# Use the combined chaincustomer_input ="My credit card number is 4916-6734-7572-5015, and the card is getting declined."response = full_chain.run(customer_input)print("\nFinal Response:", response)
Explanation
We start by importing necessary modules and loading environment variables.
We initialize the Nightfall client and define detection rules for credit card numbers.
The NightfallSanitizationChain class is a custom LangChain component that handles content sanitization using Nightfall.
We set up the Anthropic LLM and create a prompt template for customer service responses.
We create separate chains for sanitization and response generation, then combine them using SimpleSequentialChain.
The process_customer_input function provides an easy-to-use interface for our chain.
Error Handling and Logging
In a production environment, you might want to add more robust error handling and logging. For example:
import logginglogging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)defsanitize_input(text): payload = [text]try: findings, redacted_payload = nightfall.scan_text( payload, detection_rules=[detection_rule] )if findings: logger.info(f"Sensitive information detected and redacted")return redacted_payload[0]if redacted_payload[0]else textexceptExceptionas e: logger.error(f"Error in sanitizing input: {e}")# Depending on your use case, you might want to return the original text or an error messagereturn text
Usage
To use this script, you can either run it directly or import the process_customer_input function in another script.
Running the Script Directly
Simply run the script:
pythonsecure_langchain.py
This will process the example customer input and print the sanitized input and final response.
Using in Another Script
You can import the process_customer_input function in another script:
from secure_langchain import process_customer_inputcustomer_input ="My credit card 4111-1111-1111-1111 isn't working. Contact me at alice@example.com."response =process_customer_input(customer_input)print(response)
Expected Output
What does success look like?
If the example runs properly, you should expect to see an output demonstrating the sanitization process and the final response from Claude. Here's what the output might look like:
> Entering new SimpleSequentialChain chain...
> Finished chain.
Sanitized input: The customer said: 'My credit card number is XXXX-XXXX-XXXX-411, and the card is getting declined.' How should I respond to the customer?
Final Response: I understand you're having trouble with your credit card being declined. I apologize for the inconvenience. To assist you better, I'll need some additional information...