Nightfall Documentation
  • Data Detection and Response
  • Posture Management
  • Data Exfiltration Prevention
  • Data Encryption
  • Firewall for AI
  • Data Classification and Discovery
  • Welcome
  • Introduction to Firewall for AI
    • Overview
    • Quickstart
    • Use Cases
    • Authentication and Security
  • Key Concepts
    • Entities and Terms to Know
    • Setting Up Nightfall
      • Creating API Key
      • Creating Detectors
      • Creating Detection Rules
      • Creating Policies
    • Alerting
    • Scanning Text
    • Scanning Files
      • Supported File Types
      • File Scanning and Webhooks
      • Uploading and Scanning API Calls
      • Special File Types
      • Specialized File Detectors
      • Webhooks and Asynchronous Notifications
        • Accessing Your Webhook Signing Key
        • Creating a Webhook Server
    • Scanning Features
      • Using Pre-Configured Detection Rules
        • Scanning Images for patterns using Custom Regex Detectors
      • Creating an Inline Detection Rule
      • Using Exclusion Rules
      • Using Context Rules
      • Using Redaction
      • Using Policies to Send Alerts
      • Detecting Secrets
      • PHI Detection Rules
    • Detector Glossary
    • Test Datasets
    • Errors
    • Nightfall Playground
  • Nightfall APIs
    • DLP APIs - Firewall for AI Platform
      • Rate Limits for Firewall APIs
    • DLP APIs - Native SaaS Apps
      • Policy User Scope Update API
      • Rate Limits for Native SaaS app APIs
  • Exfiltration Prevention APIs
    • Default
    • Models
  • Posture Management APIs
    • Default
    • Models
  • Nightfall Software Development Kit (SDK)
    • Overview
    • Java SDK
    • Python SDK
    • Go SDK
    • Node.JS SDK
  • Language Specific Guides
    • Overview
    • Python
    • Ruby
    • Java
  • Tutorials
    • GenAI Protection
      • OpenAI Prompt Sanitization Tutorial
      • Anthropic Prompt Sanitization Tutorial
      • LangChain Prompt Sanitization Tutorial
    • SaaS Protection
      • HubSpot DLP Tutorial
      • Zendesk DLP Tutorial
    • Observability Protection
      • Datadog DLP Tutorial
      • New Relic DLP Tutorial
    • Datastore Protection
      • Airtable DLP Tutorial
      • Amazon Kinesis DLP Tutorial
      • Amazon RDS DLP Tutorial
      • Amazon RDS DLP Tutorial - Full Scan
      • Amazon S3 DLP Tutorial
      • Elasticsearch DLP Tutorial
      • Snowflake DLP Tutorial
  • Nightfall Use Cases
    • Overview
    • GenAI Content Filtering-How to prevent exposure of sensitive data
    • Redacting Sensitive Data in 4 Lines of Code
    • Detecting Sensitive Data in SMS Automations
    • Building Endpoint DLP to Detect PII on Your Machine in Real-Time
    • Deploy a File Scanner for Sensitive Data in 40 Lines of Code
    • Using Scan API (with Python)
  • FAQs
    • What Can I do with the Firewall for AI
    • How quickly can I get started with Firewall for AI?
    • What types of data can I scan with API?
    • What types of detectors are supported out of the box?
    • Can I customize or bring my own detectors?
    • What is the pricing model?
    • How do I know my data is secure?
    • How do I get in touch with you?
    • Can I test out the detection and my own detection rules before writing any code?
    • How does Nightfall support custom data types?
    • How does Nightfall's Firewall for AI differs from other solutions?
  • Nightfall Playground
  • Login to Nightfall
  • Contact Us
Powered by GitBook
On this page
  • Masking
  • Examples
  • When to use Masking?
  • Substitution
  • Examples
  • When to use Substitution
  • Encryption
  • Example
  • When to use Encryption

Was this helpful?

Export as PDF
  1. Nightfall Use Cases

Redacting Sensitive Data in 4 Lines of Code

PreviousGenAI Content Filtering-How to prevent exposure of sensitive dataNextDetecting Sensitive Data in SMS Automations

Last updated 10 months ago

Was this helpful?

In this tutorial, we'll demonstrate how easy it is to redact sensitive data and give you a more in-depth look at various redaction techniques, how Nightfall works, and touch upon use cases for redaction techniques.

Before we get started, let's set our Nightfall API key as an environment variable and install our dependencies for our code samples in Python. If you don't have a Nightfall API key, generate one on your Nightfall . If you don't have a Nightfall account, for a free Nightfall Developer Platform account.

Masking

Mask sensitive data with a configurable character, allow leaving some characters unmasked, and allow ignoring certain characters.

Examples

Cases
Additional Config
Before
After

Default

None

my ssn is 518-45-7708

my ssn is ***********

Mask with custom character

masking_char="X"

my ssn is 518-45-7708

my ssn is XXXXXXXXXXX

Leave first four characters unmasked

num_chars_to_leave_unmasked=4

my ssn is 518-45-7708

my ssn is 518-*******

Leave last four characters unmasked

mask_right_to_left=True

my ssn is 518-45-7708

my ssn is *******7708

Don't mask - characters

chars_to_ignore=["-"]

my ssn is 518-45-7708

my ssn is ***-**-****

All of the above!

masking_char="X", num_chars_to_leave_unmasked=4, mask_right_to_left=True, chars_to_ignore=["-"]

my ssn is 518-45-7708

my ssn is XXX-XX-7708

Let's put this together in Python with the Nightfall SDK. In our example, we have an input string with a credit card number (4916-6734-7572-5015 is my credit card number) and we wish to mask with an asterisk, unmask the last 4 digits, and ignore hyphens.

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

nightfall = Nightfall()  # reads API key from NIGHTFALL_API_KEY environment variable by default
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule(
        [Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                mask_config=MaskConfig(
                    masking_char="X",
                    num_chars_to_leave_unmasked=4,
                    mask_right_to_left=True,
                    chars_to_ignore=["-"]))
        )]
    )]
)
print(findings)
print(redacted_payload)

We'll see our findings look like this (with line formatting added for clarity):

[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='XXX-XXXX-XXXX-5015', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]

Also, we have received the input payload back as a redacted string in our redacted_payload object:

['XXXX-XXXX-XXXX-5015 is my credit card number']

When to use Masking?

Masking is especially useful in scenarios where you want to retain some of the original format of the data or a certain amount of non-sensitive information as context. For example, it's common to refer to credit card numbers by their last 4 digits, so masking everything but the last 4 digits would ensure that the output is still useful to the viewer.

Substitution

Substitute sensitive data findings with the InfoType, custom word, or an empty string.

Examples

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, Nightfall

nightfall = Nightfall()
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule([
        Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                substitution_phrase="SubMeIn")
        )]
    )]
)
print(findings)
print(redacted_payload)

We'll see our findings object returned to us looks like this (with line formatting added for clarity):

[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='SubMeIn', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]

And our redacted input payload in our redacted_payload object:

['SubMeIn is my credit card number']

Instead of using a custom string as the substitution (SubMeIn), we may want to use the name of the detector for additional context. We can make a one line change to the example above, replacing substitution_phrase="SubMeIn" with infotype_substitution=True.

This yields:

['[CREDIT_CARD_NUMBER] is my credit card number']

When to use Substitution

Substitution is effective in scenarios where you intend to replace sensitive data with a contextual label, for example, you wish to replace a literal credit card number with the label "Credit Card Number". This provides context to the reader of the data that the data is a credit card number, without exposing them to the actual token itself.

Encryption

Encrypt sensitive data findings with a public encryption key that is passed via the API. Make the encryption algorithm configurable.

Encryption is a complex topic so we'll go into a more in-depth tutorial on encrypting and decrypting sensitive data with Nightfall in a separate post, but let's run through the basics below.

Nightfall uses RSA encryption which is asymmetric, meaning it works with two different keys: a public one and a private one. Anyone with your public key can encrypt data. Encrypted data can only be decrypted with the private key. So, you'll pass Nightfall your public key to encrypt with, and only you will have your private key to decrypt the encrypted data.

Example

  • Default case public_key=”MIG...AQAB” (“my ssn is 518-45-7708” → “my ssn is EhOp/DphEIA0LQd4q1BUq8FtuxKj66VA381Z9DtbiQaaHvy5Wlvtxg0je91DFXEJncOWbhgPbt7EvBl36k5MFlFdPbc5+bg40FxP676SnllEClEO+DDsuiRCk9VC4noAd0zLxgvV8qD/NPE/XhTfOpscqlKhllfTg7G5jZYYSG8=”)

For our example, we'll use the cryptography package in Python, so let's install it first: pip3 install cryptography

Let's first generate a public/private RSA key pair in PEM format on the command line. We'll cover how to generate keys programmatically in Python in our encryption-specific tutorial.

First, we'll generate our private key and write it to a file called example_private.pem:

openssl genrsa -out example_private.pem 2048

Next, we'll generate our public key in PEM format from this private key:

openssl rsa -in example_private.pem -outform PEM -pubout -out example_public.pem

Let's take a look at our public key with cat example_public.pem:

-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAnszkbHNclOhYgEc1lMPn
6KLm3cXS+w2CRBSEC5HFlqOUmdcXWnBFa9tlJYvXhQYuMFhXBcjUYgVUSAftK703
oTFMwRGZNnBjcUnNSK+pD4iaCEmdskkSA85GFCPsO1yrcfJp4965c43FrgWqyo7A
Aka5sGW9gX2wibQpQhil9TS0vtWHvEOq1TZnFAJD/DEJFN7zIQhglA/53Vd5PEL9
8fSfXxzbtu68wwhRtRqTaVRjzslx6i2Xs/QWcS/sWnKhnuF/enjlcll+SLyDEoPO
6iGp8MpHkZzJHmjATQJBA1vyu+mqo+G3wWm7WPME6V83VBNfG4wdkZCx/n9N5KzH
yQIDAQAB
-----END PUBLIC KEY-----

Remember to keep your private key safe. Anyone with this key can decrypt your encrypted data.

Now we can use our public key to encrypt any content with Nightfall! To do so, we'll first read the public key into a string.

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

with open(f'example_public.pem', "rb") as key_file:
	public_key = serialization.load_pem_public_key(
		key_file.read()
	)

pem = public_key.public_bytes(
		encoding=serialization.Encoding.PEM,
		format=serialization.PublicFormat.SubjectPublicKeyInfo
	)

pem_str = pem.decode('utf-8')

Now, we'll pass the public key into our redaction configuration, similar to the above examples, so Nightfall can use it to encrypt your sensitive data.

nightfall = Nightfall()
payload = [ "4916-6734-7572-5015 is my credit card number" ]
findings, redacted_payload = nightfall.scan_text(
				        payload,
				        [ DetectionRule([ 
			        		Detector(
			        			min_confidence=Confidence.LIKELY,
		               			nightfall_detector="CREDIT_CARD_NUMBER",
		               			display_name="Credit Card Number",
			               		redaction_config=RedactionConfig(
								remove_finding=False, 
								public_key=pem_str)
			               	)])
				        ])
print(findings)
print(redacted_payload)

We'll see our findings look like this (with line formatting added for clarity):

[[
  Finding(
    finding='4916-6734-7572-5015', 
    redacted_finding='ar4PGD1T3yCBjBdgJ+iX2Ak3hZYIyaaKcRY+AcNS3RjsGnss9hUA9Q0ycLtBOaMjFMeTdCupCEPNUFVYyzeWhHmL009DwWshV47Vkm84zB5O6HroJHAG0JpKHb6bLL58hAb9FHZ73usU4bI67ZEtJhX41HovlOfSCaeUnH4y3pPqRnh7d5roX7EIYQ39wzPGGo2TNbeyqm2pluC1G4Mqt9hLqy0tCwfbmKPXro41i9i1xED9GkVcnxTu0gS8bCMFkvAK4S+Hw0K/gqPq0hu2JGoryKo335IYBCit6S39JESJdNh7IafuE6mrmvYMlR9l4c60VkowEMZAPkUjOelPDw==', 
    before_context=None, 
    after_context=None, 
    detector_name='Credit Card Number', 
    detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
    confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
    byte_range=Range(start=0, end=19), 
    codepoint_range=Range(start=0, end=19), 
    matched_detection_rule_uuids=[], 
    matched_detection_rules=['Inline Detection Rule #1'])
]]

And our redacted input payload in our redacted_payload object (truncated for clarity):

['GpcjUg74...BpQHw== is my credit card number']

When to use Encryption

Third-party encryption is well-suited for use cases where you want to preserve the original sensitive data but ensure that it is only visible to sanctioned parties that have your private key. For example, if you are storing the data or passing it to a sanctioned third-party for processing, encrypting the sensitive tokens can add one additional layer of encryption and security, while still allowing a downstream processor to access the raw data as required with the key.

Congrats! You've now learned about and implemented multiple redaction techniques in just a few lines of code. You're ready to start adding redaction to your apps.

Default case (“my email is ” → “my email is .”)

Case with custom word=”[REDACTED BY NIGHTFALL]” (“my email is ” → “my email is [REDACTED BY NIGHTFALL].”)

Substitute with InfoType “my email is ” → “my email is [EMAIL].”

Dashboard
sign up
[email protected].
[email protected]
[email protected]