# Redacting Sensitive Data in 4 Lines of Code

In this tutorial, we'll demonstrate how easy it is to redact sensitive data and give you a more in-depth look at various redaction techniques, how Nightfall works, and touch upon use cases for redaction techniques.

Before we get started, let's set our Nightfall API key as an environment variable and install our dependencies for our code samples in Python. If you don't have a Nightfall API key, generate one on your Nightfall [**Dashboard**](https://app.nightfall.ai/developer-platform). If you don't have a Nightfall account, [**sign up**](https://app.nightfall.ai/sign-up) for a free Nightfall Developer Platform account.

## Masking

Mask sensitive data with a configurable character, allow leaving some characters unmasked, and allow ignoring certain characters.

### Examples

| Cases                                | Additional Config                                                                                 | Before                  | After                   |
| ------------------------------------ | ------------------------------------------------------------------------------------------------- | ----------------------- | ----------------------- |
| Default                              | None                                                                                              | `my ssn is 518-45-7708` | `my ssn is ***********` |
| Mask with custom character           | `masking_char="X"`                                                                                | `my ssn is 518-45-7708` | `my ssn is XXXXXXXXXXX` |
| Leave first four characters unmasked | `num_chars_to_leave_unmasked=4`                                                                   | `my ssn is 518-45-7708` | `my ssn is 518-*******` |
| Leave last four characters unmasked  | `mask_right_to_left=True`                                                                         | `my ssn is 518-45-7708` | `my ssn is *******7708` |
| Don't mask `-` characters            | `chars_to_ignore=["-"]`                                                                           | `my ssn is 518-45-7708` | `my ssn is ***-**-****` |
| All of the above!                    | `masking_char="X", num_chars_to_leave_unmasked=4, mask_right_to_left=True, chars_to_ignore=["-"]` | `my ssn is 518-45-7708` | `my ssn is XXX-XX-7708` |

Let's put this together in Python with the Nightfall SDK. In our example, we have an input string with a credit card number (`4916-6734-7572-5015 is my credit card number`) and we wish to mask with an asterisk, unmask the last 4 digits, and ignore hyphens.

```python
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

nightfall = Nightfall()  # reads API key from NIGHTFALL_API_KEY environment variable by default
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule(
        [Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                mask_config=MaskConfig(
                    masking_char="X",
                    num_chars_to_leave_unmasked=4,
                    mask_right_to_left=True,
                    chars_to_ignore=["-"]))
        )]
    )]
)
print(findings)
print(redacted_payload)
```

We'll see our `findings` look like this (with line formatting added for clarity):

```python
[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='XXX-XXXX-XXXX-5015', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]
```

Also, we have received the input payload back as a redacted string in our `redacted_payload` object:

```python
['XXXX-XXXX-XXXX-5015 is my credit card number']
```

### When to use Masking?

Masking is especially useful in scenarios where you want to retain some of the original format of the data or a certain amount of non-sensitive information as context. For example, it's common to refer to credit card numbers by their last 4 digits, so masking everything but the last 4 digits would ensure that the output is still useful to the viewer.

## Substitution

Substitute sensitive data findings with the `InfoType`, custom word, or an empty string.

### Examples

* Default case (“my email is [\[email protected\].](about:/cdn-cgi/l/email-protection#f784969ab7999e909f8391969b9bd9969ed9)” → “my email is .”)
* Case with custom word=”\[REDACTED BY NIGHTFALL]” (“my email is [\[email protected\]](about:/cdn-cgi/l/email-protection#fb889a96bb95929c938f9d9a9797d59a92)” → “my email is \[REDACTED BY NIGHTFALL].”)
* Substitute with InfoType “my email is [\[email protected\]](about:/cdn-cgi/l/email-protection#6112000c210f0806091507000d0d4f0008)” → “my email is \[EMAIL].”

```python
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, Nightfall

nightfall = Nightfall()
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule([
        Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                substitution_phrase="SubMeIn")
        )]
    )]
)
print(findings)
print(redacted_payload)
```

We'll see our `findings` object returned to us looks like this (with line formatting added for clarity):

```python
[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='SubMeIn', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]
```

And our redacted input payload in our `redacted_payload` object:

```python
['SubMeIn is my credit card number']
```

Instead of using a custom string as the substitution (`SubMeIn`), we may want to use the name of the detector for additional context. We can make a one line change to the example above, replacing `substitution_phrase="SubMeIn"` with `infotype_substitution=True`.

This yields:

```python
['[CREDIT_CARD_NUMBER] is my credit card number']
```

### When to use Substitution

Substitution is effective in scenarios where you intend to replace sensitive data with a contextual label, for example, you wish to replace a literal credit card number with the label "Credit Card Number". This provides context to the reader of the data that the data is a credit card number, without exposing them to the actual token itself.

## Encryption

Encrypt sensitive data findings with a public encryption key that is passed via the API. Make the encryption algorithm configurable.

Encryption is a complex topic so we'll go into a more in-depth tutorial on encrypting and decrypting sensitive data with Nightfall in a separate post, but let's run through the basics below.

Nightfall uses RSA encryption which is asymmetric, meaning it works with two different keys: a public one and a private one. Anyone with your public key can encrypt data. Encrypted data can only be decrypted with the private key. So, you'll pass Nightfall your public key to encrypt with, and only you will have your private key to decrypt the encrypted data.

### Example

* Default case public\_key=”MIG...AQAB” (“my ssn is 518-45-7708” → “my ssn is EhOp/DphEIA0LQd4q1BUq8FtuxKj66VA381Z9DtbiQaaHvy5Wlvtxg0je91DFXEJncOWbhgPbt7EvBl36k5MFlFdPbc5+bg40FxP676SnllEClEO+DDsuiRCk9VC4noAd0zLxgvV8qD/NPE/XhTfOpscqlKhllfTg7G5jZYYSG8=”)

For our example, we'll use the `cryptography` package in Python, so let's install it first:\
`pip3 install cryptography`

Let's first generate a public/private RSA key pair in PEM format on the command line. We'll cover how to generate keys programmatically in Python in our encryption-specific tutorial.

First, we'll generate our private key and write it to a file called `example_private.pem`:

```python
openssl genrsa -out example_private.pem 2048
```

Next, we'll generate our public key in PEM format from this private key:

```python
openssl rsa -in example_private.pem -outform PEM -pubout -out example_public.pem
```

Let's take a look at our public key with `cat example_public.pem`:

```cypher
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAnszkbHNclOhYgEc1lMPn
6KLm3cXS+w2CRBSEC5HFlqOUmdcXWnBFa9tlJYvXhQYuMFhXBcjUYgVUSAftK703
oTFMwRGZNnBjcUnNSK+pD4iaCEmdskkSA85GFCPsO1yrcfJp4965c43FrgWqyo7A
Aka5sGW9gX2wibQpQhil9TS0vtWHvEOq1TZnFAJD/DEJFN7zIQhglA/53Vd5PEL9
8fSfXxzbtu68wwhRtRqTaVRjzslx6i2Xs/QWcS/sWnKhnuF/enjlcll+SLyDEoPO
6iGp8MpHkZzJHmjATQJBA1vyu+mqo+G3wWm7WPME6V83VBNfG4wdkZCx/n9N5KzH
yQIDAQAB
-----END PUBLIC KEY-----
```

**Remember to keep your private key safe.** Anyone with this key can decrypt your encrypted data.

Now we can use our public key to encrypt any content with Nightfall! To do so, we'll first read the public key into a string.

```python
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

with open(f'example_public.pem', "rb") as key_file:
	public_key = serialization.load_pem_public_key(
		key_file.read()
	)

pem = public_key.public_bytes(
		encoding=serialization.Encoding.PEM,
		format=serialization.PublicFormat.SubjectPublicKeyInfo
	)

pem_str = pem.decode('utf-8')
```

Now, we'll pass the public key into our redaction configuration, similar to the above examples, so Nightfall can use it to encrypt your sensitive data.

```python
nightfall = Nightfall()
payload = [ "4916-6734-7572-5015 is my credit card number" ]
findings, redacted_payload = nightfall.scan_text(
				        payload,
				        [ DetectionRule([ 
			        		Detector(
			        			min_confidence=Confidence.LIKELY,
		               			nightfall_detector="CREDIT_CARD_NUMBER",
		               			display_name="Credit Card Number",
			               		redaction_config=RedactionConfig(
								remove_finding=False, 
								public_key=pem_str)
			               	)])
				        ])
print(findings)
print(redacted_payload)
```

We'll see our `findings` look like this (with line formatting added for clarity):

```python
[[
  Finding(
    finding='4916-6734-7572-5015', 
    redacted_finding='ar4PGD1T3yCBjBdgJ+iX2Ak3hZYIyaaKcRY+AcNS3RjsGnss9hUA9Q0ycLtBOaMjFMeTdCupCEPNUFVYyzeWhHmL009DwWshV47Vkm84zB5O6HroJHAG0JpKHb6bLL58hAb9FHZ73usU4bI67ZEtJhX41HovlOfSCaeUnH4y3pPqRnh7d5roX7EIYQ39wzPGGo2TNbeyqm2pluC1G4Mqt9hLqy0tCwfbmKPXro41i9i1xED9GkVcnxTu0gS8bCMFkvAK4S+Hw0K/gqPq0hu2JGoryKo335IYBCit6S39JESJdNh7IafuE6mrmvYMlR9l4c60VkowEMZAPkUjOelPDw==', 
    before_context=None, 
    after_context=None, 
    detector_name='Credit Card Number', 
    detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
    confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
    byte_range=Range(start=0, end=19), 
    codepoint_range=Range(start=0, end=19), 
    matched_detection_rule_uuids=[], 
    matched_detection_rules=['Inline Detection Rule #1'])
]]
```

And our redacted input payload in our `redacted_payload` object (truncated for clarity):

```python
['GpcjUg74...BpQHw== is my credit card number']
```

### When to use Encryption

Third-party encryption is well-suited for use cases where you want to preserve the original sensitive data but ensure that it is only visible to sanctioned parties that have your private key. For example, if you are storing the data or passing it to a sanctioned third-party for processing, encrypting the sensitive tokens can add one additional layer of encryption and security, while still allowing a downstream processor to access the raw data as required with the key.

Congrats! You've now learned about and implemented multiple redaction techniques in just a few lines of code. You're ready to start adding redaction to your apps.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.nightfall.ai/developer-api/nightfall-use-cases/redact.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
