Redacting Sensitive Data in 4 Lines of Code
In this tutorial, we'll demonstrate how easy it is to redact sensitive data and give you a more in-depth look at various redaction techniques, how Nightfall works, and touch upon use cases for redaction techniques.
Before we get started, let's set our Nightfall API key as an environment variable and install our dependencies for our code samples in Python. If you don't have a Nightfall API key, generate one on your Nightfall Dashboard. If you don't have a Nightfall account, sign up for a free Nightfall Developer Platform account.
Masking
Mask sensitive data with a configurable character, allow leaving some characters unmasked, and allow ignoring certain characters.
Examples
Default
None
my ssn is 518-45-7708
my ssn is ***********
Mask with custom character
masking_char="X"
my ssn is 518-45-7708
my ssn is XXXXXXXXXXX
Leave first four characters unmasked
num_chars_to_leave_unmasked=4
my ssn is 518-45-7708
my ssn is 518-*******
Leave last four characters unmasked
mask_right_to_left=True
my ssn is 518-45-7708
my ssn is *******7708
Don't mask -
characters
chars_to_ignore=["-"]
my ssn is 518-45-7708
my ssn is ***-**-****
All of the above!
masking_char="X", num_chars_to_leave_unmasked=4, mask_right_to_left=True, chars_to_ignore=["-"]
my ssn is 518-45-7708
my ssn is XXX-XX-7708
Let's put this together in Python with the Nightfall SDK. In our example, we have an input string with a credit card number (4916-6734-7572-5015 is my credit card number
) and we wish to mask with an asterisk, unmask the last 4 digits, and ignore hyphens.
We'll see our findings
look like this (with line formatting added for clarity):
Also, we have received the input payload back as a redacted string in our redacted_payload
object:
When to use Masking?
Masking is especially useful in scenarios where you want to retain some of the original format of the data or a certain amount of non-sensitive information as context. For example, it's common to refer to credit card numbers by their last 4 digits, so masking everything but the last 4 digits would ensure that the output is still useful to the viewer.
Substitution
Substitute sensitive data findings with the InfoType
, custom word, or an empty string.
Examples
Default case (“my email is [email protected].” → “my email is .”)
Case with custom word=”[REDACTED BY NIGHTFALL]” (“my email is [email protected]” → “my email is [REDACTED BY NIGHTFALL].”)
Substitute with InfoType “my email is [email protected]” → “my email is [EMAIL].”
We'll see our findings
object returned to us looks like this (with line formatting added for clarity):
And our redacted input payload in our redacted_payload
object:
Instead of using a custom string as the substitution (SubMeIn
), we may want to use the name of the detector for additional context. We can make a one line change to the example above, replacing substitution_phrase="SubMeIn"
with infotype_substitution=True
.
This yields:
When to use Substitution
Substitution is effective in scenarios where you intend to replace sensitive data with a contextual label, for example, you wish to replace a literal credit card number with the label "Credit Card Number". This provides context to the reader of the data that the data is a credit card number, without exposing them to the actual token itself.
Encryption
Encrypt sensitive data findings with a public encryption key that is passed via the API. Make the encryption algorithm configurable.
Encryption is a complex topic so we'll go into a more in-depth tutorial on encrypting and decrypting sensitive data with Nightfall in a separate post, but let's run through the basics below.
Nightfall uses RSA encryption which is asymmetric, meaning it works with two different keys: a public one and a private one. Anyone with your public key can encrypt data. Encrypted data can only be decrypted with the private key. So, you'll pass Nightfall your public key to encrypt with, and only you will have your private key to decrypt the encrypted data.
Example
Default case public_key=”MIG...AQAB” (“my ssn is 518-45-7708” → “my ssn is EhOp/DphEIA0LQd4q1BUq8FtuxKj66VA381Z9DtbiQaaHvy5Wlvtxg0je91DFXEJncOWbhgPbt7EvBl36k5MFlFdPbc5+bg40FxP676SnllEClEO+DDsuiRCk9VC4noAd0zLxgvV8qD/NPE/XhTfOpscqlKhllfTg7G5jZYYSG8=”)
For our example, we'll use the cryptography
package in Python, so let's install it first:
pip3 install cryptography
Let's first generate a public/private RSA key pair in PEM format on the command line. We'll cover how to generate keys programmatically in Python in our encryption-specific tutorial.
First, we'll generate our private key and write it to a file called example_private.pem
:
Next, we'll generate our public key in PEM format from this private key:
Let's take a look at our public key with cat example_public.pem
:
Remember to keep your private key safe. Anyone with this key can decrypt your encrypted data.
Now we can use our public key to encrypt any content with Nightfall! To do so, we'll first read the public key into a string.
Now, we'll pass the public key into our redaction configuration, similar to the above examples, so Nightfall can use it to encrypt your sensitive data.
We'll see our findings
look like this (with line formatting added for clarity):
And our redacted input payload in our redacted_payload
object (truncated for clarity):
When to use Encryption
Third-party encryption is well-suited for use cases where you want to preserve the original sensitive data but ensure that it is only visible to sanctioned parties that have your private key. For example, if you are storing the data or passing it to a sanctioned third-party for processing, encrypting the sensitive tokens can add one additional layer of encryption and security, while still allowing a downstream processor to access the raw data as required with the key.
Congrats! You've now learned about and implemented multiple redaction techniques in just a few lines of code. You're ready to start adding redaction to your apps.
Last updated