Amazon Kinesis DLP Tutorial
Last updated
Last updated
Amazon Kinesis allows you to collect, process, and analyze real-time streaming data. In this tutorial, we will set up Nightfall DLP to scan Kinesis streams for sensitive data. An overview of what we are going to build is shown in the diagram below.
We will send data to Kinesis using a simple producer written in Python. Next, we will use an AWS Lambda function to send data from Kinesis to Nightfall. Nightfall will scan the data for sensitive information. If there are any findings returned by Nightfall, the Lambda function will write the findings to a DynamoDB table.
To complete this tutorial you will need the following:
An AWS Account with access to Kinesis, Lambda, and DynamoDB
The AWS CLI installed and configured on your local machine.
A Nightfall API Key
An existing Nightfall Detection Rule which contains at least one detector for email addresses.
Local copy of the companion repository for this tutorial.
Before continuing, you should clone the companion repository locally.
First, we will configure all of our required Services on AWS.
Open the IAM roles page in the AWS console.
Choose Create role.
Create a role with the following properties:
Lambda as the trusted entity
Permissions
AWSLambdaKinesisExecutionRole
AmazonDynamoDBFullAccess
Role name: nightfall-kinesis-role
Open the Kinesis page and select Create Data Stream
Enter nightfall-demo
as the Data stream name
Enter 1
as the Number of open shards
Select Create data stream
Open the Lambda page and select Create function
Choose Author from scratch and add the following Basic information:
nightfall-lambda
as the Function name
Python 3.8 as the Runtime
Select Change default execution role, Use an existing role, and select the previously created nightfall-kinesis-role
Once the function has been created, in the Code tab of the Lambda function select Upload from and choose .zip file. Select the local nightfall-lambda-package.zip
file that you cloned earlier from the companion repository and upload it to AWS Lambda.
You should now see the previous sample code replaced with our Nightfall-specific Lambda function.
Next, we need to configure environment variables for the Lambda function.
Within the same Lambda view, select the Configuration tab and then select Environment variables.
Add the following environment variables that will be used during the Lambda function invocation.
NIGHTFALL_API_KEY
: your Nightfall API Key
DETECTION_RULE_UUID
: your Nightfall Detection Rule UUID.
🚧Detection Rule RequirementsThis tutorial uses a data set that contains a name, email, and random text. In order to see results, please make sure that the Nightfall Detection Rule you choose contains at least one detector for email addresses.
Lastly, we need to create a trigger that connects our Lambda function to our Kinesis stream.
In the function overview screen on the top of the page, select Add trigger.
Choose Kinesis as the trigger.
Select the previously created nightfall-demo
Kinesis stream.
Select Add
The last step in creating our demo environment is to create a DynamoDB table.
Open the DynamoDB page and select Create table
Enter nightfall-findings
as the Table Name
Enter KinesisEventID
as the Primary Key
Be sure to also run the following before the Lambda function is created:
This is to ensure that the required version of the Python SDK for Nightfall has been installed. We also need to install boto3.
Before we start processing the Kinesis stream data with Nightfall, we will provide a brief overview of how the Lambda function code works. The entire function is shown below:
This is a relatively simple function that does four things.
Create a DynamoDB client using the boto3
library.
Extract and decode data from the Kinesis stream and add it to a single list of strings.
Create a Nightfall client using the nightfall
library and scan the records that were extracted in the previous step.
Iterate through the response from Nightfall, if there is are findings for a record we copy the record and findings metadata into a DynamoDB table. We need to process the list of Finding objects into a list of dicts before passing them to DynamoDB.
Now that you've configured all of the required AWS services, and understand how the Lambda function works, you're ready to start sending data to Kinesis and scanning it with Nightfall.
We've included a sample script in the companion repository that allows you to send fake data to Kinesis. The data that we are going to be sending looks like this:
The script will send one record with the data shown above every 10 seconds.
Before running the script, make sure that you have the AWS CLI installed and configured locally. The user that you are logged in with should have the appropriate permissions to add records to the Kinesis stream. This script uses the Boto3 library which handles authentication based on the credentials file that is created with the AWS CLI.
You can start sending data with the following steps:
Open the companion repo that you cloned earlier in a terminal.
Create and Activate a new Python Virutalenv
Install Dependencies
Start sending data
If everything worked, you should see output similar to this in your terminal:
As the data starts to get sent to Kinesis, the Lambda function that we created earlier will begin to process each record and check for sensitive data using the Nightfall Detection Rule that we specified in the configuration.
If Nightfall detects a record with sensitive data, the Lambda function will copy that record and additional metadata from Nightfall to the DynamoDB table that we created previously.
If you'd like to clean up the created resources in AWS after completing this tutorial you should remove the following resources:
nightfall-kinesis-role
IAM Role
nightfall-demo
Kinesis data stream
nightfall-lambda
Lambda Function
nightfall-findings
DynamoDB Table
With the Nightfall API, you are also able to redact and mask your Kinesis findings. You can add a Redaction Config, as part of your Detection Rule, as a section within the lambda function. For more information on how to use redaction with the Nightfall API, and its specific options, please refer to the guide here.
Congrats You've successfully integrated Nightfall with Amazon Kinesis, Lambda, and DynamoDB. If you have an existing Kinesis Stream, you should be able to take the same Lambda Function that we used in this tutorial and start scanning that data without any additional changes.