Deploy a File Scanner for Sensitive Data in 40 Lines of Code
The service ingests a local file, scans it for sensitive data with Nightfall, and displays the results in a simple table UI.
We'll deploy the server on Render (a PaaS Heroku alternative) so that you can serve your application publicly in production instead of running it off your local machine. You'll build familiarity with the following tools and frameworks: Python, Flask, Nightfall, Ngrok, Jinja, Render.
Key Concepts
Before we get started on our implementation, start by familiarizing yourself with how scanning files works with Nightfall, so you're acquainted with the flow we are implementing.
In a nutshell, file scanning is done asynchronously by Nightfall; after you upload a file to Nightfall and trigger the scan, we perform the scan in the background. When the scan completes, Nightfall delivers the results to you by making a request to your webhook server. This asynchronous behavior allows Nightfall to scan files of varying sizes and complexities without requiring you to hold open a long synchronous request, or continuously poll for updates. The impact of this pattern is that you need a webhook endpoint that can receive inbound notifications from Nightfall when scans are completed - that's what we are building in this tutorial.
Getting Started
You can fork the sample repo and view the complete code here, or follow along below. If you're starting from scratch, create a new GitHub repository.
Setting Up Dependencies
First, let's start by installing our dependencies. We'll be using Nightfall for data classification, the Flask web framework in Python, and Gunicorn as our web server. Create requirements.txt
and add the following to the file:
Then run pip install -r requirements.txt
to do the installation.
Configuring Detection with Nightfall
Next, we'll need our Nightfall API Key and Webhook Signing Secret; the former authenticates us to the Nightfall API, while the latter authenticates that incoming webhooks are originating from Nightfall. You can retrieve your API Key and Webhook Signing Secret from the Nightfall Dashboard. Complete the Nightfall Quickstart for a more detailed walk-through. Sign up for a free Nightfall account if you don't have one.
These values are unique to your account and should be kept safe. This means that we will store them as environment variables and should not store them directly in code or commit them into version control. If these values are ever leaked, be sure to visit the Nightfall Dashboard to re-generate new values for these secrets.
Setting Up Our Server
Let's start writing our Flask server. Create a file called app.py
. We'll start by importing our dependencies and initializing the Flask and Nightfall clients:
Next, we'll add our first route, which will display "Hello World" when the client navigates to /ping
simply as a way to validate things are working:
Run gunicorn app:app
on the command line to fire up your server, and navigate to your local server in your web browser. You'll see where the web browser is hosted in the Gunicorn logs, typically it will be 127.0.0.1:8000
aka localhost:8000
.
To expose our local webhook server via a public tunnel that Nightfall can send requests to, we'll use ngrok. Download and install ngrok via their quickstart documentation here. We'll create an ngrok tunnel as follows:
After running this command, ngrok
will create a tunnel on the public internet that redirects traffic from their site to your local machine. Copy the HTTPS tunnel endpoint that ngrok has created: we can use this as the webhook URL when we trigger a file scan.
Let's set this HTTPS endpoint as a local environment variable so we can reference it later:
Tip: With a Pro ngrok account, you can create a subdomain so that your tunnel URL is consistent, instead of randomly generated each time you start the tunnel.
Handling an Inbound Webhook
Before you send a file scan request to Nightfall, let's add logic for our incoming webhook endpoint, so that when Nightfall finishes scanning a file, it can successfully send the sensitive findings to us.
First, what does it mean to have findings? If a file has findings, this means that Nightfall identified sensitive data in the file that matched the detection rules you configured. For example, if you told Nightfall to look for credit card numbers, any substring from the request payload that matched our credit card detector would constitute sensitive findings.
We'll host our incoming webhook at /ingest
with a POST method.
Nightfall will POST to the webhook endpoint, and in the inbound payload, Nightfall will indicate if there are sensitive findings in the file, and provide a link where we can access the sensitive findings as JSON.
Restart your server so the changes propagate. We'll take a look at the console output of our webhook endpoint and explain what it means in the next section.
Scan a File
Now, we want to trigger a file scan request, so that Nightfall will scan the file and send a POST request to our /ingest
webhook endpoint when the scan is complete. We'll write a simple script that sends a file to Nightfall to scan it for credit card numbers. Create a new file called scan.py
.
First, we'll establish our dependencies, initialize the Nightfall client, and specify the filepath to the file we wish to scan as well as the webhook endpoint we created above. The filepath is a relative path to any file, in this case we are scanning the sample-pci-xs.csv
file which is in the same directory as scan.py
. This is a sample CSV file with 10 credit card numbers in it - you can download it in the tutorial's GitHub repo.
Next, we will initiate the scan request to Nightfall, by specifying our filepath, webhook URL where the scan results should be posted, and our Detection Rule that specifies what sensitive data we are looking for.
In this simple example, we have specified an inline Detection Rule that detects Likely Credit Card Numbers. This Detection Rule is a simple starting point that just scratches the surface of the types of detection you can build with Nightfall. Learn more about building inline detection rules here or how to configure them in the Nightfall Dashboard.
The scan_id
is useful for identifying your scan results later.
View Sensitive Findings
Let's run scan.py
to trigger our file scan job.
Once Nightfall has finished scanning the file, we'll see our Flask server receive the request at our webhook endpoint (/ingest
). In our code above, we parse the webhook payload, and print the following when there are sensitive findings:
In our output, we are printing two URLs.
The first URL is provided to us by Nightfall. It is the temporary signed S3 URL that we can access to fetch the sensitive findings that Nightfall detected.
The second URL won't work yet, we'll implement it next. This URL a we constructed in our ingest()
method above - the URL calls /view
and passes the Findings URL above as a URL-escaped query parameter.
Let's add a method to our Flask server that opens this URL and displays the findings in a formatted table so that the results are easier to view than downloading them as JSON.
We'll do this by adding a view
method that responds to GET requests to the /view
route. The /view
route will read the URL to the S3 Findings URL via a query parameter. It will then open the findings URL, parse it as JSON, pass the results to an HTML template, and display the results in a simple HTML table using Jinja. Jinja is a simple templating engine in Python.
Add the following to our Flask server in app.py
:
Create the Table View
To display the findings in an HTML table, we'll create a new Flask template. Create a folder in your project directory called templates
and add a new file within it called view.html
.
Our template uses Jinja to iterate through our findings, and create a table row for each sensitive finding.
Now, if we restart our Flask server, trigger a file scan request, and navigate to the "View" URL printed in the server logs, we should see a formatted table with our results! In fact, we can input any Nightfall-provided signed S3 URL (after URL-escaping it) in the findings_url
parameter of the /view
route to view it.
Deploy on Render
As a longtime Heroku user, I was initially inclined to write this tutorial with instructions to deploy our app on Heroku. However, new PaaS vendors have been emerging and I was curious to try them out and see how they compare to Heroku. One such vendor is Render, which is where we'll deploy our app.
Deploying our service on Render is straightforward. If you're familiar with Heroku, the process is quite similar. Once you've signed up or logged into Render (free), we'll do the following:
Create a new
Web Service
on Render, and permit Render to access your new repo.Use the following values during creation:
Environment: Python
Build Command:
pip install -r requirements.txt
Start Command:
gunicorn app:app
Let's also set our environment variables during creation. These are the same values we set locally.
Scan a file (in production)
Once Render has finished deploying, you'll get the base URL of your application. Set this as your NIGHTFALL_SERVER_URL
locally and re-run scan.py
- this time, the file scan request is served by your production Flask server running on Render!
To confirm this, navigate to the Logs
tab in your Render app console, you'll see the webhook's output of your file scan results:
Navigate to the View
link above in your browser to verify that you can see the results formatted in a table on your production site.
Congrats, you've successfully created a file scanning server and deployed it in production! You're now ready to build more advanced business logic around your file scanner. Here are some ideas on how to extend this tutorial:
Use WebSockets to send a notification back from the webhook to the client that initiated the file scan request
Build a more advanced detection rule using pre-built or custom detectors
Add a user interface to add more interactive capabilities, for example allowing users to upload files or read files from URLs
Last updated