How to scan for sensitive data in Airtable
Airtable is a popular cloud collaboration tool that lands somewhere between a spreadsheet and a database. As such, it can house all sorts of sensitive data that you may not want to surface in a shared environment.
By utilizing Airtable's API in conjunction with Nightfall AI’s scan API, you can discover, classify, and remediate sensitive data within your Airtable bases.
You will need a few things to follow along with this tutorial:
An Airtable account and API key
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment (version 3.7 or later)
The most recent version of Python Nightfall SDK
Install the Nightfall SDK and the requests library using pip.
To start, import all the libraries we will be using.
The JSON, OS, and CSV libraries are part of Python so we don't need to install them.
We've configured the Airtable and Nightfall API keys as environment variables so they are not written directly into the code.
Next, we define the Detection Rule with which we wish to scan our data.
The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.
Also, we abstract a nightfall class from the SDK, for our API key.
The Airtable API doesn't list all bases in a workspace or all tables in a base; instead, you must specifically call each table to get its contents.
In this example, we have set up a config.json
file to store that information for the Airtable My First Workspace
bases. You may also wish to consider setting up a separate Base and Table that stores your schema and retrieves that information with a call to the Airtable API.
As an extension of this exercise, you could write Nightfall findings back to another table within that Base.
Now we set up the parameters we will need to call the Airtable API using the previously referenced API key and config file.
We will now call the Airtable API to retrieve the contents of our Airtable workspace. The data hierarchy in Airtable goes Workspace > Base > Table. We will need to perform a GET request on each table in turn.
As we go along, we will convert each data field into its string enriched with identifying metadata so that we can locate and remediate the data later should sensitive findings occur.
🚧WarningIf you are sending more than 50,000 items or more than 500KB, consider using the file API. You can learn more about how to use the file API in the Using the File Scanning Endpoint with Airtable section below.
Before moving on we will define a helper function to use later so that we can unpack the metadata from the strings we send to the Nightfall API.
We will begin constructing an all_findings
object to collect our results. The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later.
This example will include the full finding below. As the finding might be a piece of sensitive data, we recommend using the Redaction feature of the Nightfall API to mask your data.
Now we call the Nightfall API on content retrieved from Airtable. For every sensitive data finding we receive, we strip out the identifying metadata from the sent string and store it with the finding in all_findings
so we can analyze it later.
Finally, we export our results to a CSV so they can be easily reviewed.
That's it! You now have insight into all of the sensitive data stored within your Airtable workspace!
As a next step, you could write your findings to a separate 'Nightfall Findings' Airtable base for review, or you could update and redact confirmed findings in situ using the Airtable API.
The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down into the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize and retrieve the data we want to retrieve from Airtable.
Now we go through writing the data to a .csv file.
Using the above .csv file, begin the Scan API file upload process.
Once the files have been uploaded, use the scan endpoint.
A webhook server is required for the scan endpoint to submit its results. See our example webhook server.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.