Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The file scan API has first-class support for text extraction and scanning on all MIME types enumerated below.
Certain file types receive special handling, such as tabular data and archives of Git repositories, that results in more precise information about the location of findings within the source file..
application/json
application/x-ndjson
application/x-php
text/calendar
text/css
text/csv (treated as tabular data and may be redacted )
text/html
text/javascript
text/plain
text/tab-separated-values (treated as tabular data)
text/tsv (treated as tabular data)
text/x-php
application/pdf
application/vnd.openxmlformats-officedocument.presentationml.presentation
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (treated as tabular data)
application/vnd.openxmlformats-officedocument.wordprocessingml.document
application/vnd.ms-excel (treated as tabular data)
application/bzip2
application/ear
application/gzip
application/jar
application/java-archive
application/tar+gzip
application/vnd.android.package-archive
application/war
application/x-bzip2
application/x-gzip
application/x-rar-compressed
application/x-tar
application/x-webarchive
application/x-zip-compressed
application/x-zip
application/zip
image/apng
image/avif
image/gif
image/jpeg
image/jpg
image/png
image/svg+xml
image/tiff
image/webp
The file scan API explicitly rejects requests with MIME types that are not conducive to extracting or scanning text. Sample rejected MIME types include:
application/photoshop
audio/midi
audio/wav
video/mp4
video/quicktime
File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange
, codepointRange
, and lineRange
properties.
Findings will contain a columnRange
and a rowRange
that will allow you to identify the specific row and column within the tabular data wherein the finding is present.
This functionality is applicable to the following mime types:
text/csv
text/tab-separated-values
text/tsv
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel
Apache parquet data files are also accepted.
Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.
Findings within csv files may be redacted.
To enable redaction in files, set the enableFileRedaction
flag of your policy
to "true"
The csv file will be redacted based on the configuration of the defaultRedactionConfig
of the policy
Below is an example curl request for a csv file that has already been uploaded .
When results are sent to the location specified in the alertConfig
(in this case an email address) a redactedFile
property will be set with a fileURL
in addition the findingsURL
This redacted file will be a modified version of the original csv file.
Below is an example of a redacted csv file.
Nightfall provides special handling for archives of Git repositories.
Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.
In order to scan the repository, you will need to create a clone, i.e.
git clone https://github.com/nightfallai/nightfall-go-sdk.git
This creates a clone of the Nightfall go SDK.
You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.
zip -r directory.zip directory
Note that in order to work, the hidden directory .github
must be included in the archive.
When you initiate the file upload sequence with this file, you will receive scan results that contain the commitHash
property filled in.
Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http://
or https://
), which will send results such as the following:
Large repositories result in a large volume of data sent at once. We are working on changes to allow these and other large surges of data to be processed in a more controlled manner, and will increase the limit or remove it altogether once those changes are complete.
To retrieve the specific checkout, you will need to clone the repository, i.e.
git clone https://github.com/nightfallai/nightfall-go-sdk.git
You can then checkout the specific commit using the commit hash returned by Nightfall.
Note that you are in a 'detached HEAD' state when workin with this sort of check out of a repository.
File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange
, codepointRange
, and lineRange
properties.
Findings will contain a columnRange
and a rowRange
that will allow you to identify the specific row and column within the tabular data wherein the finding is present.
This functionality is applicable to the following mime types:
text/csv
text/tab-separated-values
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel
data files are also accepted.
Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.
Nightfall provides special handling for archives of GitHub repositories.
Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.
In order to scan the repository, you will need to create a clone, i.e.
git clone https://github.com/nightfallai/nightfall-go-sdk.git
This creates a clone of the Nightfall go SDK.
You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.
zip -r directory.zip directory
Note that in order to work, the hidden directory .github
must be included in the archive.
Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http://
or https://
), which will send results such as the following:
Sensitive Data in GitHub Repositories
If the finding in a GitHub repository is considered to be sensitive, it should be considered compromised and appropriate mitigation steps (i.e. secrets should be rotated).
To retrieve the specific checkout, you will need to clone the repository, i.e.
git clone https://github.com/nightfallai/nightfall-go-sdk.git
You can then checkout the specific commit using the commit hash returned by Nightfall.
Nightfall’s file scan API allows a user to upload a file in chunks, then to scan it with Detection Rules once the upload is complete.
The scan will then be processed asynchronously before sending the results to the webhook URL that is provided along with your Detection Rules.
The following sequence diagram illustrates the full process for scanning a binary file with Nightfall.
In order to utilize the File Scanning API you need the following:
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (detailed information to follow)
As part of submitting a file scan request, the request payload must contain a reference to a URL defined as part of a policy
defined inline.
When Nightfall prepares a file scan operation, it will issue a challenge to the to verify its legitimacy.
After the file scan has been processed asynchronously, the results will be delivered to the webhook.
For a file scan, your webhook will receive a request body that will be a JSON payload containing:
the upload UUID (uploadID
)
a boolean indicating whether or not any data in the file matched the provided detection rules (findingsPresent
)
a pre-signed S3 URL where the caller may fetch the findings for the scan (findingsURL
). if there are no findings in the file, this field will be empty.
the date until which the findingsURL is valid (validUntil
) formatted to . Results are valid for 24 hours after scan completion. The time will be in UTC.
the value you supplied for requestMetadata. Callers may opt to use this to help identify their input file upon receiving a webhook response. Maximum length 10 KB.
Below is an example of a payload sent to the webhook URL.
In this example, we have uploaded a zip file with a python script (upload.py) and a README.md file. A Detector in our DetectionRule checks for the presence of the string http://localhost
Nightfall's upload process is built to accommodate files of any size. Once files are uploaded, they may be scanned with and to detect potential violations.
Many users will find it more convenient to use our our to complete the upload process.
Uploading files using Client SDK libraries requires fewer steps as all the required API operations are wrapped in a single function call. Furthermore these SDKs handle all the programmatic logic necessary to send files in smaller chunks to Nightfall.
For users that are looking to understand the entire end-to-end, that is also outlined in this document. We will walk you through the order of operations necessary to upload the file.
Rather than implementing the for the upload functionality yourself, the Nightfall’s provide a single method that wraps the steps required to upload your file.
Below is an example of uploading a file from our and our .
To run the node sample script you must compile it as TypesScript. Save it as a .ts file and run
tsc <yourfilename>.ts -lib ES2015,DOM
You can then run the resulting JavaScript file:
NIGHTFALL_API_KEY=<YourApiKey> node yourscriptname.js
Note that these examples use an email address to receive the results for simplicity.
The upload process consists of 3 stages:
POST /v3/upload
As part of the initialization you must provide the total byte size of the file being uploaded.
You may also provide the mime-type, otherwise the system will attempt to determine it once the upload is complete.
The id
of the returned JSON object will be used as the fileId
in subsequent requests.
The chunkSize
is the maximum number of bytes to upload during the uploading phase.
The size of these chunks are determined by the chunkSize
value returned by POST /upload
endpoint used in the previous step.
Below is a simple example where the file is less than the chunkSize
so may safely be uploaded with one call to the upload endpoint.
If your file's size exceeds the chunkSize
, to upload the complete file you will need to send iterative requests as you read portions of the file's contents. This means you will send multiple requests to the upload
endpoint as shown above. As you do so, you will be updating the value of the X-Upload-Offset
header based on the portion of the file being sent.
Each request should send a chunk of the file exactly chunkSize
bytes long except for the final uploaded chunk. The final uploaded chunk is allowed to contain fewer bytes as the remainder of the file may be less than the chunkSize
returned by the initialization step.
The request body should be the contents of the chunk being uploaded.
The value of the X-UPLOAD-OFFSET
header should be the byte offset specifying where to insert the data into the file as an integer. This byte offset is zero-indexed.
Successful calls to this endpoint return an empty response with an HTTP status code of 204
POST /v3/upload/<uploadUUID>/finish
When an upload completes successfully, the returned payload will indicate the mimeType the system determined to file to be if it was not provided during upload initialization.
Once a file has been marked as completed, you may initiate a scan of the uploaded file.
After an upload is finalized, it can be scanned against a Detection Policy. A Detection Policy represents a pairing of:
a webhook URL
a set of detection rules to scan data against
You may also supply a value to the requestMetadata
field to help identify the input file upon receiving a response to your webhook. This field has a maximum length 10 KB.
Below is a sample Python script that handles the complete sequence of API calls to upload a file using a path specified as an argument.
When you initiate the with this file, you will receive scan results that contain the commitHash
property filled in.
Note that you are in a when working with this sort of check out of a repository.
For a detailed walkthrough of the API calls necessary to upload and scan a file and full script that shows the entire process, see
An active API Key authorized for file scanning passed via the header Authorization: Bearer <key>
— see
File scanning also support Nightfall's functionality for and as part of your scan requests.
If you follow the URL (before it expires) it will return a JSON representation of the findings similar to those returned by the endpoint.
You may also want to use a webhook. See for additional information on how to set up Webhook server to receive these results.
Once the upload is complete, you may initiate the
After we discuss each API call in the sequence, you will find a script that walks through the at the end of this guide.
The first step in the process of scanning a binary file is to initiate an upload in order to get a fileId
through the Initiate a .
Use the endpoint to upload the file contents in chunks.
See the below for an illustration as to how this upload process can be done programmatically.
Once all chunks are uploaded, mark the upload as completed using the .
The scanning process is asynchronous, with results being delivered to the webhook URL configured on the detection policy. See for more information about creating a Webhook server.
Exactly one policy
should be provided in the request body, which includes a webhookURL
to which the callback will be made once the file scan has been completed (this must be an HTTPS URL) as well as a Detection Rule as either an a or as a rule that has been .
The Nightfall API supports the ability to send asynchronous notifications when findings are detected as part of a scan request.
The supported destinations for these notifications include external platforms, such as Slack, email, or url to a SIEM log collector as well as to a webhook server.
Nightfall issues notifications under the following scenarios:
to notify a client about the results of a file scan request. File scans themselves are always performed asynchronously because of complexity relating to text extraction and data volume.
To create a webhook you will need to access your webhook Signing Key and then set up a create a webhook server.
For more information on how webhooks and asynchronous notifications are used please see our guides on:
Learn how to set up a server to handle results of file scans and alerts sent based on policy alert configurations.
Nightfall will send a POST request with a JSON payload with a single field challenge
containing randomly-generated bytes when it sends a message to a user-provided webhook address. This is to ensure that the caller owns the server.
In order to authenticate your webhook server to Nightfall, you must reply with (1) a 200 HTTP Status Code, and (2) a plaintext request body containing only the value of the challenge
key.
If Nightfall receives the expected value back, then the file scan operation will proceed; otherwise it will be aborted.
When a server responds successfully to a challenge request, the validity of that URL will be cached for up to 24 hours, after which it will need to be validated again.
If the webhook cannot be reached, you will receive an error with the code "40012" and the description "Webhook URL validation failed" when you initiate the scan.
If the webhook challenge fails, you will receive an error with the code "42201" and the description "Webhook returned incorrect challenge response" when you initiate the scan.
When a customer signs up for the developer platform, Nightfall automatically generates a unique siging secret for them.
This secret is used to sign requests to the customer's configured webhook URL.
If you has any concerns that their signing secret may have leaked, you can request rotation at any time by reaching out to Nightfall Customer Success.
For security purposes, the webhook includes a signature header containing an HMAC-SHA256 digital signature that customers may use to authenticate the client.
In order to authenticate requests to the webhook URL, customers may use the following algorithm:
Check for the presence of the headers X-Nightfall-Signature
and X-Nightfall-Timestamp
. If these headers are not both present, discard the request.
Read the entire request body into a string body
.
Verify that the value in the X-Nightfall-Timestamp
header (the POSIX time in seconds) occurred recently. This is to protect against replay attacks, so a threshold on the order of magnitude of minutes should be reasonable. If a request occurred too far in the past, it should be discarded.
Concatenate the timestamp and body with a colon delimiter, i.e. timestamp:body
.
Compute the HMAC SHA-256 hash of the payload from the previous step, using your unique signing secret as the key. Encode this computed value in hex.
Compare the value of the X-Nightfall-Signature
header to the value computed in the previous step. If the values match, authentication is successful, and processing should proceed. Otherwise, the request must be discarded.
The snippet below shows how you might implement this authentication validation in Python:
An example implementation of a simple webhook server is below.
You can test your webhook with a tool such as ngrok which allows you expose a web server running on your local machine to the internet.
In the above example, the webhook server is running on port 8075. To route ngrok requests to this server, once you run the python script (having installed the necessary dependencies such getenv and Flask), you would run ngrok as follow:
./ngrok http 8075
See the section on Alerting for details about the json payloads for the different messages sent to webhook servers.
Nightfall supports Detectors that will scan for file names, file types, and file finger prints.
In addition to scanning the content of files, you may configure the Detectors to scan file names as well.
This is done through the “scope” attribute of a Detector.
The scope attribute allows you to scan either within file contents, the file name, or both the file contents and file name.
File extensions can be scanned for by creating a Regular Expression type custom Detector with a scope
to scan only file names ("File") or both the content and file name ("ContentAndFile"), as shown in the example request below.
In addition to scanning based on file name, you may also use a File Type Detector which allows you to scan for files based on their mime-type.
Note that confidence sensitivity does not apply to file names. Sensitive findings will always be reported on.
Nightfall’s File Type detection allows you to implement compliance policies that detect and alert you when particular file types that are not allowed in a given location are discovered.
This functionality is implemented by creating a specific Detector called a “File Type Detector”
To create a File Type Detector, select “Detectors” from the left hand navigation and click the button labeled “+New Detector” in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the “File Type” Detector type.
You will then select one or more file types for which to scan by selecting from a list of mime-types
You can either scroll through the list of mime-types in the select box or you may type in a portion of the mime-type and the contents of the select box will be filtered to match your input.
Nightfall supports detection for a wide variety of mime-types. See the Internet Assigned Numbers Authority’s (IANA) website for a definitive list of mime-types. Note however that Nightfall does not support the detection of audio and video related mime-types.
Detection of file types is done based on the file contents, not its extension. However, you can create Detectors that scan file names by setting the scope
attribute.
File Type Detectors vary from other Nightfall Detectors in that the attributes of scope
and confidence
are not relevant to File Type Detectors
Once you have added all the mime-types you wish to scan for, save your new Detector. You may then add your new Detector to Detection Rules and Policies.
Nightfall allows you to discover the location of specific files that you have deemed sensitive and want to avoid sharing.
This discovery is done through document fingerprinting. Fingerprinting is the process of algorithmically creating a unique identifier for a file by mapping the data of the document to a signature that can be recalled quickly. This allows the file to be identified in a manner akin to how human fingerprints uniquely identify individual people.
This functionality is achieved in Nightfall by creating a specific Detector type called a File Fingerprint Detector.
The Fingerprint Detector allows you to create a fingerprint for one more files (a sort “handful” of fingerprints, if you would).
To create a Fingerprint Detector, select “Detectors” from the left hand navigation and click the button labeled “+New Detector” in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the “Fingerprint” Detector type.
When you create a File Fingerprint Detector you can upload up to 50 files that need to be fingerprinted. The file size limit is 25MB.
Once the fingerprint is generated, the actual content of the file is discarded so no sensitive content is stored on Nightfall’s system.
These Detectors may only be created through the console.
You may then treat the Fingerprint detector like any other Detector and incorporate it into a Detection Rule using its unique Detector identifier.
You may incorporate these Detectors into Policies that will alert you whenever files that match the fingerprint are detected.
In order to accept requests from Nightfall, a Webhook server must use a signing key to verify requests.
To access or generate your Webhook signing key, start by logging in to the Nightfall dashboard.
Select the Developer Platform > Manage API Keys using the navigation bar on the left side of the page. You will see the Webhook signing section:
Unlike the API Key, it is possible to reveal the signature via the "eye" icon furtherest to the left of the three icons displayed.
You may copy the current value to your clipboard with the "copy" icon in the center of the three icons displayed.
You may also regenerate the key with the circular arrow icon furthest to the right.
Use this value as shown in the code examples that are used in the following sections.