All pages
Powered by GitBook
1 of 9

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Scanning Files

Nightfall’s file scan API allows a user to upload a file in chunks, then to scan it with Detection Rules once the upload is complete.

The scan will then be processed asynchronously before sending the results to the webhook URL that is provided along with your Detection Rules.

The following sequence diagram illustrates the full process for scanning a binary file with Nightfall.

For a detailed walkthrough of the API calls necessary to upload and scan a file and full script that shows the entire process, see Uploading and Scanning Files.

Prerequisites

In order to utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer <key> — see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (detailed information to follow)

File scanning also support Nightfall's functionality for Using Exclusion Rules and Using Context Rules as part of your scan requests.

Special File Types

Spreadsheets and Tabular Data

File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange, codepointRange, and lineRange properties.

Findings will contain a columnRange and a rowRange that will allow you to identify the specific row and column within the tabular data wherein the finding is present.

This functionality is applicable to the following mime types:

  • text/csv

  • text/tab-separated-values

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • application/vnd.ms-excel

Apache parquet data files are also accepted.

Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.

{
   "findings":[
      {
         "path":"Sheet1 (5)",
         "detector":{
            "id":"e30d9a87-f6c7-46b9-a8f4-16547901e069",
            "name":"US social security number (SSN)",
            "version":1
         },
         "finding":"624-84-9182",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":2505,
               "end":2516
            },
            "codepointRange":{
               "start":2452,
               "end":2463
            },
            "lineRange":{
               "start":55,
               "end":55
            },
            "rowRange":{
               "start":55,
               "end":55
            },
            "columnRange":{
               "start":2,
               "end":2
            },
            "commitHash":""
         },
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
...

Git Repositories

Nightfall provides special handling for archives of GitHub repositories.

Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.

In order to scan the repository, you will need to create a clone, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

This creates a clone of the Nightfall go SDK.

You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.

zip -r directory.zip directory

Note that in order to work, the hidden directory .github must be included in the archive.

When you initiate the file upload sequence with this file, you will receive scan results that contain the commitHash property filled in.

Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http:// or https://), which will send results such as the following:

{
   "findings":[
      {
         "path":"f607a067..53e59684/nightfall.go",
         "detector":{
            "id":"6123060e-2d9f-4f35-a7a1-743379ea5616",
            "name":"URL"
         },
         "finding":"https://api.nightfall.ai/\"",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":142,
               "end":168
            },
            "codepointRange":{
               "start":142,
               "end":168
            },
            "lineRange":{
               "start":16,
               "end":16
            },
            "rowRange":{
               "start":0,
               "end":0
            },
            "columnRange":{
               "start":0,
               "end":0
            },
            "commitHash":"53e59684d9778ceb0f0ed6a4b949c464c24d35ce"
         },
         "beforeContext":"tp\"\n\t\"os\"\n\t\"time\"\n)\n\nconst (\n\tAPIURL = \"",
         "afterContext":"\n\n\tDefaultFileUploadConcurrency = 1\n\tDef",
         "matchedDetectionRuleUUIDs":[
            "cda0367f-aa75-4d6a-904f-0311209b3383"
         ],
         "matchedDetectionRules":[
            
         ]
      },
 ...

Sensitive Data in GitHub Repositories

If the finding in a GitHub repository is considered to be sensitive, it should be considered compromised and appropriate mitigation steps (i.e. secrets should be rotated).

To retrieve the specific checkout, you will need to clone the repository, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

You can then checkout the specific commit using the commit hash returned by Nightfall.

cd nightfall-go-sdk
git checkout 53e59684d9778ceb0f0ed6a4b949c464c24d35ce

Note that you are in a 'detached HEAD' state when working with this sort of check out of a repository.

Creating a Webhook Server

Learn how to set up a server to handle results of file scans and alerts sent based on policy alert configurations.

Webhook Challenges

Nightfall will send a POST request with a JSON payload with a single field challenge containing randomly-generated bytes when it sends a message to a user-provided webhook address. This is to ensure that the caller owns the server.

{"challenge": "z78woE1uDFu7tPrPvEBV"}

In order to authenticate your webhook server to Nightfall, you must reply with (1) a 200 HTTP Status Code, and (2) a plaintext request body containing only the value of the challenge key.

If Nightfall receives the expected value back, then the file scan operation will proceed; otherwise it will be aborted.

When a server responds successfully to a challenge request, the validity of that URL will be cached for up to 24 hours, after which it will need to be validated again.

If the webhook cannot be reached, you will receive an error with the code "40012" and the description "Webhook URL validation failed" when you initiate the scan.

If the webhook challenge fails, you will receive an error with the code "42201" and the description "Webhook returned incorrect challenge response" when you initiate the scan.

Webhook Signature Verification

When a customer signs up for the developer platform, Nightfall automatically generates a unique siging secret for them.

This secret is used to sign requests to the customer's configured webhook URL.

Signing Secret Security

The signing secret should never be stored in plaintext, as a leak compromises the authenticity of webhook requests.

If you has any concerns that their signing secret may have leaked, you can request rotation at any time by reaching out to Nightfall Customer Success.

For security purposes, the webhook includes a signature header containing an HMAC-SHA256 digital signature that customers may use to authenticate the client.

In order to authenticate requests to the webhook URL, customers may use the following algorithm:

  1. Check for the presence of the headers X-Nightfall-Signature and X-Nightfall-Timestamp. If these headers are not both present, discard the request.

  2. Read the entire request body into a string body.

  3. Verify that the value in the X-Nightfall-Timestamp header (the POSIX time in seconds) occurred recently. This is to protect against replay attacks, so a threshold on the order of magnitude of minutes should be reasonable. If a request occurred too far in the past, it should be discarded.

  4. Concatenate the timestamp and body with a colon delimiter, i.e. timestamp:body.

  5. Compute the HMAC SHA-256 hash of the payload from the previous step, using your unique signing secret as the key. Encode this computed value in hex.

  6. Compare the value of the X-Nightfall-Signature header to the value computed in the previous step. If the values match, authentication is successful, and processing should proceed. Otherwise, the request must be discarded.

The snippet below shows how you might implement this authentication validation in Python:

from datetime import datetime, timedelta
    import hmac
    import hashlib

    from flask import request

    SIGNING_SECRET = "super-secret"

    given_signature = request.headers.get('X-Nightfall-Signature')
    req_timestamp = request.headers.get('X-Nightfall-Timestamp')
    now = datetime.now()
    if now-timedelta(minutes=5) <= datetime.fromtimestamp(int(req_timestamp)) <= now:
        raise Exception("could not validate timestamp is within the last few minutes")
    computed_signature = hmac.new(
        SIGNING_SECRET.encode(),
        msg=F"{req_timestamp}:{request.get_data(as_text=True)}".encode(),
        digestmod=hashlib.sha256
    ).hexdigest().lower()
    if computed_signature != given_signature:
        raise Exception("could not validate signature of inbound request!")

Example Webhook Server

An example implementation of a simple webhook server is below.

import hmac
import hashlib
from os import getenv, path, mkdir

from flask import Flask, request
import requests

app = Flask(__name__)

output_dir = "findings"

SIGNING_SECRET = getenv("NF_SIGNING_SECRET")


@app.route("/", methods=['POST'])
def hello():
    content = request.get_json(silent=True)
    challenge = content.get("challenge")
    if challenge:
        return challenge
    else:
        verify_signature()

        print(F"Received request metadata: {content['requestMetadata']}")
        print(F"Received errors: {content['errors']}")

        if not content["findingsPresent"]:
            print(F"No findings for {content['uploadID']}")
            return "", 200
        print(F"S3 findings valid until {content['validUntil']}")
        response = requests.get(content["findingsURL"])
        save_findings(content["uploadID"], response.text)
        return "", 200


def verify_signature():
    if SIGNING_SECRET is None:
        return
    given_signature = request.headers.get('X-Nightfall-Signature')
    nonce = request.headers.get('X-Nightfall-Timestamp')
    computed_signature = hmac.new(
        SIGNING_SECRET.encode(),
        msg=F"{nonce}:{request.get_data(as_text=True)}".encode(),
        digestmod=hashlib.sha256
    ).hexdigest().lower()
    if computed_signature != given_signature:
        raise Exception("could not validate signature of inbound request!")


def save_findings(scan_id, finding_json):
    if not path.isdir(output_dir):
        mkdir(output_dir)
    output_path = path.join(output_dir, f"{scan_id}.json")
    with open(output_path, "w+") as out_file:
        out_file.write(finding_json)
    print(F"Findings for {scan_id} written to {output_path}")


if __name__ == "__main__":
    app.run(port=8075)

You can test your webhook with a tool such as ngrok which allows you expose a web server running on your local machine to the internet.

In the above example, the webhook server is running on port 8075. To route ngrok requests to this server, once you run the python script (having installed the necessary dependencies such getenv and Flask), you would run ngrok as follow:

./ngrok http 8075

See the section on Alerting for details about the json payloads for the different messages sent to webhook servers.

Accessing Your Webhook Signing Key

In order to accept requests from Nightfall, a Webhook server must use a signing key to verify requests.

To access or generate your Webhook signing key, start by logging in to the Nightfall .

Select the Developer Platform > Manage API Keys using the navigation bar on the left side of the page. You will see the Webhook signing section:

Unlike the API Key, it is possible to reveal the signature via the "eye" icon furtherest to the left of the three icons displayed.

You may copy the current value to your clipboard with the "copy" icon in the center of the three icons displayed.

You may also regenerate the key with the circular arrow icon furthest to the right.

Use this value as shown in the code examples that are used in the following sections.

dashboard

File Scanning and Webhooks

As part of submitting a file scan request, the request payload must contain a reference to a webhook server URL defined as part of a policy defined inline.

When Nightfall prepares a file scan operation, it will issue a challenge to the webhook server to verify its legitimacy.

After the file scan has been processed asynchronously, the results will be delivered to the webhook.

Webhook Payload and Findings for File Scans

For a file scan, your webhook will receive a request body that will be a JSON payload containing:

  • the upload UUID (uploadID)

  • a boolean indicating whether or not any data in the file matched the provided detection rules (findingsPresent)

  • a pre-signed S3 URL where the caller may fetch the findings for the scan (findingsURL). if there are no findings in the file, this field will be empty.

  • the date until which the findingsURL is valid (validUntil) formatted to RFC 3339. Results are valid for 24 hours after scan completion. The time will be in UTC.

  • the value you supplied for requestMetadata. Callers may opt to use this to help identify their input file upon receiving a webhook response. Maximum length 10 KB.

Below is an example of a payload sent to the webhook URL.

{
    "findingsURL": "https://files.nightfall.ai/asdfasdf-asdf-asdf-asdf-asdfasdfasdf.json?Expires=1635135397&Signature=asdfasdfQ2qTmPFnS9uD5I3QGEqHY2KlsYv4S-WOeEEROj~~x6W2slP2GvPPgPlYs~lwdr-mtJjVFu4LtyDhdfYezC7B0ysfJytyMIyAFriVMqOGsRJXqoQfsg8Ckd2b6kRcyDZXJE25cW8zBS08lyVwMBCsGS0BKSin8uSuD7pQu3QAubT7p~MPkfc6PSXYIJREBr3q4-8c7UnrYOAiXfSW1AmFE47rr3Wxh2TpU3E-Fxu-6e3DKN4q6meACdgZb2KHZo3e-NK7ug9f8sxBp1YT0n5oiVuW4KXguIyXWN~aKEHMa6DzZ4cUJ61LmnMzGndc2sVKhii39FHwTsYog__&Key-Pair-Id=asdfOPZ1EKX0YC",
    "validUntil": "2021-10-25T04:16:37.734633129Z",
    "uploadID": "152848af-2ac9-4e0a-8563-2b82343d964a",
    "findingsPresent": true,
    "requestMetadata": "",
    "errors": []
}

If you follow the URL (before it expires) it will return a JSON representation of the findings similar to those returned by the Scan Plain Text endpoint.

In this example, we have uploaded a zip file with a python script (upload.py) and a README.md file. A Detector in our DetectionRule checks for the presence of the string http://localhost

{
   "findings":[
      {
         "path":"fileupload/upload.py",
         "detector":{
            "id":"58861dee-b213-4dbc-97fa-a148acb8bd1a",
            "name":"localhost url"
         },
         "finding":"http://localhost",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":105,
               "end":121
            },
            "codepointRange":{
               "start":105,
               "end":121
            },
            "lineRange":{
               "start":7,
               "end":7
            }
         },
         "beforeContext":"PLOAD_URL = getenv(\"FILE_UPLOAD_HOST\", \"",
         "afterContext":":8080/v3\")\nNF_API_KEY = getenv(\"NF_API_K",
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
      {
         "path":"fileupload/README.md",
         "detector":{
            "id":"58861dee-b213-4dbc-97fa-a148acb8bd1a",
            "name":"localhost url"
         },
         "finding":"http://localhost",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":570,
               "end":586
            },
            "codepointRange":{
               "start":570,
               "end":586
            },
            "lineRange":{
               "start":22,
               "end":22
            }
         },
         "beforeContext":"t the script will send the requests to `",
         "afterContext":":8080`, but this can be overridden using",
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
      {
         "path":"fileupload/README.md",
         "detector":{
            "id":"58861dee-b213-4dbc-97fa-a148acb8bd1a",
            "name":"localhost url"
         },
         "finding":"http://localhost",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":965,
               "end":981
            },
            "codepointRange":{
               "start":965,
               "end":981
            },
            "lineRange":{
               "start":26,
               "end":26
            }
         },
         "beforeContext":"ice deployment you want to connect to | ",
         "afterContext":":8080 |\n| `NF_API_KEY`      | the API Ke",
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      }
   ]
}

Webhooks and Asynchronous Notifications

The Nightfall API supports the ability to send asynchronous notifications when findings are detected as part of a scan request.

The supported destinations for these notifications include external platforms, such as Slack, email, or url to a SIEM log collector as well as to a webhook server.

Nightfall issues notifications under the following scenarios:

  • to notify a client about the results of a file scan request. File scans themselves are always performed asynchronously because of complexity relating to text extraction and data volume.

  • to notify a client about results from a text scan request. Although results are already delivered synchronously in the response object, clients may configure the request to forward results to other platforms such a webhook, SIEM endpoint, or email through a policy.

To create a webhook you will need to access your webhook Signing Key and then set up a create a webhook server.

For more information on how webhooks and asynchronous notifications are used please see our guides on:

  • Alerting

  • Using Policies

  • File Scanning and Webhooks

Specialized File Detectors

Nightfall supports Detectors that will scan for file names, file types, and file finger prints.

Detecting File Names

In addition to scanning the content of files, you may configure the Detectors to scan file names as well.

This is done through the “scope” attribute of a Detector.

The scope attribute allows you to scan either within file contents, the file name, or both the file contents and file name.

File extensions can be scanned for by creating a Regular Expression type custom Detector with a scope to scan only file names ("File") or both the content and file name ("ContentAndFile"), as shown in the example request below.

curl --request POST \
     --url https://api.nightfall.ai/v3/upload/<fileid>/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer  NF-<yourNightfallKey> \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "regex": {
                                   "pattern": "*\.txt",
                                   "isCaseSensitive": false
                              },
                              "detectorType": "REGEX",
                              "scope": "ContentAndFile"
                         }
                    ],
                    "name": "File Name Detector",
                    "logicalOp": "ANY"
               }
          ]
     }
}

In addition to scanning based on file name, you may also use a File Type Detector which allows you to scan for files based on their mime-type.

Note that confidence sensitivity does not apply to file names. Sensitive findings will always be reported on.

Detecting File Types

Nightfall’s File Type detection allows you to implement compliance policies that detect and alert you when particular file types that are not allowed in a given location are discovered.

This functionality is implemented by creating a specific Detector called a “File Type Detector”

To create a File Type Detector, select “Detectors” from the left hand navigation and click the button labeled “+New Detector” in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the “File Type” Detector type.

You will then select one or more file types for which to scan by selecting from a list of mime-types

You can either scroll through the list of mime-types in the select box or you may type in a portion of the mime-type and the contents of the select box will be filtered to match your input.

Nightfall supports detection for a wide variety of mime-types. See the Internet Assigned Numbers Authority’s (IANA) website for a definitive list of mime-types. Note however that Nightfall does not support the detection of audio and video related mime-types.

Detection of file types is done based on the file contents, not its extension. However, you can create Detectors that scan file names by setting the scope attribute.

File Type Detectors vary from other Nightfall Detectors in that the attributes of scope and confidence are not relevant to File Type Detectors

Once you have added all the mime-types you wish to scan for, save your new Detector. You may then add your new Detector to Detection Rules and Policies.

Detecting Files Through Fingerprinting

Nightfall allows you to discover the location of specific files that you have deemed sensitive and want to avoid sharing.

This discovery is done through document fingerprinting. Fingerprinting is the process of algorithmically creating a unique identifier for a file by mapping the data of the document to a signature that can be recalled quickly. This allows the file to be identified in a manner akin to how human fingerprints uniquely identify individual people.

This functionality is achieved in Nightfall by creating a specific Detector type called a File Fingerprint Detector.

The Fingerprint Detector allows you to create a fingerprint for one more files (a sort “handful” of fingerprints, if you would).

To create a Fingerprint Detector, select “Detectors” from the left hand navigation and click the button labeled “+New Detector” in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the “Fingerprint” Detector type.

When you create a File Fingerprint Detector you can upload up to 50 files that need to be fingerprinted. The file size limit is 25MB.

Once the fingerprint is generated, the actual content of the file is discarded so no sensitive content is stored on Nightfall’s system.

These Detectors may only be created through the console.

Updates to Fingerprinted Files

You can not update Fingerprint Detectors, so any modification to the original file or underlying requires that you create a brand new Fingerprint Detector.

You may then treat the Fingerprint detector like any other Detector and incorporate it into a Detection Rule using its unique Detector identifier.

You may incorporate these Detectors into Policies that will alert you whenever files that match the fingerprint are detected.

Uploading and Scanning API Calls

Nightfall's upload process is built to accommodate files of any size. Once files are uploaded, they may be scanned with Detection Rules and Policies to detect potential violations.

Many users will find it more convenient to use our our native language SDKs to complete the upload process.

Uploading files using Client SDK libraries requires fewer steps as all the required API operations are wrapped in a single function call. Furthermore these SDKs handle all the programmatic logic necessary to send files in smaller chunks to Nightfall.

For users that are looking to understand the entire upload process end-to-end, that is also outlined in this document. We will walk you through the order of operations necessary to upload the file.

Using Nightfall's SDKs to Upload Files

Rather than implementing the full sequence of API calls for the upload functionality yourself, the Nightfall’s native language SDKs provide a single method that wraps the steps required to upload your file.

Below is an example of uploading a file from our Python SDK and our Node SDK.

>>> from nightfall import Confidence, DetectionRule, Detector, Nightfall, EmailAlert, AlertConfig
>>> import os

>>> # use your API Key here
>>> nightfall = Nightfall("NF-y0uRaPiK3yG03sH3r3")

>>> # A rule contains a set of detectors to scan with
>>> cc = Detector(min_confidence=Confidence.LIKELY, nightfall_detector="CREDIT_CARD_NUMBER")
>>> ssn = Detector(min_confidence=Confidence.POSSIBLE, nightfall_detector="US_SOCIAL_SECURITY_NUMBER")
>>> detection_rule = DetectionRule([cc, ssn])
>>> # The scanning is done asynchronously, so provide a valid email address as the simplest way of getting results
>>> alertconfig = alert_config=AlertConfig(email=EmailAlert("[email protected]"))
    

>>> # Upload the file and start the scan.
>>> id, message = nightfall.scan_file( "./README.md", detection_rules=[detection_rule], alert_config=alertconfig)
>>> print("started scan", id, message)
//this script assumes the node sdk has been installed locally with `npm install` and `npm run build`
import { Nightfall } from "./nightfall-nodejs-sdk/dist/nightfall.js";
import { Detector } from "./nightfall-nodejs-sdk/dist/types/detectors.js";


// By default, the client reads your API key from the environment variable NIGHTFALL_API_KEY
const uploadit = async() => {
    var data = null;
    
    const nfClient = new Nightfall();
    	
    try{
   
		const response = await nfClient.scanFile('./README.md', {
		  detectionRules: [
			{
			  name: 'Secrets Scanner',
			  logicalOp: 'ANY',
			  detectors: [
				{
				  minNumFindings: 1,
				  minConfidence: Detector.Confidence.Possible,
				  displayName: 'Credit Card Number',
				  detectorType: Detector.Type.Nightfall,
				  nightfallDetector: 'CREDIT_CARD_NUMBER',
				},
			  ],
			},
		  ],
		  alertConfig: {
				email: {
						address: "[email protected]"
					}
		   }
		});

		if (response.isError) {
		  data = response.getError();
		}
		else{ 
			data = (response.data.id);
		}
	 
    }
	catch(e){
		console.log(e);
	}


	return data;

}

uploadit().then(data => console.log(data));

To run the node sample script you must compile it as TypesScript. Save it as a .ts file and run

tsc <yourfilename>.ts -lib ES2015,DOM

You can then run the resulting JavaScript file:

NIGHTFALL_API_KEY=<YourApiKey> node yourscriptname.js

Note that these examples use an email address to receive the results for simplicity.

You may also want to use a webhook. See Webhooks and Asynchronous Notifications for additional information on how to set up Webhook server to receive these results.

The Upload Process

The upload process consists of 3 stages:

  • Initializing

  • Uploading

  • Completing

Once the upload is complete, you may initiate the file scan.

After we discuss each API call in the sequence, you will find a script that walks through the full sequence at the end of this guide.

Initializing Phase

POST /v3/upload

The first step in the process of scanning a binary file is to initiate an upload in order to get a fileId through the Initiate a File Upload endpoint.

As part of the initialization you must provide the total byte size of the file being uploaded.

You may also provide the mime-type, otherwise the system will attempt to determine it once the upload is complete.

curl --location --request POST 'https://api.nightfall.ai/v3/upload' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--data-raw '{
    "fileSizeBytes": 73891,
    "mimeType" : "image/png"
}'

The id of the returned JSON object will be used as the fileId in subsequent requests.

The chunkSize is the maximum number of bytes to upload during the uploading phase.

{
    "id": "f9dbdb15-c9fa-46ff-86ec-cd5c09aa550d",
    "fileSizeBytes": 73891,
    "chunkSize": 10485760,
    "mimeType": "image/png"
}

Uploading Phase

PATCH /v3/upload/<uploadUUID>

Use the Upload a Chunk of a File endpoint to upload the file contents in chunks.

The size of these chunks are determined by the chunkSize value returned by POST /upload endpoint used in the previous step.

Below is a simple example where the file is less than the chunkSize so may safely be uploaded with one call to the upload endpoint.

curl --location --request PATCH 'https://api.nightfall.ai/v3/upload/f9dbdb15-c9fa-46ff-86ec-cd5c09aa550d' \
--header 'X-Upload-Offset: 0' \
--header 'Content-Type: application/octet-stream' \
--header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--data-binary '@/Users/myname/Documents/work/Nightfall/Nightfall Upload Sequence.png'

If your file's size exceeds the chunkSize, to upload the complete file you will need to send iterative requests as you read portions of the file's contents. This means you will send multiple requests to the upload endpoint as shown above. As you do so, you will be updating the value of the X-Upload-Offset header based on the portion of the file being sent.

Each request should send a chunk of the file exactly chunkSize bytes long except for the final uploaded chunk. The final uploaded chunk is allowed to contain fewer bytes as the remainder of the file may be less than the chunkSize returned by the initialization step.

The request body should be the contents of the chunk being uploaded.

The value of the X-UPLOAD-OFFSET header should be the byte offset specifying where to insert the data into the file as an integer. This byte offset is zero-indexed.

Successful calls to this endpoint return an empty response with an HTTP status code of 204

See the full example script below for an illustration as to how this upload process can be done programmatically.

Completion Phase

POST /v3/upload/<uploadUUID>/finish

Once all chunks are uploaded, mark the upload as completed using the Complete a File Upload endpoint.

curl --location --request POST 'https://api.nightfall.ai/v3/upload/f9dbdb15-c9fa-46ff-86ec-cd5c09aa550d/finish' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer  NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--data-raw '""'

When an upload completes successfully, the returned payload will indicate the mimeType the system determined to file to be if it was not provided during upload initialization.

{
    "id": "152848af-2ac9-4e0a-8563-2b82343d964a",
    "fileSizeBytes": 2349,
    "chunkSize": 10485760,
    "mimeType": "application/zip"
}

Once a file has been marked as completed, you may initiate a scan of the uploaded file.

Scanning Uploaded Files

After an upload is finalized, it can be scanned against a Detection Policy. A Detection Policy represents a pairing of:

  • a webhook URL

  • a set of detection rules to scan data against

The scanning process is asynchronous, with results being delivered to the webhook URL configured on the detection policy. See Webhooks and Asynchronous Notifications for more information about creating a Webhook server.

Exactly one policy should be provided in the request body, which includes a webhookURL to which the callback will be made once the file scan has been completed (this must be an HTTPS URL) as well as a Detection Rule as either an a list of UUIDs or as a rule that has been defined in-line.

You may also supply a value to the requestMetadata field to help identify the input file upon receiving a response to your webhook. This field has a maximum length 10 KB.

curl --request POST \
     --url https://api.nightfall.ai/v3/upload/f9dbdb15-c9fa-46ff-86ec-cd5c09aa550d/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRuleUUIDs": [
               "950833c9-8608-4c66-8a3a-0734eac11157"
          ],
          "webhookURL": "https://mycompany.org/webhookservice"
     },
     "requestMetadata": "your file metadata"
}
'

Webhook Verification

Nightfall will verify that the webhook URL is valid before launching its asynchronous scan by issuing a challenge.

Full Upload Process Example Script

Below is a sample Python script that handles the complete sequence of API calls to upload a file using a path specified as an argument.

from os import getenv, path

import fire
import requests


BASE_UPLOAD_URL = getenv("FILE_UPLOAD_HOST", "http://api.nightfall.ai/v3")
NF_API_KEY = getenv("NF_API_KEY")


def upload(filepath, mimetype, policy_uuid):
    """Upload the given file using the provided MIMEType and PolicyUUID.

    Arguments:
        file_path -- an absolute or relative path to the file that will be
            uploaded to the API.
        mimetype -- (optional) The mimetype of the file being uploaded.
        policy_uuid -- The UUID corresponding to an existing policy. This
            policy must be active and have a webhook URL associated with it.
    """
    default_headers = {
        "Authorization": F"Bearer {NF_API_KEY}",
    }

    # =*=*=*=*=* Initiate Upload =*=*=*=*=*=*
    file_size = path.getsize(filepath)
    upload_request_body = {"fileSizeBytes": file_size, "mimeType": mimetype}
    r = requests.post(F"{BASE_UPLOAD_URL}/upload",
                      headers=default_headers,
                      json=upload_request_body)
    upload = r.json()
    if not r.ok:
        raise Exception(F"Unexpected error initializing upload - {upload}")

    # =*=*=*=*=*=* Upload Chunks =*=*=*=*=*=*
    chunk_size = upload["chunkSize"]
    i = 0
    with open(filepath, "rb") as file:
        while file.tell() < file_size:
            upload_chunk_headers = {
                **default_headers,
                "X-UPLOAD-OFFSET": str(file.tell())
            }
            r = requests.patch(F"{BASE_UPLOAD_URL}/upload/{upload['id']}",
                               headers=upload_chunk_headers,
                               data=file.read(chunk_size))
            if not r.ok:
                raise Exception(F"Unexpected error uploading chunk - {r.text}")
            i += 1

    # =*=*=*=*=*=* Finish Upload =*=*=*=*=*=*
    r = requests.post(F"{BASE_UPLOAD_URL}/upload/{upload['id']}/finish",
                      headers=default_headers)
    if not r.ok:
        raise Exception(F"Unexpected error finalizing upload - {r.text}")

    # =*=*=*=*=* Scan Uploaded File =*=*=*=*=*
    r = requests.post(F"{BASE_UPLOAD_URL}/upload/{upload['id']}/scan",
                      json={"policyUUID": policy_uuid},
                      headers=default_headers)
    if not r.ok:
        raise Exception(F"Unexpected error initiating scan - {r.text}")

    print("Scan Initiated Successfully - await response on configured webhook")
    quota_remaining = r.headers.get('X-Quota-Remaining')
    if quota_remaining is not None and int(quota_remaining) <= 0:
        print(F"Scan quota exhausted - Quota will reset on {r.headers['X-Quota-Period-End']}")


if __name__ == "__main__":
    fire.Fire(upload)

Supported File Types

The file scan API has first-class support for text extraction and scanning on all MIME types enumerated below.

Certain file types receive special handling, such as and , that results in more precise information about the location of findings within the source file.

Handling of MIME Types Not Listed

Files with a MIME type not listed below are processed using an unoptimized text extractor. As a result, the quality of the text extraction for unrecognized types may vary.

Accepted Text and Derivatives

  • application/json

  • application/x-ndjson

  • application/x-php

  • text/calendar

  • text/css

  • text/csv (treated as and may be )

  • text/html

  • text/javascript

  • text/plain

  • text/tab-separated-values (treated as )

  • text/tsv (treated as )

  • text/x-php

Accepted Office Formats

  • application/pdf

  • application/vnd.openxmlformats-officedocument.presentationml.presentation

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (treated as )

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document

  • application/vnd.ms-excel (treated as )

Accepted Archive and Compressed File Types

  • application/bzip2

  • application/ear

  • application/gzip

  • application/jar

  • application/java-archive

  • application/tar+gzip

  • application/vnd.android.package-archive

  • application/war

  • application/x-bzip2

  • application/x-gzip

  • application/x-rar-compressed

  • application/x-tar

  • application/x-webarchive

  • application/x-zip-compressed

  • application/x-zip

  • application/zip

Accepted Image File Types

  • image/apng

  • image/avif

  • image/gif

  • image/jpeg

  • image/jpg

  • image/png

  • image/svg+xml

  • image/tiff

  • image/webp

Rejected MIME Types

The file scan API explicitly rejects requests with MIME types that are not conducive to extracting or scanning text. Sample rejected MIME types include:

  • application/photoshop

  • audio/midi

  • audio/wav

  • video/mp4

  • video/quicktime

Spreadsheets and Tabular Data

File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange, codepointRange, and lineRange properties.

Findings will contain a columnRange and a rowRange that will allow you to identify the specific row and column within the tabular data wherein the finding is present.

This functionality is applicable to the following mime types:

  • text/csv

  • text/tab-separated-values

  • text/tsv

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • application/vnd.ms-excel

data files are also accepted.

Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.

Redacting CSV Files

Findings within csv files may be redacted.

To enable redaction in files, set the enableFileRedaction flag of your policy to "true"

The csv file will be redacted based on the configuration of the defaultRedactionConfig of the policy

Below is an example curl request for a csv file that has already been .

When results are sent to the location specified in the alertConfig (in this case an email address) a redactedFile property will be set with a fileURL in addition the findingsURL

This redacted file will be a modified version of the original csv file.

Below is an example of a redacted csv file.

Git Repositories

Nightfall provides special handling for archives of Git repositories.

Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.

In order to scan the repository, you will need to create a clone, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

This creates a clone of the Nightfall go SDK.

You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.

zip -r directory.zip directory

Note that in order to work, the hidden directory .github must be included in the archive.

When you initiate the with this file, you will receive scan results that contain the commitHash property filled in.

Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http:// or https://), which will send results such as the following:

Support for Large Repositories

Currently, processing is limited to repositories with a total number of commits lower than 5000.

Large repositories result in a large volume of data sent at once. We are working on changes to allow these and other large surges of data to be processed in a more controlled manner, and will increase the limit or remove it altogether once those changes are complete.

Sensitive Data in GitHub Repositories

If the finding in a GitHub repository is considered to be sensitive, it should be considered compromised and appropriate mitigation steps (i.e. secrets should be rotated).

To retrieve the specific checkout, you will need to clone the repository, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

You can then checkout the specific commit using the commit hash returned by Nightfall.

Note that you are in a when workin with this sort of check out of a repository.

File Scanning Limitations

  • CSV Files: Only the first 250,000 rows will be scanned.

  • Spreadsheet Files: Up to 100,000 rows per sheet will be scanned, with a maximum of 1 million rows across all tabs in multi-sheet spreadsheets.

  • PDF Files: Scanning is limited to the first 100 pages, including a maximum of 50 images within those pages.

  • Images: Images smaller than 5KB or larger than 50MB will be excluded from scanning.

  • Archive Files: A maximum of 1,000 files will be extracted and scanned. Files larger than 100MB requiring extraction will not be scanned.

{
   "findings":[
      {
         "path":"Sheet1 (5)",
         "detector":{
            "id":"e30d9a87-f6c7-46b9-a8f4-16547901e069",
            "name":"US social security number (SSN)",
            "version":1
         },
         "finding":"624-84-9182",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":2505,
               "end":2516
            },
            "codepointRange":{
               "start":2452,
               "end":2463
            },
            "lineRange":{
               "start":55,
               "end":55
            },
            "rowRange":{
               "start":55,
               "end":55
            },
            "columnRange":{
               "start":2,
               "end":2
            },
            "commitHash":""
         },
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
...
curl --request POST \
     --url https://api.nightfall.ai/v3/upload/02a0c5e1-c950-4e28-a988-f6fffefc4205/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer NF-<Your API Key>' \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRuleUUIDs": [
               "950833c9-8608-4c66-8a3a-0734eac11157"
          ],
          "alertConfig": {
               "email": {
                    "address": "<your email addres>"
               }
          },
          "defaultRedactionConfig": {
               "maskConfig": {
                    "charsToIgnore": [
                         "-",
                         "@"
                    ],
                    "maskingChar": "*"
               }
          },
          "enableFileRedaction": true

     },
     "requestMetadata": "csv redaction test"
}
'
{
   "errors":null,
   "findingsPresent":true,
   "findingsURL":"https://files.nightfall.ai/asdfc5e1-c950-4e28-a988-f6fffefc4205.json?Expires=1655324479&Signature=zjo1nT-PECHC-fiTvAgdA8aDnceoY~6iGfzOBCcBjscKqOHnIar8hoH4gGufffiulBw5BpfJuvWwBW~lXO~ZNhN139LDwoTsfLJswJiQCB2Hj-Az0Em6go~1j8WBqCS8G0Gk17M-zcPedHGX3z~1pw8nm5sh6Pa-jJwfw9NIEiqmBb3Vdcj3J-~Wzag~ENV4499rnG299ee-ig5Ms1oVlzycb4YxzgTMrTL5Q07ozNenwFZcGDNQre1inLXmV-m8teLX-K3boklenp9KXiNDDV0wi74ADN-QfIR1q1oU7mEI1f3aVC3kju0QRErp2lsfs08EtZKLE3C4N17jDJdYcw__&Key-Pair-Id=K24YOPZ1EKX0YC",
   "redactedFile":{
      "fileURL":"https://files.nightfall.ai/asdfc5e1-c950-4e28-a988-f6fffefc4205-redacted.csv?Expires=1655324479&Signature=Hx8kRh88maLeStysy3fsLbFVG9VELEtfemtQe2lWUnFjAMd9HqlEksTmirqAWFWV4zPVUB73izlMj5cSer8v2N5ZCcnD3dz~nnwR4P5LewGJ2CQzGnDnXgh70HW5qp04gnUD-pYWp~bGPVspkJKCkl1zH-EoGonvcNVq3SNsVzOlsVIjep7Y7otQKEEyAZ7JmHiVfuBxrvn8pleuC5lEJ3f9miPyoRqH9DyPlNTJTIuijqe9q32Qcui2RsDR6IT-foFX52dy6rRa01ZV0gZMDWJokMlCr8Iu5An~qnhxC49bqTtI82oz9FcBaP-Yea8cq1TiAfGxX7CJ0~JeTLvr6g__&Key-Pair-Id=K24YOPZ1EKX0YC",
      "validUntil":"2022-06-15T20:21:19.750990823Z"
   },
   "requestMetadata":"csv redaction test",
   "uploadID":"02a0c5e1-c950-4e28-a988-f6fffefc4205",
   "validUntil":"2022-06-15T20:21:19.723045787Z"
}
name,email,phone,alphanumeric
Ulric Burton,*****@*************,*-***-***-****,TEL82EBM1GQ
Wade Jones,******************@***********,(********-****,VVF64PJV2EF
Molly Mccullough,*****************@**********,(********-****,OHO41SFZ2BR
Raja Riggs,************@**********,(********-****,UVD51JTE5NZ
Colin Carter,**********************@*********,(********-****,LNI34LLC5WV// Some code
{
   "findings":[
      {
         "path":"f607a067..53e59684/nightfall.go",
         "detector":{
            "id":"6123060e-2d9f-4f35-a7a1-743379ea5616",
            "name":"URL"
         },
         "finding":"https://api.nightfall.ai/\"",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":142,
               "end":168
            },
            "codepointRange":{
               "start":142,
               "end":168
            },
            "lineRange":{
               "start":16,
               "end":16
            },
            "rowRange":{
               "start":0,
               "end":0
            },
            "columnRange":{
               "start":0,
               "end":0
            },
            "commitHash":"53e59684d9778ceb0f0ed6a4b949c464c24d35ce"
         },
         "beforeContext":"tp\"\n\t\"os\"\n\t\"time\"\n)\n\nconst (\n\tAPIURL = \"",
         "afterContext":"\n\n\tDefaultFileUploadConcurrency = 1\n\tDef",
         "matchedDetectionRuleUUIDs":[
            "cda0367f-aa75-4d6a-904f-0311209b3383"
         ],
         "matchedDetectionRules":[
            
         ]
      },
 ...
cd nightfall-go-sdk
git checkout 53e59684d9778ceb0f0ed6a4b949c464c24d35ce
tabular data
archives of Git repositories
tabular data
redacted
tabular data
tabular data
tabular data
tabular data
Apache parquet
uploaded
file upload sequence
'detached HEAD' state