1 of 96

Developer APIs

Welcome to Developer APIs Documentation

Welcome to the amazing world of the Nightfall Developer APIs (formerly known as Firewall for AI). Here you can find all the information about Nightfall's APIs, and SDKs, and also usage examples of these APIs and SDKs.

Introduction to Developer APIs

Overview

Welcome to Nightfall's Firewall for AI Developers Scan and Workflow APIs documentation. This documentation helps developers leverage Nightfall AI's industry-leading detection engine to identify and protect sensitive customer and corporate data anywhere. It prevents unauthorized access and data breaches and allows you to focus on innovation.

Scan APIs

Scan prompts, text, documents, spreadsheets, logs, zips, JSON, images, etc., for PII, PHI, PCI, banking information, API keys, passwords, and network information with the highest accuracy and lightning-fast response times. Redact sensitive findings with customizable formatting.

Workflow APIs

Leverage the full potential of the Nightfall console application through our Workflow APIs. Customize your SIEM workflows and reporting, take actions, update support tickets, alert users, search violations, annotate findings, create reports, and more.

Key Features

AI-Powered Identification: Utilize advanced AI models to detect and prevent security threats in real-time.
Comprehensive Sensitive Data Detection: Identify PII, PHI, PCI, banking information, API keys, passwords, and network information across various formats including text, documents, spreadsheets, logs, zips, and images.
Customizable Redaction: Tailor data protection to your needs with fully customizable redaction for each sensitive entity type.
Flexible Detectors: Leverage Nightfall’s comprehensive list of machine learning-based detectors, customize them, or create your own with specialized logic.
High Accuracy and Performance: Achieve precision and recall rates of 95% or higher, handle over 1K requests per second, and experience latency of less than 100 ms.
Seamless Integration: Easily integrate with your existing AI development and data engineering tools for smooth and efficient operation.

Customizable and Built-in Machine Learning-based Detectors

You can leverage Nightfall’s machine learning-based detectors or create your own detectors with customized logic to scan third-party apps, internal services, and data silos to identify instances of potentially sensitive types of data such as:

Personally Identifiable Information (PII) including Social Security Numbers, passport numbers, email addresses, or date of birth
Protected Health Information (PHI) such as insurance claim numbers or ICD10 codes
Financial information like credit card numbers or bank routing numbers
Secrets such as API and cryptographic Keys, database connection strings, passwords, etc.
Network information such as IP Address or MAC Address

A Flexible Data Security Solution

Key features of Nightfall’s detection engine include:

Defining minimum confidence thresholds and minimum finding counts on detectors to reduce the chance of false positives.
Specifying and on detectors to fine-tune their accuracy to better suit your use cases.
Choosing which detectors are triggered for each policy.

Using the API

The Nightfall API consumes arbitrary data as input either as or as and allows you to use any combination of detectors to return a collection of “findings" objects.

The detectors may be defined in our and or defined as part of the .

The findings display the relevant detector, the likelihood of a match, and the location within the given data where the matched token occurred (not only in terms bytes — there is support for tabular and JSON data as well).

You can take protective action on sensitive text by , substituting, or encrypting it with the API. You may also set up to receive asynchronous notifications when findings are detected.

The Nightfall API is RESTful and uses JSON for its payloads. Our API is designed to have predictable, resource-oriented URLs for each endpoint and uses to indicate any API errors.

You may test out the API through the

Where to Go From Here

The following guide will walk you through getting started and describe the API functionality in more detail. If you want to execute an API call immediately, see our guide to see how to obtain an API Key and make a simple scan request.

After that, you can learn about Nightfall with our Key Concepts section, which will also help you get set up with Nightfall.

If you’re looking for more ideas about best to leverage Nightfall’s functionality, see our guide.

We have created numerous that demonstrate how to implement DLP for a variety of platforms (including OpenAI, LangChang, Amazon, Datadog, and Elasticsearch) and handle various scenarios (such as detecting sensitive data in GenAI prompts or detecting PII on your machine in real-time).

We also have several language-specific to get you up and running in Java, Python, Go, Node.js, and Ruby.

You can also quickly test out Nightfall detectors or your custom Detection Rules in the . Please also consult our Detector to see the variety of built-in detectors that Nightfall offers.

The page allows you to create API keys and manage Detectors and Detection Rules through a straightforward user interface. Log in here to access the Dashboard, or sign up to create a free account.

For frequently asked questions, feedback, and other help, please contact Nightfall support at . We also host on Wednesdays at 12pm PT to help answer questions, talk through any ideas, and chat about data security. We would love to see you there!

Use Cases

There are many use cases for a high accuracy data classification and protection system like Nightfall. Here are some of the most popular to spark your imagination.

We can't wait to hear more about what you're planning to build: reach out to us anytime at [email protected] to discuss your use case.

Protect sensitive data from transferring to downstream 3rd party services like LLM APIs.

Motivation

Third-party APIs provide services that greatly augment the capabilities of your applications.
- For example, GenAI LLMs can automatically generate content. These LLMs can be accessed via APIs, such as OpenAI or Anthropic APIs.
- Another example are telecom/communications APIs like SendGrid and Twilio that provide communications infrastructure.
The challenge is that these services may unnecessarily receive sensitive or confidential information from your application that is calling these APIs, which can pose data privacy risks because customer data is being shared outside the intended scope. For example, LLMs can handle very large inputs, or prompts, and these prompts may contain sensitive customer information.

Benefits

By filtering out customer data from API inputs, you will be able to leverage cutting-edge third-party services and APIs without introducing data privacy risks by oversharing sensitive or confidential information.

Sanitize user input to prevent unnecessary collection or proliferation of sensitive customer data.

Motivation

Applications collect and store sensitive information from consumers. Users may “overshare” or incorrectly input information, leading to sensitive data ending up in places it is not expected, or internal services may proliferate or handle this data in unexpected ways.
- Fintech applications that intake, store, and generate files with PII like W-2s and paystubs.
- Healthcare applications that handle protected health information or SSNs.
Marketplaces and social media applications allow for user generated content that may contain sensitive or illicit information, such as profanity, toxicity.
Support channels receive any inbound information from consumers, and can include highly sensitive information or over-sharing that is then exposed to support agents.
This data can come in a variety of unstructured formats - whether that be screenshots, images, documents, plaintext, compressed folders or archives, so to inspect this content requires high quality text extraction.

Benefits

Reduce the possibility of users inputting sensitive data that should not be collected or retained within your application or service by scanning data upon submission. Warn or prevent users from inputting sensitive data into form fields or file uploads.
Diminish collection of sensitive data types that could result in regulatory fines or brand damage, if leaked or breached.
Limit exposure of sensitive data to internal personnel like support agents that could lead to accidental misuse or intentional theft.

Audit and remove sensitive data in data silos and processing workflows for compliance.

Motivation

Compliance regimes like FedRAMP, PCI, and HIPAA may require that sensitive data is not proliferating into unsanctioned data silos, like project management systems, data warehouses, and logging infrastructure.
Many different development teams may be writing data into these internal services like logging and data warehousing, so it is challenging to enforce data sanitization on data ingress.
CDP tools like Segment and Fivetran can further proliferate sensitive data into a broader set of data silos than its original location.
Data analytics and data science teams may replicate and transform data, leading to further copies and versions across internal systems.
Edge cases, unexpected errors, and stack traces can lead to sensitive data landing or replicating in application logs.

Benefits

Identify and remove sensitive data from places that it shouldn’t be.
Monitor data at rest in data silos instead of at points of ingress/egress that would be hard to monitor or track.
Scan extremely high volumes of unstructured data at scale.
Build workflows to delete data, redact data, or alert the right teams when sensitive data is found where it shouldn’t be.

Build data classification and DLP features directly into your SaaS application.

Motivation

Data classification and DLP capabilities are increasingly expected by regulated institutions such as big banks.
Building data classification and DLP from scratch is complex and has high opportunity costs in moving developers away from working on the core product offering. Building a half-baked solution erodes customer trust, especially when there is already a high degree of skepticism around the quality of traditional DLP solutions.
SaaS and security vendors can deliver additional customer value and drive additional revenue through premium enterprise feature tiers that include security features like DLP, SAML SSO, audit logging, and more.

Benefits

Reduce time-to-market by leveraging out of the box components.
Reduce the overhead of an in-house data classification service that requires text extraction services, detector research and tuning, machine learning model development and deployment, maintenance & support.
Deliver best in class accuracy, reducing the risk of alert fatigue or missing sensitive data that erodes customer trust.

Centralize detection logic, custom detectors & regexes all in one place instead of embedded directly in code, and reduce the number of regexes required.

Motivation

Detecting a single type of sensitive data well (e.g. a credit card number) can be complex - requiring research and maintenance as the detector evolves over time. This becomes especially challenging for esoteric detectors, for example those that are region or industry-specific.
Managing regexes and input validation is complex and evolving. For example, a regex embedded in code to validate a Google Docs link may need to be updated over time as the format for Google Docs links changes, false positives are identified and accounted for, any performance implications are observed.
Many data types cannot be detected accurately with a regex because they require a certain level of validation, are heavily context dependent, or are highly variable or entropic in nature leading to a regex being overly sensitive or overly specific.

Benefits

Leverage out of the box detectors so no engineering time is spent on research, training, tuning detectors. No need to reinvent the wheel. These detectors span the categories of PII, PCI, PHI, credentials & secrets, ID numbers, and more.
Reduce time spent finding, tuning, and sharing regular expressions.
Build upon out of the box detectors with custom logic, instead of having to start from scratch with a regex or custom validation logic.

Improve accuracy of existing content inspection systems.

Motivations

Existing content inspection systems may yield a high degree of false positives (i.e. noise), leading to alert fatigue and significant time wasted on inaccurate alerts.
On the contrary, existing solutions may also be very limited in detection scope, leading to a high degree of false negatives (i.e. misses), putting the business at risk when sensitive data is missed.

Benefits

Replace existing, brittle solutions with a highly accurate content inspection system.
Reduce engineering time spent analyzing false positives and attempting to tune them out.

Sanitize inputs to labeled data used to train machine learning models.

Motivation

In training complex learning models, data scientists must compile and use large corpuses of data to improve the accuracy of the trained model. Unknowingly leveraging sensitive data in this effort can lead to violations of compliance regimes like HIPAA, GDPR, or PCI.
Models that focus on health, finance, public sector applications are particularly at risk for ingesting sensitive data that may violate industry specific compliance mandates.
Labeled data is often ingested from unregulated sources like customer communications, emails, public repos, and more. Inspecting all of these input sources manually is untenable.
Additionally, the data being leveraged may be in a variety of unstructured formats like screenshots, images, documents, plaintext, compressed folders or archives – to inspect this content requires high quality text extraction.

Benefits

Ensure the hygiene of the labeled data you are using to train your machine learning models
Diminish collection of sensitive data types that could result in regulatory fines or brand damage, if leaked or breached.

Example use cases by team and industry.

Healthcare: Detect PHI to ensure HIPAA compliance in your apps
Financial services: Secure PII and PCI like bank account numbers, payment card details, and social security numbers
E-commerce: Prevent costly data breaches of PII and PCI that can damage brand reputation
Education: Protect student and faculty privacy within applications
Customer support: Redact sensitive data in customer support system, shielding agents from information they shouldn’t see
IT Operations: Search for API keys, credentials, and secrets across internal and external data silos
Product: Create custom solutions for data classification, DLP, content moderation and more within your applications
Compliance: Address PCI-DSS, HIPAA, FedRAMP, GDPR, CCPA, GLBA, FERPA, PHIPA, and more
People & Community: Content moderation to detect profanity, toxicity
Gaming: Detecting profanity, toxicity, or even personal or financial information being shared in community chat rooms

Authentication and Security

The Nightfall API uses API keys to authenticate requests. You can create and view your API keys in the Nightfall app on the Manage API Keys page.

Your API keys carry many privileges, so be sure to keep them secure. Do not share your secret API keys in publicly accessible areas such as GitHub, client-side code, or anywhere else that would compromise their secrecy. If you believe one of your API Keys has been compromised, you should delete it through the Dashboard.

All API requests must be made over HTTPS.

Calls made over plain HTTP will fail.

API requests without authentication will fail.

Key Concepts

Entities and Terms to Know

This section describes the terms you will need to know when using the API.

Detectors

Detectors provide the logic to find potentially sensitive pieces of data.

When this logic detects such data, the Detector is considered "triggered."

Nightfall's has numerous pre-built Detectors that are trained via machine learning. Detectors may also be defined with regular expressions or dictionaries. Their accuracy may be further refined with exclusion rules and context rules. Whether a Detector is triggered may be controlled by a minimum confidence threshold per Detector and minimum number of findings per Detector as set on a Detection Rule.

The built-in set of Detectors cover a number of different categories of data, including:

Standard PII (e.g. social security number, driver's license number, ID card image)
PCI (Credit Card Number, credit card image)
Healthcare (e.g. PHI, US Medicare Beneficiary Number)
Finance - Banking (e.g. SWIFT code, IBAN code, US bank routing number)
Network (e.g. an IP Address)

The full set is enumerated in the Detector Glossary.

Custom Detectors

Nightfall also supports RE2 regexes and word lists for any custom detectors that you may want to implement.

Over time, we've aggregated the following regex library, which you're welcome to select from to save you some time. Please note that a regular expression is an established yet limited method that searches for pre-defined patterns, so your mileage may vary.

You can test regular expressions here.

You can input custom detectors in two ways: directly in the Nightfall Dashboard by navigating to Detectors → New Detector → Regular expression, or define them inline .

Exclusion Rules

An exclusion rule is a regular expression or word list that will be used once a Detector is triggered by its primary expression or word list to eliminate false positives.

For instance, you may have a Detector designed to detect phone numbers. However, you may have a particular set of phone numbers that you use for testing purposes that are known not to be valid (e.g. they start with the prefix 555) and this should be ignored. Adding an exclusion rule would allow you to prevent those matches from being returned by the API.

See: Using Exclusion Rules

Context Rules

Context Rules are additional matching expressions for a Detector that may be used to adjust the confidence score of a match.

You may provide a regular expression and the number of leading or trailing characters within which a match of that expression must occur in order to adjust the confidence level to a particular level.

For instance, if you found a sequence that appeared to be a social security number based on its length or formatting, you might boost the confidence score if it was preceded by the text like “SSN” or “Social Security Number.”

Returning Surrounding Context

You may request that a sequence of bytes of a given length be provided from before and after the text that triggers a Detection Rule.

This information can help you better understand whether or not something is an actual violation by observing the circumstances within which the detected text was found.

You are limited to a maximum of 40 bytes of this context text preceding and trailing the match for a total of 80 bytes overall.

See: Using Context

Detection Rules

Detection Rules are aggregations of Detectors that are assigned a minimum confidence level. The identifiers of Detection Rules are used as a parameter to the API.

You may create Detection Rules as described in the section Creating Detection Rules and use their identifier as part of API calls to scan content.

Alternatively you may specify Detection Rules programmatically in each API call, as described in the scan method documentation below.

A Detection Rule is composed of a list of Detectors with which you wish to scan each request payload, where any or all Detectors may be satisfied in order to trigger the rule. You can add up to 50 total Detectors with a limit of 30 regular expression type custom detectors.

Additionally, each Detector in the Detection Rule is assigned a “minimum confidence” level (see below and a minimum number of findings to determine if the Detection Rule should be considered triggered.

Confidence Levels

Detection results will be returned with one of the following confidence values.

In practice, the API will only return detections assigned a POSSIBLE or higher confidence level.

VERY_LIKELY (recommended)
LIKELY
POSSIBLE
UNLIKELY
VERY_UNLIKELY

Learn more about what different confidence levels mean and how to choose the right minimum confidence level for your detection rule here.

Policies

Policies allow you to create templates for the most common workflows by unifying a set of Detection Rules with the actions to be taken when those rules are triggered, including:

automated actions such as redaction of findings
alerting through webhooks

Once defined, a Policy may be used in requests to the Nightfall API, such as calls to scan file uploads, though automated redactions are not available for uploaded files at this time.

Setting Up Nightfall

Before you use the scan endpoint, there are a number of actions to do within the Nightfall dashboard to get your environment set up properly.

See Creating an API Key to see how to create the necessary Authentication token for making API calls.
See Creating a Detector for how to define your own custom logic for detecting sensitive data
See Creating Detection Rules for how to aggregate Detectors for use in the scan endpoint
See Creating Policies for how to set up common workflows that combine your Detection Rules with remediation actions such as alerting.

Creating Detection Rules

You can define Detection Rules “inline” in the body of each request to the scan endpoint. See the example in the walk through of the scan endpoint Creating an Inline Detection Rule.

You can also use the > to predefine your Detection Rules. Once you have created a Detection Rule, you will receive a UUID, which you can pass in as part of your API request payloads.

You may add up to 50 detectors to your detection rule.

To create a Detection Rule in the Nightfall UI, Select "Detection Rules" from the left hand navigation.

Click the + New Detection Rule button in the upper right hand corner.

First, enter a name for your Detection Rule as well as an optional description.

Then click the + Detectors button to add Detectors to your Detection Rule.

In this example we have selected the US drivers license and Canada Government ID detectors.

Click the Add button in the lower right hand corner at the end of the detector list when you are done adding detectors.

Now that your Detectors are set, choose a minimum confidence level and a minimum # of findings for each detector.

If these minimums for a Detector are not met, the Detection Rule will not be triggered.

Save your Detection Rule in the lower left hand corner once you are done.

Once the Detection Rule is saved, it is available for use in requests to the Nightfall API to scan your data for sensitive information. Pass in the UUID of the Detection Rule as the detectionRuleUUIDs field of your requests to the the scan endpoints.

The UUID may be obtained by clicking the "copy" icon, the left most icon in the set of icons that appear next to the Detection Rules name when your cursor highlights a Detection Rule in the list of Detection Rules.

See Using Pre-Configured Detection Rules for an example of using a Detection Rule UUID.

Creating Policies

This document applies only to the Nightfall Firewall for AI customers. If you are a Nightfall SaaS application customer, refer to this document.

Policies allow customers to create templates for their most common workflows by unifying a set of Detection Rules with the actions to be taken when those rules are triggered, including:

automated actions such as redaction of findings
alerting through webhooks

Once defined, a Policy may be used in requests to the Nightfall API, such as calls to scan file uploads, though automated redactions are not available for uploaded files at this time.

To create a policy:

Log in to Nightfall.
Click Overview under the Firewall for AI section.

Click Create Policy.

The policy creation page is displayed as follows.

If you click the Policies button under the Setting Up section, you need to execute a couple of additional steps to reach the policy creation page, as displayed in the following image.

Enter a name for the policy.
(Optional) Enter a Description for the policy.
Click + Detection rule to add a Detector rule to the policy.
Select the check box of the Detector rules that you wish you add to the Policy.

Select the Redact Violations check box to mask sensitive information found in your transmitted data.
Select one of the alerting method available.
- Click + Application Webhook to add the URL of a webhook that needs to be notified. See Configuring Webhook Alerts to learn more.
- Click + HTTP Alerts to configure a website as alert notification channel.
- Click + Email to notify recipients through an Email.
- Click + Slack to select a Slack channel to which the alerts must be sent.
Click Save Policy.

Configuring Webhook Alerts

When you click + Application Webhook, the following window is displayed.

If you have custom headers you would like to add to requests sent to the Webhook URL, you can do this from the overlay that appears when you click the "+ Webhook" button on the policy creation and edit page. These headers may be used for the purpose of authentication as well as integrating with Security Incidents and Event Management (SIEMs) or similar tools that aggregate content through HTTP event collection.

Click the "Add Header" button to add your custom headers.

Once your header key and value is entered you may obfuscate it by clicking on the "lock" icon next to the value field for the header. Click the "Save" button to persist your changes to the headers.

When you have completed configuring your Webhook URL and Headers, click the "Save" button.

🚧Limits On Webhook Headers
It is currently not possible to configure headers for webhooks programmatically when defining policies through the API.

After you click the "Save Policy" button, your policy should be immediately available for use. You can refer to the API Docs for the comprehensive list of endpoints that support policy UUIDs.

Alerting

Nightfall has the ability to send alerts when a violation is detected.

Policies for alerting may be configured through the Nightfall app user interface or they may be set up . Policies that are configured under Developer Platform > Overview > Policies may be used in the API by referencing their Policy UUID.

The way that an alert notification presents itself depends on the platform in question.

For example, notifications sent to Slack will appear as formatted messages sent by the Nightfall Alerts Bot. Other destinations such as email, SIEM url, and webhooks, will present the information as JSON objects.

In the case of webhooks, detailed information about the finding will be sent. For other destinations, sensitive information is redacted.

Supported Alert Platforms

Slack

In order to use asynchronous notifications with Slack, you must install the Nightfall Alerts plugin from the Slack Marketplace.

See our end user documentation on installing for more details.

Once you have authenticated Nightfall to your Slack workspace, you can provide any public channel name (e.g. #general) as part of a request to the Nightfall API.

To send notifications to a private channel, a member of the channel should invite the Nightfall bot to the specific private channel and allow channel access to the bot.

Follow the steps below to invite Nightfall Alerts bot to a private channel:

Go to the Slack channel in question
Type /invite @Nightfall Alerts as a message
Press 'Enter' (you should see a message that Nightfall Alerts has now joined the channel)

If any findings are detected as part of that request, then the Nightfall Alerts bot will send a message to the channel you configured. Conversely, if there are no findings in the request payload, then Nightfall will not send an alert message.

Teams

Documentation TBD

Email

Email is unauthenticated, so you can get started using Nightfall to send email alerts without any initial setup work.

Nightfall will send an email to the provided address only if findings were detected as part of the request. The findings themselves will be attached in a JSON file.

SIEM

You may send your alerts to a designated url, such as an endpoint hosted by SIEM software for log collection.

In addition to the url, you may provide headers, either for security or logging purposes.

See in our end user guide or our for more details.

Webhook

You may use a webhook server to programmatically handle a finding, allowing you to create your own custom workflows with your own or 3rd party systems.

Nightfall will always send an alert to the client's webhook server if it is provided as part of an API request, even if the scan request yielded no findings.

See for more details.

Alert Schemas

The request body sent by Nightfall is JSON, and uses the schemas in the section documented below.

File Scans

Since file scans can produce a large number of results, findings are not transmitted directly in the notification that Nightfall sends. The notification object looks like the following:

The requestMetadata field contains arbitrary contents provided by the client at request time, and can be used by the client to correlate this response to the original request.

The value of the findingsURL field is a pre-signed URL, which means anyone with the link can download the file. Therefore, this URL itself should be treated as sensitive and must not be leaked. The object stored at this URL is a JSON file containing a single key findings containing a list of all data detected from the request. The schema for the finding object inside the list is shared between the text-based and file-based API endpoints.

Text Scans

The payload that is forwarded on behalf of text scanning requests is identical to the response body that is synchronously returned to the client. Refer to the for more details on this payload.

Scanning Text

The scan endpoint allows you to apply Policies and Detection Rules to a list of text strings provided as a payload.

You may use or

Text scanning supports the use of,, and as well as other .

For scanning files, see .

Note that you must generate anto send requests to the Nightfall API.

Scanning Files

Nightfall’s file scan API allows a user to upload a file in chunks, then to scan it with Detection Rules once the upload is complete.

The scan will then be processed asynchronously before sending the results to the webhook URL that is provided along with your Detection Rules.

The following sequence diagram illustrates the full process for scanning a binary file with Nightfall.

For a detailed walkthrough of the API calls necessary to upload and scan a file and full script that shows the entire process, see

Prerequisites

In order to utilize the File Scanning API you need the following:

An active API Key authorized for file scanning passed via the header Authorization: Bearer <key> — see
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (detailed information to follow)

File scanning also support Nightfall's functionality for and as part of your scan requests.

Special File Types

Spreadsheets and Tabular Data

File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange, codepointRange, and lineRange properties.

Findings will contain a columnRange and a rowRange that will allow you to identify the specific row and column within the tabular data wherein the finding is present.

This functionality is applicable to the following mime types:

text/csv
text/tab-separated-values
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel

Apache parquet data files are also accepted.

Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.

{
   "findings":[
      {
         "path":"Sheet1 (5)",
         "detector":{
            "id":"e30d9a87-f6c7-46b9-a8f4-16547901e069",
            "name":"US social security number (SSN)",
            "version":1
         },
         "finding":"624-84-9182",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":2505,
               "end":2516
            },
            "codepointRange":{
               "start":2452,
               "end":2463
            },
            "lineRange":{
               "start":55,
               "end":55
            },
            "rowRange":{
               "start":55,
               "end":55
            },
            "columnRange":{
               "start":2,
               "end":2
            },
            "commitHash":""
         },
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
...

Git Repositories

Nightfall provides special handling for archives of GitHub repositories.

Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.

In order to scan the repository, you will need to create a clone, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

This creates a clone of the Nightfall go SDK.

You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.

zip -r directory.zip directory

Note that in order to work, the hidden directory .github must be included in the archive.

When you initiate the file upload sequence with this file, you will receive scan results that contain the commitHash property filled in.

Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http:// or https://), which will send results such as the following:

{
   "findings":[
      {
         "path":"f607a067..53e59684/nightfall.go",
         "detector":{
            "id":"6123060e-2d9f-4f35-a7a1-743379ea5616",
            "name":"URL"
         },
         "finding":"https://api.nightfall.ai/\"",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":142,
               "end":168
            },
            "codepointRange":{
               "start":142,
               "end":168
            },
            "lineRange":{
               "start":16,
               "end":16
            },
            "rowRange":{
               "start":0,
               "end":0
            },
            "columnRange":{
               "start":0,
               "end":0
            },
            "commitHash":"53e59684d9778ceb0f0ed6a4b949c464c24d35ce"
         },
         "beforeContext":"tp\"\n\t\"os\"\n\t\"time\"\n)\n\nconst (\n\tAPIURL = \"",
         "afterContext":"\n\n\tDefaultFileUploadConcurrency = 1\n\tDef",
         "matchedDetectionRuleUUIDs":[
            "cda0367f-aa75-4d6a-904f-0311209b3383"
         ],
         "matchedDetectionRules":[
            
         ]
      },
 ...

Sensitive Data in GitHub Repositories

If the finding in a GitHub repository is considered to be sensitive, it should be considered compromised and appropriate mitigation steps (i.e. secrets should be rotated).

To retrieve the specific checkout, you will need to clone the repository, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

You can then checkout the specific commit using the commit hash returned by Nightfall.

cd nightfall-go-sdk
git checkout 53e59684d9778ceb0f0ed6a4b949c464c24d35ce

Note that you are in a 'detached HEAD' state when working with this sort of check out of a repository.

Specialized File Detectors

Nightfall supports Detectors that will scan for file names, file types, and file finger prints.

Detecting File Names

In addition to scanning the content of files, you may configure the Detectors to scan file names as well.

This is done through the “scope” attribute of a Detector.

The scope attribute allows you to scan either within file contents, the file name, or both the file contents and file name.

File extensions can be scanned for by creating a Regular Expression type custom Detector with a scope to scan only file names ("File") or both the content and file name ("ContentAndFile"), as shown in the example request below.

curl --request POST \
     --url https://api.nightfall.ai/v3/upload/<fileid>/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer  NF-<yourNightfallKey> \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "regex": {
                                   "pattern": "*\.txt",
                                   "isCaseSensitive": false
                              },
                              "detectorType": "REGEX",
                              "scope": "ContentAndFile"
                         }
                    ],
                    "name": "File Name Detector",
                    "logicalOp": "ANY"
               }
          ]
     }
}

In addition to scanning based on file name, you may also use a File Type Detector which allows you to scan for files based on their mime-type.

Note that confidence sensitivity does not apply to file names. Sensitive findings will always be reported on.

Detecting File Types

Nightfall’s File Type detection allows you to implement compliance policies that detect and alert you when particular file types that are not allowed in a given location are discovered.

This functionality is implemented by creating a specific Detector called a “File Type Detector”

To create a File Type Detector, select “Detectors” from the left hand navigation and click the button labeled “+New Detector” in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the “File Type” Detector type.

You will then select one or more file types for which to scan by selecting from a list of mime-types

You can either scroll through the list of mime-types in the select box or you may type in a portion of the mime-type and the contents of the select box will be filtered to match your input.

Nightfall supports detection for a wide variety of mime-types. See the Internet Assigned Numbers Authority’s (IANA) website for a definitive list of mime-types. Note however that Nightfall does not support the detection of audio and video related mime-types.

Detection of file types is done based on the file contents, not its extension. However, you can create Detectors that scan file names by setting the scope attribute.

File Type Detectors vary from other Nightfall Detectors in that the attributes of scope and confidence are not relevant to File Type Detectors

Once you have added all the mime-types you wish to scan for, save your new Detector. You may then add your new Detector to Detection Rules and Policies.

Detecting Files Through Fingerprinting

Nightfall allows you to discover the location of specific files that you have deemed sensitive and want to avoid sharing.

This discovery is done through document fingerprinting. Fingerprinting is the process of algorithmically creating a unique identifier for a file by mapping the data of the document to a signature that can be recalled quickly. This allows the file to be identified in a manner akin to how human fingerprints uniquely identify individual people.

This functionality is achieved in Nightfall by creating a specific Detector type called a File Fingerprint Detector.

The Fingerprint Detector allows you to create a fingerprint for one more files (a sort “handful” of fingerprints, if you would).

To create a Fingerprint Detector, select “Detectors” from the left hand navigation and click the button labeled “+New Detector” in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the “Fingerprint” Detector type.

When you create a File Fingerprint Detector you can upload up to 50 files that need to be fingerprinted. The file size limit is 25MB.

Once the fingerprint is generated, the actual content of the file is discarded so no sensitive content is stored on Nightfall’s system.

These Detectors may only be created through the console.

Updates to Fingerprinted Files

You can not update Fingerprint Detectors, so any modification to the original file or underlying requires that you create a brand new Fingerprint Detector.

You may then treat the Fingerprint detector like any other Detector and incorporate it into a Detection Rule using its unique Detector identifier.

You may incorporate these Detectors into Policies that will alert you whenever files that match the fingerprint are detected.

Webhooks and Asynchronous Notifications

The Nightfall API supports the ability to send asynchronous notifications when findings are detected as part of a scan request.

The supported destinations for these notifications include external platforms, such as Slack, email, or url to a SIEM log collector as well as to a webhook server.

Nightfall issues notifications under the following scenarios:

to notify a client about the results of a . File scans themselves are always performed asynchronously because of complexity relating to text extraction and data volume.
to notify a client about results from a text scan request. Although results are already delivered synchronously in the response object, clients may configure the request to forward results to other platforms such a webhook, SIEM endpoint, or email through a

To create a webhook you will need to and then set up a

For more information on how webhooks and asynchronous notifications are used please see our guides on:

Accessing Your Webhook Signing Key

In order to accept requests from Nightfall, a Webhook server must use a signing key to verify requests.

To access or generate your Webhook signing key, start by logging in to the Nightfall dashboard.

Select the Developer Platform > Manage API Keys using the navigation bar on the left side of the page. You will see the Webhook signing section:

Unlike the API Key, it is possible to reveal the signature via the "eye" icon furtherest to the left of the three icons displayed.

You may copy the current value to your clipboard with the "copy" icon in the center of the three icons displayed.

You may also regenerate the key with the circular arrow icon furthest to the right.

Use this value as shown in the code examples that are used in the following sections.

Creating a Webhook Server

Learn how to set up a server to handle results of file scans and alerts sent based on policy alert configurations.

Webhook Challenges

Nightfall will send a POST request with a JSON payload with a single field challenge containing randomly-generated bytes when it sends a message to a user-provided webhook address. This is to ensure that the caller owns the server.

In order to authenticate your webhook server to Nightfall, you must reply with (1) a 200 HTTP Status Code, and (2) a plaintext request body containing only the value of the challenge key.

If Nightfall receives the expected value back, then the file scan operation will proceed; otherwise it will be aborted.

When a server responds successfully to a challenge request, the validity of that URL will be cached for up to 24 hours, after which it will need to be validated again.

If the webhook cannot be reached, you will receive an error with the code "40012" and the description "Webhook URL validation failed" when you initiate the scan.

If the webhook challenge fails, you will receive an error with the code "42201" and the description "Webhook returned incorrect challenge response" when you initiate the scan.

Webhook Signature Verification

When a customer signs up for the developer platform, Nightfall automatically generates a unique for them.

This secret is used to sign requests to the customer's configured webhook URL.

Signing Secret Security

The signing secret should never be stored in plaintext, as a leak compromises the authenticity of webhook requests.

If you has any concerns that their signing secret may have leaked, you can request rotation at any time by reaching out to Nightfall Customer Success.

For security purposes, the webhook includes a signature header containing an HMAC-SHA256 digital signature that customers may use to authenticate the client.

In order to authenticate requests to the webhook URL, customers may use the following algorithm:

Check for the presence of the headers X-Nightfall-Signature and X-Nightfall-Timestamp. If these headers are not both present, discard the request.
Read the entire request body into a string body.
Verify that the value in the X-Nightfall-Timestamp header (the POSIX time in seconds) occurred recently. This is to protect against replay attacks, so a threshold on the order of magnitude of minutes should be reasonable. If a request occurred too far in the past, it should be discarded.
Concatenate the timestamp and body with a colon delimiter, i.e. timestamp:body.
Compute the HMAC SHA-256 hash of the payload from the previous step, using your unique signing secret as the key. Encode this computed value in hex.
Compare the value of the X-Nightfall-Signature header to the value computed in the previous step. If the values match, authentication is successful, and processing should proceed. Otherwise, the request must be discarded.

The snippet below shows how you might implement this authentication validation in Python:

Example Webhook Server

An example implementation of a simple webhook server is below.

You can test your webhook with a tool such as which allows you expose a web server running on your local machine to the internet.

In the above example, the webhook server is running on port 8075. To route ngrok requests to this server, once you run the python script (having installed the necessary dependencies such getenv and Flask), you would run ngrok as follow:

./ngrok http 8075

See the section on for details about the json payloads for the different messages sent to webhook servers.

Scanning Features

Nightfall offers many useful features beyond its detectors, including:

The ability to use and to narrow the scope of matches.

The ability to create in a way that is highly configurable so that sensitive data is appropriately obfuscated.

The ability to create that determine how leaks of sensitive information should be mitigated (i.e. through alerts sent to email or Slack).

Scanning Images for patterns using Custom Regex Detectors

Using regex to identify long patterns in images can be challenging because OCR systems. In such cases, even Nightfall may not achieve 100% character-by-character accuracy. To improve results, you must introduce higher levels of flexibility into your regex patterns to accommodate common OCR inconsistencies. Here are some typical OCR challenges to keep in mind:

Spell-check noise: Spell-checking tools can add artifacts like red underlines, which may interfere with text recognition.
Character ambiguity:
- The digit 0 may be misinterpreted as the letter O (or vice versa), depending on the font.
- The character l (lowercase L) may be read as the digit 1.
- The letter B may appear as the digit 8.
Underscore handling: An underscore (_) is sometimes interpreted as a space, particularly when spell-check artifacts are present.
Line wrapping: OCR may introduce unexpected newlines when text wraps across multiple lines.
Periods and punctuation: Spell-check artifacts or font issues may result in extraneous periods (.) or other punctuation being added to the output. En dash (–) and hyphens (-) may be interchanged.

For reference, OCR tools like Tesseract typically achieve 85-98% character accuracy for similar input, and our system operates within a similar range. Given this, tuning your regex to be more forgiving (e.g., allowing for optional characters or slight variations) can significantly improve detection rates.

Example Regex (original and loosened)

original: ATATT3xFfGF0[A-Za-z0-9=_\-]*[=A-Za-z0-9]{9}

loosened: ATATT[A-Za-z0-9_\-– @.\n=]*[A-Za-z0-9_\- @.\n]{7,11}

shortened the literal match prefix
excluded the the literal zero (0) from the prefix
added period (.) and newline () chars
relaxed the char length

Using Exclusion Rules

An Exclusion Rule allows you to refine a Detector to make sure false positives are not surfaced by Nightfall.

For instance you may want to detect whether credit card numbers are being shared inappropriately in your organization. However, there may be cases where members of your QA are sharing test credit card numbers, which should not be considered a violation and should be ignored by Nightfall.

In the following example, we define a Detector with a regular expression to match credit cards.

We then add an exclusion for some known test credit cards.

curl --location --request POST 'https://api.nightfall.ai/v3/scan' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--header 'Content-Type: application/json' \
--data-raw '{
    "policy": {
        "detectionRules": [
            {
                "detectors": [
                    {
                        "regex": {
                            "pattern": "(?:(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(6(?:011|5[0-9]{2})[0-9]{12})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])[0-9]{11})|((?:2131|1800|35[0-9]{3})[0-9]{11}))",
                            "isCaseSensitive": false
                        },
                        "exclusionRules": [
                            {
                                "wordList": {
                                    "values": [
                                        "4111111111111111",
                                        "5105105105105100"
                                    ]
                                },
                                "exclusionType": "WORD_LIST",
                                "matchType": "FULL"
                            }
                        ],
                        "minNumFindings": 1,
                        "minConfidence": "POSSIBLE",
                        "displayName": "Credit Card Reg Ex",
                        "detectorType": "REGEX"
                    }
                ],
                "name": "Credit Card Detection Rule",
                "logicalOp": "ALL"
            }
        ]
    },
    "payload": [
        "5105105105105100",
        "4111111111111111",
        "4012888888881881"
    ]
}'

As the resulting payload shows, only the 3rd provided Credit Card number matches because the first two items in the payload are included in our ExclusionRules word list.

{
   "findings":[
      [
         
      ],
      [
         
      ],
      [
         {
            "finding":"4012888888881881",
            "detector":{
               "name":"Credit Card Reg Ex",
               "uuid":"93024e88-e6de-4c84-8295-75157cdd1b52"
            },
            "confidence":"LIKELY",
            "location":{
               "byteRange":{
                  "start":0,
                  "end":16
               },
               "codepointRange":{
                  "start":0,
                  "end":16
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "Credit Card Detection Rule"
            ]
         }
      ]
   ],
   "redactedPayload":[
      "",
      "",
      ""
   ]
}

Detecting Secrets

Leaked secrets, such as credentials needed to authenticate and authorize a cloud provider’s API request, expose company software, services, infrastructure, and data to hackers.

Nightfall has developed technology to detect secrets and label findings to speed SecOPs workflows from being clogged and eliminate false positive alerts.

Overall Coverage

Nightfall uses machine learning models trained on a large (millions of lines of code) diverse dataset (including all programming languages and application types) to ensure best-in-class secret detection accuracy and coverage.

Explicit Labeling and Endpoint Validation for Popular Services

For a growing set of the most popular services, Nightfall will:

label detected secrets by vendor and service type (returned the kind field of the response)
label detected secrets as active risks by validating supported credential types with their associated service endpoints (returned as the status of the service)

Our current solution supports the following vendors covering a diverse set of use cases, including cloud storage/infrastructure, communication, social networks, software development, banking, observability, and payment processing.

This list is not static and will continue to grow as we add support for detecting API keys from additional services. If you want to detect API keys from a service not listed below, please .

Key Detection Example

Below is an example of how an AWS Key would be shown in a finding.

The following values are returned for the status field:

ACTIVE
EXPIRED
UNVERIFIED

This value will be based on what information is returned by the corresponding service when attempting the validate the key. If no data is returned fro the service, it will be considered UNVERIFIED.

To use this functionality, you use our existing built-in API_KEY detector to scan a data source such as . Below is an example using a detection rule defined in line for a text scan.

PHI Detection Rules

Protected health information (PHI), also referred to as personal health information, describes a patient's medical history — including ailments, various treatments, and outcomes. PHI may include:

demographic information
test and laboratory results
mental health conditions
insurance information

The Health Insurance Portability and Accountability Act (HIPAA) of 1996 is the primary law that oversees the use of, access to, and disclosure of PHI in the United States. HIPAA lists 18 different personal information identifiers (PII) that, when paired with health information, become PHI. In order to more accurately detect potential PHI, Nightfall has introduced specific new detectors that allow for specialized combinations.

These HIPAA PII and PHI-specific detectors intelligently aggregate Nightfall's built-in detector to ensure compliance with governing law. For example, finding a patient's name in a document or message is not considered HIPAA PII as it does not uniquely identify an individual, many people can share the same name. However, the information would be considered HIPAA PII if the patient's name and address were in the same message.

Specific PHI and HIPAA PII can be detected with greater confidence, especially as they relate to specific medical codes or terms in association with specific logical combinations of other PII. For instance when the patient's name and date of birth or a person's name and street address or any of a set of particular PII (phone number email, SSN, etc) it would be considered HIPAA PII.

If the combined detectors all match with a confidence of "Very Likely" it would match our "HIPAA PII Very Likely" Detection Rule. Otherwise if these detectors match with a confidence of "Likely" it would match our "HIPAA PII Likely" Detection Rule.

Alternatively when any of the above PII options are found in conjunction with a specific set of medical related codes or terms (IDC Codes, FDA Drug Names or Codes, Procedures), that finding could be flagged as PHI.

When all the detectors within these PHI Detection Rules make findings that have a confidence of "Very Likely," that would match our "PHI Very Likely" Detection Rule, while if some are all are met with a confidence of "Likely" that would match our "PHI Likely" Detection Rule.

Our PHI Detectors may be used just like other Detectors with or .

Test Datasets

The following sample datasets can be used to test Nightfall's advanced AI-based detection capabilities.

This data has been fully de-identified and can be used to test any data loss prevention (DLP) platform.

Errors

While using Nightfall's Scan API, you may encounter some of the common errors outlined below. Try following the provided troubleshooting steps.

If problems persist, please contact Nightfall Support for further assistance.

HTTP Error Codes

The following error codes are returned as part of a standard HTTP response.

HTTP Error Code

Description

Troubleshooting

400

Bad Request

This error most often occurs when there is something syntactically incorrect in the body of your request. Check your request format and try again. For example, this error could occur if the request body size is greater than 500 KB, or if the number of items to scan in the payload exceeds 50,000.

401

Unauthorized

You may be using an incorrect API key or calling the wrong endpoint.

422

Unprocessable Entity

You may be using an invalid or unrecognized detector set. You may also have exceeded the maximum allowable payload size; try spreading your payload across multiple requests.

429

Too Many Requests or Quota Exceeded

Either your monthly request limit has been exceeded, or you have exceeded the allowed rate limit. Consider upgrading to a higher volume plan, or wait several moments to retry the requests.

500

Internal Server Error

Wait a few moments and try again. If the problem persists, Nightfall may be experiencing an outage.

Nightfall Playground

The Nightfall Developer Playground () is a sample app that you may use to test out API functionality before writing any code.

Our playground environment allows you to:

Test Detectors and Detection Rules. Here are some .
Generate sample data for DLP testing.
Explore a sample app built on our APIs

Nightfall APIs

DLP APIs - Firewall for AI Platform

Firewall for AI DLP APIs enables developers to write custom code to sanitize data anywhere–RAG data sets, analytics data stores, data pipelines, and unsupported SaaS applications.

Rate Limits for Firewall APIs

To prevent misuse and ensure the stability of our platform, we enforce a rate limit on an API Key and endpoint basis, similar to the way many other APIs enforce rate limits.

When operating under our Free plan, accounts and their corresponding API Keys have a rate limit of 5 requests per second on average, with support for bursts of 15 requests per second. If you upgrade to a paid plan – the Enterprise plan – this rate increases to a limit of 10 requests per second on average and bursts of 50 requests per second.

Plan

Requests Per Second (Avg)

Burst

Free

Enterprise

10,

50,

The Nightfall API follows standard practices and conventions to signal when these rate limits have been exceeded.

Successful requests return a header X-Rate-Limit-Remaining with the integer number of requests remaining before errors will be returned to the client.

When your application exceeds the rate limit for a given API endpoint, the Nightfall API will return an HTTP response code of 429 "Too Many Requests.” If your use case requires increased rate limiting, please reach out to [email protected].

Additionally, these unsuccessful requests return the number of seconds to wait before retrying the request in a Retry-After Header.

Request Rate Limiting

Your Request Rate Limiting throttles how frequently you can make requests to the API. You can monitor your rate limit usage via the `X-Rate-Limit-Remaining` header, which tells you how many remaining requests you can make within the next second before being throttled.

Quotas

Your Quota limits how many bytes of data you're permitted to scan within a given period. Your current remaining quota and the end of your current quota period are denoted by the following response headers.

Response Headers

Type

Description

X-Quota-Remaining

string

The bytes remaining in your quota for this period. Will be reset to the amount specified in your billing plan at the end of your quota cycle.

X-Quota-Period-End

datetime

the date and time at which your quota will be reset, encoded as a string in the RFS-3339 format.

Policy User Scope Update API

Rate Limits for Native SaaS app APIs

To prevent misuse and ensure the stability of our platform, we enforce a rate limit on an API Key and endpoint basis, similar to the way many other APIs enforce rate limits.

Plan

Requests Per Second (Avg)

Burst

The Nightfall API follows standard practices and conventions to signal when these rate limits have been exceeded.

Successful requests return a header X-Rate-Limit-Remaining with the integer number of requests remaining before errors will be returned to the client.

When your application exceeds the rate limit for a given API endpoint, the Nightfall API will return an HTTP "Too Many Requests.” If your use case requires increased rate limiting, please reach out to

Additionally, these unsuccessful requests return the number of seconds to wait before retrying the request in a Header.

Request Rate Limiting

Quotas

Your Quota limits how many requests you can make within a given period. Your current remaining quota and the end of your current quota period are denoted by the following response headers.

Response Headers

Type

Description

For the free plan, we allow 5 requests per second and 10000 requests in a day.

Exfiltration Prevention APIs

You can use the exfiltration APIs to search exfiltration events, fetch exfiltration events and also event details. Additionally, you can also view details of the user (actor) whose actions triggered an event, and details of the asset that triggered an event.

Default

Models

Posture Management APIs

You can use the posture management APIs to search posture events, fetch posture events and also event details. Additionally, you can also view details of the user (actor) whose actions triggered an event, and details of the asset that triggered an event.

Default

Models

Nightfall Software Development Kit (SDK)

Overview

Leverage our software development kits (SDKs) to enable easier, faster, and more stable engagement with the Nightfall APIs. Nightfall has a growing library of language specific SDKs including for:

If there is a language-specific SDK that you would find valuable but is not here, please don't hesitate to reach out to .

Language Specific Guides

Overview

Nightfall provides you the flexibility to easily integrate into applications using programming languages. The supported languages are as follows.

Python

This guide describes how to use Nightfall with the Python programming language.

The example below will demonstrate how to use Nightfall’s text scanning functionality to verify whether a string contains sensitive PII using the Nightfall Python SDK.

To request the Nightfall API you will need:

A Nightfall API key
An existing Nightfall Detection Rule
Data to scan. Note that the API interprets data as plaintext, so you may pass it in any structured or unstructured format.

You can read more about obtaining a Nightfall API key or about our available data detectors in the linked reference guides.

In this tutorial, we will be downloading, setting up, and using the Python SDK provided by Nightfall.

We recommend you first set up a virtual environment. You can learn more about that here.

You can download the Nightfall SDK from PyPi like this:

We will be using the built-in os library to help run this sample API script. This will be used to help extract the API Key from the OS as an environment variable.

import os

from nightfall import Confidence, DetectionRule, Detector, LogicalOp, Nightfall

Next, we extract our API Key, and abstract a nightfall class from the SDK, for it. In this example, we have our API key set via an environment variable called NIGHTFALL_API_KEY. Your API key should never be hard-coded directly into your script.

nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.

detection_rule_uuid = os.environ.get('DETECTION_RULE_UUID')

In this example, we will use some example data in the payload List.

🚧Payload Limit
Payloads must be under 500 KB when using the Scan API. If your file is larger than the limit, consider using the file api, which is also available via the Python SDK.

We will ignore the second parameter as we do not have redaction configured for this request.

With the Nightfall API, you can redact and mask your findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

payload = [
    "The customer social security number is 458-02-6124",
    "No PII in this string",
    "My credit card number is 4916-6734-7572-5015"
]

result, _ = nightfall.scan_text(
        payload,
        detection_rule_uuids=[detection_rule_uuid]
    )

payload = [
    "The customer social security number is 458-02-6124",
    "No PII in this string",
    "My credit card number is 4916-6734-7572-5015"
]

result, _ = nightfall.scan_text(
    payload,
    detection_rules=[
        DetectionRule(
            name="Sample_Detection_Rule",
            logical_op=LogicalOp.ANY,
            detectors=[
                Detector(
                    min_confidence=Confidence.VERY_LIKELY,
                    min_num_findings=1,
                    display_name="Credit Card",
                    nightfall_detector="CREDIT_CARD_NUMBER",
                ),
                Detector(
                    min_confidence=Confidence.VERY_LIKELY,
                    min_num_findings=1,
                    display_name="Social",
                    nightfall_detector="US_SOCIAL_SECURITY_NUMBER",
                )
            ]
        )
    ]
)

Reviewing Results

Now we are ready to review the results from the Nightfall SDK to check if there is any sensitive data in our file. Since the results will be in a dataclass, we can use the built-in __repr__ functions to format the results in a user-friendly and readable manner.

All data and sample findings shown below are validated, non-sensitive, examples of sample data.

If there are no sensitive findings in our payload, the response will be as shown in the 'empty response' pane below:

[
    [Finding(finding='458-02-6124', redacted_finding=None, before_context=None, after_context=None, detector_name='US social security number (SSN)', detector_uuid='e30d9a87-f6c7-46b9-a8f4-16547901e069', confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, byte_range=Range(start=39, end=50), codepoint_range=Range(start=39, end=50), matched_detection_rule_uuids=['c67e3dd7-560e-438f-8c72-6ec54979396f'], matched_detection_rules=[])],
    [],
    [Finding(finding='4916-6734-7572-5015', redacted_finding=None, before_context=None, after_context=None, detector_name='Credit card number', detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, byte_range=Range(start=25, end=44), codepoint_range=Range(start=25, end=44), matched_detection_rule_uuids=['c67e3dd7-560e-438f-8c72-6ec54979396f'], matched_detection_rules=[])]
]

[
    [Finding(finding='458-02-6124', redacted_finding=None, before_context=None, after_context=None, detector_name='Social', detector_uuid='e30d9a87-f6c7-46b9-a8f4-16547901e069', confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, byte_range=Range(start=39, end=50), codepoint_range=Range(start=39, end=50), matched_detection_rule_uuids=[], matched_detection_rules=['Sample_Detection_Rule'])],
    [],
    [Finding(finding='4916-6734-7572-5015', redacted_finding=None, before_context=None, after_context=None, detector_name='Credit Card', detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, byte_range=Range(start=25, end=44), codepoint_range=Range(start=25, end=44), matched_detection_rule_uuids=[], matched_detection_rules=['Sample_Detection_Rule'])],
]

[[], [], []]

And that's it 🎉

You are now ready to use the Python SDK for other scenarios.

Java

This guide describes how to use Nightfall with the Java programming language.

The example below will demonstrate how to use Nightfall’s text scanning functionality to verify whether a string contains sensitive PII using the Nightfall Java SDK.

In this tutorial, we will be downloading, setting up, and using the Java SDK provided by Nightfall.

To make a request to the Nightfall API you will need:

A Nightfall API key
Plaintext data to scan.

You can read more about obtaining or about our available from the linked reference guides.

You can add the Nightfall package to your project by adding a dependency to your pom.xml:

First add the required imports to the top of the file.

These are the objects we will use from the Nightfall SDK, as well as some collection classes for data handling.

We can then declare some data to scan in a List:

Create a ScanTextRequest to scan the payload with. First create a new instance of the credit card detector, and set to trigger if there are any findings that are confidence LIKELY or above.

Add a second detector, looking for social security numbers. Set it to be triggered if there is at least a possible finding.

Combine these detectors into a detection rule, which will return findings if either of these detectors are triggered.

Finally, combine the payload and configuration together as a new ScanTextRequest, and return it.

Use the ScanTextRequest instance with a NightfallClient to send your request to Nightfall.

The resulting ScanTextResponse may be used to print out the results:

And that's it 🎉

You are now ready to use the Java SDK for other scenarios.

Tutorials

OpenAI Prompt Sanitization Tutorial

Protecting Sensitive Information in AI Interactions: The Critical Role of Content Filtering

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property

Real-world scenarios highlight the urgency of this issue:

Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.

Steps to Identify and Sanitize ChatGPT Prompts

Let's look at a Python example using OpenAI and Nightfall's Python SDK. You can download this sample code .

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating an API key .

Step 2: Configure Detection

Create an with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction .

Step 3: Classify, Redact, Filter Your User Input

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, let’s say we send Nightfall the following:

We get back the following redacted text:

Step 4: Send Redacted Prompt to OpenAI

Review the response to see if Nightfall has returned sensitive findings:

If there are sensitive findings:
- You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.
- Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
- Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request.
- Construct your outgoing prompt.
- If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
- Use the OpenAI API or SDK client to send the prompt to the AI model.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

And the message we ultimately sent was redacted, and that’s what we sent to OpenAI:

OpenAI sends us the same response either way because it doesn’t need to receive sensitive data to generate a cogent response. This means we were able to leverage ChatGPT just as easily but we didn’t risk sending OpenAI any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.

SaaS Protection

This section consists of various documents that assist you in scanning various popular SaaS applications using Nightfall APIs.

HubSpot DLP Tutorial
Zendesk DLP Tutorial

Observability Protection

This section consists of various documents that assist you in scanning various popular observability platforms using Nightfall APIs.

Datastore Protection

This section consists of various documents that assist you in scanning various popular data stores using Nightfall APIs.

Airtable DLP Tutorial
Amazon Kinesis DLP Tutorial
Amazon RDS DLP Tutorial - Full Scan
Amazon RDS DLP Tutorial
Amazon S3 DLP Tutorial
Elasticsearch DLP Tutorial
Snowflake DLP Tutorial

Nightfall Use Cases

Overview

This section consists of use case tutorials for various scenarios of Firewall for AI. The tutorials explained in this section are as follows.

Using Scan API (with Python)

Say you have a number of files containing customer or patient data and you are not sure which of them are ok to share in a less secure manner. By leveraging Nightfall’s API you can easily verify whether a file contains sensitive PII, PHI, or PCI.

To make a request to the Nightfall API you will need:

A Nightfall API key
A list of data types you wish to scan for
Data to scan. Note that the API interprets data as plaintext, so you may pass it in any structured or unstructured format.

You can read more about or about our in the linked reference guides.

To run the following API call, we will be using Python's standard json, os, and requests libraries.

First we define the endpoint we want to reach with our API call.

Next we define the headers of our API request. In this example, we have our API key set via an environment variable called "NIGHTFALL_API_KEY". Your API key should never be hard-coded directly into your script.

Next we define the detectors with which we wish to scan our data. The detectors must be formatted as a list of key-value pairs of format {‘name’:’DETECTOR_NAME’}.

Next, we build the request body, which contains the detectors from above, as well as the raw data that you wish to scan. In this example, we will read it from a file called sample_data.csv.

Here we assume that the file is under the 500 KB payload limit of the Scan API. If your file is larger than the limit, consider breaking it down into smaller pieces across multiple API requests.

Now we are ready to call the Nightfall API to check if there is any sensitive data in our file. If there are no sensitive findings in our file, the response will be "[[]]".

[[]]

FAQs

What Can I do with the Firewall for AI

Firewall for AI is a powerful API that acts as a middleware layer or client wrapper to protect your AI models from consuming sensitive data. By integrating Firewall for AI into your application via API calls, you can proactively prevent data leaks and maintain compliance without disrupting your existing workflows or model updates.

How quickly can I get started with Firewall for AI?

You can start scanning for sensitive data in just a few minutes. Our developer-friendly API and comprehensive documentation make it easy to integrate Firewall for AI into your application. Follow our Quickstart guide at for step-by-step instructions on setting up the API, configuring detectors, and making your first API call.

What types of data can I scan with API?

Firewall for AI provides a flexible and extensible API that allows you to scan a wide variety of data types, including plain text, structured and unstructured files, and even images. Our API can handle data in various formats such as JSON, XML, CSV, and more. Visit our detector glossary at docs.nightfall.ai/docs/detector-glossary to explore the comprehensive list of supported data types and file formats

What types of detectors are supported out of the box?

Firewall for AI offers a rich set of pre-built detectors that can identify many different types of sensitive data, including personally identifiable information (PII), payment card industry data (PCI), protected health information (PHI), secrets, and credentials. These detectors are powered by advanced machine learning models and can be easily integrated into your application with just a few lines of code. Refer to our detector glossary at docs.nightfall.ai/docs/detector-glossary for a complete list of available detectors.

Can I customize or bring my own detectors?

Absolutely! In addition to the pre-built detectors, Firewall for AI allows you to create custom detectors tailored to your specific requirements. You can either fine-tune one of our pre-configured detection rules or build your own detector from scratch using our intuitive API. Nightfall supports many traditional detector types such as regular expressions, exact data matching, and word list/dictionaries. Check out our dedicated guide on creating custom detectors for more information.

What is the pricing model?

We offer a free tier that allows you to sign up and start using Firewall for AI with zero upfront costs or commitments. This tier provides a generous data scanning capacity and access to all the core features.

We offer enterprise pricing plans for advanced requirements such as higher data volumes, custom rate limits, and dedicated support.

Contact our team at [email protected] or via the contact form on our website to discuss your specific needs and get a tailored pricing quote.

How do I get in touch with you?

Don't hesitate to get in touch with us directly via email at or through the c on our website.

We host on Wednesdays at 12 pm PT to help answer questions, talk through any ideas, and chat about data security. We would love to see you there!

Can I test out the detection and my own detection rules before writing any code?

Yes, you can test out the detection engine, including 70+ pre-built detectors without writing any code or having to sign up in our Playground.

How does Nightfall support custom data types?

In two ways:

Nightfall’s out of the box detectors can be modified with context rules and exclusion rules.
Nightfall also supports inputting custom regular expressions or word lists (i.e. dictionaries) as detectors in the RE2 standard as documented here.

How does Nightfall's Firewall for AI differs from other solutions?

The Firewall for AI Platform differs from other solutions like Google DLP and Amazon Macie, as well as open source solutions like truffleHog, on a number of dimensions summarised below.

Accuracy

While solutions like Google DLP have a broad set of detectors, many of them are rules or regex based, which means many of the detectors are not usable in practice. Likewise, detection has been found to be inconsistent in some cases, perhaps due to internal A/B testing.
Because of the limitations of regex-based rules, instead of leveraging machine learning based detectors, OSS detection solutions tend to have a much higher rate of false positives compared to Nightfall.
Detector configurability and ability to provide metrics at the token level makes Nightfall accurate and actionable to engineering & security teams.

Convenience

Want to leave the last 4 digits of a credit card number visible, securely encrypt emails, and completely remove SSNs from your data? The Nightfall platform allows you to redact/replace, substitute, and/or encrypt sensitive data findings in the same API call as your inspection request.

Ease of use

All inspection configuration in Google DLP is done as code, which makes it challenging to easily update, visualize, and modify detection rules and configuration. Nightfall allows for configuration as code, as well as the Nightfall for creating and updating detection rules, which makes it easier to collaborate.
OSS secret detection tools tend to rely heavily on manual creation of regex-based detection compared to an ability to programmatically scan text and file inputs using 150+ detectors in Nightfall – e.g. truffleHog only enables you to scan for secrets like passwords and private keys whereas Nightfall scans for not only secrets and credentials, but also allows you to use our vast detector library to scan for PII, PCI, and PHI.

File parsing

To parse files with Google DLP and Macie, each requires that they be in their respective cloud storage (Google Cloud Storage or S3, respectively). With the Nightfall Developer Platform, we take care of storage requirements for you. Uploaded assets are stored encrypted at rest with minimal access permissions, and are automatically deleted after 24 hours.
Amazon’s file parsers are limited to around 20 file types. Most notably, Macie does not support images. Text extraction via machine-learning based OCR for images is a core component of Nightfall’s file scanning endpoint.
Open source secrets detection solutions are limited in their detection capabilities. Namely, these projects do not support scanning binary files. Nightfall supports binary files and the ability to scan diff files.

Platform agnostic

Each cloud provider's DLP products are geared towards protecting their own cloud services. For example, Google DLP’s native integrations are limited to Google Cloud offerings such as BigQuery. Similarly, Macie is primarily designed around scanning AWS S3 buckets. The interface is largely geared towards exploring sensitive data across S3 buckets. To scan content outside of S3, Amazon’s recommendation is to move or replicate the data into S3 to scan, which is impractical.
OSS solutions are primarily designed around git repositories.
Nightfall has native integrations with many cloud applications like Slack, Atlassian, GitHub, Google Drive, as well a broad set of tutorials and open source code so you can build integrations into any data silo with ease. For example, this includes services like Snowflake, Airtable, and more.

Support and documentation

Google DLP and Macie are loosely supported products and with many cloud offerings, support is hard to come by. Nightfall is laser-focused on best-of-breed content inspection and we are ready to address your questions and use cases.
Nightfall also has extensive documentation including SDKs for multiple languages including Python, Java, NodeJS, and Go - with more under consistent development.

Cost and scale

Costs can balloon quickly with commercial services. They also have rate limits that don’t suit high data volumes.
Open source solutions have high hidden costs in the form of TCO, maintenance, and opportunity cost.
Nightfall offers a custom enterprise tier that can help you scale pricing based on your anticipated usage as well as custom rate limits.

Contact Us

Schedule a Demo

You can schedule a demo or a meeting with our sales/solutions engineering team directly via Calendly here. If you don't see a suitable time, please email us at [email protected].

Email Us

For support inquiries, please email us at [email protected].

For sales inquiries, please email us at [email protected].

term	value
clause	user_email:"[email protected]"
field	user_email
operator	:
value	[email protected]

SYNTAX	USAGE	DESCRIPTION	EXAMPLES
`:`	field:value	Exact match operator (case insensitive)	`state:"pending"` returns records where the currency is exactly `"PENDING"` in a case-insensitive comparison
(space)	field1:value1 field2:value2	The query returns only records that match both clauses	`state:active slack.channel_name:general`
`OR`	field:(value1 OR value2)	The query returns records that match either of the values (case insensitive)	`state:(active OR pending)`

param	description
event_id	the unique identifier of the exfiltration event to filter on
integration_name	the name of the integration to filter on
state	the state of the event to filter on (active, pending, resolved, expired)
event_type	the type of exfiltration event to filter on
actor_name	the name of the actor who performed the action to filter on
actor_email	the email of the actor who performed the action to filter on
user_name	the username of the user to filter on (backward compatibility)
user_email	the email of the user to filter on (backward compatibility)
notes	the comment or notes associated with the event to filter on
policy_id	the unique identifier of the policy to filter on
policy_name	the name of the policy to filter on
resource_id	the identifier of the resource to filter on
resource_name	the name of the resource to filter on
resource_owner_name	the name of the resource owner to filter on
resource_owner_email	the email of the resource owner to filter on
resource_content_type	the content type of the resource to filter on
endpoint.device_id	the device identifier for endpoint events to filter on
endpoint.machine_name	the machine name for endpoint events to filter on
gdrive.permission	the permission setting for Google Drive files to filter on
gdrive.shared_internal_email	the internal emails with which the file is shared to filter on
gdrive.shared_external_email	the external emails with which the file is shared to filter on
gdrive.drive	the Google Drive name to filter on
gdrive.file_owner	the owner of the Google Drive file to filter on
gdrive.label_name	the label name applied to Google Drive files to filter on
salesforce.report.scope	the scope of the Salesforce report to filter on
salesforce.report.event_source	the event source of the Salesforce report to filter on
salesforce.report.source_ip	the source IP address of the Salesforce report to filter on
salesforce.report.session_level	the session level of the Salesforce report to filter on
salesforce.report.operation	the operation type of the Salesforce report to filter on
salesforce.report.description	the description of the Salesforce report to filter on
salesforce.file.source_ip	the source IP address for Salesforce file events to filter on
salesforce.file.session_level	the session level for Salesforce file events to filter on

term	value
clause	user_email:"[email protected]"
field	user_email
operator	:
value	[email protected]

SYNTAX	USAGE	DESCRIPTION	EXAMPLES
`:`	field:value	Exact match operator (case insensitive)	`state:"pending"` returns records where the currency is exactly `"PENDING"` in a case-insensitive comparison
(space)	field1:value1 field2:value2	The query returns only records that match both clauses	`state:active slack.channel_name:general`
`OR`	field:(value1 OR value2)	The query returns records that match either of the values (case insensitive)	`state:(active OR pending)`

param	description
event_id	the unique identifier of the posture event to filter on
integration_name	the name of the integration to filter on
state	the state of the event to filter on (active, pending, resolved, expired)
event_type	the type of posture event to filter on
actor_name	the name of the actor who performed the action to filter on
actor_email	the email of the actor who performed the action to filter on
user_name	the username of the user to filter on (backward compatibility)
user_email	the email of the user to filter on (backward compatibility)
notes	the comment or notes associated with the event to filter on
policy_id	the unique identifier of the policy to filter on
policy_name	the name of the policy to filter on
resource_id	the identifier of the resource to filter on
resource_name	the name of the resource to filter on
resource_owner_name	the name of the resource owner to filter on
resource_owner_email	the email of the resource owner to filter on
resource_content_type	the content type of the resource to filter on
endpoint.device_id	the device identifier for endpoint events to filter on
endpoint.machine_name	the machine name for endpoint events to filter on
gdrive.permission	the permission setting for Google Drive files to filter on
gdrive.shared_internal_email	the internal emails with which the file is shared to filter on
gdrive.shared_external_email	the external emails with which the file is shared to filter on
gdrive.drive	the Google Drive name to filter on
gdrive.file_owner	the owner of the Google Drive file to filter on
gdrive.label_name	the label name applied to Google Drive files to filter on
salesforce.report.scope	the scope of the Salesforce report to filter on
salesforce.report.event_source	the event source of the Salesforce report to filter on
salesforce.report.source_ip	the source IP address of the Salesforce report to filter on
salesforce.report.session_level	the session level of the Salesforce report to filter on
salesforce.report.operation	the operation type of the Salesforce report to filter on
salesforce.report.description	the description of the Salesforce report to filter on
salesforce.file.source_ip	the source IP address for Salesforce file events to filter on
salesforce.file.session_level	the session level for Salesforce file events to filter on

Building Endpoint DLP to Detect PII on Your Machine in Real-Time

Endpoint data loss prevention (DLP) discovers, classifies, and protects sensitive data - like PII, credit card numbers, and secrets - that proliferates onto endpoint devices, like your computer or EC2 machines. This is a way to help keep data safe, so that you can detect and stop occurrences of data exfiltration. Our endpoint DLP application will be composed of two core services that will run locally. The first service will monitor for file system events using the Watchdog package in Python. When a file system event is triggered, such as when a file is created or modified, the service will send the file to Nightfall to be scanned for sensitive data. The second service is a webhook server that will receive scan results from Nightfall, parse the sensitive findings, and write them to a CSV file as output. You'll build familiarity with the following tools and frameworks:

Python
Flask
Nightfall
Ngrok
Watchdog

Key Concepts

Before we get started on our implementation, start by familiarizing yourself with how file scanning works with Nightfall, so you're acquainted with the flow we are implementing.

In a nutshell, file scanning is done asynchronously by Nightfall; after you upload a file to Nightfall and trigger the scan, we perform the scan in the background. When the scan completes, Nightfall delivers the results to you by requesting your webhook server. This asynchronous behavior allows Nightfall to scan files of varying sizes and complexities without requiring you to hold open a long synchronous request, or continuously poll for updates. The impact of this pattern is that you need a webhook endpoint that can receive inbound notifications from Nightfall when scans are completed - that's one of the two services we are building in this tutorial.

Getting Started

You can fork the sample repo and view the complete code here, or follow along below. If you're starting from scratch, create a new GitHub repository. This tutorial was developed on a Mac and assumes that's the endpoint operating system you're running, however, this tutorial should work across operating systems with minor modifications. For example, you may wish to extend this tutorial by running endpoint DLP on an EC2 machine to monitor your production systems.

Setting Up Dependencies

First, let's start by installing our dependencies. We'll be using Nightfall for data classification, the Flask web framework in Python, watchdog for monitoring file system events, and Gunicorn as our web server. Create requirements.txt and add the following to the file:

nightfall
Flask
Gunicorn
watchdog

Then run pip install -r requirements.txt to do the installation.

Configuring Detection with Nightfall

Next, we'll need our Nightfall API Key and Webhook Signing Secret; the former authenticates us to the Nightfall API, while the latter authenticates that incoming webhooks are originating from Nightfall. You can retrieve your API Key and Webhook Signing Secret from the Nightfall Dashboard. Complete the Nightfall Quickstart for a more detailed walk-through. Sign up for a free Nightfall account if you don't have one.

These values are unique to your account and should be kept safe. This means that we will store them as environment variables and should not store them directly in code or commit them into version control. If these values are ever leaked, be sure to visit the Nightfall Dashboard to re-generate new values for these secrets.

export NIGHTFALL_API_KEY=<your_key_here>
export NIGHTFALL_SIGNING_SECRET=<your_secret_here>

Monitoring File System Events

Watchdog is a Python module that watches for file system events. Create a file called scanner.py. We'll start by importing our dependencies and setting up a basic event handler. This event handler responds to file change events for file paths that match a given set of regular expressions (regexes). In this case, the .* indicates we are matching on any file path - we'll customize this a bit later. When a file system event is triggered, we'll print a line to the console.

import os
import time
from watchdog.observers import Observer
from watchdog.events import RegexMatchingEventHandler
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

class MyHandler(RegexMatchingEventHandler):
    # event handler callback that is called when a file is modified (created or changed)
    def on_modified(self, event):
        print(f'Event type: {event.event_type} | Path: {event.src_path}')

if __name__ == "__main__":
    regexes = [ ".*" ]

    # register event handler to monitor file paths that match our regex
    event_handler = MyHandler(regexes)
    observer = Observer()
    observer.schedule(event_handler,  path='',  recursive=True)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

Run python scanner.py and you'll notice lots of lines getting printed to the console. These are all the files that are getting created and changed on your machine in real-time. You'll notice that your operating system and the apps you're running are constantly writing, modifying, and deleting files on disk!

Event type: modified | Path: /Users/myuser/Library/Caches
Event type: modified | Path: /Users/myuser/Library/Caches/com.apple.nsservicescache.plist
Event type: modified | Path: /Users/myuser/Library/Caches
Event type: modified | Path: /Users/myuser/Library/Caches/Google/Chrome/Default/Cache
Event type: modified | Path: /private/tmp
Event type: modified | Path: /Users/myuser/Library/Preferences/ContextStoreAgent.plist
Event type: modified | Path: /private/tmp
Event type: modified | Path: /Users/myuser/Library/Assistant
Event type: modified | Path: /Users/myuser/Library/Assistant/SyncSnapshot.plist
...

Next, we'll update our event handler so that instead of simply printing to the console, we are sending the file to Nightfall to be scanned. We will initiate the scan request to Nightfall, by specifying the file path of the changed/created file, a webhook URL where the scan results should be sent, and our Detection Rule that specifies what sensitive data we are looking for. If the file scan is initiated successfully, we'll print the corresponding Upload ID that Nightfall provides us to the console. This ID will be useful later when identifying scan results.

Here's our complete scanner.py, explained further below:

import os
import time
from watchdog.observers import Observer
from watchdog.events import RegexMatchingEventHandler
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

class MyHandler(RegexMatchingEventHandler):
    def scan_file(self, filepath):
        nightfall = Nightfall() # reads API key from NIGHTFALL_API_KEY environment variable by default
        webhook_url = f"{os.getenv('NIGHTFALL_SERVER_URL')}/ingest" # webhook server we'll create

        try:
            scan_id, message = nightfall.scan_file(
                filepath, 
                webhook_url=webhook_url,
                # detection rule to detect credit card numbers, SSNs, and API keys
                detection_rules=[ DetectionRule([ 
                    Detector(
                        min_confidence=Confidence.LIKELY,
                        nightfall_detector="CREDIT_CARD_NUMBER",
                        display_name="Credit Card Number"),
                    Detector(
                        min_confidence=Confidence.LIKELY,
                        nightfall_detector="US_SOCIAL_SECURITY_NUMBER",
                        display_name="US Social Security Number"),
                    Detector(
                        min_confidence=Confidence.LIKELY,
                        nightfall_detector="API_KEY",
                        display_name="API Key")
                    ])
                ])
            return scan_id, message
        except Exception as err:
            print(f"Error processing {filepath} | {err}")
            return None, None

    def on_modified(self, event):
        # scan file with Nightfall
        scan_id, message = self.scan_file(event.src_path)
        if scan_id:
            print(f"Scan initiated | Path {event.src_path} | UploadID {scan_id}")
        print(f'Event type: {event.event_type} | Path: {event.src_path}')

if __name__ == "__main__":
    regexes = [ ".*/Downloads/.*", ".*/Desktop/.*", ".*/Documents/.*" ]

    # register event handler to monitor file paths that match our regexes
    event_handler = MyHandler(regexes)
    observer = Observer()
    observer.schedule(event_handler,  path='',  recursive=True)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

We can't run this just yet, since we need to set our webhook URL, which is currently reading from an environment variable that we haven't set yet. We'll create our webhook server and set the webhook URL in the next set of steps.

In this example, we have specified an inline Detection Rule that detects Likely Credit Card Numbers, Social Security Numbers, and API Keys. This Detection Rule is a simple starting point that just scratches the surface of the types of detection you can build with Nightfall. Learn more about building inline detection rules here or how to configure them in the Nightfall Dashboard.

Also note that we've updated our regex from .* to a set of file paths on Macs that commonly contain user generated files - the Desktop, Documents, and Downloads folders:

regexes = [ ".*/Downloads/.*", ".*/Desktop/.*", ".*/Documents/.*" ]

You can customize these regexes to whatever file paths are of interest to you. Another option is to write a catch-all regex that ignores/excludes paths to config and temp files:

regexes = [ "(?!/opt/|.*/Library/|.*/private/|/System/|/Applications/|/usr/).*" ]

Setting Up Webhook Server

Next, we'll set up our Flask webhook server, so we can receive file scanning results from Nightfall. Create a file called app.py. We'll start by importing our dependencies and initializing the Flask and Nightfall clients:

import os
from flask import Flask, request, render_template
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from datetime import datetime, timedelta
import urllib.request, urllib.parse, json
import csv

app = Flask(__name__)

nightfall = Nightfall(
	key=os.getenv('NIGHTFALL_API_KEY'),
	signing_secret=os.getenv('NIGHTFALL_SIGNING_SECRET')
)

Next, we'll add our first route, which will display "Hello World" when the client navigates to /ping simply as a way to validate things are working:

@app.route("/ping")
def ping():
	return "Hello World", 200

In a second command line window, run gunicorn app:app on the command line to fire up your server, and navigate to your local server in your web browser. You'll see where the web browser is hosted in the Gunicorn logs, typically it will be 127.0.0.1:8000 aka localhost:8000.

[2021-11-26 14:22:53 -0800] [61196] [INFO] Starting gunicorn 20.1.0
[2021-11-26 14:22:53 -0800] [61196] [INFO] Listening at: http://127.0.0.1:8000 (61196)
[2021-11-26 14:22:53 -0800] [61196] [INFO] Using worker: sync
[2021-11-26 14:22:53 -0800] [61246] [INFO] Booting worker with pid: 61246

To expose our local webhook server via a public tunnel that Nightfall can send requests to, we'll use ngrok. Download and install ngrok via their quickstart documentation here. We'll create an ngrok tunnel as follows:

./ngrok http 8000

After running this command, ngrok will create a tunnel on the public internet that redirects traffic from their site to your local machine. Copy the HTTPS tunnel endpoint that ngrok has created: we can use this as the webhook URL when we trigger a file scan.

Account                       Nightfall Example
Version                       2.3.40
Region                        United States (us)
Web Interface                 http://127.0.0.1:4040
Forwarding                    http://3ecedafba368.ngrok.io -> http://localhost:8000
Forwarding                    https://3ecedafba368.ngrok.io -> http://localhost:8000

Let's set this HTTPS endpoint as a local environment variable so we can reference it later:

export NIGHTFALL_SERVER_URL=https://3ecedafba368.ngrok.io

With a Pro ngrok account, you can create a subdomain so that your tunnel URL is consistent, instead of randomly generated each time you start the tunnel.

Handling Inbound Webhooks

Before we send a file scan request to Nightfall, let's implement our incoming webhook endpoint, so that when Nightfall finishes scanning a file, it can successfully send the sensitive findings to us.

First, what does it mean to have findings? If a file has findings, this means that Nightfall identified sensitive data in the file that matched the detection rules you configured. For example, if you told Nightfall to look for credit card numbers, any substring from the request payload that matched our credit card detector would constitute sensitive findings.

We'll host our incoming webhook at /ingest with a POST method.

Nightfall will POST to the webhook endpoint, and in the inbound payload, Nightfall will indicate if there are sensitive findings in the file, and provide a link where we can access the sensitive findings as JSON.

We'll validate the inbound webhook from Nightfall, retrieve the JSON findings from the link provided, and write the findings to a CSV file. First, let's initialize our CSV file where we will write results, and add our /ingest POST method.

# create CSV where sensitive findings will be written
headers = ["upload_id", "#", "datetime", "before_context", "finding", "after_context", "detector", "confidence", "loc", "detection_rules"]
with open(f"results.csv", 'a') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(headers)

# respond to POST requests at /ingest
# Nightfall will send requests to this webhook endpoint with file scan results
@app.route("/ingest", methods=['POST'])
def ingest():
    data = request.get_json(silent=True)
    # validate webhook URL with challenge response
    challenge = data.get("challenge") 
    if challenge:
        return challenge
    # challenge was passed, now validate the webhook payload
    else: 
        # get details of the inbound webhook request for validation
        request_signature = request.headers.get('X-Nightfall-Signature')
        request_timestamp = request.headers.get('X-Nightfall-Timestamp')
        request_data = request.get_data(as_text=True)

        if nightfall.validate_webhook(request_signature, request_timestamp, request_data):
            # check if any sensitive findings were found in the file, return if not
            if not data["findingsPresent"]: 
                print("No sensitive data present!")
                return "", 200

            # there are sensitive findings in the file
            output_results(data)
            return "", 200
        else:
            return "Invalid webhook", 500

You'll notice that when there are sensitive findings, we call the output_results() method. Let's write that next. In output_results(), we are going to parse the findings and write them as rows into our CSV file.

def output_results(data):
	findings_url = data['findingsURL']
	# open findings URL provided by Nightfall to access findings
	with urllib.request.urlopen(findings_url) as url:
		findings = json.loads(url.read().decode())
		findings = findings['findings']

	print(f"Sensitive data found, outputting {len(findings)} finding(s) to CSV | UploadID {data['uploadID']}")
	table = []
	# loop through findings JSON, get relevant finding metadata, write each finding as a row into output CSV
	for i, finding in enumerate(findings):
		row = [
			data['uploadID'],
			i+1,
			datetime.now(),
			repr(finding['beforeContext']), 
			repr(finding['finding']),
			repr(finding['afterContext']),
			finding['detector']['name'],
			finding['confidence'],
			finding['location']['byteRange'],
			finding['matchedDetectionRules']
		]
		table.append(row)
		with open(f"results.csv", 'a') as csvfile:
			writer = csv.writer(csvfile)
			writer.writerow(row)
	return

Restart your server so the changes propagate. We'll take a look at the console and CSV output of our webhook endpoint in the next section.

Scan Changed Files in Real-Time

In our previous command line window, we can now turn our attention back to scanner.py. We now have our webhook URL so let's set it here as well and run our scanner.

export NIGHTFALL_SERVER_URL=https://3ecedafba368.ngrok.io
python scanner.py

To trigger a file scan event, download the following sample data file. Assuming it automatically downloads to your Downloads folder, this should immediately trigger a file change event and you'll see console log output! If not, you can also download the file with curl into a location that matches your event handler's regex we set earlier.

curl https://raw.githubusercontent.com/nightfallai/dlp-sample-data/main/sample-pci.csv > ~/Downloads/sample-pci.csv

You'll see the following console output from scanner.py:

Event type: modified | Path: /Users/myuser/Downloads/sample-pci.csv
Scan initiated | Path /Users/myuser/Downloads/sample-pci.csv | UploadID c23fdde2-5e98-4183-90b0-31e2cdd20ac0

And the following console output from our webhook server:

Sensitive data found, outputting 10 finding(s) to CSV | UploadID ac6a4a9d-a7b9-4a78-810d-8a66f7644704

And the following sensitive findings written to results.csv:

upload_id,#,datetime,before_context,finding,after_context,detector,confidence,loc,detection_rules
ac6a4a9d-a7b9-4a78-810d-8a66f7644704,1,2021-12-04 22:12:21.039602,'Name\tCredit Card\nRep. Viviana Hintz\t','5433-9502-3725-7862','\nEloisa Champlin\t3457-389808-83234\nOmega',Credit Card Number,VERY_LIKELY,"{'start': 36, 'end': 55}",[]
...

Each row in the output CSV will correspond to a sensitive finding. Each row will have the following fields, which you can customize in app.py: the upload ID provided by Nightfall, an incrementing index, timestamp, characters before the sensitive finding (for context), the sensitive finding itself, characters after the sensitive finding (for context), the confidence level of the detection, the byte range location (character indicies) of the sensitive finding in its parent file, and the corresponding detection rules that flagged the sensitive finding.

Note that you may also see events for system files like .DS_Store or errors corresponding to failed attempts to scan temporary versions of files. This is because doing things like downloading a file can trigger multiple file modification events. As an extension to this tutorial, you could consider filtering those out further, though they shouldn't impact our ability to scan files of interest.

If we leave these services running, we'll continue to monitor files for sensitive data and appending to our results CSV when sensitive findings are discovered!

Running Endpoint DLP in the Background

We can run both of our services in the background nohup so that we don't need to leave two command line tabs open indefinitely. We'll pipe console output to log files so that we can always reference the application's output or determine if the services crashed for any reason.

nohup python -u scanner.py > scanner.log &
nohup gunicorn app:app > server.log &

This will return the corresponding process IDs - we can always check on these later with the ps command.

Next Steps

This post is simply of a proof of concept version of endpoint DLP. Building a production-grade endpoint DLP application will have additional complexity and functionality. However, the detection engine is one of the biggest components of an endpoint DLP system, and this example should give you a sense of how easy it is to integrate with Nightfall's APIs and the power of Nightfall's detection engine.

Here are few ideas on how you can extend upon this service further:

Run the scanner on EC2 machines to scan your production machines in real-time
Respond to more system events like I/O of USB drives and external ports
Implement remediation actions like end-user notifications or file deletion
Redact the sensitive findings prior to writing them to the results file
Store the results in the cloud for central reporting
Package in an executable so the application can be run easily
Scan all files on disk on the first boot of the application

param	description
state	the violation states to filter on
user_email	the emails of users updating the resource resulting in the violation
user_name	the usernames of users updating the resource resulting in the violation
integration_name	the integration to filter on
confidence	one or more likelihoods/confidences
policy_id	one or more policy IDs
detection_rule_id	one or more detection rule IDs
detector_id	one or more detector IDs
risk_label	the risk label to filter on
risk_source	the risk determination source to filter on
slack.channel_name	the slack channel names to filter on
slack.channel_id	the slack channel IDs to filter on
slack.workspace	the slack workspaces to filter on
confluence.parent_page_name	the names of the parent pages in confluence to filter on
confluence.space_name	the names of the spaces in confluence to filter on
gdrive.drive	the drive names in gdrive to filter on
jira.project_name	the jira project names to filter on
jira.ticket_number	the jira ticket numbers to filter on
salesforce.org_name	the salesforce organization names to filter on
salesforce.object	the salesforce object names to filter on
salesforce.record_id	the salesforce record IDs to filter on
github.author_email	the github author emails to filter on
github.branch	the github branches to filter on
github.commit	the github commit ids to filter on
github.org	the github organizations to filter on
github.repository	the github repositories to filter on
github.repository_owner	the github repository owners to filter on
teams.team_name	the m365 teams team names to filter on
teams.channel_name	the m365 teams channels to filter on
teams.channel_type	the m365 teams channel types to filter on
teams.team_sensitivity	the m365 teams sensitivities to filter on
teams.sender	the m365 teams senders to filter on
teams.msg_importance	the m365 teams importance to filter on
teams.msg_attachment	the m365 teams attachment names to filter on
teams.chat_id	the m365 teams chat ID to filter on
teams.chat_type	the m365 teams chat type to filter on
teams.chat_topic	the m365 teams chat topic to filter on
teams.chat_participant	the m365 teams chat participant's display name to filter on
onedrive.drive_owner	drive owner's display name to filter on
onedrive.drive_owner_email	drive owner's email to filter on
onedrive.file_name	the file name to filter on
onedrive.created_by	the m365 user, who created the file in the drive, display name to filter on
onedrive.created_by_email	the m365 users, who created the file in the drive, email to filter on
onedrive.modified_by	the m365 users, who last modified the file in the drive, display name to filter on
onedrive.modified_by_email	the m365 users, who last modified the file in the drive, email to filter on
zendesk.ticket_status	the zendesk ticket status to filter on
zendesk.ticket_title	the zendesk ticket titles to filter on
zendesk.ticket_group_assignee	the zendesk ticket assignee groups to filter on
zendesk.current_user_role	the zendesk ticket current assignee user's roles to filter on
notion.created_by	the names of the users creating a resource in notion to filter on
notion.last_edited_by	the names of the users editing a resource in notion to filter on
notion.page_title	the page names in notion to filter on
notion.workspace_name	the workspace names in notion to filter on
gmail.user_name	the names of the sender to filter on
gmail.from	the email of sender to filter on
gmail.to	the email or name of recipients to filter on
gmail.cc	the email or name of cc to filter on
gmail.bcc	the email or name of bcc to filter on
gmail.thread_id	the thread id of email to filter on
gmail.subject	the subject of email to filter on
gmail.attachment_name	the name of attachment to filter on
gmail.attachment_type	the type of attachment to filter on