Only this pageAll pages
Powered by GitBook
1 of 96

Firewall for AI

Loading...

Introduction to Firewall for AI

Loading...

Loading...

Loading...

Loading...

Key Concepts

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Nightfall APIs

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Nightfall Software Development Kit (SDK)

Loading...

Language Specific Guides

Loading...

Loading...

Loading...

Loading...

Tutorials

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Nightfall Use Cases

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Quickstart

The Document will guide you in making your first API request.

This page will get you up and running with the Nightfall API so you can start scanning for sensitive data.

Obtain an API key

The Nightfall API requires a valid API key to authenticate your API requests.

Make an API Scan Request

Below is an example request to the scan endpoint.

To run this example yourself, replace the API key (NF-rEpLaCe...) with the one you created in the dashboard or set it as the environment variable NIGHTFALL_API_KEY as necessary.

curl --request POST \
     --url https://api.nightfall.ai/v3/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer  NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "minNumFindings": 1,
                              "minConfidence": "VERY_LIKELY",
                              "displayName": "US Social Security Number",
                              "detectorType": "NIGHTFALL_DETECTOR",
                              "nightfallDetector": "US_SOCIAL_SECURITY_NUMBER"
                         },
                         {
                              "redactionConfig": {
                                   "maskConfig": {
                                        "charsToIgnore": [
                                             "-"
                                        ],
                                        "maskingChar": "X",
                                        "maskRightToLeft":true,
                                        "numCharsToLeaveUnMasked":4
                                   }
                              },
                              "minNumFindings": 1,
                              "minConfidence": "VERY_LIKELY",
                              "displayName": "Credit Card Number",
                              "detectorType": "NIGHTFALL_DETECTOR",
                              "nightfallDetector": "CREDIT_CARD_NUMBER"
                         }
                    ],
                    "name": "My Match Rule",
                    "logicalOp": "ANY"
               }
          ]
     },
     "payload": [
          "The customer social security number is 458-02-6124",
          "No PII in this string",
          "My credit card number is 5310-2768-6832-9293"
     ]
}
'
// By default, the client reads your API key from the environment variable NIGHTFALL_API_KEY
const nfClient = new Nightfall();

const payload = [
          "The customer social security number is 458-02-6124",
          "No PII in this string",
          "My credit card number is 5310-2768-6832-9293"
     ];
     

const policy = {
		 "detectionRules": [
               {
                    "detectors": [
                         {
                              "minNumFindings": 1,
                              "minConfidence": "LIKELY",
                              "displayName": "US Social Security Number",
                              "detectorType": "NIGHTFALL_DETECTOR",
                              "nightfallDetector": "US_SOCIAL_SECURITY_NUMBER"
                         },
                         {
                              "redactionConfig": {
                                   "maskConfig": {
                                        "charsToIgnore": [
                                             "-"
                                        ],
                                        "maskingChar": "#"
                                   }
                              },
                              "minNumFindings": 1,
                              "minConfidence": "LIKELY",
                              "displayName": "Credit Card Number",
                              "detectorType": "NIGHTFALL_DETECTOR",
                              "nightfallDetector": "CREDIT_CARD_NUMBER"
                         }
                    ],
                    "name": "My Match Rule",
                    "logicalOp": "ANY"
               }
          ]
     };
     
const response = await nfClient.scanText(payload, policy);

if (response.isError) {
  console.log(response.getError());
} else {
  response.data.findings.forEach((finding) => {
    if (finding.length > 0) {
      finding.forEach((result) => {
        console.log(`Finding: ${result.finding}, Confidence: ${result.confidence}`);
      });
    }
  });
}// Some code
>>> from nightfall import Confidence, DetectionRule, Detector, Nightfall

>>> # By default, the client reads the API key from the environment variable NIGHTFALL_API_KEY
>>> nightfall = Nightfall()

>>> # A rule contains a set of detectors to scan with
>>> cc = Detector(min_confidence=Confidence.LIKELY, nightfall_detector="CREDIT_CARD_NUMBER")
>>> ssn = Detector(min_confidence=Confidence.POSSIBLE, nightfall_detector="US_SOCIAL_SECURITY_NUMBER")
>>> detection_rule = DetectionRule([cc, ssn])
>>> payload = ["hello world", "my SSN is 678-99-8212", "4242-4242-4242-4242"]
>>> findings, _ = nightfall.scan_text( payload, detection_rules=[detection_rule])

The Policy (policy) you define indicates what to scan for in your payload with a logical grouped (ANY or ALL) set of Detection Rules (detectionRules).

Detection Rules can be defined two ways:

In the example above, two of Nightfall's native Detectors are being used: US_SOCIAL_SECURITY_NUMBER and CREDIT_CARD_NUMBER.

In the payload body, you can see that we are submitting a list of three different strings to scan (payload). The first will trigger the U.S. Social Security Detector. The last will trigger the credit card Detector. The middle example will trigger neither.

Example Nightfall API Scan Response

The Nightfall API returns a response with an array (findings) with a length that corresponds to the length of the payload array. In this example, only the first and last items in the request payload triggered the Detectors, so the second element of the array is empty.

In the first element of the array, you can see details about which Detection Rule was triggered and the data that was found (finding). The response also provides a confidence level (confidence), as well as the location within the original text where the data was found either in terms of bytes (byteRange) or characters (codepointRange).

{
  "findings": [
    [
      {
        "finding": "458-02-6124",
        "redactedFinding": "XXX-XXXX-XXXX-9293",
        "detector": {
          "name": "US Social Security Number",
          "uuid": "e30d9a87-f6c7-46b9-a8f4-16547901e069"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 39,
            "end": 50
          },
          "codepointRange": {
            "start": 39,
            "end": 50
          },
          "rowRange": null,
          "columnRange": null,
          "commitHash": ""
        },
        "matchedDetectionRuleUUIDs": [],
        "matchedDetectionRules": [
          "My Match Rule"
        ]
      }
    ],
    [],
    [
      {
        "finding": "5310-2768-6832-9293",
        "redactedFinding": "XXXX-XXXX-XXXX-9293",
        "detector": {
          "name": "Credit Card Number",
          "uuid": "74c1815e-c0c3-4df5-8b1e-6cf98864a454"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 25,
            "end": 44
          },
          "codepointRange": {
            "start": 25,
            "end": 44
          },
          "rowRange": null,
          "columnRange": null,
          "commitHash": ""
        },
        "redactedLocation": {
          "byteRange": {
            "start": 25,
            "end": 44
          },
          "codepointRange": {
            "start": 25,
            "end": 44
          },
          "rowRange": null,
          "columnRange": null,
          "commitHash": ""
        },
        "matchedDetectionRuleUUIDs": [],
        "matchedDetectionRules": [
          "My Match Rule"
        ]
      }
    ]
  ],
  "redactedPayload": [
    "",
    "",
    "My credit card number is XXXX-XXXX-XXXX-9293"
  ]
}

Congratulations! You have successfully completed the Nightfall Quickstart.

You can modify the Detectors or payload in the example request to get more practice with the Nightfall API.

You can create API keys in the .

Learn more about .

The cURL example may be run from the command line without any additional installation. To run the example, you will need to download the corresponding SDK.

i, as shown above

in the Nightall app, which you will then .

Learn more about in the Nightfall app to create your own Detectors, Detection Rules, and Policies. See for an example as to how to execute queries using an existing Detection Rules UUID.

You can find a full list of native Detectors in the .

If you don't want to create your Detectors, Detection Rules, and Policies in the Nightfall app, but would prefer to do it in code, it is possible to define with your own regular expressions or word list as well as extend our native Detectors with and rules.

When defining a Detection Rule, you configure the minimum(minConfidence) and minimum number of times the match must be found (minNumFindings) for the rule to be triggered.

Another feature Nightfall offers is the ability to sensitive findings. Detectors may be configured (via redactionConfig) to replace the text that triggered them with a variety of customizable masks, including an encrypted version of the text.

Dashboard
Authentication and Security
Python
nline as code
reference by UUID
setting up Nightfall
Using Pre-Configured Detection Rules
Detector Glossary
Detectors inline
exclusion
context
redact

Welcome

Welcome to the amazing world of the Nightfall Firewall for AI (formerly known as Nightfall Developer Platform). Here you can find all the information about Nightfall's APIs, and SDKs, and also usage examples of these APIs and SDKs.

Use Cases

There are many use cases for a high accuracy data classification and protection system like Nightfall. Here are some of the most popular to spark your imagination.

Protect sensitive data from transferring to downstream 3rd party services like LLM APIs.

Motivation

  • Third-party APIs provide services that greatly augment the capabilities of your applications.

    • For example, GenAI LLMs can automatically generate content. These LLMs can be accessed via APIs, such as OpenAI or Anthropic APIs.

    • Another example are telecom/communications APIs like SendGrid and Twilio that provide communications infrastructure.

  • The challenge is that these services may unnecessarily receive sensitive or confidential information from your application that is calling these APIs, which can pose data privacy risks because customer data is being shared outside the intended scope. For example, LLMs can handle very large inputs, or prompts, and these prompts may contain sensitive customer information.

Benefits

  • By filtering out customer data from API inputs, you will be able to leverage cutting-edge third-party services and APIs without introducing data privacy risks by oversharing sensitive or confidential information.

Sanitize user input to prevent unnecessary collection or proliferation of sensitive customer data.

Motivation

  • Applications collect and store sensitive information from consumers. Users may โ€œovershareโ€ or incorrectly input information, leading to sensitive data ending up in places it is not expected, or internal services may proliferate or handle this data in unexpected ways.

    • Fintech applications that intake, store, and generate files with PII like W-2s and paystubs.

    • Healthcare applications that handle protected health information or SSNs.

  • Marketplaces and social media applications allow for user generated content that may contain sensitive or illicit information, such as profanity, toxicity.

  • Support channels receive any inbound information from consumers, and can include highly sensitive information or over-sharing that is then exposed to support agents.

  • This data can come in a variety of unstructured formats - whether that be screenshots, images, documents, plaintext, compressed folders or archives, so to inspect this content requires high quality text extraction.

Benefits

  • Reduce the possibility of users inputting sensitive data that should not be collected or retained within your application or service by scanning data upon submission. Warn or prevent users from inputting sensitive data into form fields or file uploads.

  • Diminish collection of sensitive data types that could result in regulatory fines or brand damage, if leaked or breached.

  • Limit exposure of sensitive data to internal personnel like support agents that could lead to accidental misuse or intentional theft.

Audit and remove sensitive data in data silos and processing workflows for compliance.

Motivation

  • Compliance regimes like FedRAMP, PCI, and HIPAA may require that sensitive data is not proliferating into unsanctioned data silos, like project management systems, data warehouses, and logging infrastructure.

  • Many different development teams may be writing data into these internal services like logging and data warehousing, so it is challenging to enforce data sanitization on data ingress.

  • CDP tools like Segment and Fivetran can further proliferate sensitive data into a broader set of data silos than its original location.

  • Data analytics and data science teams may replicate and transform data, leading to further copies and versions across internal systems.

  • Edge cases, unexpected errors, and stack traces can lead to sensitive data landing or replicating in application logs.

Benefits

  • Identify and remove sensitive data from places that it shouldnโ€™t be.

  • Monitor data at rest in data silos instead of at points of ingress/egress that would be hard to monitor or track.

  • Scan extremely high volumes of unstructured data at scale.

  • Build workflows to delete data, redact data, or alert the right teams when sensitive data is found where it shouldnโ€™t be.

Build data classification and DLP features directly into your SaaS application.

Motivation

  • Data classification and DLP capabilities are increasingly expected by regulated institutions such as big banks.

  • Building data classification and DLP from scratch is complex and has high opportunity costs in moving developers away from working on the core product offering. Building a half-baked solution erodes customer trust, especially when there is already a high degree of skepticism around the quality of traditional DLP solutions.

  • SaaS and security vendors can deliver additional customer value and drive additional revenue through premium enterprise feature tiers that include security features like DLP, SAML SSO, audit logging, and more.

Benefits

  • Reduce time-to-market by leveraging out of the box components.

  • Reduce the overhead of an in-house data classification service that requires text extraction services, detector research and tuning, machine learning model development and deployment, maintenance & support.

  • Deliver best in class accuracy, reducing the risk of alert fatigue or missing sensitive data that erodes customer trust.

Centralize detection logic, custom detectors & regexes all in one place instead of embedded directly in code, and reduce the number of regexes required.

Motivation

  • Detecting a single type of sensitive data well (e.g. a credit card number) can be complex - requiring research and maintenance as the detector evolves over time. This becomes especially challenging for esoteric detectors, for example those that are region or industry-specific.

  • Managing regexes and input validation is complex and evolving. For example, a regex embedded in code to validate a Google Docs link may need to be updated over time as the format for Google Docs links changes, false positives are identified and accounted for, any performance implications are observed.

  • Many data types cannot be detected accurately with a regex because they require a certain level of validation, are heavily context dependent, or are highly variable or entropic in nature leading to a regex being overly sensitive or overly specific.

Benefits

  • Leverage out of the box detectors so no engineering time is spent on research, training, tuning detectors. No need to reinvent the wheel. These detectors span the categories of PII, PCI, PHI, credentials & secrets, ID numbers, and more.

  • Reduce time spent finding, tuning, and sharing regular expressions.

  • Build upon out of the box detectors with custom logic, instead of having to start from scratch with a regex or custom validation logic.

Improve accuracy of existing content inspection systems.

Motivations

  • Existing content inspection systems may yield a high degree of false positives (i.e. noise), leading to alert fatigue and significant time wasted on inaccurate alerts.

  • On the contrary, existing solutions may also be very limited in detection scope, leading to a high degree of false negatives (i.e. misses), putting the business at risk when sensitive data is missed.

Benefits

  • Replace existing, brittle solutions with a highly accurate content inspection system.

  • Reduce engineering time spent analyzing false positives and attempting to tune them out.

Sanitize inputs to labeled data used to train machine learning models.

Motivation

  • In training complex learning models, data scientists must compile and use large corpuses of data to improve the accuracy of the trained model. Unknowingly leveraging sensitive data in this effort can lead to violations of compliance regimes like HIPAA, GDPR, or PCI.

  • Models that focus on health, finance, public sector applications are particularly at risk for ingesting sensitive data that may violate industry specific compliance mandates.

  • Labeled data is often ingested from unregulated sources like customer communications, emails, public repos, and more. Inspecting all of these input sources manually is untenable.

  • Additionally, the data being leveraged may be in a variety of unstructured formats like screenshots, images, documents, plaintext, compressed folders or archives โ€“ to inspect this content requires high quality text extraction.

Benefits

  • Ensure the hygiene of the labeled data you are using to train your machine learning models

  • Diminish collection of sensitive data types that could result in regulatory fines or brand damage, if leaked or breached.

Example use cases by team and industry.

  • Healthcare: Detect PHI to ensure HIPAA compliance in your apps

  • Financial services: Secure PII and PCI like bank account numbers, payment card details, and social security numbers

  • E-commerce: Prevent costly data breaches of PII and PCI that can damage brand reputation

  • Education: Protect student and faculty privacy within applications

  • Customer support: Redact sensitive data in customer support system, shielding agents from information they shouldnโ€™t see

  • IT Operations: Search for API keys, credentials, and secrets across internal and external data silos

  • Product: Create custom solutions for data classification, DLP, content moderation and more within your applications

  • Compliance: Address PCI-DSS, HIPAA, FedRAMP, GDPR, CCPA, GLBA, FERPA, PHIPA, and more

  • People & Community: Content moderation to detect profanity, toxicity

  • Gaming: Detecting profanity, toxicity, or even personal or financial information being shared in community chat rooms

Entities and Terms to Know

This section describes the terms you will need to know when using the API.

Detectors

Detectors provide the logic to find potentially sensitive pieces of data.

When this logic detects such data, the Detector is considered "triggered."

Nightfall's has numerous pre-built Detectors that are trained via machine learning. Detectors may also be defined with regular expressions or dictionaries. Their accuracy may be further refined with exclusion rules and context rules. Whether a Detector is triggered may be controlled by a minimum confidence threshold per Detector and minimum number of findings per Detector as set on a Detection Rule.

The built-in set of Detectors cover a number of different categories of data, including:

  • Standard PII (e.g. social security number, driver's license number, ID card image)

  • PCI (Credit Card Number, credit card image)

  • Healthcare (e.g. PHI, US Medicare Beneficiary Number)

  • Finance - Banking (e.g. SWIFT code, IBAN code, US bank routing number)

  • Network (e.g. an IP Address)

Custom Detectors

Exclusion Rules

An exclusion rule is a regular expression or word list that will be used once a Detector is triggered by its primary expression or word list to eliminate false positives.

For instance, you may have a Detector designed to detect phone numbers. However, you may have a particular set of phone numbers that you use for testing purposes that are known not to be valid (e.g. they start with the prefix 555) and this should be ignored. Adding an exclusion rule would allow you to prevent those matches from being returned by the API.

Context Rules

Context Rules are additional matching expressions for a Detector that may be used to adjust the confidence score of a match.

You may provide a regular expression and the number of leading or trailing characters within which a match of that expression must occur in order to adjust the confidence level to a particular level.

For instance, if you found a sequence that appeared to be a social security number based on its length or formatting, you might boost the confidence score if it was preceded by the text like โ€œSSNโ€ or โ€œSocial Security Number.โ€

Returning Surrounding Context

You may request that a sequence of bytes of a given length be provided from before and after the text that triggers a Detection Rule.

This information can help you better understand whether or not something is an actual violation by observing the circumstances within which the detected text was found.

You are limited to a maximum of 40 bytes of this context text preceding and trailing the match for a total of 80 bytes overall.

Detection Rules

Detection Rules are aggregations of Detectors that are assigned a minimum confidence level. The identifiers of Detection Rules are used as a parameter to the API.

A Detection Rule is composed of a list of Detectors with which you wish to scan each request payload, where any or all Detectors may be satisfied in order to trigger the rule. You can add up to 50 total Detectors with a limit of 30 regular expression type custom detectors.

Confidence Levels

Detection results will be returned with one of the following confidence values.

In practice, the API will only return detections assigned a POSSIBLE or higher confidence level.

  • VERY_LIKELY (recommended)

  • LIKELY

  • POSSIBLE

  • UNLIKELY

  • VERY_UNLIKELY

Policies

Policies allow you to create templates for the most common workflows by unifying a set of Detection Rules with the actions to be taken when those rules are triggered, including:

  • automated actions such as redaction of findings

  • alerting through webhooks

Once defined, a Policy may be used in requests to the Nightfall API, such as calls to scan file uploads, though automated redactions are not available for uploaded files at this time.

Authentication and Security

Your API keys carry many privileges, so be sure to keep them secure. Do not share your secret API keys in publicly accessible areas such as GitHub, client-side code, or anywhere else that would compromise their secrecy. If you believe one of your API Keys has been compromised, you should delete it through the Dashboard.

All API requests must be made over HTTPS.

Calls made over plain HTTP will fail.

API requests without authentication will fail.

Overview

Welcome to Nightfall's Firewall for AI Developers Scan and Workflow APIs documentation. This documentation helps developers leverage Nightfall AI's industry-leading detection engine to identify and protect sensitive customer and corporate data anywhere. It prevents unauthorized access and data breaches and allows you to focus on innovation.

Scan APIs

Scan prompts, text, documents, spreadsheets, logs, zips, JSON, images, etc., for PII, PHI, PCI, banking information, API keys, passwords, and network information with the highest accuracy and lightning-fast response times. Redact sensitive findings with customizable formatting.

Workflow APIs

Leverage the full potential of the Nightfall console application through our Workflow APIs. Customize your SIEM workflows and reporting, take actions, update support tickets, alert users, search violations, annotate findings, create reports, and more.

Key Features

  • AI-Powered Identification: Utilize advanced AI models to detect and prevent security threats in real-time.

  • Comprehensive Sensitive Data Detection: Identify PII, PHI, PCI, banking information, API keys, passwords, and network information across various formats including text, documents, spreadsheets, logs, zips, and images.

  • Customizable Redaction: Tailor data protection to your needs with fully customizable redaction for each sensitive entity type.

  • Flexible Detectors: Leverage Nightfallโ€™s comprehensive list of machine learning-based detectors, customize them, or create your own with specialized logic.

  • High Accuracy and Performance: Achieve precision and recall rates of 95% or higher, handle over 1K requests per second, and experience latency of less than 100 ms.

  • Seamless Integration: Easily integrate with your existing AI development and data engineering tools for smooth and efficient operation.

Customizable and Built-in Machine Learning-based Detectors

You can leverage Nightfallโ€™s machine learning-based detectors or create your own detectors with customized logic to scan third-party apps, internal services, and data silos to identify instances of potentially sensitive types of data such as:

  • Personally Identifiable Information (PII) including Social Security Numbers, passport numbers, email addresses, or date of birth

  • Protected Health Information (PHI) such as insurance claim numbers or ICD10 codes

  • Financial information like credit card numbers or bank routing numbers

  • Secrets such as API and cryptographic Keys, database connection strings, passwords, etc.

  • Network information such as IP Address or MAC Address

A Flexible Data Security Solution

Key features of Nightfallโ€™s detection engine include:

  • Defining minimum confidence thresholds and minimum finding counts on detectors to reduce the chance of false positives.

  • Choosing which detectors are triggered for each policy.

Using the API

The findings display the relevant detector, the likelihood of a match, and the location within the given data where the matched token occurred (not only in terms bytes โ€” there is support for tabular and JSON data as well).

Where to Go From Here

After that, you can learn about Nightfall with our Key Concepts section, which will also help you get set up with Nightfall.

We can't wait to hear more about what you're planning to build: reach out to us anytime at to discuss your use case.

The full set is enumerated in the .

Nightfall also supports and word lists for any custom detectors that you may want to implement.

Over time, we've aggregated the following , which you're welcome to select from to save you some time. Please note that a regular expression is an established yet limited method that searches for pre-defined patterns, so your mileage may vary.

You can test regular expressions .

You can input custom detectors in two ways: directly in the Nightfall by navigating to Detectors โ†’ New Detector โ†’ Regular expression, or

See:

See:

You may create Detection Rules as described in the section and use their identifier as part of API calls to scan content.

Alternatively you may specify Detection Rules in each API call, as described in the scan method documentation below.

Additionally, each Detector in the Detection Rule is assigned a โ€œminimum confidenceโ€ level (see and a minimum number of findings to determine if the Detection Rule should be considered triggered.

Learn more about what different confidence levels mean and how to choose the right minimum confidence level for your detection rule .

The Nightfall API uses API keys to authenticate requests. You can create and view your API keys in the Nightfall app on the page.

Specifying and on detectors to fine-tune their accuracy to better suit your use cases.

The Nightfall API consumes arbitrary data as input either as or as and allows you to use any combination of detectors to return a collection of โ€œfindings" objects.

The detectors may be defined in our and or defined as part of the .

You can take protective action on sensitive text by , substituting, or encrypting it with the API. You may also set up to receive asynchronous notifications when findings are detected.

The Nightfall API is RESTful and uses JSON for its payloads. Our API is designed to have predictable, resource-oriented URLs for each endpoint and uses to indicate any API errors.

You may test out the API through the

The following guide will walk you through getting started and describe the API functionality in more detail. If you want to execute an API call immediately, see our guide to see how to obtain an API Key and make a simple scan request.

If youโ€™re looking for more ideas about best to leverage Nightfallโ€™s functionality, see our guide.

We have created numerous that demonstrate how to implement DLP for a variety of platforms (including OpenAI, LangChang, Amazon, Datadog, and Elasticsearch) and handle various scenarios (such as detecting sensitive data in GenAI prompts or detecting PII on your machine in real-time).

We also have several language-specific to get you up and running in Java, Python, Go, Node.js, and Ruby.

You can also quickly test out Nightfall detectors or your custom Detection Rules in the . Please also consult our Detector to see the variety of built-in detectors that Nightfall offers.

The page allows you to create API keys and manage Detectors and Detection Rules through a straightforward user interface. Log in here to access the Dashboard, or sign up to create a free account.

For frequently asked questions, feedback, and other help, please contact Nightfall support at . We also host on Wednesdays at 12pm PT to help answer questions, talk through any ideas, and chat about data security. We would love to see you there!

support@nightfall.ai
confidence level
Detector Glossary
RE2 regexes
regex library
here
Dashboard
define them inline
.
Using Context
Creating Detection Rules
programmatically
below
here
Manage API Keys
context rules
exclusion rules
strings
files
web app
referenced in an API call
payload to an API call
redacting
webhooks
interactive reference documentation
.
Quickstart
Use Cases
tutorials and example implementations
SDKs
Nightfall Playground
Glossary
Firewall for AI Overview
support@nightfall.ai
Nightfall Developer Office Hours
Using Exclusion Rules

Creating Detection Rules

You can define Detection Rules โ€œinlineโ€ in the body of each request to the scan endpoint. See the example in the walk through of the scan endpoint Creating an Inline Detection Rule.

You may add up to 50 detectors to your detection rule.

To create a Detection Rule in the Nightfall UI, Select "Detection Rules" from the left hand navigation.

Click the + New Detection Rule button in the upper right hand corner.

First, enter a name for your Detection Rule as well as an optional description.

Then click the + Detectors button to add Detectors to your Detection Rule.

In this example we have selected the US drivers license and Canada Government ID detectors.

Click the Add button in the lower right hand corner at the end of the detector list when you are done adding detectors.

Now that your Detectors are set, choose a minimum confidence level and a minimum # of findings for each detector.

If these minimums for a Detector are not met, the Detection Rule will not be triggered.

Save your Detection Rule in the lower left hand corner once you are done.

Once the Detection Rule is saved, it is available for use in requests to the Nightfall API to scan your data for sensitive information. Pass in the UUID of the Detection Rule as the detectionRuleUUIDs field of your requests to the the scan endpoints.

The UUID may be obtained by clicking the "copy" icon, the left most icon in the set of icons that appear next to the Detection Rules name when your cursor highlights a Detection Rule in the list of Detection Rules.

See Using Pre-Configured Detection Rules for an example of using a Detection Rule UUID.


Alerting

The way that an alert notification presents itself depends on the platform in question.

For example, notifications sent to Slack will appear as formatted messages sent by the Nightfall Alerts Bot. Other destinations such as email, SIEM url, and webhooks, will present the information as JSON objects.

In the case of webhooks, detailed information about the finding will be sent. For other destinations, sensitive information is redacted.

Supported Alert Platforms

Slack

In order to use asynchronous notifications with Slack, you must install the Nightfall Alerts plugin from the Slack Marketplace.

Once you have authenticated Nightfall to your Slack workspace, you can provide any public channel name (e.g. #general) as part of a request to the Nightfall API.

To send notifications to a private channel, a member of the channel should invite the Nightfall bot to the specific private channel and allow channel access to the bot.

Follow the steps below to invite Nightfall Alerts bot to a private channel:

  1. Go to the Slack channel in question

  2. Type /invite @Nightfall Alerts as a message

  3. Press 'Enter' (you should see a message that Nightfall Alerts has now joined the channel)

If any findings are detected as part of that request, then the Nightfall Alerts bot will send a message to the channel you configured. Conversely, if there are no findings in the request payload, then Nightfall will not send an alert message.

Teams

Documentation TBD

Email

Email is unauthenticated, so you can get started using Nightfall to send email alerts without any initial setup work.

Nightfall will send an email to the provided address only if findings were detected as part of the request. The findings themselves will be attached in a JSON file.

SIEM

You may send your alerts to a designated url, such as an endpoint hosted by SIEM software for log collection.

In addition to the url, you may provide headers, either for security or logging purposes.

Webhook

You may use a webhook server to programmatically handle a finding, allowing you to create your own custom workflows with your own or 3rd party systems.

Nightfall will always send an alert to the client's webhook server if it is provided as part of an API request, even if the scan request yielded no findings.

Alert Schemas

The request body sent by Nightfall is JSON, and uses the schemas in the section documented below.

File Scans

Since file scans can produce a large number of results, findings are not transmitted directly in the notification that Nightfall sends. The notification object looks like the following:

{
  "errors": [],
  "findingsPresent": true,
  "findingsURL": "https://files.nightfall.ai/877442c5-1573-4637-a223-595bf620e3e5.json?Expires=1645722381&Signature=C-kQbtonFAPXfooGcm0dYgbsn9jfGu~vGSv5yK5j1z2f7aAhk0WuaL4bISUwx5MZkQmPVFgeyMwemvEoI8aI11lPA-ORsX5LtRdGJBOma4sPVl~9f9qBPKE2VSrdGDmT4EpBLc8ewUtKrLm2xE-0BzW~5PdLSvZ~NQxtB7OMBaYm7h~y2NSUZfpqzdzENyKhyHx5QxH2PJvxeN5IvMXqNUrKyZsxviSYY6kDNAiGExS-u6PmKKS1GhXOaFLdJSRjgtFhUxDLyWl~xTYR-lJol5UTgtcuYU8AaJ3xVTF1-1JYRlioRlaf9shAvme4djFyg8k~zOB8bYgzBeaRqSjeWA__&Key-Pair-Id=K3RYMP51FKX5HX",
  "requestMetadata": "some data",
  "uploadID": "877442c5-1573-4637-a223-595bf620e3e5",
  "validUntil": "2022-02-24T17:06:21.412377682Z"
}

The requestMetadata field contains arbitrary contents provided by the client at request time, and can be used by the client to correlate this response to the original request.

The value of the findingsURL field is a pre-signed URL, which means anyone with the link can download the file. Therefore, this URL itself should be treated as sensitive and must not be leaked. The object stored at this URL is a JSON file containing a single key findings containing a list of all data detected from the request. The schema for the finding object inside the list is shared between the text-based and file-based API endpoints.

{
  "findings": [
    {
      "detector": {
        "id": "74d1315e-c0c3-4ef5-8b1e-6cf98664a854"
      },
      "finding": "4242-4242-4242-4242",
      "confidence": "VERY_LIKELY",
      "location": {
        "byteRange": {
          "start": 146,
          "end": 165
        },
        "codepointRange": {
          "start": 146,
          "end": 165
        },
        "lineRange": {
          "start": 3,
          "end": 3
        },
        "rowRange": null,
        "columnRange": null,
        "commitHash": ""
      },
      "beforeContext": "nd HIPAA Defined PII\nHIPAA HIPAA hooray\n",
      "afterContext": " is my credit card number\n\n",
      "matchedDetectionRuleUUIDs": ["7bd6166a-b9af-4069-847d-487a88788122"],
      "matchedDetectionRules": []
    } 
  ]    
}

Text Scans

{
  "findings": [
    [
      {
        "finding": "4242-4242-4242-4242",
        "beforeContext": "hello world cc ",
        "detector": {
          "name": "Credit card number",
          "uuid": "74c1815e-c0c3-4df5-8b1e-6cf98864a454"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 15,
            "end": 34
          },
          "codepointRange": {
            "start": 15,
            "end": 34
          },
          "rowRange": null,
          "columnRange": null,
          "commitHash": ""
        },
        "matchedDetectionRuleUUIDs": [
          "42efe36c-6479-412a-9049-fd8cdf895ced"
        ],
        "matchedDetectionRules": []
      }
    ]
  ],
  "redactedPayload": [""]
}

Setting Up Nightfall

Before you use the scan endpoint, there are a number of actions to do within the Nightfall dashboard to get your environment set up properly.

Creating Policies

Policies allow customers to create templates for their most common workflows by unifying a set of Detection Rules with the actions to be taken when those rules are triggered, including:

  • automated actions such as redaction of findings

  • alerting through webhooks

Once defined, a Policy may be used in requests to the Nightfall API, such as calls to scan file uploads, though automated redactions are not available for uploaded files at this time.

To create a policy:

  1. Log in to Nightfall.

  2. Click Overview under the Firewall for AI section.

  1. Click Create Policy.

The policy creation page is displayed as follows.

If you click the Policies button under the Setting Up section, you need to execute a couple of additional steps to reach the policy creation page, as displayed in the following image.

  1. Enter a name for the policy.

  2. (Optional) Enter a Description for the policy.

  3. Click + Detection rule to add a Detector rule to the policy.

  4. Select the check box of the Detector rules that you wish you add to the Policy.

  1. Select the Redact Violations check box to mask sensitive information found in your transmitted data.

  2. Select one of the alerting method available.

  3. Click Save Policy.

Configuring Webhook Alerts

When you click + Application Webhook, the following window is displayed.

If you have custom headers you would like to add to requests sent to the Webhook URL, you can do this from the overlay that appears when you click the "+ Webhook" button on the policy creation and edit page. These headers may be used for the purpose of authentication as well as integrating with Security Incidents and Event Management (SIEMs) or similar tools that aggregate content through HTTP event collection.

Click the "Add Header" button to add your custom headers.

Once your header key and value is entered you may obfuscate it by clicking on the "lock" icon next to the value field for the header. Click the "Save" button to persist your changes to the headers.

When you have completed configuring your Webhook URL and Headers, click the "Save" button.

๐ŸšงLimits On Webhook Headers

It is currently not possible to configure headers for webhooks programmatically when defining policies through the API.

After you click the "Save Policy" button, your policy should be immediately available for use. You can refer to the API Docs for the comprehensive list of endpoints that support policy UUIDs.


Creating API Key

The API expects an API Key to be passed via the Authorization: Bearer <key> HTTP header.

To create and manage API keys:

  1. Log in to Nightfall.

  2. Click Overview under the Firewall for AI section.

  1. Click Create key.

The Generate API Key window is displayed.

  1. Enter a name for the API key and click Create.

The API key is generated and displayed (blurred in the following image). Click the copy button to copy the API key and store it in a. secure location. Once you click the Got it button, you cannot retrieve the API key again.

๐ŸšงBe Sure to Record the API Key's Value

For security reasons, after closing the window, you will not be able to recover the key's value.

Once you close the window, the My API Keys page will display your newly generated key, with the majority of the Key redacted.

You can return to the Overview page at any time to create new keys (assuming your license allows you to generate additional keys) or delete old keys.


You can also use the > to predefine your Detection Rules. Once you have created a Detection Rule, you will receive a UUID, which you can pass in as part of your API request payloads.

The Nightfall Detection Rules page
Creating a New Detection Rule
Selecting Detectors for a Detection Rule
Setting confidence levels and minimum findings for a Detection Rule
Copying a UUID for a Detection Rule

Nightfall has the ability to send alerts when a violation is detected.

Policies for alerting may be configured through the Nightfall app user interface or they may be set up . Policies that are configured under Developer Platform > Overview > Policies may be used in the API by referencing their Policy UUID.

See our end user documentation on installing for more details.

See in our end user guide or our for more details.

See for more details.

The payload that is forwarded on behalf of text scanning requests is identical to the response body that is synchronously returned to the client. Refer to the for more details on this payload.

See to see how to create the necessary Authentication token for making API calls.

See for how to define your own custom logic for detecting sensitive data

See for how to aggregate Detectors for use in the scan endpoint

See for how to set up common workflows that combine your Detection Rules with remediation actions such as alerting.

This document applies only to the Nightfall Firewall for AI customers. If you are a Nightfall SaaS application customer, refer to .

Click + Application Webhook to add the URL of a webhook that needs to be notified. See to learn more.

Nightfall UI
Detection Rules
policy
programmatically
Nightfall for Slack
integrating with SIEM
API documentation on using policies
Creating a Webhook Server
API docs
Creating an API Key
Creating a Detector
Creating Detection Rules
Creating Policies
Configuring Webhook Alerts
this document

Supported File Types

The file scan API has first-class support for text extraction and scanning on all MIME types enumerated below.

Handling of MIME Types Not Listed

Files with a MIME type not listed below are processed using an unoptimized text extractor. As a result, the quality of the text extraction for unrecognized types may vary.

Accepted Text and Derivatives

  • application/json

  • application/x-ndjson

  • application/x-php

  • text/calendar

  • text/css

  • text/html

  • text/javascript

  • text/plain

  • text/x-php

Accepted Office Formats

  • application/pdf

  • application/vnd.openxmlformats-officedocument.presentationml.presentation

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document

Accepted Archive and Compressed File Types

  • application/bzip2

  • application/ear

  • application/gzip

  • application/jar

  • application/java-archive

  • application/tar+gzip

  • application/vnd.android.package-archive

  • application/war

  • application/x-bzip2

  • application/x-gzip

  • application/x-rar-compressed

  • application/x-tar

  • application/x-webarchive

  • application/x-zip-compressed

  • application/x-zip

  • application/zip

Accepted Image File Types

  • image/apng

  • image/avif

  • image/gif

  • image/jpeg

  • image/jpg

  • image/png

  • image/svg+xml

  • image/tiff

  • image/webp

Rejected MIME Types

The file scan API explicitly rejects requests with MIME types that are not conducive to extracting or scanning text. Sample rejected MIME types include:

  • application/photoshop

  • audio/midi

  • audio/wav

  • video/mp4

  • video/quicktime

Spreadsheets and Tabular Data

File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange, codepointRange, and lineRange properties.

Findings will contain a columnRange and a rowRange that will allow you to identify the specific row and column within the tabular data wherein the finding is present.

This functionality is applicable to the following mime types:

  • text/csv

  • text/tab-separated-values

  • text/tsv

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • application/vnd.ms-excel

Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.

{
   "findings":[
      {
         "path":"Sheet1 (5)",
         "detector":{
            "id":"e30d9a87-f6c7-46b9-a8f4-16547901e069",
            "name":"US social security number (SSN)",
            "version":1
         },
         "finding":"624-84-9182",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":2505,
               "end":2516
            },
            "codepointRange":{
               "start":2452,
               "end":2463
            },
            "lineRange":{
               "start":55,
               "end":55
            },
            "rowRange":{
               "start":55,
               "end":55
            },
            "columnRange":{
               "start":2,
               "end":2
            },
            "commitHash":""
         },
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
...

Redacting CSV Files

Findings within csv files may be redacted.

To enable redaction in files, set the enableFileRedaction flag of your policy to "true"

The csv file will be redacted based on the configuration of the defaultRedactionConfig of the policy

curl --request POST \
     --url https://api.nightfall.ai/v3/upload/02a0c5e1-c950-4e28-a988-f6fffefc4205/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer NF-<Your API Key>' \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRuleUUIDs": [
               "950833c9-8608-4c66-8a3a-0734eac11157"
          ],
          "alertConfig": {
               "email": {
                    "address": "<your email addres>"
               }
          },
          "defaultRedactionConfig": {
               "maskConfig": {
                    "charsToIgnore": [
                         "-",
                         "@"
                    ],
                    "maskingChar": "*"
               }
          },
          "enableFileRedaction": true

     },
     "requestMetadata": "csv redaction test"
}
'

When results are sent to the location specified in the alertConfig (in this case an email address) a redactedFile property will be set with a fileURL in addition the findingsURL

{
   "errors":null,
   "findingsPresent":true,
   "findingsURL":"https://files.nightfall.ai/asdfc5e1-c950-4e28-a988-f6fffefc4205.json?Expires=1655324479&Signature=zjo1nT-PECHC-fiTvAgdA8aDnceoY~6iGfzOBCcBjscKqOHnIar8hoH4gGufffiulBw5BpfJuvWwBW~lXO~ZNhN139LDwoTsfLJswJiQCB2Hj-Az0Em6go~1j8WBqCS8G0Gk17M-zcPedHGX3z~1pw8nm5sh6Pa-jJwfw9NIEiqmBb3Vdcj3J-~Wzag~ENV4499rnG299ee-ig5Ms1oVlzycb4YxzgTMrTL5Q07ozNenwFZcGDNQre1inLXmV-m8teLX-K3boklenp9KXiNDDV0wi74ADN-QfIR1q1oU7mEI1f3aVC3kju0QRErp2lsfs08EtZKLE3C4N17jDJdYcw__&Key-Pair-Id=K24YOPZ1EKX0YC",
   "redactedFile":{
      "fileURL":"https://files.nightfall.ai/asdfc5e1-c950-4e28-a988-f6fffefc4205-redacted.csv?Expires=1655324479&Signature=Hx8kRh88maLeStysy3fsLbFVG9VELEtfemtQe2lWUnFjAMd9HqlEksTmirqAWFWV4zPVUB73izlMj5cSer8v2N5ZCcnD3dz~nnwR4P5LewGJ2CQzGnDnXgh70HW5qp04gnUD-pYWp~bGPVspkJKCkl1zH-EoGonvcNVq3SNsVzOlsVIjep7Y7otQKEEyAZ7JmHiVfuBxrvn8pleuC5lEJ3f9miPyoRqH9DyPlNTJTIuijqe9q32Qcui2RsDR6IT-foFX52dy6rRa01ZV0gZMDWJokMlCr8Iu5An~qnhxC49bqTtI82oz9FcBaP-Yea8cq1TiAfGxX7CJ0~JeTLvr6g__&Key-Pair-Id=K24YOPZ1EKX0YC",
      "validUntil":"2022-06-15T20:21:19.750990823Z"
   },
   "requestMetadata":"csv redaction test",
   "uploadID":"02a0c5e1-c950-4e28-a988-f6fffefc4205",
   "validUntil":"2022-06-15T20:21:19.723045787Z"
}

This redacted file will be a modified version of the original csv file.

Below is an example of a redacted csv file.

name,email,phone,alphanumeric
Ulric Burton,*****@*************,*-***-***-****,TEL82EBM1GQ
Wade Jones,******************@***********,(********-****,VVF64PJV2EF
Molly Mccullough,*****************@**********,(********-****,OHO41SFZ2BR
Raja Riggs,************@**********,(********-****,UVD51JTE5NZ
Colin Carter,**********************@*********,(********-****,LNI34LLC5WV// Some code

Git Repositories

Nightfall provides special handling for archives of Git repositories.

Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.

In order to scan the repository, you will need to create a clone, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

This creates a clone of the Nightfall go SDK.

You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.

zip -r directory.zip directory

Note that in order to work, the hidden directory .github must be included in the archive.

Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http:// or https://), which will send results such as the following:

{
   "findings":[
      {
         "path":"f607a067..53e59684/nightfall.go",
         "detector":{
            "id":"6123060e-2d9f-4f35-a7a1-743379ea5616",
            "name":"URL"
         },
         "finding":"https://api.nightfall.ai/\"",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":142,
               "end":168
            },
            "codepointRange":{
               "start":142,
               "end":168
            },
            "lineRange":{
               "start":16,
               "end":16
            },
            "rowRange":{
               "start":0,
               "end":0
            },
            "columnRange":{
               "start":0,
               "end":0
            },
            "commitHash":"53e59684d9778ceb0f0ed6a4b949c464c24d35ce"
         },
         "beforeContext":"tp\"\n\t\"os\"\n\t\"time\"\n)\n\nconst (\n\tAPIURL = \"",
         "afterContext":"\n\n\tDefaultFileUploadConcurrency = 1\n\tDef",
         "matchedDetectionRuleUUIDs":[
            "cda0367f-aa75-4d6a-904f-0311209b3383"
         ],
         "matchedDetectionRules":[
            
         ]
      },
 ...

Support for Large Repositories

Currently, processing is limited to repositories with a total number of commits lower than 5000.

Large repositories result in a large volume of data sent at once. We are working on changes to allow these and other large surges of data to be processed in a more controlled manner, and will increase the limit or remove it altogether once those changes are complete.

Sensitive Data in GitHub Repositories

If the finding in a GitHub repository is considered to be sensitive, it should be considered compromised and appropriate mitigation steps (i.e. secrets should be rotated).

To retrieve the specific checkout, you will need to clone the repository, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

You can then checkout the specific commit using the commit hash returned by Nightfall.

cd nightfall-go-sdk
git checkout 53e59684d9778ceb0f0ed6a4b949c464c24d35ce

File Scanning Limitations

  • CSV Files: Only the first 250,000 rows will be scanned.

  • Spreadsheet Files: Up to 100,000 rows per sheet will be scanned, with a maximum of 1 million rows across all tabs in multi-sheet spreadsheets.

  • PDF Files: Scanning is limited to the first 100 pages, including a maximum of 50 images within those pages.

  • Images: Images smaller than 5KB or larger than 50MB will be excluded from scanning.

  • Archive Files: A maximum of 1,000 files will be extracted and scanned. Files larger than 100MB requiring extraction will not be scanned.

File Scanning and Webhooks

After the file scan has been processed asynchronously, the results will be delivered to the webhook.

Webhook Payload and Findings for File Scans

For a file scan, your webhook will receive a request body that will be a JSON payload containing:

  • the upload UUID (uploadID)

  • a boolean indicating whether or not any data in the file matched the provided detection rules (findingsPresent)

  • a pre-signed S3 URL where the caller may fetch the findings for the scan (findingsURL). if there are no findings in the file, this field will be empty.

  • the value you supplied for requestMetadata. Callers may opt to use this to help identify their input file upon receiving a webhook response. Maximum length 10 KB.

Below is an example of a payload sent to the webhook URL.

{
    "findingsURL": "https://files.nightfall.ai/asdfasdf-asdf-asdf-asdf-asdfasdfasdf.json?Expires=1635135397&Signature=asdfasdfQ2qTmPFnS9uD5I3QGEqHY2KlsYv4S-WOeEEROj~~x6W2slP2GvPPgPlYs~lwdr-mtJjVFu4LtyDhdfYezC7B0ysfJytyMIyAFriVMqOGsRJXqoQfsg8Ckd2b6kRcyDZXJE25cW8zBS08lyVwMBCsGS0BKSin8uSuD7pQu3QAubT7p~MPkfc6PSXYIJREBr3q4-8c7UnrYOAiXfSW1AmFE47rr3Wxh2TpU3E-Fxu-6e3DKN4q6meACdgZb2KHZo3e-NK7ug9f8sxBp1YT0n5oiVuW4KXguIyXWN~aKEHMa6DzZ4cUJ61LmnMzGndc2sVKhii39FHwTsYog__&Key-Pair-Id=asdfOPZ1EKX0YC",
    "validUntil": "2021-10-25T04:16:37.734633129Z",
    "uploadID": "152848af-2ac9-4e0a-8563-2b82343d964a",
    "findingsPresent": true,
    "requestMetadata": "",
    "errors": []
}

In this example, we have uploaded a zip file with a python script (upload.py) and a README.md file. A Detector in our DetectionRule checks for the presence of the string http://localhost

{
   "findings":[
      {
         "path":"fileupload/upload.py",
         "detector":{
            "id":"58861dee-b213-4dbc-97fa-a148acb8bd1a",
            "name":"localhost url"
         },
         "finding":"http://localhost",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":105,
               "end":121
            },
            "codepointRange":{
               "start":105,
               "end":121
            },
            "lineRange":{
               "start":7,
               "end":7
            }
         },
         "beforeContext":"PLOAD_URL = getenv(\"FILE_UPLOAD_HOST\", \"",
         "afterContext":":8080/v3\")\nNF_API_KEY = getenv(\"NF_API_K",
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
      {
         "path":"fileupload/README.md",
         "detector":{
            "id":"58861dee-b213-4dbc-97fa-a148acb8bd1a",
            "name":"localhost url"
         },
         "finding":"http://localhost",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":570,
               "end":586
            },
            "codepointRange":{
               "start":570,
               "end":586
            },
            "lineRange":{
               "start":22,
               "end":22
            }
         },
         "beforeContext":"t the script will send the requests to `",
         "afterContext":":8080`, but this can be overridden using",
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
      {
         "path":"fileupload/README.md",
         "detector":{
            "id":"58861dee-b213-4dbc-97fa-a148acb8bd1a",
            "name":"localhost url"
         },
         "finding":"http://localhost",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":965,
               "end":981
            },
            "codepointRange":{
               "start":965,
               "end":981
            },
            "lineRange":{
               "start":26,
               "end":26
            }
         },
         "beforeContext":"ice deployment you want to connect to | ",
         "afterContext":":8080 |\n| `NF_API_KEY`      | the API Ke",
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      }
   ]
}

Creating Detectors

To create a Detector, select "Detectors" from the left-hand navigation and click the + New Detector button

Custom detectors can add context and exclusion rules on top of pre-built Nightfall detectors, or can be built off your own custom regular expressions.

Be aware that you may not have two detectors based on the same Nightfall data type within the same detection rule.

A full glossary of Nightfall's prebuilt detectors can be found in the Detector Glossary

Updated 2 months ago


Uploading and Scanning API Calls

Uploading files using Client SDK libraries requires fewer steps as all the required API operations are wrapped in a single function call. Furthermore these SDKs handle all the programmatic logic necessary to send files in smaller chunks to Nightfall.

Using Nightfall's SDKs to Upload Files

>>> from nightfall import Confidence, DetectionRule, Detector, Nightfall, EmailAlert, AlertConfig
>>> import os

>>> # use your API Key here
>>> nightfall = Nightfall("NF-y0uRaPiK3yG03sH3r3")

>>> # A rule contains a set of detectors to scan with
>>> cc = Detector(min_confidence=Confidence.LIKELY, nightfall_detector="CREDIT_CARD_NUMBER")
>>> ssn = Detector(min_confidence=Confidence.POSSIBLE, nightfall_detector="US_SOCIAL_SECURITY_NUMBER")
>>> detection_rule = DetectionRule([cc, ssn])
>>> # The scanning is done asynchronously, so provide a valid email address as the simplest way of getting results
>>> alertconfig = alert_config=AlertConfig(email=EmailAlert("whatever@example.com"))
    

>>> # Upload the file and start the scan.
>>> id, message = nightfall.scan_file( "./README.md", detection_rules=[detection_rule], alert_config=alertconfig)
>>> print("started scan", id, message)
//this script assumes the node sdk has been installed locally with `npm install` and `npm run build`
import { Nightfall } from "./nightfall-nodejs-sdk/dist/nightfall.js";
import { Detector } from "./nightfall-nodejs-sdk/dist/types/detectors.js";


// By default, the client reads your API key from the environment variable NIGHTFALL_API_KEY
const uploadit = async() => {
    var data = null;
    
    const nfClient = new Nightfall();
    	
    try{
   
		const response = await nfClient.scanFile('./README.md', {
		  detectionRules: [
			{
			  name: 'Secrets Scanner',
			  logicalOp: 'ANY',
			  detectors: [
				{
				  minNumFindings: 1,
				  minConfidence: Detector.Confidence.Possible,
				  displayName: 'Credit Card Number',
				  detectorType: Detector.Type.Nightfall,
				  nightfallDetector: 'CREDIT_CARD_NUMBER',
				},
			  ],
			},
		  ],
		  alertConfig: {
				email: {
						address: "whatever@example.com"
					}
		   }
		});

		if (response.isError) {
		  data = response.getError();
		}
		else{ 
			data = (response.data.id);
		}
	 
    }
	catch(e){
		console.log(e);
	}


	return data;

}

uploadit().then(data => console.log(data));

To run the node sample script you must compile it as TypesScript. Save it as a .ts file and run

tsc <yourfilename>.ts -lib ES2015,DOM

You can then run the resulting JavaScript file:

NIGHTFALL_API_KEY=<YourApiKey> node yourscriptname.js

Note that these examples use an email address to receive the results for simplicity.

The Upload Process

The upload process consists of 3 stages:

Initializing Phase

POST /v3/upload

As part of the initialization you must provide the total byte size of the file being uploaded.

You may also provide the mime-type, otherwise the system will attempt to determine it once the upload is complete.

curl --location --request POST 'https://api.nightfall.ai/v3/upload' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--data-raw '{
    "fileSizeBytes": 73891,
    "mimeType" : "image/png"
}'

The id of the returned JSON object will be used as the fileId in subsequent requests.

The chunkSize is the maximum number of bytes to upload during the uploading phase.

{
    "id": "f9dbdb15-c9fa-46ff-86ec-cd5c09aa550d",
    "fileSizeBytes": 73891,
    "chunkSize": 10485760,
    "mimeType": "image/png"
}

Uploading Phase

PATCH /v3/upload/<uploadUUID>

The size of these chunks are determined by the chunkSize value returned by POST /upload endpoint used in the previous step.

Below is a simple example where the file is less than the chunkSize so may safely be uploaded with one call to the upload endpoint.

curl --location --request PATCH 'https://api.nightfall.ai/v3/upload/f9dbdb15-c9fa-46ff-86ec-cd5c09aa550d' \
--header 'X-Upload-Offset: 0' \
--header 'Content-Type: application/octet-stream' \
--header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--data-binary '@/Users/myname/Documents/work/Nightfall/Nightfall Upload Sequence.png'

If your file's size exceeds the chunkSize, to upload the complete file you will need to send iterative requests as you read portions of the file's contents. This means you will send multiple requests to the upload endpoint as shown above. As you do so, you will be updating the value of the X-Upload-Offset header based on the portion of the file being sent.

Each request should send a chunk of the file exactly chunkSize bytes long except for the final uploaded chunk. The final uploaded chunk is allowed to contain fewer bytes as the remainder of the file may be less than the chunkSize returned by the initialization step.

The request body should be the contents of the chunk being uploaded.

The value of the X-UPLOAD-OFFSET header should be the byte offset specifying where to insert the data into the file as an integer. This byte offset is zero-indexed.

Successful calls to this endpoint return an empty response with an HTTP status code of 204

Completion Phase

POST /v3/upload/<uploadUUID>/finish

curl --location --request POST 'https://api.nightfall.ai/v3/upload/f9dbdb15-c9fa-46ff-86ec-cd5c09aa550d/finish' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer  NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--data-raw '""'

When an upload completes successfully, the returned payload will indicate the mimeType the system determined to file to be if it was not provided during upload initialization.

{
    "id": "152848af-2ac9-4e0a-8563-2b82343d964a",
    "fileSizeBytes": 2349,
    "chunkSize": 10485760,
    "mimeType": "application/zip"
}

Once a file has been marked as completed, you may initiate a scan of the uploaded file.

Scanning Uploaded Files

After an upload is finalized, it can be scanned against a Detection Policy. A Detection Policy represents a pairing of:

  • a webhook URL

  • a set of detection rules to scan data against

You may also supply a value to the requestMetadata field to help identify the input file upon receiving a response to your webhook. This field has a maximum length 10 KB.

curl --request POST \
     --url https://api.nightfall.ai/v3/upload/f9dbdb15-c9fa-46ff-86ec-cd5c09aa550d/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRuleUUIDs": [
               "950833c9-8608-4c66-8a3a-0734eac11157"
          ],
          "webhookURL": "https://mycompany.org/webhookservice"
     },
     "requestMetadata": "your file metadata"
}
'

Webhook Verification

Nightfall will verify that the webhook URL is valid before launching its asynchronous scan by issuing a challenge.

Full Upload Process Example Script

Below is a sample Python script that handles the complete sequence of API calls to upload a file using a path specified as an argument.

from os import getenv, path

import fire
import requests


BASE_UPLOAD_URL = getenv("FILE_UPLOAD_HOST", "http://api.nightfall.ai/v3")
NF_API_KEY = getenv("NF_API_KEY")


def upload(filepath, mimetype, policy_uuid):
    """Upload the given file using the provided MIMEType and PolicyUUID.

    Arguments:
        file_path -- an absolute or relative path to the file that will be
            uploaded to the API.
        mimetype -- (optional) The mimetype of the file being uploaded.
        policy_uuid -- The UUID corresponding to an existing policy. This
            policy must be active and have a webhook URL associated with it.
    """
    default_headers = {
        "Authorization": F"Bearer {NF_API_KEY}",
    }

    # =*=*=*=*=* Initiate Upload =*=*=*=*=*=*
    file_size = path.getsize(filepath)
    upload_request_body = {"fileSizeBytes": file_size, "mimeType": mimetype}
    r = requests.post(F"{BASE_UPLOAD_URL}/upload",
                      headers=default_headers,
                      json=upload_request_body)
    upload = r.json()
    if not r.ok:
        raise Exception(F"Unexpected error initializing upload - {upload}")

    # =*=*=*=*=*=* Upload Chunks =*=*=*=*=*=*
    chunk_size = upload["chunkSize"]
    i = 0
    with open(filepath, "rb") as file:
        while file.tell() < file_size:
            upload_chunk_headers = {
                **default_headers,
                "X-UPLOAD-OFFSET": str(file.tell())
            }
            r = requests.patch(F"{BASE_UPLOAD_URL}/upload/{upload['id']}",
                               headers=upload_chunk_headers,
                               data=file.read(chunk_size))
            if not r.ok:
                raise Exception(F"Unexpected error uploading chunk - {r.text}")
            i += 1

    # =*=*=*=*=*=* Finish Upload =*=*=*=*=*=*
    r = requests.post(F"{BASE_UPLOAD_URL}/upload/{upload['id']}/finish",
                      headers=default_headers)
    if not r.ok:
        raise Exception(F"Unexpected error finalizing upload - {r.text}")

    # =*=*=*=*=* Scan Uploaded File =*=*=*=*=*
    r = requests.post(F"{BASE_UPLOAD_URL}/upload/{upload['id']}/scan",
                      json={"policyUUID": policy_uuid},
                      headers=default_headers)
    if not r.ok:
        raise Exception(F"Unexpected error initiating scan - {r.text}")

    print("Scan Initiated Successfully - await response on configured webhook")
    quota_remaining = r.headers.get('X-Quota-Remaining')
    if quota_remaining is not None and int(quota_remaining) <= 0:
        print(F"Scan quota exhausted - Quota will reset on {r.headers['X-Quota-Period-End']}")


if __name__ == "__main__":
    fire.Fire(upload)

Scanning Files

Nightfallโ€™s file scan API allows a user to upload a file in chunks, then to scan it with Detection Rules once the upload is complete.

The scan will then be processed asynchronously before sending the results to the webhook URL that is provided along with your Detection Rules.

The following sequence diagram illustrates the full process for scanning a binary file with Nightfall.

Prerequisites

In order to utilize the File Scanning API you need the following:

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (detailed information to follow)

Special File Types

Spreadsheets and Tabular Data

File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange, codepointRange, and lineRange properties.

Findings will contain a columnRange and a rowRange that will allow you to identify the specific row and column within the tabular data wherein the finding is present.

This functionality is applicable to the following mime types:

  • text/csv

  • text/tab-separated-values

  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • application/vnd.ms-excel

Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.

{
   "findings":[
      {
         "path":"Sheet1 (5)",
         "detector":{
            "id":"e30d9a87-f6c7-46b9-a8f4-16547901e069",
            "name":"US social security number (SSN)",
            "version":1
         },
         "finding":"624-84-9182",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":2505,
               "end":2516
            },
            "codepointRange":{
               "start":2452,
               "end":2463
            },
            "lineRange":{
               "start":55,
               "end":55
            },
            "rowRange":{
               "start":55,
               "end":55
            },
            "columnRange":{
               "start":2,
               "end":2
            },
            "commitHash":""
         },
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },
...

Git Repositories

Nightfall provides special handling for archives of GitHub repositories.

Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.

In order to scan the repository, you will need to create a clone, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

This creates a clone of the Nightfall go SDK.

You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.

zip -r directory.zip directory

Note that in order to work, the hidden directory .github must be included in the archive.

Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http:// or https://), which will send results such as the following:

{
   "findings":[
      {
         "path":"f607a067..53e59684/nightfall.go",
         "detector":{
            "id":"6123060e-2d9f-4f35-a7a1-743379ea5616",
            "name":"URL"
         },
         "finding":"https://api.nightfall.ai/\"",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":142,
               "end":168
            },
            "codepointRange":{
               "start":142,
               "end":168
            },
            "lineRange":{
               "start":16,
               "end":16
            },
            "rowRange":{
               "start":0,
               "end":0
            },
            "columnRange":{
               "start":0,
               "end":0
            },
            "commitHash":"53e59684d9778ceb0f0ed6a4b949c464c24d35ce"
         },
         "beforeContext":"tp\"\n\t\"os\"\n\t\"time\"\n)\n\nconst (\n\tAPIURL = \"",
         "afterContext":"\n\n\tDefaultFileUploadConcurrency = 1\n\tDef",
         "matchedDetectionRuleUUIDs":[
            "cda0367f-aa75-4d6a-904f-0311209b3383"
         ],
         "matchedDetectionRules":[
            
         ]
      },
 ...

Sensitive Data in GitHub Repositories

If the finding in a GitHub repository is considered to be sensitive, it should be considered compromised and appropriate mitigation steps (i.e. secrets should be rotated).

To retrieve the specific checkout, you will need to clone the repository, i.e.

git clone https://github.com/nightfallai/nightfall-go-sdk.git

You can then checkout the specific commit using the commit hash returned by Nightfall.

cd nightfall-go-sdk
git checkout 53e59684d9778ceb0f0ed6a4b949c464c24d35ce

Webhooks and Asynchronous Notifications

The Nightfall API supports the ability to send asynchronous notifications when findings are detected as part of a scan request.

The supported destinations for these notifications include external platforms, such as Slack, email, or url to a SIEM log collector as well as to a webhook server.

Nightfall issues notifications under the following scenarios:

For more information on how webhooks and asynchronous notifications are used please see our guides on:

Accessing Your Webhook Signing Key

In order to accept requests from Nightfall, a Webhook server must use a signing key to verify requests.

Select the Developer Platform > Manage API Keys using the navigation bar on the left side of the page. You will see the Webhook signing section:

Unlike the API Key, it is possible to reveal the signature via the "eye" icon furtherest to the left of the three icons displayed.

You may copy the current value to your clipboard with the "copy" icon in the center of the three icons displayed.

You may also regenerate the key with the circular arrow icon furthest to the right.

Use this value as shown in the code examples that are used in the following sections.

Specialized File Detectors

Nightfall supports Detectors that will scan for file names, file types, and file finger prints.

Detecting File Names

In addition to scanning the content of files, you may configure the Detectors to scan file names as well.

This is done through the โ€œscopeโ€ attribute of a Detector.

The scope attribute allows you to scan either within file contents, the file name, or both the file contents and file name.

File extensions can be scanned for by creating a Regular Expression type custom Detector with a scope to scan only file names ("File") or both the content and file name ("ContentAndFile"), as shown in the example request below.

Note that confidence sensitivity does not apply to file names. Sensitive findings will always be reported on.

Detecting File Types

Nightfallโ€™s File Type detection allows you to implement compliance policies that detect and alert you when particular file types that are not allowed in a given location are discovered.

This functionality is implemented by creating a specific Detector called a โ€œFile Type Detectorโ€

To create a File Type Detector, select โ€œDetectorsโ€ from the left hand navigation and click the button labeled โ€œ+New Detectorโ€ in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the โ€œFile Typeโ€ Detector type.

You can either scroll through the list of mime-types in the select box or you may type in a portion of the mime-type and the contents of the select box will be filtered to match your input.

File Type Detectors vary from other Nightfall Detectors in that the attributes of scope and confidence are not relevant to File Type Detectors

Detecting Files Through Fingerprinting

Nightfall allows you to discover the location of specific files that you have deemed sensitive and want to avoid sharing.

This discovery is done through document fingerprinting. Fingerprinting is the process of algorithmically creating a unique identifier for a file by mapping the data of the document to a signature that can be recalled quickly. This allows the file to be identified in a manner akin to how human fingerprints uniquely identify individual people.

This functionality is achieved in Nightfall by creating a specific Detector type called a File Fingerprint Detector.

The Fingerprint Detector allows you to create a fingerprint for one more files (a sort โ€œhandfulโ€ of fingerprints, if you would).

To create a Fingerprint Detector, select โ€œDetectorsโ€ from the left hand navigation and click the button labeled โ€œ+New Detectorโ€ in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the โ€œFingerprintโ€ Detector type.

When you create a File Fingerprint Detector you can upload up to 50 files that need to be fingerprinted. The file size limit is 25MB.

Once the fingerprint is generated, the actual content of the file is discarded so no sensitive content is stored on Nightfallโ€™s system.

These Detectors may only be created through the console.

Updates to Fingerprinted Files

You can not update Fingerprint Detectors, so any modification to the original file or underlying requires that you create a brand new Fingerprint Detector.

Scanning Features

Nightfall offers many useful features beyond its detectors, including:

Configuring a Webhook URL

Certain file types receive special handling, such as and , that results in more precise information about the location of findings within the source file.

text/csv (treated as and may be )

text/tab-separated-values (treated as )

text/tsv (treated as )

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (treated as )

application/vnd.ms-excel (treated as )

data files are also accepted.

Below is an example curl request for a csv file that has already been .

When you initiate the with this file, you will receive scan results that contain the commitHash property filled in.

Note that you are in a when workin with this sort of check out of a repository.

As part of submitting a file scan request, the request payload must contain a reference to a URL defined as part of a policy defined inline.

When Nightfall prepares a file scan operation, it will issue a challenge to the to verify its legitimacy.

the date until which the findingsURL is valid (validUntil) formatted to . Results are valid for 24 hours after scan completion. The time will be in UTC.

If you follow the URL (before it expires) it will return a JSON representation of the findings similar to those returned by the endpoint.

You can customize your Detection Rules by creating custom detectors in the .

Detectors in the Nightfall Dashboard

Nightfall's upload process is built to accommodate files of any size. Once files are uploaded, they may be scanned with and to detect potential violations.

Many users will find it more convenient to use our our to complete the upload process.

For users that are looking to understand the entire end-to-end, that is also outlined in this document. We will walk you through the order of operations necessary to upload the file.

Rather than implementing the for the upload functionality yourself, the Nightfallโ€™s provide a single method that wraps the steps required to upload your file.

Below is an example of uploading a file from our and our .

You may also want to use a webhook. See for additional information on how to set up Webhook server to receive these results.

Once the upload is complete, you may initiate the

After we discuss each API call in the sequence, you will find a script that walks through the at the end of this guide.

The first step in the process of scanning a binary file is to initiate an upload in order to get a fileId through the Initiate a .

Use the endpoint to upload the file contents in chunks.

See the below for an illustration as to how this upload process can be done programmatically.

Once all chunks are uploaded, mark the upload as completed using the .

The scanning process is asynchronous, with results being delivered to the webhook URL configured on the detection policy. See for more information about creating a Webhook server.

Exactly one policy should be provided in the request body, which includes a webhookURL to which the callback will be made once the file scan has been completed (this must be an HTTPS URL) as well as a Detection Rule as either an a or as a rule that has been .

For a detailed walkthrough of the API calls necessary to upload and scan a file and full script that shows the entire process, see

An active API Key authorized for file scanning passed via the header Authorization: Bearer <key> โ€” see

File scanning also support Nightfall's functionality for and as part of your scan requests.

data files are also accepted.

When you initiate the with this file, you will receive scan results that contain the commitHash property filled in.

Note that you are in a when working with this sort of check out of a repository.

to notify a client about the results of a . File scans themselves are always performed asynchronously because of complexity relating to text extraction and data volume.

to notify a client about results from a text scan request. Although results are already delivered synchronously in the response object, clients may configure the request to forward results to other platforms such a webhook, SIEM endpoint, or email through a

To create a webhook you will need to and then set up a

To access or generate your Webhook signing key, start by logging in to the Nightfall .

In addition to scanning based on file name, you may also use a which allows you to scan for files based on their mime-type.

You will then select one or more file types for which to scan by selecting from a list of

Nightfall supports detection for a wide variety of mime-types. See the Internet Assigned Numbers Authorityโ€™s (IANA) website for a definitive list of . Note however that Nightfall does not support the detection of audio and video related mime-types.

Detection of file types is done based on the file contents, not its extension. However, you can create by setting the scope attribute.

Once you have added all the mime-types you wish to scan for, save your new Detector. You may then add your new Detector to and.

You may then treat the Fingerprint detector like any other and incorporate it into a using its unique Detector identifier.

You may incorporate these Detectors into that will alert you whenever files that match the fingerprint are detected.

The ability to use and to narrow the scope of matches.

The ability to create in a way that is highly configurable so that sensitive data is appropriately obfuscated.

The ability to create that determine how leaks of sensitive information should be mitigated (i.e. through alerts sent to email or Slack).

Apache parquet
uploaded
file upload sequence
'detached HEAD' state
webhook server
webhook server
RFC 3339
Scan Plain Text
Nightfall dashboard
Detection Rules
Policies
native language SDKs
upload process
Python SDK
Node SDK
Webhooks and Asynchronous Notifications
File Upload endpoint
Upload a Chunk of a File
Complete a File Upload endpoint
Webhooks and Asynchronous Notifications
list of UUIDs
defined in-line
Uploading and Scanning Files
.
Authentication and Security
Using Exclusion Rules
Using Context Rules
Apache parquet
file upload sequence
'detached HEAD' state
tabular data
archives of Git repositories
tabular data
redacted
tabular data
tabular data
tabular data
tabular data
native language SDKs
full sequence of API calls
Initializing
Uploading
Completing
file scan.
full sequence
full example script
curl --request POST \
     --url https://api.nightfall.ai/v3/upload/<fileid>/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer  NF-<yourNightfallKey> \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "regex": {
                                   "pattern": "*\.txt",
                                   "isCaseSensitive": false
                              },
                              "detectorType": "REGEX",
                              "scope": "ContentAndFile"
                         }
                    ],
                    "name": "File Name Detector",
                    "logicalOp": "ANY"
               }
          ]
     }
}

Using Context Rules

You can use the surrounding context of a match to help determine how likely it is that your potential match should actually be considered as a match by adjusting its confidence rating.

You can also tell the Detection Rule to return a portion of the surrounding context for manual review.

In the following example, in addition to providing a regular expression to match Social Security Numbers, we also look to see if someone has written the text โ€œSSNโ€ before and after the match, which might be a label indicating it is indeed a social security number. In which case, we change our confidence score to โ€œVERY_LIKELY.โ€ We then provide two possible matches in our payload, the first of which contains the string โ€œSSNโ€.

curl --location --request POST 'https://api.nightfall.ai/v3/scan' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--header 'Content-Type: application/json' \
--data-raw '{
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "regex": {
                                   "isCaseSensitive": false,
                                   "pattern": "\\d{3}-\\d{2}-\\d{4}"
                              },
                              "contextRules": [
                                   {
                                        "regex": {
                                             "pattern": "SSN",
                                             "isCaseSensitive": false
                                        },
                                        "proximity": {
                                             "windowBefore": 20,
                                             "windowAfter": 20
                                        },
                                        "confidenceAdjustment": {
                                             "fixedConfidence": "VERY_LIKELY"
                                        }
                                   }
                              ],
                              "minNumFindings": 1,
                              "minConfidence": "POSSIBLE",
                              "detectorType": "REGEX",
                              "displayName": "SSN Match Detector"
                         }
                    ],
                    "name": "SSN Match Detection Rule",
                    "logicalOp": "ALL"
               }
          ],
          "contextBytes": 20
     },
     "payload": [
          "My SSN is 555-55-5555",
          "Here it is : 555-55-5555"
     ]
}
'

In the results, you can see the confidence for the first finding in the payload has been set to VERY_LIKELY while the second item is only LIKELY.

{
   "findings":[
      [
         {
            "finding":"555-55-5555",
            "beforeContext":"My SSN is ",
            "detector":{
               "name":"SSN Match Detector",
               "uuid":"6131f41c-dbdd-47a9-8c6f-1819c9baf388"
            },
            "confidence":"VERY_LIKELY",
            "location":{
               "byteRange":{
                  "start":10,
                  "end":21
               },
               "codepointRange":{
                  "start":10,
                  "end":21
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "SSN Match Detection Rule"
            ]
         }
      ],
      [
         {
            "finding":"555-55-5555",
            "beforeContext":"Here it is : ",
            "detector":{
               "name":"SSN Match Detector",
               "uuid":"6131f41c-dbdd-47a9-8c6f-1819c9baf388"
            },
            "confidence":"LIKELY",
            "location":{
               "byteRange":{
                  "start":13,
                  "end":24
               },
               "codepointRange":{
                  "start":13,
                  "end":24
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "SSN Match Detection Rule"
            ]
         }
      ]
   ],
   "redactedPayload":[
      "",
      ""
   ]
}
file scan request
policy
.
access your webhook Signing Key
create a webhook server.
Alerting
Using Policies
File Scanning and Webhooks
dashboard
mime-types
mime-types
Detectors that scan file names
Detection Rules
Policies
Detector
Detection Rule
Policies
Context Rules
Exclusion Rules
Redactions
Policies
File Type Detector

Scanning Images for patterns using Custom Regex Detectors

Using regex to identify long patterns in images can be challenging because OCR systems. In such cases, even Nightfall may not achieve 100% character-by-character accuracy. To improve results, you must introduce higher levels of flexibility into your regex patterns to accommodate common OCR inconsistencies. Here are some typical OCR challenges to keep in mind:

  • Spell-check noise: Spell-checking tools can add artifacts like red underlines, which may interfere with text recognition.

  • Character ambiguity:

    • The digit 0 may be misinterpreted as the letter O (or vice versa), depending on the font.

    • The character l (lowercase L) may be read as the digit 1.

    • The letter B may appear as the digit 8.

  • Underscore handling: An underscore (_) is sometimes interpreted as a space, particularly when spell-check artifacts are present.

  • Line wrapping: OCR may introduce unexpected newlines when text wraps across multiple lines.

  • Periods and punctuation: Spell-check artifacts or font issues may result in extraneous periods (.) or other punctuation being added to the output. En dash (โ€“) and hyphens (-) may be interchanged.

For reference, OCR tools like Tesseract typically achieve 85-98% character accuracy for similar input, and our system operates within a similar range. Given this, tuning your regex to be more forgiving (e.g., allowing for optional characters or slight variations) can significantly improve detection rates.

Example Regex (original and loosened)

original: ATATT3xFfGF0[A-Za-z0-9=_\-]*[=A-Za-z0-9]{9}

loosened: ATATT[A-Za-z0-9_\-โ€“ @.\n=]*[A-Za-z0-9_\- @.\n]{7,11}

  • shortened the literal match prefix

  • excluded the the literal zero (0) from the prefix

  • added period (.) and newline () chars

  • relaxed the char length

HTTP response codes

Posture Management APIs

You can use the posture management APIs to search posture events, fetch posture events and also event details. Additionally, you can also view details of the user (actor) whose actions triggered an event, and details of the asset that triggered an event.

Exfiltration Prevention APIs

You can use the exfiltration APIs to search exfiltration events, fetch exfiltration events and also event details. Additionally, you can also view details of the user (actor) whose actions triggered an event, and details of the asset that triggered an event.

PHI Detection Rules

Protected health information (PHI), also referred to as personal health information, describes a patient's medical history โ€” including ailments, various treatments, and outcomes. PHI may include:

  • demographic information

  • test and laboratory results

  • mental health conditions

  • insurance information

The Health Insurance Portability and Accountability Act (HIPAA) of 1996 is the primary law that oversees the use of, access to, and disclosure of PHI in the United States. HIPAA lists 18 different personal information identifiers (PII) that, when paired with health information, become PHI. In order to more accurately detect potential PHI, Nightfall has introduced specific new detectors that allow for specialized combinations.

These HIPAA PII and PHI-specific detectors intelligently aggregate Nightfall's built-in detector to ensure compliance with governing law. For example, finding a patient's name in a document or message is not considered HIPAA PII as it does not uniquely identify an individual, many people can share the same name. However, the information would be considered HIPAA PII if the patient's name and address were in the same message.

Specific PHI and HIPAA PII can be detected with greater confidence, especially as they relate to specific medical codes or terms in association with specific logical combinations of other PII. For instance when the patient's name and date of birth or a person's name and street address or any of a set of particular PII (phone number email, SSN, etc) it would be considered HIPAA PII.

If the combined detectors all match with a confidence of "Very Likely" it would match our "HIPAA PII Very Likely" Detection Rule. Otherwise if these detectors match with a confidence of "Likely" it would match our "HIPAA PII Likely" Detection Rule.

Alternatively when any of the above PII options are found in conjunction with a specific set of medical related codes or terms (IDC Codes, FDA Drug Names or Codes, Procedures), that finding could be flagged as PHI.

When all the detectors within these PHI Detection Rules make findings that have a confidence of "Very Likely," that would match our "PHI Very Likely" Detection Rule, while if some are all are met with a confidence of "Likely" that would match our "PHI Likely" Detection Rule.

Errors

While using Nightfall's Scan API, you may encounter some of the common errors outlined below. Try following the provided troubleshooting steps.

HTTP Error Codes

The following error codes are returned as part of a standard HTTP response.

HTTP Error Code
Description
Troubleshooting

400

Bad Request

This error most often occurs when there is something syntactically incorrect in the body of your request. Check your request format and try again. For example, this error could occur if the request body size is greater than 500 KB, or if the number of items to scan in the payload exceeds 50,000.

401

Unauthorized

You may be using an incorrect API key or calling the wrong endpoint.

422

Unprocessable Entity

You may be using an invalid or unrecognized detector set. You may also have exceeded the maximum allowable payload size; try spreading your payload across multiple requests.

429

Too Many Requests or Quota Exceeded

Either your monthly request limit has been exceeded, or you have exceeded the allowed rate limit. Consider upgrading to a higher volume plan, or wait several moments to retry the requests.

500

Internal Server Error

Wait a few moments and try again. If the problem persists, Nightfall may be experiencing an outage.

Using Policies to Send Alerts

Policies allow customers to create templates for their most common workflows such as sending alerts when detection rules are triggered.

The alertConfig can be either:

  • an email address

  • a Slack channel

  • a webhook url

  • a url to a SIEM host as well authentication and other headers

Below is a simple example of a payload with a policy that will send alerts to an email address that you would use with our endpoint for .

{     
  "policy": {
    "detectionRules": [
               {
                    "detectors": [
                        {
                            "detectorType": "NIGHTFALL_DETECTOR",
                            "nightfallDetector": "US_SOCIAL_SECURITY_NUMBER",
                            "minNumFindings": 1,
                            "minConfidence": "LIKELY",
                            "displayName": "US Social Security Number"
                        }
                    ],
                    "name": "SSN Match Detection Rule",
                    "logicalOp": "ALL"
               }
          ],
    "contextBytes": 5,
    "alertConfig": {
      "email": {
        "address": "youremail@nightfall.ai"
      }
    }
  },
  "payload": [
        "The customer's social security number is 555-55-5555",
        "No SSN in this string"
   ]
}

You will receive the following response:

{
    "findings": [
        [
            {
                "finding": "555-55-5555",
                "beforeContext": "r is ",
                "detector": {
                    "name": "US Social Security Number",
                    "uuid": "e30d9a87-f6c7-46b9-a8f4-16547901e069"
                },
                "confidence": "VERY_LIKELY",
                "location": {
                    "byteRange": {
                        "start": 41,
                        "end": 52
                    },
                    "codepointRange": {
                        "start": 41,
                        "end": 52
                    },
                    "rowRange": null,
                    "columnRange": null,
                    "commitHash": ""
                },
                "matchedDetectionRuleUUIDs": [],
                "matchedDetectionRules": [
                    "SSN Match Detection Rule"
                ]
            }
        ],
        []
    ],
    "redactedPayload": [
        "",
        ""
    ]
}

Note that you may also use a pre-defined policy defined under Developer Platform > Overview > Policies by copying the Policy UUID and sending a request as shown below.

curl --request POST \
     --url https://api.nightfall.ai/v3/scan \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <InsertYourApiKeyHere>' \
     --header 'content-type: application/json' \
     --data '
{
     "policyUUIDs": [
          "2b2ced32-80c3-4a89-8757-489743ec4640"
     ],
     "payload": [
          "My payload to scan"
     ]
}
'

policy vs. policyUUIDs vs. config

The policy object supersedes the config object. The use of config objects will still continue to be supported, but its use should be considered deprecated. If you specify policy object you cannot also specify a config object.

Also note that previous iterations of the API allowed for a simple list of policyUUIDs to be specified instead of of a policy object. This has been preserved for backwards compatibility, but it is recommended you use the policy object as it has a richer set of features. You may not use both a policyUUIDs list and a policy object.

The following payload will be sent to the given email address with the subject "๐Ÿšจ Findings Detected by Nightfall! ๐Ÿšจ" as an attachment with the name nightfall-findings.json:

{
  "redactedPayload": [
    "", 
    ""
  ], 
  "findings": [
    [
      {
        "confidence": "LIKELY", 
        "matchedDetectionRules": [
          "SSN Match Detection Rule"
        ], 
        "matchedDetectionRuleUUIDs": [], 
        "location": {
          "codepointRange": {
            "start": 41, 
            "end": 52
          }, 
          "rowRange": null, 
          "byteRange": {
            "start": 41, 
            "end": 52
          }, 
          "columnRange": null, 
          "commitHash": ""
        }, 
        "finding": "555-55-5555", 
        "detector": {
          "name": "SSN Match Detector", 
          "uuid": "7270ccd5-07c5-44e5-b280-c768e0028963"
        }, 
        "beforeContext": "r is "
      }
    ], 
    []
  ]
}

This attachment has the same content as the response payload to the initial request.

Note that the sender address will be no-reply@nightfall.ai

This email address will not respond to messages sent to it.

Using Webhooks with Policies

Policies also allow you to send findings to a callback designated URL using the url property of the alertConfig object.

Below is what Webhook URL should like in your policy's alertConfig in a payload sent to our endpoint used for scanning plain text.

{     
  "policy": {
    "detectionRuleUUIDs": [
      "c8d43147-0a63-4c01-8a57-83d8108422f5"
    ],
    "alertConfig": {
        "url": {
            "address": "https://mywebhookurl.com"
        }
    }
  },
  "payload": [
        "The customer's social security number is 555-55-5555"
   ]
}

Using Slack Channels With Policies

Another option supported by Policies is sending finding data to a designated Slack channel.

Below is a sample payload for scanning plain text.

{     
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                        {
                            "detectorType": "NIGHTFALL_DETECTOR",
                            "nightfallDetector": "US_SOCIAL_SECURITY_NUMBER",
                            "minNumFindings": 1,
                            "minConfidence": "LIKELY",
                            "displayName": "US Social Security Number"
                        }
                    ],
                    "name": "Simple SSN Match Detection Rule",
                    "logicalOp": "ALL"
               }
          ],
        "alertConfig": {
            "slack": {
                "target": "#securityalert"
            }
        }
    },
     "payload": [
          "The customer's social security number is 555-55-5555",
          "No SSN in this string"
     ]
}

Below is an example as to how the violation will appear in Slack.

Sending Alerts to SIEMs and other HTTP Event Collectors

SIEM (pronounced โ€œsimโ€) is a combination of security information management (SIM) and security event management systems. SIEM technology collects event log data for analysis in order to provide visibility into network activity.

It is possible to send findings from a policy to a SIEM service such as LogRhythm, SumoLogic, or Splunk using the siem alertConfig.

This configuration will require a URL to a collector that uses an HTTPS endpoint.

Note that the URL for the siem alertConfig must:

  • use the HTTPS scheme

  • be able to accept requests made with the POST verb

  • respond with a 200 status code upon receipt of the event

See the documentation for your SIEM service for how to set up this URL.

Unlike the url alertConfig option, the siem alertConfig does not require that the endpoint for the service implement a custom challenge response. Events sent to the siem alertConfig endpoint contain a subset of what is sent to the url alertConfig. Furthermore the findings are sent in a redacted form similar to Slack or email alerts.

In addition to the URL, you may provide headers such as those that are used for authorization.

The headers in the SIEM alertConfig are divided into sensitiveHeaders and plainTextHeaders header mappings.

The sensitiveHeaders field is specifically for header values like authentication. Nightfall ensures that these header values are always hidden in our service. They are never logged or saved in analytic events.

You can use plainTextHeaders for all other type of information you would like passed along with Nightfall alerts to you HTTP endpoint. Nightfall assumes that the values stored plainTextHeaders do not contain any sensitive information so we do not take any action to hide or protect these values.

Below is an example of a payload using a siem alertConfig.

{
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "nightfallDetector": "CREDIT_CARD_NUMBER",
                              "detectorType": "NIGHTFALL_DETECTOR",
                              "minConfidence": "POSSIBLE",
                              "minNumFindings": 1
                         }
                    ],
                    "logicalOp": "ALL"
               }
          ],
		"alertConfig": {
		   "email": {
			"address": "<your email>"
		   },
		   "siem": {
				"sensitiveHeaders": {
					 "Authorization": "Splunk <your token value>"
				},
				"address": "https://http-inputs-<yourhost>.splunkcloud.com:8088/services/collector/event"
		   }
		}
     },
     "payload": [
          "4916-6734-7572-5015 is my credit card number",
          "This string does not have any sensitive data",
          "my api key is yr+ZWwIZp6ifFgaHV8410b2BxbRt5QiAj1EZx1qj and my ๐Ÿ’ณ credit card number ๐Ÿ’ฐ is 30204861594838"
     ]
}

Other Policy Features

Using Redaction Within a Policy

A policy may be configured with default redaction rules as a defaultRedactionConfig that will affect the content of the redactedPayload field of the content that is sent to the alert locations specified in the policy alertConfig. Note that this redaction does not affect the findings themselves.

These redaction rules will be applied to Detection Rules that do not have a specified redaction configuration.

The redactionConfig specified must be one and only one of the four available redaction types:

  • maskConfig

  • infoTypeSubstitutionConfig

  • substitutionConfig

  • cryptoConfig

{
  "policy": {
    "detectionRules": [
      {
        "detectors": [
          {
            "detectorType": "NIGHTFALL_DETECTOR",
            "nightfallDetector": "US_SOCIAL_SECURITY_NUMBER",
            "minNumFindings": 1,
            "minConfidence": "LIKELY",
            "displayName": "US Social Security Number"
          }
        ],
        "name": "Simple SSN Match Detection Rule",
        "logicalOp": "ALL"
      }
    ],
      "defaultRedactionConfig": {
        "maskConfig": {
          "charsToIgnore": [
            "-"
          ],   
            "maskingChar": "#",
              "numCharsToLeaveUnmasked": 4,
                "maskLeftToRight": true
        }
      },
        "contextBytes": 5,
          "alertConfig": {
            "email": {
              "address": "eric@nightfall.ai"
            }
          }
  },
    "payload": [
      "The customers social security number is 555-55-5555",
      "No SSN in this string"
    ]
}

Using Context Bytes Within a Policy

In additional to a defaultRedactionConfig it is possible to set the number of bytes to include as before and after a given finding as the contextBytes. This context can provide meaning to how the finding appears within the text to allow human readers to better understand the meaning of the finding. The maximum value for contextBytes is 40.

DLP APIs - Firewall for AI Platform

Firewall for AI DLP APIs enables developers to write custom code to sanitize data anywhereโ€“RAG data sets, analytics data stores, data pipelines, and unsupported SaaS applications.

DLP APIs - Native SaaS Apps

If you are using Nightfall SaaS apps, you can use APIs to fetch violations, search through the violations, and fetch specific findings within the Violations. To scan data in any custom apps or cloud infrastructure services like AWS S3, you must use the APIs in the DLP APIs - Firewall for AI Platform section.

Default

Models

Rate Limits for Native SaaS app APIs

To prevent misuse and ensure the stability of our platform, we enforce a rate limit on an API Key and endpoint basis, similar to the way many other APIs enforce rate limits.

When operating under our Free plan, accounts and their corresponding API Keys have a rate limit of 5 requests per second on average, with support for bursts of 15 requests per second. If you upgrade to a paid plan โ€“ the Enterprise plan โ€“ this rate increases to a limit of 10 requests per second on average and bursts of 50 requests per second.

The Nightfall API follows standard practices and conventions to signal when these rate limits have been exceeded.

Successful requests return a header X-Rate-Limit-Remaining with the integer number of requests remaining before errors will be returned to the client.

Request Rate Limiting

Your Request Rate Limiting throttles how frequently you can make requests to the API. You can monitor your rate limit usage via the `X-Rate-Limit-Remaining` header, which tells you how many remaining requests you can make within the next second before being throttled.

Quotas

Your Quota limits how many requests you can make within a given period. Your current remaining quota and the end of your current quota period are denoted by the following response headers.

For the free plan, we allow 5 requests per second and 10000 requests in a day.

Using Exclusion Rules

An Exclusion Rule allows you to refine a Detector to make sure false positives are not surfaced by Nightfall.

For instance you may want to detect whether credit card numbers are being shared inappropriately in your organization. However, there may be cases where members of your QA are sharing test credit card numbers, which should not be considered a violation and should be ignored by Nightfall.

In the following example, we define a Detector with a regular expression to match credit cards.

We then add an exclusion for some known test credit cards.

As the resulting payload shows, only the 3rd provided Credit Card number matches because the first two items in the payload are included in our ExclusionRules word list.

Nightfall Playground

Our playground environment allows you to:

  • Generate sample data for DLP testing.

  • Explore a sample app built on our APIs

Our PHI Detectors may be used just like other Detectors with or .

If problems persist, please for further assistance.

These policies may be through the dashboard or may be defined programmatically.

When defining an a Policy inline, in addition to specifying the Detection Rules (either by referencing the UUID of an existing Detection Rule or), you must define an alertConfig which will determine where findings are sent.

This mechanism allows you to programmatically consume findings and the data sent will contain sensitive information as well as additional metadata like the location of the findings in the payload. For this reason the URL must be an HTTPS URL and the service backing it be implemented to properly respond with your and act as a

This feature requires that you have configured the .

See the section on Slack in the overview on for more details.

For more information on Redactions see:

Below is a simple example of a payload for using a policy set up to use a defaultRedactionConfig

The native SaaS app APIs can be utilized by customers using Nightfallโ€™s SaaS apps, supported natively, to fetch violations, search violations by app meta-data attributes, and fetch findings within violations. These DLP APIs do not provide access to violations for apps scanned via the developer platform. These APIs require you to create an API key as outlined in the . However, to use these APIs, you need not create any detectors, detection rules, and policies in the developer platform.

Plan
Requests Per Second (Avg)
Burst

When your application exceeds the rate limit for a given API endpoint, the Nightfall API will return an HTTP "Too Many Requests.โ€ If your use case requires increased rate limiting, please reach out to

Additionally, these unsuccessful requests return the number of seconds to wait before retrying the request in a Header.

Response Headers
Type
Description

The Nightfall Developer Playground () is a sample app that you may use to test out API functionality before writing any code.

Test Detectors and Detection Rules. Here are some .

Policies
Inline Detection Rules
contact Nightfall Support
created manually
defining a Detection Rule and its Detectors inline
Webhook signing key
Webhook Server
.
Nightfall Slack integration
LogRhythm
Sumo Logic
Splunk
Using Redaction
scanning plain text
Getting Started with the Developer Platform section
Alerting

Free

5

15

Enterprise

10

50

X-Quota-Remaining

string

The requests remaining in your quota for this period. Will be reset to the amount specified in your billing plan at the end of your quota cycle.

X-Quota-Period-End

datetime

The date and time at which your quota will be reset, encoded as a string in the RFS-3339 format.

curl --location --request POST 'https://api.nightfall.ai/v3/scan' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--header 'Content-Type: application/json' \
--data-raw '{
    "policy": {
        "detectionRules": [
            {
                "detectors": [
                    {
                        "regex": {
                            "pattern": "(?:(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(6(?:011|5[0-9]{2})[0-9]{12})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])[0-9]{11})|((?:2131|1800|35[0-9]{3})[0-9]{11}))",
                            "isCaseSensitive": false
                        },
                        "exclusionRules": [
                            {
                                "wordList": {
                                    "values": [
                                        "4111111111111111",
                                        "5105105105105100"
                                    ]
                                },
                                "exclusionType": "WORD_LIST",
                                "matchType": "FULL"
                            }
                        ],
                        "minNumFindings": 1,
                        "minConfidence": "POSSIBLE",
                        "displayName": "Credit Card Reg Ex",
                        "detectorType": "REGEX"
                    }
                ],
                "name": "Credit Card Detection Rule",
                "logicalOp": "ALL"
            }
        ]
    },
    "payload": [
        "5105105105105100",
        "4111111111111111",
        "4012888888881881"
    ]
}'
{
   "findings":[
      [
         
      ],
      [
         
      ],
      [
         {
            "finding":"4012888888881881",
            "detector":{
               "name":"Credit Card Reg Ex",
               "uuid":"93024e88-e6de-4c84-8295-75157cdd1b52"
            },
            "confidence":"LIKELY",
            "location":{
               "byteRange":{
                  "start":0,
                  "end":16
               },
               "codepointRange":{
                  "start":0,
                  "end":16
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "Credit Card Detection Rule"
            ]
         }
      ]
   ],
   "redactedPayload":[
      "",
      "",
      ""
   ]
}
response code of 429
support@nightfall.ai.
Retry-After
playground.nightfall.ai
sample datasets

Default

Ruby

This guide describes how to use Nightfall with the Ruby programming language.

The example below will demonstrate how to use Nightfallโ€™s text scanning functionality to verify whether a string contains sensitive PII using the Nightfall Python SDK.

To follow along, you will need:

  • A Nightfall API Key

  • An existing Detection Rule

  • Data to scan. Note that the API interprets data as plaintext, so you may pass it in any structured or unstructured format.

  • A local Ruby 2.6 or greater environment.

Start by creating a new file called nightfall_demo.rb

Now we will walk through the code step by step. If you'd like to skip ahead you can see the complete code sample at the bottom of this page.

We will be using a few built-in Ruby libraries to run this sample API script.

First, we will load some environment variables that will be used to interact with the Nightfall API. NIGHTFALL_API_KEY should be your Nightfall API Key, and NIGHTFALL_DETECTION_RULE_UUID should be the UUID for your existing Nightfall condition set.

Next, we will construct our payload to scan as an array. You can replace this with any data you'd like, or read plaintext from a file.

Next, we build the HTTP request headers and body using the environment variables that we previously defined.

Next, we build the HTTP object and make a request to the Nightfall API.

Lastly, we make the API request and process the response from Nightfall. If there are sensitive findings in the response we pretty-print them to the console. If there are no findings, we print a message to the console. Otherwise, if there is a problem with the HTTP request we print the status code and message to the console.

Usage

Now we can run our script:

If there are sensitive findings based on your Nightfall detection rule, you should see output similar to this in your console, corresponding to each of the 3 items inputted to scan in the payload.

Complete Sample Code

For your convenience, the complete Ruby code sample is shown below.


Models

Congrats . You've successfully scanned text for sensitive data with Ruby using the Nightfall API.

# Load dependencies
require 'open-uri'
require 'net/http'
require 'json'
# Load environment variables for Nightfall API
nightfall_api_key = ENV['NIGHTFALL_API_KEY']
detection_rule_uuid = ENV['NIGHTFALL_DETECTION_RULE_UUID']
# Text data to scan
payload = [
    "The customer social security number is 458-02-6124",
    "No PII in this string",
    "My credit card number is 4916-6734-7572-5015"
]
# Configure detection settings
config = { 
	"config": {
		"detectionRuleUUIDs": [detection_rule_uuid]
	},
	"payload": payload
}
# Build API request
url = URI("https://api.nightfall.ai/v3/scan")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Post.new(url)
request["Accept"] = 'application/json'
request["Content-Type"] = 'application/json'
request["Authorization"] = "Bearer #{nightfall_api_key}"
request.body = config.to_json
# Make API request
response = http.request(request)

# Parse response
if response.code.to_i == 200 and response.body['findings']
    puts "This text contains sensitive data.\n\n"
    puts JSON.pretty_generate(JSON.parse(response.body))
elsif response.code.to_i == 200
    puts "No sensitive data found. Hooray!"
else
    puts "Something went wrong -- Response #{response.code}."
end
ruby nightfall_demo.rb
This text contains sensitive data.

{
  "findings": [
    [
      {
        "finding": "458-02-6124",
        "detector": {
          "name": "US social security number (SSN)",
          "uuid": "e30d9a87-f6c7-46b9-a8f4-16547901e069"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 39,
            "end": 50
          },
          "codepointRange": {
            "start": 39,
            "end": 50
          }
        },
        "matchedDetectionRuleUUIDs": [
          "996a3c12-35d1-48cb-b858-5ee0841c652d"
        ],
        "matchedDetectionRules": [

        ]
      }
    ],
    [

    ],
    [
      {
        "finding": "4916-6734-7572-5015",
        "detector": {
          "name": "Credit card number",
          "uuid": "74c1815e-c0c3-4df5-8b1e-6cf98864a454"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 25,
            "end": 44
          },
          "codepointRange": {
            "start": 25,
            "end": 44
          }
        },
        "matchedDetectionRuleUUIDs": [
          "996a3c12-35d1-48cb-b858-5ee0841c652d"
        ],
        "matchedDetectionRules": [

        ]
      }
    ]
  ],
  "redactedPayload": [
    "",
    "",
    ""
  ]
}

# nightfall_demo.rb

# Load dependencies
require 'open-uri'
require 'net/http'
require 'json'

# Load environment variables for Nightfall API
nightfall_api_key = ENV['NIGHTFALL_API_KEY']
detection_rule_uuid = ENV['NIGHTFALL_DETECTION_RULE_UUID']

# Text data to scan
payload = [
    "The customer social security number is 458-02-6124",
    "No PII in this string",
    "My credit card number is 4916-6734-7572-5015"
]

# Configure detection settings
config = { 
	"config": {
		"detectionRuleUUIDs": [detection_rule_uuid]
	},
	"payload": payload
}

# Build API request
url = URI("https://api.nightfall.ai/v3/scan")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Post.new(url)
request["Accept"] = 'application/json'
request["Content-Type"] = 'application/json'
request["Authorization"] = "Bearer #{nightfall_api_key}"
request.body = config.to_json

# Make API request
response = http.request(request)

# Parse response
if response.code.to_i == 200 and response.body['findings']
    puts "This text contains sensitive data.\n\n"
    puts JSON.pretty_generate(JSON.parse(response.body))
elsif response.code.to_i == 200
    puts "No sensitive data found. Hooray!"
else
    puts "Something went wrong -- Response #{response.code}."
end

GenAI Protection

This section consists of various documents that assist you in scanning various popular SaaS GenAI services and frameworks using Nightfall APIs.

  • OpenAI Prompt Sanitization Tutorial

  • Anthropic Prompt Sanitization Tutorial

  • LangChain Prompt Sanitization Tutorial

Search exfiltration events

get

Fetch a list of exfiltration events based on some filters

Authorizations
Query parameters
createdAfterintegerOptional

Unix timestamp in seconds, filters records created โ‰ฅ the value, defaults to -180 days UTC

createdBeforeintegerOptional

Unix timestamp in seconds, filters records created < the value, defaults to end of the current day UTC

updatedAfterintegerOptional

Unix timestamp in seconds, filters records updated > the value

limitinteger ยท max: 100Optional

The maximum number of records to be returned in the response

Default: 50
pageTokenstringOptional

Cursor for getting the next page of results

sortstring ยท enumOptional

Sort key and direction, defaults to descending order by creation time

Default: TIME_DESCPossible values:
querystringRequired

The query containing filter clauses

Search query language

Query structure and terminology

A query clause consists of a field followed by an operator followed by a value:

term value
clause user_email:"amy@rocketrides.io"
field user_email
operator :
value amy@rocketrides.io

You can combine multiple query clauses in a search by separating them with a space.

Field types, substring matching, and numeric comparators

Every search field supports exact matching with a :. Certain fields such as user_email and user_name support substring matching.

Quotes

You may use quotation marks around string values. Quotation marks are required in case the value contains spaces. For example:

  • user_mail:john@example.com
  • user_name:"John Doe"

Special Characters

+ - && || ! ( ) { } [ ] ^ " ~ * ? : are special characters need to be escaped using \. For example:

  • a value like (1+1):2 should be searched for using \(1\+1)\:2

Search Syntax

The following table lists the syntax that you can use to construct a query.

SYNTAX USAGE DESCRIPTION EXAMPLES
: field:value Exact match operator (case insensitive) state:"pending" returns records where the currency is exactly "PENDING" in a case-insensitive comparison
(space) field1:value1 field2:value2 The query returns only records that match both clauses state:active slack.channel_name:general
OR field:(value1 OR value2) The query returns records that match either of the values (case insensitive) state:(active OR pending)

Query Fields

param description
event_id the unique identifier of the exfiltration event to filter on
integration_name the name of the integration to filter on
state the state of the event to filter on (active, pending, resolved, expired)
event_type the type of exfiltration event to filter on
actor_name the name of the actor who performed the action to filter on
actor_email the email of the actor who performed the action to filter on
user_name the username of the user to filter on (backward compatibility)
user_email the email of the user to filter on (backward compatibility)
notes the comment or notes associated with the event to filter on
policy_id the unique identifier of the policy to filter on
policy_name the name of the policy to filter on
resource_id the identifier of the resource to filter on
resource_name the name of the resource to filter on
resource_owner_name the name of the resource owner to filter on
resource_owner_email the email of the resource owner to filter on
resource_content_type the content type of the resource to filter on
endpoint.device_id the device identifier for endpoint events to filter on
endpoint.machine_name the machine name for endpoint events to filter on
gdrive.permission the permission setting for Google Drive files to filter on
gdrive.shared_internal_email the internal emails with which the file is shared to filter on
gdrive.shared_external_email the external emails with which the file is shared to filter on
gdrive.drive the Google Drive name to filter on
gdrive.file_owner the owner of the Google Drive file to filter on
gdrive.label_name the label name applied to Google Drive files to filter on
salesforce.report.scope the scope of the Salesforce report to filter on
salesforce.report.event_source the event source of the Salesforce report to filter on
salesforce.report.source_ip the source IP address of the Salesforce report to filter on
salesforce.report.session_level the session level of the Salesforce report to filter on
salesforce.report.operation the operation type of the Salesforce report to filter on
salesforce.report.description the description of the Salesforce report to filter on
salesforce.file.source_ip the source IP address for Salesforce file events to filter on
salesforce.file.session_level the session level for Salesforce file events to filter on
Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /exfiltration/v1/events/search HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "events": [
    {
      "id": "123e4567-e89b-12d3-a456-426614174000",
      "integration": "text",
      "createdAt": 1,
      "state": "text",
      "eventType": "text",
      "policyUUIDs": [
        "123e4567-e89b-12d3-a456-426614174000"
      ],
      "assetsCount": 1,
      "userInfo": {
        "username": "text",
        "userEmail": "name@gmail.com",
        "userProfileLink": "https://example.com",
        "deviceId": "text",
        "machineName": "text",
        "isExternal": true
      },
      "appInfo": {
        "id": "text",
        "name": "text"
      }
    }
  ],
  "nextPageToken": "text"
}

Fetch exfiltration events

get

Fetch a list of exfiltration events for a period

Authorizations
Query parameters
createdAfterintegerOptional

Unix timestamp in seconds, filters records created โ‰ฅ the value, defaults to -90 days UTC

createdBeforeintegerOptional

Unix timestamp in seconds, filters records created < the value, defaults to end of the current day UTC

updatedAfterintegerOptional

Unix timestamp in seconds, filters records updated > the value

limitinteger ยท max: 100Optional

The maximum number of records to be returned in the response

Default: 50
pageTokenstringOptional

Cursor for getting the next page of results

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /exfiltration/v1/events HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "events": [
    {
      "id": "123e4567-e89b-12d3-a456-426614174000",
      "integration": "text",
      "createdAt": 1,
      "state": "text",
      "eventType": "text",
      "policyUUIDs": [
        "123e4567-e89b-12d3-a456-426614174000"
      ],
      "assetsCount": 1,
      "userInfo": {
        "username": "text",
        "userEmail": "name@gmail.com",
        "userProfileLink": "https://example.com",
        "deviceId": "text",
        "machineName": "text",
        "isExternal": true
      },
      "appInfo": {
        "id": "text",
        "name": "text"
      }
    }
  ],
  "nextPageToken": "text"
}

Fetch exfiltration event details

get

Fetch an exfiltration event details by ID

Authorizations
Path parameters
eventIdstring ยท uuidRequired

The UUID of the event to fetch

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
404
Event does not exist
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /exfiltration/v1/events/{eventId} HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "assets": {
    "id": "text",
    "name": "text",
    "path": "text",
    "sizeBytes": 1,
    "mimetype": "text",
    "owner": {
      "id": "text",
      "email": "name@gmail.com",
      "comment": "text",
      "metadata": {
        "gdrive": {
          "userBelongsToGroups": [
            "text"
          ],
          "isAdmin": true,
          "isSuspended": true,
          "createdAt": 1
        },
        "salesforce": {},
        "endpointAgent": {
          "deviceID": "text",
          "machineName": "text"
        }
      }
    },
    "comment": "text",
    "ddrViolationIDs": [],
    "metadata": {
      "gdrive": {
        "fileID": "text",
        "fileName": "text",
        "fileSize": "text",
        "fileLink": "text",
        "permissionSetting": "text",
        "sharingExternalUsers": [
          "text"
        ],
        "sharingInternalUsers": [
          "text"
        ],
        "canViewersDownload": true,
        "fileOwner": "text",
        "isInTrash": true,
        "createdAt": 1,
        "updatedAt": 1,
        "drive": "text",
        "labels": [
          "text"
        ],
        "filePermissionType": "text"
      },
      "salesforce": {
        "resourceType": "text",
        "fileResourceMetadata": {
          "fileAction": "text",
          "sourceIP": "text",
          "sessionLevel": "text"
        },
        "reportResourceMetadata": {
          "description": "text",
          "displayEntityFields": [
            "text"
          ],
          "dashboardName": "text",
          "scope": "text",
          "operation": "text",
          "recordCount": 1,
          "queriedEntities": [
            "text"
          ],
          "groupedColumnHeaders": [
            "text"
          ],
          "columnCount": 1,
          "processedRowCount": 1,
          "sourceIP": "text",
          "eventSource": "text",
          "sessionLevel": "text"
        },
        "bulkApiResourceMetadata": {
          "query": "text",
          "eventIdentifier": "text",
          "sourceIP": "text",
          "sessionKey": "text",
          "sessionLevel": "text"
        }
      },
      "endpointAgent": {
        "medium": "EXFIL_MEDIUM_USB",
        "mediumName": "text",
        "user": "text"
      }
    }
  },
  "actor": {
    "id": "text",
    "email": "name@gmail.com",
    "comment": "text",
    "metadata": {
      "gdrive": {
        "userBelongsToGroups": [
          "text"
        ],
        "isAdmin": true,
        "isSuspended": true,
        "createdAt": 1
      },
      "salesforce": {},
      "endpointAgent": {
        "deviceID": "text",
        "machineName": "text"
      }
    }
  },
  "events": {
    "type": "DOWNLOAD",
    "timestamp": 1,
    "metadata": {
      "endpointAgent": {
        "endpointBrowserUploadMetadata": {
          "browserName": "text",
          "browserVersion": "text",
          "domain": "text",
          "browserTabURL": "text",
          "browserTabTitle": "text",
          "uploadStartTime": 1,
          "uploadEndTime": 1,
          "fileName": "text",
          "originMetadata": [
            {
              "timestamp": 1,
              "browserDownloadMetadata": {
                "browserName": "text",
                "browserVersion": "text",
                "domain": "text",
                "browserTabURL": "text",
                "browserTabTitle": "text",
                "downloadStartTime": 1,
                "downloadEndTime": 1
              },
              "clipboardCopyMetadata": {
                "contentType": "CCT_TEXT",
                "browserMetadata": {
                  "browserName": "text",
                  "browserVersion": "text",
                  "domain": "text",
                  "browserTabURL": "text",
                  "browserTabTitle": "text"
                }
              }
            }
          ]
        },
        "endpointCloudSyncMetadata": {
          "app": "text",
          "accountType": "text",
          "accountName": "text",
          "email": "text",
          "destinationFilePath": "text",
          "uploadStartTime": 1,
          "uploadEndTime": 1,
          "fileName": "text"
        },
        "endpointClipboardMetadata": {
          "contentType": "text",
          "originMetadata": [
            {
              "timestamp": 1,
              "browserDownloadMetadata": {
                "browserName": "text",
                "browserVersion": "text",
                "domain": "text",
                "browserTabURL": "text",
                "browserTabTitle": "text",
                "downloadStartTime": 1,
                "downloadEndTime": 1
              },
              "clipboardCopyMetadata": {
                "contentType": "CCT_TEXT",
                "browserMetadata": {
                  "browserName": "text",
                  "browserVersion": "text",
                  "domain": "text",
                  "browserTabURL": "text",
                  "browserTabTitle": "text"
                }
              }
            }
          ],
          "destinationMetadata": {
            "browserMetadata": {
              "browserName": "text",
              "browserVersion": "text",
              "domain": "text",
              "browserTabURL": "text",
              "browserTabTitle": "text"
            }
          }
        }
      },
      "gdrive": {
        "originatingAppId": "text",
        "originatingAppName": "text",
        "isClientSyncEvent": true
      },
      "salesforce": {
        "sourceIP": "text",
        "sessionLevel": "text",
        "sessionKey": "text",
        "sfUserId": "text"
      }
    },
    "assetIDs": []
  }
}

Fetch asset activity

get

Fetch the activity history for a specific asset

Authorizations
Query parameters
assetIDstringRequired

The ID of the asset to fetch activities for

rangeStartintegerRequired

Unix timestamp in seconds, filters activities created โ‰ฅ the value

rangeEndintegerRequired

Unix timestamp in seconds, filters activities created < the value

pageTokenstringOptional

Cursor for getting the next page of results

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /exfiltration/v1/asset/activity HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "activities": [
    {
      "type": "DOWNLOAD",
      "userEmail": "name@gmail.com",
      "eventTime": 1,
      "assetNames": [
        "text"
      ],
      "metadata": {
        "downloadEventMetadata": {
          "source": "text",
          "fileName": "text"
        },
        "browserUploadMetadata": {
          "domain": "text",
          "fileName": "text"
        },
        "cloudSyncMetadata": {
          "cloudApp": "text",
          "fileName": "text"
        },
        "clipboardMetadata": {
          "browserMetadata": {
            "domain": "text"
          }
        }
      }
    }
  ],
  "nextPageToken": "text"
}

Fetch actor activity

get

Fetch the activity history for a specific actor

Authorizations
Query parameters
actorIDstringRequired

The Nightfall ID of the actor to fetch activities for

rangeStartintegerRequired

Unix timestamp in seconds, filters activities created โ‰ฅ the value

rangeEndintegerRequired

Unix timestamp in seconds, filters activities created < the value

pageTokenstringOptional

Cursor for getting the next page of results

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /exfiltration/v1/actor/activity HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "activities": [
    {
      "type": "DOWNLOAD",
      "userEmail": "name@gmail.com",
      "eventTime": 1,
      "assetNames": [
        "text"
      ],
      "metadata": {
        "downloadEventMetadata": {
          "source": "text",
          "fileName": "text"
        },
        "browserUploadMetadata": {
          "domain": "text",
          "fileName": "text"
        },
        "cloudSyncMetadata": {
          "cloudApp": "text",
          "fileName": "text"
        },
        "clipboardMetadata": {
          "browserMetadata": {
            "domain": "text"
          }
        }
      }
    }
  ],
  "nextPageToken": "text"
}

SaaS Protection

This section consists of various documents that assist you in scanning various popular SaaS applications using Nightfall APIs.

  • HubSpot DLP Tutorial

  • Zendesk DLP Tutorial

๐ŸŽ‰

Java

This guide describes how to use Nightfall with the Java programming language.

The example below will demonstrate how to use Nightfallโ€™s text scanning functionality to verify whether a string contains sensitive PII using the Nightfall Java SDK.

In this tutorial, we will be downloading, setting up, and using the Java SDK provided by Nightfall.

To make a request to the Nightfall API you will need:

  • A Nightfall API key

  • Plaintext data to scan.

You can add the Nightfall package to your project by adding a dependency to your pom.xml:

<!--pom.xml-->

<?xml version="1.0" encoding="UTF-8"?>
 <project xmlns="http://maven.apache.org/POM/4.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
     <modelVersion>4.0.0</modelVersion>

     <groupId>com.foo</groupId>
     <artifactId>my-artifact</artifactId>
     <version>1.0.0</version>

     <name>${project.groupId}:${project.artifactId}</name>
     <packaging>jar</packaging>

     <dependencies>
         <dependency>
             <groupId>ai.nightfall</groupId>
             <artifactId>scan-api</artifactId>
             <version>1.0.1</version>
         </dependency>
     </dependencies>
 </project>

First add the required imports to the top of the file.

These are the objects we will use from the Nightfall SDK, as well as some collection classes for data handling.

//List of imports

import ai.nightfall.scan.NightfallClient;
 import ai.nightfall.scan.model.Confidence;
 import ai.nightfall.scan.model.DetectionRule;
 import ai.nightfall.scan.model.Detector;
 import ai.nightfall.scan.model.LogicalOp;
 import ai.nightfall.scan.model.NightfallAPIException;
 import ai.nightfall.scan.model.ScanTextConfig;
 import ai.nightfall.scan.model.ScanTextRequest;
 import ai.nightfall.scan.model.ScanTextResponse;

 import java.util.Arrays;
 import java.util.List;

We can then declare some data to scan in a List:

//Sample Payload

List<String> payload = Arrays.asList(
  "hello", 
  "world", 
  "my data is 4242-4242-4242-4242 but shhhh ๐Ÿ™Š ", 
  "my ssn is 678-99-8212"
);

Create a ScanTextRequest to scan the payload with. First create a new instance of the credit card detector, and set to trigger if there are any findings that are confidence LIKELY or above.

Add a second detector, looking for social security numbers. Set it to be triggered if there is at least a possible finding.

Combine these detectors into a detection rule, which will return findings if either of these detectors are triggered.

Finally, combine the payload and configuration together as a new ScanTextRequest, and return it.

//Build the Scan Request

public static ScanTextRequest buildScanTextRequest() {
  	// Define some detectors to use to scan your data
  	Detector creditCard = new Detector("CREDIT_CARD_NUMBER");
  	creditCard.setMinConfidence(Confidence.LIKELY);
  	creditCard.setMinNumFindings(1);

    Detector ssn = new Detector("US_SOCIAL_SECURITY_NUMBER");
    ssn.setMinConfidence(Confidence.POSSIBLE);
    ssn.setMinNumFindings(1);
    DetectionRule rule = new DetectionRule(Arrays.asList(creditCard, ssn), LogicalOp.ANY);
    ScanTextConfig config = ScanTextConfig.fromDetectionRules(Arrays.asList(rule), 20);

    return new ScanTextRequest(payload, config);
}

Use the ScanTextRequest instance with a NightfallClient to send your request to Nightfall.

The resulting ScanTextResponse may be used to print out the results:

//Run the Scan Request

public class Runner {
     public static void main(String[] args) {
         try (NightfallClient c = NightfallClient.Builder.defaultClient()) {
             try {
                 ScanTextResponse response = c.scanText(buildScanTextRequest());
                 System.out.println("response: " + response.getFindings());
             } catch (NightfallAPIException e) {
                 // not a checked exception, just for illustrative purposes
                 System.out.println("got error: " + e);
             }
         }
     }
 }

You are now ready to use the Java SDK for other scenarios.

Anthropic Prompt Sanitization Tutorial

  • Personally Identifiable Information (PII)

  • Protected Health Information (PHI)

  • Financial details (e.g., credit card numbers, bank account information)

  • Intellectual property

Real-world scenarios highlight the urgency of this issue:

  1. Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.

  2. Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.

Standard Pattern for Using Anthropic Claude APIs

A typical pattern for leveraging Claude is as follows:

  1. Get an API key and set environment variables

  2. Initialize the Anthropic SDK client (e.g. Anthropic Python client), or use the API directly to construct a request

  3. Construct your prompt and decide which endpoint and model is most applicable.

  4. Send the request to Anthropic

Let's look at a simple example in Python. Weโ€™ll ask a Claude model for an auto-generated response we can send to a customer who is asking our customer support team about an issue with their payment method. Note how easy it is to send sensitive data, in this case, a credit card number, to Claude.

import os
from anthropic import Anthropic

# Initialize the Anthropic client with your API key
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

# The user input you intend to send. Notice the credit card number in the message. Don't do this!!
user_input = "My credit card number is 4916-6734-7572-5015, and the card is getting declined."
  
# Define your prompt, ensuring it starts with "\n\nHuman:" and ending with "\n\nAssistant:"
prompt = "\nYou are a level 1 support bot. Your role is to assist users with common issues and provide helpful information. \n\nHuman: " + user_input + "\n\nAssistant:"

response = client.completions.create(
    model="claude-2.1",
    prompt=prompt,
    max_tokens_to_sample=1024,
    temperature=0.7,
    top_p=1.0
)

print("\nHere's a generated response you can send the customer:\n", response.completion)

This is a risky practice because now we are sending sensitive customer information to Anthropic. Next, letโ€™s explore how we can prevent this while still benefitting from Claude.

Adding Content Filtering to the Pattern

Updating this pattern by using Nightfall is straightforward to check for sensitive findings and ensure sensitive data isnโ€™t sent out. Hereโ€™s how:

Step 1: Setup Nightfall

Get an API key for Nightfall and set environment variables. Learn more about creating a Nightfall API key here. In this example, weโ€™ll use the Nightfall Python SDK.

Step 2: Configure Detection

Create a pre-configured detection rule in the Nightfall dashboard or an inline detection rule with the Nightfall API or SDK client.

Consider using Redaction

Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3: Classify, Redact, Filter

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, letโ€™s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Send Redacted Prompt to Anthropic

Review the response to see if Nightfall has returned sensitive findings:

  • If there are sensitive findings:

    • You can specify a redaction config in your request so that sensitive findings are redacted automatically.

    • Without a redaction config, you can break out of the conditional statement, throw an exception, etc.

  • If no sensitive findings or you chose to redact findings with a redaction config:

    • Initialize the Anthropic SDK client (e.g., Anthropic Python client), or use the API directly to construct a request.

    • Construct your outgoing prompt.

    • If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.

    • Use the Anthropic API or SDK client to send the prompt to the AI model.

Python Example

import os
from dotenv import load_dotenv
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from anthropic import Anthropic

# Load environment variables
load_dotenv()

# Initialize clients
try:
    # By default Nightfall will read the NIGHTFALL_API_KEY environment variable
    nightfall = Nightfall()  

    # Initialize the Anthropic client with your API key
    client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

except Exception as e:
    print(f"Error initializing clients: {e}")
    exit(1)

# Example user input with sensitive information
user_input = "The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?"
payload = [user_input]

print("\nHere's the user's question before sanitization:\n", user_input)

# 2) Configure Nightfall detection and redaction
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

try:
    # 3) Classify, Redact, Filter Your User Input
    findings, redacted_payload = nightfall.scan_text(
        payload,
        detection_rules=detection_rule
    )

    # If the message has sensitive data, use the redacted version, otherwise use the original message
    user_input_sanitized = redacted_payload[0] if redacted_payload[0] else payload[0]

    print("\nHere's the user's question after sanitization:\n", user_input_sanitized)

    # Define your prompt, ensuring it starts with "\n\nHuman:" and ending with "\n\nAssistant:"
    prompt = "\nYou are a level 1 support bot. Your role is to assist users with common issues and provide helpful information. \n\nHuman: " + user_input_sanitized + "\n\nAssistant:"

    # 4) Send prompt to Anthropic model for AI-generated response
    response = client.completions.create(
        model="claude-2.1",
        prompt=prompt,
        max_tokens_to_sample=1024,
        temperature=0.7,
        top_p=1.0
    )

    print("\nHere's a generated response you can send the customer:\n", response.completion)

except Exception as e:
    print(f"An error occurred: {e}")

Step 1: Setup Nightfall

Step 2: Configure Detection

Step 3: Classify, Redact, Filter Your User Input

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, letโ€™s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Step 4: Send Redacted Prompt to Anthropic

Review the response to see if Nightfall has returned sensitive findings:

  • If there are sensitive findings:

    • You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.

    • Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.

  • If no sensitive findings or you chose to redact findings with a redaction config:

    • Construct your outgoing prompt.

    • If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.

    • Use the Claude API or SDK client to send the prompt to the AI model.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

The customer said: '4916-6734-7572-5015 is my credit card number and the card is getting declined.' How should I respond to the customer?

And the message we ultimately sent was redacted, and thatโ€™s what we sent to Anthropic:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Anthropic sends us the same response either way because it doesnโ€™t need to receive sensitive data to generate a cogent response. This means we were able to leverage Claude just as easily but we didnโ€™t risk sending Anthropic any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.

OpenAI Prompt Sanitization Tutorial

Protecting Sensitive Information in AI Interactions: The Critical Role of Content Filtering

  • Personally Identifiable Information (PII)

  • Protected Health Information (PHI)

  • Financial details (e.g., credit card numbers, bank account information)

  • Intellectual property

Real-world scenarios highlight the urgency of this issue:

  1. Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.

  2. Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.

Steps to Identify and Sanitize ChatGPT Prompts

import os
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Example user input with sensitive information
user_input = "My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015."
payload = [user_input]

# 1) Get the Nightfall API key
nightfall = Nightfall() # By default Nightfall will read the NIGHTFALL_API_KEY environment variable

print("\nHere's the user's question before sanitization:\n", user_input)

# 2) Configure Nightfall detection and redaction
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

# 3) Classify, Redact, Filter Your User Input
# Send the message to Nightfall to scan it for sensitive data
# Nightfall returns the sensitive findings and a copy of your input payload with sensitive data redacted
findings, redacted_payload = nightfall.scan_text(
    payload,
    detection_rules=detection_rule
)

# If the message has sensitive data, use the redacted version otherwise, use the original message
if redacted_payload[0]:
    user_input_sanitized = redacted_payload[0]
else:
    user_input_sanitized = payload[0]

print("\nHere's the user's question after sanitization:\n", user_input_sanitized)

# 4) Send prompt to OpenAI model for AI-generated response
completion = client.chat.completions.create(model="gpt-4",
messages=[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": user_input_sanitized}
],
max_tokens=1024)

print("\nHere's a generated response you can send the customer:\n", completion.choices[0].message.content)

Step 1: Setup Nightfall

Step 2: Configure Detection

Step 3: Classify, Redact, Filter Your User Input

Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.

For example, letโ€™s say we send Nightfall the following:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined.' How should I respond to the customer?

We get back the following redacted text:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined.' How should I respond to the customer?

Step 4: Send Redacted Prompt to OpenAI

Review the response to see if Nightfall has returned sensitive findings:

  • If there are sensitive findings:

    • You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.

    • Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.

  • If no sensitive findings or you chose to redact findings with a redaction config:

    • Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request.

    • Construct your outgoing prompt.

    • If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.

    • Use the OpenAI API or SDK client to send the prompt to the AI model.

Safely Leveraging Generative AI

You'll see that the message we originally intended to send had sensitive data:

The customer said: 'My credit card number is 4916-6734-7572-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?

And the message we ultimately sent was redacted, and thatโ€™s what we sent to OpenAI:

The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015 and the card is getting declined. My transaction number is 4916-6734-7572-5015.' How should I respond to the customer?

OpenAI sends us the same response either way because it doesnโ€™t need to receive sensitive data to generate a cogent response. This means we were able to leverage ChatGPT just as easily but we didnโ€™t risk sending OpenAI any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.

HubSpot DLP Tutorial

Customer support tickets are a potential vector for leaking customer PII. By utilizing HubSpot's CRM tickets API in conjunction with Nightfall AIโ€™s scan API you can discover, classify, and remediate sensitive data within your customer support system.

You will need a few things to follow along with this tutorial:

  • A HubSpot account and API key

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment (version 3.6 or later)

  • Most recent version of Python Nightfall SDK

To accomplish this, we will install the version required of the Nightfall SDK:

We will be using Python and importing the following libraries:

We've configured the HubSpot and Nightfall API keys as environment variables so they don't need to be committed directly into our code.

Also, we abstract a nightfall class from the SDK, from our API key.

Here we'll define the headers and other request parameters that we will be using later to call the Hubspot API.

Letโ€™s start by using HubSpot API to retrieve all support tickets in our account. As the HubSpot API takes a "page limit" parameter, we will query the tickets over multiple requests to the HubSpot API, checking for list completion on each call. We'll compile the tickets into a list called all_tickets.

The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later. We won't include the sensitive fragments themselves to avoid replicating PII unnecessarily, but we'll include a redacted copy with 3 characters exposed to help identify it during the review process.

'Properties' -> 'Content' is the only field where users can supply their data, so it is the only field we need to pass to the Nightfall API. We store the ticket IDs in a matching list so that we can put a location to our findings later.

Now that we have a collection of all of our tickets, we will begin constructing an all_findings object to collect our results. The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.

For each finding in each ticket, we collect the required information from the Nightfall API to identify and locate the sensitive data, pairing them with the HubSpot ticket IDs we set aside earlier.

Finally, we export our results to a CSV so they can be easily reviewed.

That's it! You now have insight into all of the sensitive data inside your customer support tickets. As a next step, we could utilize HubSpot's API to add a comment to tickets with sensitive findings, and then trigger an email alert for the offending ticket owner.

To scan your support tickets on an ongoing basis, you may consider persisting your last ticket query's paging value and/or checking the last modified date of your tickets.

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your HubSpot findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with Hubspot

The example above is specific for the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the Endpoint

  1. Retrieve ticket data from Hubspot

Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the ticket data from Hubspot:

Now we go through write the logs to a .csv file.

  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Observability Protection

This section consists of various documents that assist you in scanning various popular observability platforms using Nightfall APIs.

Airtable DLP Tutorial

How to scan for sensitive data in Airtable

Airtable is a popular cloud collaboration tool that lands somewhere between a spreadsheet and a database. As such, it can house all sorts of sensitive data that you may not want to surface in a shared environment.

By utilizing Airtable's API in conjunction with Nightfall AIโ€™s scan API, you can discover, classify, and remediate sensitive data within your Airtable bases.

Prerequisites

You will need a few things to follow along with this tutorial:

  • An Airtable account and API key

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment (version 3.7 or later)

  • The most recent version of Python Nightfall SDK

Installation

Install the Nightfall SDK and the requests library using pip.

Creating the Example

To start, import all the libraries we will be using.

The JSON, OS, and CSV libraries are part of Python so we don't need to install them.

We've configured the Airtable and Nightfall API keys as environment variables so they are not written directly into the code.

Next, we define the Detection Rule with which we wish to scan our data.

Also, we abstract a nightfall class from the SDK, for our API key.

The Airtable API doesn't list all bases in a workspace or all tables in a base; instead, you must specifically call each table to get its contents.

In this example, we have set up a config.json file to store that information for the Airtable My First Workspace bases. You may also wish to consider setting up a separate Base and Table that stores your schema and retrieves that information with a call to the Airtable API.

As an extension of this exercise, you could write Nightfall findings back to another table within that Base.

Now we set up the parameters we will need to call the Airtable API using the previously referenced API key and config file.

We will now call the Airtable API to retrieve the contents of our Airtable workspace. The data hierarchy in Airtable goes Workspace > Base > Table. We will need to perform a GET request on each table in turn.

As we go along, we will convert each data field into its string enriched with identifying metadata so that we can locate and remediate the data later should sensitive findings occur.

๐ŸšงWarning

If you are sending more than 50,000 items or more than 500KB, consider using the file API. You can learn more about how to use the file API in the Using the File Scanning Endpoint with Airtable section below.

Before moving on we will define a helper function to use later so that we can unpack the metadata from the strings we send to the Nightfall API.

We will begin constructing an all_findings object to collect our results. The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we recommend using the Redaction feature of the Nightfall API to mask your data.

Now we call the Nightfall API on content retrieved from Airtable. For every sensitive data finding we receive, we strip out the identifying metadata from the sent string and store it with the finding in all_findings so we can analyze it later.

Finally, we export our results to a CSV so they can be easily reviewed.

That's it! You now have insight into all of the sensitive data stored within your Airtable workspace!

As a next step, you could write your findings to a separate 'Nightfall Findings' Airtable base for review, or you could update and redact confirmed findings in situ using the Airtable API.

Using the File Scanning Endpoint with Airtable

The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down into the sections below, as the file scanning process is more intensive.

File Scan Prerequisites

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

File Scan Implementation

Retrieve Airtable Data

Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize and retrieve the data we want to retrieve from Airtable.

Now we go through writing the data to a .csv file.

Upload to Scan API

Using the above .csv file, begin the Scan API file upload process.

Using the Scan Endpoint

Once the files have been uploaded, use the scan endpoint.

A webhook server is required for the scan endpoint to submit its results. See our example webhook server.

The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Amazon Kinesis DLP Tutorial

Amazon Kinesis allows you to collect, process, and analyze real-time streaming data. In this tutorial, we will set up Nightfall DLP to scan Kinesis streams for sensitive data. An overview of what we are going to build is shown in the diagram below.

We will send data to Kinesis using a simple producer written in Python. Next, we will use an AWS Lambda function to send data from Kinesis to Nightfall. Nightfall will scan the data for sensitive information. If there are any findings returned by Nightfall, the Lambda function will write the findings to a DynamoDB table.

Prerequisites

To complete this tutorial you will need the following:

  • An AWS Account with access to Kinesis, Lambda, and DynamoDB

  • A Nightfall API Key

  • An existing Nightfall Detection Rule which contains at least one detector for email addresses.

Before continuing, you should clone the companion repository locally.

Configuring AWS Services

First, we will configure all of our required Services on AWS.

Create Execution Role

  1. Choose Create role.

  2. Create a role with the following properties:

    1. Lambda as the trusted entity

    2. Permissions

      • AWSLambdaKinesisExecutionRole

      • AmazonDynamoDBFullAccess

    3. Role name: nightfall-kinesis-role

Create Kinesis Data Stream

  1. Enter nightfall-demo as the Data stream name

  2. Enter 1 as the Number of open shards

  3. Select Create data stream

Create Lambda Function

  1. Choose Author from scratch and add the following Basic information:

    1. nightfall-lambda as the Function name

    2. Python 3.8 as the Runtime

    3. Select Change default execution role, Use an existing role, and select the previously created nightfall-kinesis-role

You should now see the previous sample code replaced with our Nightfall-specific Lambda function.

Next, we need to configure environment variables for the Lambda function.

Within the same Lambda view, select the Configuration tab and then select Environment variables.

Add the following environment variables that will be used during the Lambda function invocation.

  1. NIGHTFALL_API_KEY : your Nightfall API Key

  2. DETECTION_RULE_UUID : your Nightfall Detection Rule UUID.

๐ŸšงDetection Rule Requirements

This tutorial uses a data set that contains a name, email, and random text. In order to see results, please make sure that the Nightfall Detection Rule you choose contains at least one detector for email addresses.

Lastly, we need to create a trigger that connects our Lambda function to our Kinesis stream.

  1. In the function overview screen on the top of the page, select Add trigger.

  2. Choose Kinesis as the trigger.

  3. Select the previously created nightfall-demo Kinesis stream.

  4. Select Add

Create DynamoDB Table

The last step in creating our demo environment is to create a DynamoDB table.

  1. Enter nightfall-findings as the Table Name

  2. Enter KinesisEventID as the Primary Key

Be sure to also run the following before the Lambda function is created:

This is to ensure that the required version of the Python SDK for Nightfall has been installed. We also need to install boto3.

Lambda Function Overview

Before we start processing the Kinesis stream data with Nightfall, we will provide a brief overview of how the Lambda function code works. The entire function is shown below:

This is a relatively simple function that does four things.

  1. Create a DynamoDB client using the boto3 library.

  1. Extract and decode data from the Kinesis stream and add it to a single list of strings.

  1. Create a Nightfall client using the nightfall library and scan the records that were extracted in the previous step.

  1. Iterate through the response from Nightfall, if there is are findings for a record we copy the record and findings metadata into a DynamoDB table. We need to process the list of Finding objects into a list of dicts before passing them to DynamoDB.

Sending Data to Kinesis

Now that you've configured all of the required AWS services, and understand how the Lambda function works, you're ready to start sending data to Kinesis and scanning it with Nightfall.

The script will send one record with the data shown above every 10 seconds.

Sample Data Script Usage Instructions

You can start sending data with the following steps:

  1. Open the companion repo that you cloned earlier in a terminal.

  2. Create and Activate a new Python Virutalenv

  1. Install Dependencies

  1. Start sending data

If everything worked, you should see output similar to this in your terminal:

View Nightfall Findings in DynamoDB

As the data starts to get sent to Kinesis, the Lambda function that we created earlier will begin to process each record and check for sensitive data using the Nightfall Detection Rule that we specified in the configuration.

If Nightfall detects a record with sensitive data, the Lambda function will copy that record and additional metadata from Nightfall to the DynamoDB table that we created previously.

Conclusion

Clean Up

If you'd like to clean up the created resources in AWS after completing this tutorial you should remove the following resources:

  1. nightfall-kinesis-role IAM Role

  2. nightfall-demo Kinesis data stream

  3. nightfall-lambda Lambda Function

  4. nightfall-findings DynamoDB Table

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your Kinesis findings. You can add a Redaction Config, as part of your Detection Rule, as a section within the lambda function. For more information on how to use redaction with the Nightfall API, and its specific options, please refer to the guide here.

Rate Limits for Firewall APIs

To prevent misuse and ensure the stability of our platform, we enforce a rate limit on an API Key and endpoint basis, similar to the way many other APIs enforce rate limits.

When operating under our Free plan, accounts and their corresponding API Keys have a rate limit of 5 requests per second on average, with support for bursts of 15 requests per second. If you upgrade to a paid plan โ€“ the Enterprise plan โ€“ this rate increases to a limit of 10 requests per second on average and bursts of 50 requests per second.

The Nightfall API follows standard practices and conventions to signal when these rate limits have been exceeded.

Successful requests return a header X-Rate-Limit-Remaining with the integer number of requests remaining before errors will be returned to the client.

Request Rate Limiting

Your Request Rate Limiting throttles how frequently you can make requests to the API. You can monitor your rate limit usage via the `X-Rate-Limit-Remaining` header, which tells you how many remaining requests you can make within the next second before being throttled.

Quotas

Your Quota limits how many bytes of data you're permitted to scan within a given period. Your current remaining quota and the end of your current quota period are denoted by the following response headers.

Datadog DLP Tutorial

Datadog is a monitoring and analytics tool for information technology (IT) and DevOps teams that can be used to determine performance metrics as well as event monitoring for infrastructure and cloud services. This tutorial demonstrates how to use the Nightfall API for scanning your Datadog logs/metrics/events.

This tutorial allows you to scan your Datadog instance using the Nightfall API/SDK.

You will need a few things first to use this tutorial:

  • A Datadog account with an API key and Application key

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment (version 3.7 or later)

  • Python Nightfall SDK

We need to install the nightfall and requests library using pip. All the other libraries we will be using are built into Python.

We will be using Python and installing/importing the following libraries:

Note, we are setting the Datadog authentication information as the below environment variables, and referencing the values from there:

  • DD_API_KEY

  • DD_APPLICATION_KEY

Next we abstract a nightfall class from the SDK, for our API key.

First we will set up the connection with Datadog, and get the data to be scanned from there.

The three different code sample options below are for the three different available items from Datadog to scan:

  1. logs - Scans the 100 most recent logs from Datadog.

  2. metrics - Scans all active metric tags from the last 24 hours.

  3. events - Scans all events from the last 24 hours.

Each one of these options saves the data into a data_to_scan list of tuples where the first element in the tuple is the id of the data to scan and the second element is a string of data to scan.

Please follow that same option in the next few panes:

We then run a scan on the aggregated data from using the Nightfall SDK. Since all of the examples create the same data_to_scan list, we can use the same code to scan them all.

To review the results, we will write the findings to an output csv file:

Note

The results of the scan will be outputted to a file named nf_datadog_output-TIMESTAMP.csv.

This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your Datadog findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with Datadog

The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the Endpoint

  1. Retrieve data from Datadog

Similar to the process at the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the data we like, from Datadog. This can be either logs/metrics/events. The below example will show logs:

Now we go through write the logs to a .csv file.

  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Datastore Protection

This section consists of various documents that assist you in scanning various popular data stores using Nightfall APIs.

LangChain Prompt Sanitization Tutorial

LangChain Tutorial: Integrating Nightfall for Secure Prompt Sanitization

  • Personally Identifiable Information (PII)

  • Protected Health Information (PHI)

  • Financial details (e.g., credit card numbers, bank account information)

  • Intellectual property

Real-world scenarios highlight the urgency of this issue:

  1. Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.

  2. Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.

Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.

Python Example

Step 1: Setup Nightfall

Install the necessary packages using the command line:

Set up environment variables. Create a .env file in your project directory:

Step 2: Configure Detection

Step 3: Classify, Redact, Filter Your User Input

to integrate content filtering into our LangChain pipeline seamlessly. We'll create a custom LangChain component for Nightfall sanitization. This allows us to seamlessly integrate content filtering into our LangChain pipeline.

Explanation

  1. We start by importing necessary modules and loading environment variables.

  2. We initialize the Nightfall client and define detection rules for credit card numbers.

  3. The NightfallSanitizationChain class is a custom LangChain component that handles content sanitization using Nightfall.

  4. We set up the Anthropic LLM and create a prompt template for customer service responses.

  5. We create separate chains for sanitization and response generation, then combine them using SimpleSequentialChain.

  6. The process_customer_input function provides an easy-to-use interface for our chain.

Error Handling and Logging

In a production environment, you might want to add more robust error handling and logging. For example:

Usage

To use this script, you can either run it directly or import the process_customer_input function in another script.

Running the Script Directly

Simply run the script:

This will process the example customer input and print the sanitized input and final response.

Using in Another Script

You can import the process_customer_input function in another script:

Expected Output

What does success look like?

If the example runs properly, you should expect to see an output demonstrating the sanitization process and the final response from Claude. Here's what the output might look like:

Scanning Text

The scan endpoint allows you to apply Policies and Detection Rules to a list of text strings provided as a payload.

You can read more about obtaining or about our available from the linked reference guides.

And that's it

Scrub Your Claude Chatbot Prompts to Prevent Sensitive Data Disclosure ()

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Let's look at a Python example using Anthropic Claude and Nightfall's Python SDK. You can download this sample code .

Get an API key for Nightfall and set environment variables. Learn more about creating an API key .

Create an with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction .

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Let's look at a Python example using OpenAI and Nightfall's Python SDK. You can download this sample code .

Get an API key for Nightfall and set environment variables. Learn more about creating an API key .

Create an with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction .

Next, we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

We are now ready to call the Nightfall API to scan our HubSpot tickets. This tutorial assumes that the totality of your tickets falls under the payload limit of the Nightfall API. In practice, you may want to check the size of your payload using a method like and chunk the payload across multiple requests if appropriate.

Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

The Detection Rule can be and referenced by UUID.

The installed and configured on your local machine.

Local copy of the for this tutorial.

Open the in the AWS console.

Open the and select Create Data Stream

Open the and select Create function

Once the function has been created, in the Code tab of the Lambda function select Upload from and choose .zip file. Select the local nightfall-lambda-package.zip file that you cloned earlier from the and upload it to AWS Lambda.

Open the and select Create table

We've included a sample script in the that allows you to send fake data to Kinesis. The data that we are going to be sending looks like this:

Before running the script, make sure that you have the AWS CLI installed and configured locally. The user that you are logged in with should have the appropriate permissions to add records to the Kinesis stream. This script uses the library which handles authentication based on the credentials file that is created with the AWS CLI.

Congrats You've successfully integrated Nightfall with Amazon Kinesis, Lambda, and DynamoDB. If you have an existing Kinesis Stream, you should be able to take the same Lambda Function that we used in this tutorial and start scanning that data without any additional changes.

Plan
Requests Per Second (Avg)
Burst

When your application exceeds the rate limit for a given API endpoint, the Nightfall API will return an HTTP "Too Many Requests.โ€ If your use case requires increased rate limiting, please reach out to

Additionally, these unsuccessful requests return the number of seconds to wait before retrying the request in a Header.

Response Headers
Type
Description

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:

Let's examine this in a Python example using the LangChain, Anthropic, and Nightfall Python SDKs. You can download this sample code .

If you don't yet have a Nightfall account, sign up .

Create a Nightfall key. Here are the .

Create an with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.

If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction .

You may use or

Text scanning supports the use of,, and as well as other .

For scanning files, see .

Note that you must generate anto send requests to the Nightfall API.

๐ŸŽ‰
API key
data detectors
OWASP LLM06
OWASP LLM06
here
here
inline detection rule
here
OWASP LLM06
here
here
inline detection rule
here
pip install nightfall=0.6.0
import requests
import os
import json
import csv
from nightfall import Nightfall
hubspot_api_key = os.environ.get('HUBSPOT_API_KEY')
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')

detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

nightfall = Nightfall(nightfall_api_key)
hubspot_headers = {'accept': 'application/json'}
page_limit = 100
hubspot_querystring = {
  "limit":str(page_limit),
	"archived":"false",
	"hapikey":hubspot_api_key
}

hubspot_base_url = "https://api.hubapi.com/crm/v3/objects/tickets"
hubspot_response = requests.get(
  url = hubspot_base_url, 
  headers = hubspot_headers,
  params = hubspot_querystring
)
response_dict = json.loads(hubspot_response.text)
all_tickets = []

keep_going = True
while keep_going:
  all_tickets.extend(response_dict['results'])
  if len(response_dict['results']) < page_limit:
    keep_going = False
  else:
    new_url = f"{response_dict['paging']['next']['link']}&hapikey={hubspot_api_key}"
    new_response = requests.get(url = new_url, headers = hubspot_headers)
    response_dict = json.loads(new_response.text)
all_ids = [ticket['id'] for ticket in all_tickets]
all_content = [ticket['properties']['content'] for ticket in all_tickets]
nightfall_response = nightfall.scan_text{
  [all_content],
  detection_rule_uuids=[detectionRuleUUID]
}

findings = json.loads(nightfall_response)
all_findings = []
all_findings.append(
  [
    'ticket_id', 'detector', 'confidence', 
    'finding_start', 'finding_end', 'finding'
  ]
)
for c_idx, ticket in enumerate(findings):
  for f_idx, finding in enumerate(ticket):
    row = [
      all_ids[c_idx], 
      finding['detector']['name'],
      finding['confidence'],
      finding['location']['byteRange']['start'],
      finding['location']['byteRange']['end'],
      finding['location']['codepointRange']['start'],
      finding['location']['codepointRange']['end'],
      finding['finding']
    ] 
    all_findings.append(row)
if len(all_findings) > 1:
  with open('output_file.csv', 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter = ',')
    csv_writer.writerows(all_findings)
else:
  print('No sensitive data detected. Hooray!')
hubspot_headers = {'accept': 'application/json'}
page_limit = 100
hubspot_querystring = {
  "limit":str(page_limit),
	"archived":"false",
	"hapikey":hubspot_api_key
}

hubspot_base_url = "https://api.hubapi.com/crm/v3/objects/tickets"

hubspot_response = requests.get(
  url = hubspot_base_url, 
  headers = hubspot_headers,
  params = hubspot_querystring
)
response_dict = json.loads(hubspot_response.text)
all_tickets = []

keep_going = True
while keep_going:
  all_tickets.extend(response_dict['results'])
  if len(response_dict['results']) < page_limit:
    keep_going = False
  else:
    new_url = f"{response_dict['paging']['next']['link']}&hapikey={hubspot_api_key}"
    new_response = requests.get(url = new_url, headers = hubspot_headers)
    response_dict = json.loads(new_response.text)
filename = "nf_hubspot_input-" + str(int(time.time())) + ".csv"  

for ticket in all_tickets:
  with open(filename, 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerows(ticket['properties']['content'])
     
print("Hubspot Ticket Data Written to: ", filename)
pip install nightfall=1.2.0
pip install requests
import requests
import json
import os
import csv
from nightfall import Nightfall
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')
airtable_api_key = os.environ.get('AIRTABLE_API_KEY')
detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

nightfall = Nightfall(nightfall_api_key)
[
    {
        "base_id": "appp4vxoDwgURFwYp",
        "base_name": "Product Planning",
        "tables": [
            "Stories", 
            "Epics", 
            "Sprints", 
            "Release Milestones", 
            "Facets", 
            "App Sections"
        ]
    },
    {
        "base_id": "appwWnUfLVJhltYQv",
        "base_name": "Product Launch",
        "tables": [
            "Features",
            "Product Themes",
            "Monthly Newsletters"
        ]
    }
  ]
airtable_config = json.load(open('config.json', 'r'))
airtable_base_url = 'https://api.airtable.com/v0'
airtable_headers = {
  "Authorization": f"Bearer {airtable_api_key}"
}
all_airtable = []

for base in airtable_config:
    base_id = base['base_id']
    req_tables = [i.replace(' ', '%20') for i in base['tables']]

    for table in req_tables:
        airtable_url = f"{airtable_base_url}/{base_id}/{table}"
        airtable_response = requests.get(airtable_url, headers=airtable_headers)
        airtable_content = json.loads(airtable_response.text)

        for i in airtable_content['records']:
            # We enrich each datum with metadata so it can be easily located later
            cur_str = f"BaseName: {base['base_name']} -|- BaseID: {base_id} -|- Table: {table} -|- Record: {i['id']} -|- Field: "

            for j in i['fields']:
                str_to_send = f"{cur_str}{j} -|- Content: {i['fields'][j]}"
                all_airtable.append(str_to_send)
def str_parser(sent_str):
    split_str = sent_str.split(' -|- ')
    split_dict = {i[:i.find(': ')]: i[i.find(': ')+2:] for i in split_str[:5]}
    findertext = f" -|- Field: {split_dict['Field']} -|- Content: "
    split_dict['Content'] = sent_str[sent_str.find(findertext)+len(findertext):]
    return split_dict
all_findings = []
all_findings.append(
  [
    'base_name', 'base_id', 'table_name', 'record_id', 'field',
    'detector', 'confidence', 
    'finding_start', 'finding_end', 'finding'
  ]
)
findings, redactions = nightfall.scan_text(
    all_airtable,
    detection_rule_uuids=[detectionRuleUUID]
)

# This level of loop corresponds to each list item sent to the Nightfall API
for field_idx, field_findings in enumerate(findings):
    
    sent_str = all_airtable[field_idx]
    # We call the helper function we defined earlier to help us parse the string sent to the Nightfall API
    parsed_str = str_parser(sent_str)
    offset = len(sent_str) - len(parsed_str['Content'])

    # This loop corresponds to each finding within an item sent to the Nightfall API
    for finding in field_findings:

        # If a finding is returned within the metadata for the content, we discount it
        if finding.byte_range.start < offset:
            continue

        # Add finding data to all_findings
        all_findings.append([
            parsed_str['BaseName'],
            parsed_str['BaseID'],
            parsed_str['Table'],
            parsed_str['Record'],
            parsed_str['Field'],
            finding.detector_name,
            finding.confidence.value,
            finding.byte_range.start,
            finding.byte_range.end,
            finding.finding
        ])
if len(all_findings) > 1:
    with open('output_file.csv', 'w') as output_file:
        csv_writer = csv.writer(output_file, delimiter = ',')
        csv_writer.writerows(all_findings)
else:
    print('No sensitive data detected. Hooray!')
airtable_config = json.load(open('config.json', 'r'))
airtable_base_url = 'https://api.airtable.com/v0'
airtable_headers = {
  "Authorization": f"Bearer {airtable_api_key}"
}

all_airtable = []
all_airtable.append(
  ['base_name', 'base_id', 'table_name', 'record_id', 'field', 'content']
)
filename = "nf_airtable_input-" + str(int(time.time())) + ".csv"

for base in airtable_config:
    base_id = base['base_id']
    req_tables = [i.replace(' ', '%20') for i in base['tables']]

    for table in req_tables:
        airtable_url = f"{airtable_base_url}/{base_id}/{table}"
        airtable_response = requests.get(airtable_url, headers=airtable_headers)
        airtable_content = json.loads(airtable_response.text)

        for i in airtable_content['records']:
            for j in i['fields']:
                # We enrich each datum with metadata so it can be easily located later
                # BaseName, BaseID, Table, Record, Field, Content
                row = [base['base_name'], base_id, table, i['id'], j, i['fields'][j]]
                all_airtable.append(row)

with open(filename, 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerows(all_airtable)

print("Airtable Data Written to: ", filename)
scan_id, message = nightfall.scan_text(
    filename,
    webhook_url=WEBHOOK_URL,
    detection_rule_uuids=[detectionRuleUUID],
)
git clone https://github.com/nightfallai/nightfall-kinesis-demo
pip install nightfall=1.2.0
pip install boto3
import os
import base64
import boto3
from nightfall import Nightfall


def lambda_handler(event, context):
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('nightfall-findings')

    records = []
    for record in event['Records']:
        # Kinesis data is base64 encoded so decode here
        payload = base64.b64decode(record["kinesis"]["data"])
        records.append(payload.decode("utf-8"))

    nightfall = Nightfall(
        os.environ.get('NIGHTFALL_API_KEY')
    )

    findings, redactions = nightfall.scan_text(
        records,
        detection_rule_uuids=[os.environ.get('DETECTION_RULE_UUID')]
    )

    for record_i, record_findings in enumerate(findings):
        if record_findings:
            formatted_findings = []
            for finding in record_findings:
                formatted_findings.append({
                    'Finding': finding.finding,
                    'BeforeContext': finding.before_context,
                    'AfterContext': finding.after_context,
                    'DetectorName': finding.detector_name,
                    'DetectorUUID': finding.detector_uuid,
                    'ByteStart': finding.byte_range.start,
                    'ByteStop': finding.byte_range.stop,
                    'Confidence': finding.confidence.value,
                })

            table.put_item(
                Item={
                    'KinesisEventID': event['Records'][record_i]['eventID'],
                    'KinesisRecord': records[record_i],
                    'NightfallFindings': formatted_findings,
                }
            )
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('nightfall-findings')
records = []
for record in event['Records']:
    # Kinesis data is base64 encoded so decode here
    payload = base64.b64decode(record["kinesis"]["data"])
    records.append(payload.decode("utf-8"))
nightfall = Nightfall(
    os.environ.get('NIGHTFALL_API_KEY')
)
    
findings, redactions = nightfall.scan_text(
    records,
    detection_rule_uuids=[os.environ.get('DETECTION_RULE_UUID')]
)
for record_i, record_findings in enumerate(findings):
    if record_findings:
        formatted_findings = []
        for finding in record_findings:
            formatted_findings.append({
                'Finding': finding.finding,
                'BeforeContext': finding.before_context,
                'AfterContext': finding.after_context,
                'DetectorName': finding.detector_name,
                'DetectorUUID': finding.detector_uuid,
                'ByteStart': finding.byte_range.start,
                'ByteStop': finding.byte_range.stop,
                'Confidence': finding.confidence.value,
            })

        table.put_item(
            Item={
                'KinesisEventID': event['Records'][record_i]['eventID'],
                'KinesisRecord': records[record_i],
                'NightfallFindings': formatted_findings,
           }
        )
'id': fake.uuid4(),
'name': fake.name(),
'email': fake.email(),
'message': fake.paragraph()
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
SENT TO KINESIS: {'id': '8a69f3f5-432e-4ec1-8295-e8b79236e36e', 'name': 'Jessica Henderson', 'email': '[emailย protected]', 'message': 'Eye evening ahead field. With energy all personal soon sense. Method decision TV that.'}
SENT TO KINESIS: {'id': 'd4a90b48-cbcd-45ca-a231-3edbbc0c4792', 'name': 'Thomas Cuevas', 'email': '[emailย protected]', 'message': 'People write from season. Upon drive before summer exactly tonight practice expert. Actually news reason particularly in should.'}
SENT TO KINESIS: {'id': '084083bc-114a-4cc5-8cd6-2e15fd26b6db', 'name': 'Nathan Ward', 'email': '[emailย protected]', 'message': 'Add school air visit physical range. Child that company late. Boy than remain. Early ability economy thought event option.'}

X-Quota-Remaining

string

The bytes remaining in your quota for this period. Will be reset to the amount specified in your billing plan at the end of your quota cycle.

X-Quota-Period-End

datetime

the date and time at which your quota will be reset, encoded as a string in the RFS-3339 format.

pip install nightfall=1.2.0
pip install requests
import argparse
import csv
import json
import os
import sys
import time
import collections

import requests
from nightfall import Nightfall
dd_api_key = os.environ.get('DD_API_KEY')
dd_application_key = os.environ.get('DD_APPLICATION_KEY')
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')
detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')
nightfall = Nightfall(nightfall_api_key)
# This will return the most recent 100 logs from Datadog.

dd_url = 'https://api.datadoghq.com/api/v2/logs/events?page[limit]=100'

dd_headers = {
    'Content-Type': 'application/json',
    'DD-API-KEY': dd_api_key,
    'DD-APPLICATION-KEY': dd_application_key
}

try:
    response = requests.get(
      url=dd_url,
      headers=dd_headers
    )

    response.raise_for_status()

except requests.HTTPError:
    msg = f"ERROR: Datadog API returned: {response.status_code}"
    sys.exit(msg)


# List of log ids and their message
data_to_scan = []
for log in response.json()['data']:
    data_to_scan.append((log['id'], log['attributes']['message']))
"""Uses Datadog API to retrieve all metric names submitted in the last 24 hours
    from datadog, then iterates over all the tags attached """

from_time = int(time.time()) - 60 * 60 * 24 * 1
dd_list_metrics_url = f"https://api.datadoghq.com/api/v1/metrics?from={from_time}"
dd_metric_metadata_url = "https://api.datadoghq.com/api/v2/metrics/{metric_name}/all-tags"

dd_headers = {
    'Content-Type': 'application/json',
    'DD-API-KEY': dd_api_key,
    'DD-APPLICATION-KEY': dd_application_key
}

try:
    response = requests.get(
        url=dd_list_metrics_url,
        headers=dd_headers
    )

    response.raise_for_status()

except requests.HTTPError:
    msg = f"ERROR: Datadog API returned: {response.status_code}"
    sys.exit(msg)

# List of metrics and their tags
data_to_scan = []
for metric_name in response.json()["metrics"]:
    try:
        response = requests.get(
            url=dd_metric_metadata_url.format(metric_name=metric_name),
            headers=dd_headers
        )

        response.raise_for_status()

    except requests.HTTPError:
        msg = f"ERROR: Datadog API returned: {response.status_code}"
        sys.exit(msg)

    json_resp = response.json()["data"]
    data_to_scan.append((metric_name, str(json_resp["attributes"]["tags"])))
"""Uses Datadog API to retrieve all events submitted in the last 24 hours
    from datadog and extracts scannable content"""

to_time = int(time.time())
from_time = to_time - 60 * 60 * 24 * 1
dd_list_events_url = f"https://api.datadoghq.com/api/v1/events"

dd_headers = {
    'Content-Type': 'application/json',
    'DD-API-KEY': dd_api_key,
    'DD-APPLICATION-KEY': dd_application_key
}

events = []
while from_time < to_time:
    dd_query = {
        'start': from_time,
        'end': to_time,
    }

    try:
        response = requests.get(
            url=dd_list_events_url,
            headers=dd_headers,
            params=dd_query,
        )

        response.raise_for_status()

    except requests.HTTPError:
        msg = f"ERROR: Datadog API returned: {response.status_code}"
        sys.exit(msg)

    dd_resp = response.json()

    events += dd_resp["events"]
    if len(dd_resp["events"]) < 1000:
        break

    from_time = events[-1]["date_happened"]

# List of event urls and their tags, titles, and texts
data_to_scan = []
for e in events:
    data_to_scan.append((e["url"], str((e["tags"], e["title"], e["text"]))))
findings, redactions = nightfall.scan_text(
    [data[1] for data in data_to_scan],
    detection_rule_uuids=[detectionRuleUUID]
)
all_findings = []
all_findings.append(
    [
        'id', 'detector', 'confidence',
        'finding_start', 'finding_end', 'finding'
    ]
)

for finding_idx, findings in enumerate(findings):
    data_id = data_to_scan[finding_idx][0]

    for item in findings:
        row = [
            data_id,
            item.detector_name,
            item.confidence.value,
            item.byte_range.start,
            item.byte_range.end,
            item.finding,
        ]
        all_findings.append(row)


if len(all_findings) > 1:
    filename = "nf_datadog_output-" + str(int(time.time())) + ".csv"
    with open(filename, 'w') as output_file:
        csv_writer = csv.writer(output_file, delimiter=',')
        csv_writer.writerows(all_findings)
    print("Output findings written to", filename)

else:
    print('No sensitive data detected. Hooray!')
# This will return the most recent 100 logs from Datadog.

dd_url = 'https://api.datadoghq.com/api/v2/logs/events?page[limit]=100'

dd_headers = {
    'Content-Type': 'application/json',
    'DD-API-KEY': dd_api_key,
    'DD-APPLICATION-KEY': dd_application_key
}

try:
    response = requests.get(
        url=dd_url,
        headers=dd_headers
    )

    response.raise_for_status()

except requests.HTTPError:
    msg = f"ERROR: Datadog API returned: {response.status_code}"
    sys.exit(msg)


dd_data = response.json()['data']

scan_logs = [
    ['event_id', 'message']
]
for log in dd_data:
    scan_logs.append([
        log['attributes']['id'],
        log['attributes']['message']
    ])
filename = "nf_datadog_input-" + str(int(time.time())) + ".csv"  

with open(filename, 'w') as output_file:
  csv_writer = csv.writer(output_file, delimiter=',')
  csv_writer.writerows(scan_logs)
     
print("Datadog Logs Written to: ", filename)

import os
from dotenv import load_dotenv
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from typing import Dict, List
from langchain.chains.base import Chain
from langchain.schema.language_model import BaseLanguageModel
from langchain.schema.prompt_template import BasePromptTemplate
from langchain.prompts import PromptTemplate
from langchain_anthropic import ChatAnthropic
from langchain.schema.runnable import RunnableSequence, RunnablePassthrough
from pydantic import Field

# Load environment variables
load_dotenv()

# 1) Setup Nightfall
# By default Nightfall will read the NIGHTFALL_API_KEY environment variable
nightfall = Nightfall()

# 2) Define a Nightfall detection rule
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

# 3) Classify, Redact, Filter Your User Input

# Setup Nightfall Chain element
class NightfallSanitizationChain(Chain):
    input_key: str = "input"
    output_key: str = "sanitized_input"

    @property
    def input_keys(self) -> List[str]:
        return [self.input_key]

    @property
    def output_keys(self) -> List[str]:
        return [self.output_key]

    def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
        text = inputs[self.input_key]
        payload = [text]
        try:
            findings, redacted_payload = nightfall.scan_text(
                payload,
                detection_rules=detection_rule
            )
            sanitized_text = redacted_payload[0] if redacted_payload[0] else text
            print(f"\nsanitized input:\n {sanitized_text}")
        except Exception as e:
            print(f"Error in sanitizing input: {e}")
            sanitized_text = text
        return {self.output_key: sanitized_text}

# Initialize the Anthropic LLM
llm = ChatAnthropic(model="claude-2.1")

# Create a prompt template
template = "The customer said: '{customer_input}' How should I respond to the customer?"
prompt = PromptTemplate(template=template, input_variables=["customer_input"])

# Create the sanitization chain
sanitization_chain = NightfallSanitizationChain()

# Create the full chain using RunnableSequence
full_chain = (
    RunnablePassthrough() |
    sanitization_chain |
    (lambda x: {"customer_input": x["sanitized_input"]}) |
    prompt |
    llm
)

# Use the combined chain
customer_input = "My credit card number is 4916-6734-7572-5015, and the card is getting declined."
print(f"\ncustomer input:\n {customer_input}")
try:
    response = full_chain.invoke({"input": customer_input})
    print("\model reponse:\n", response.content)
except Exception as e:
    print("An error occurred:", e)
pip install langchain anthropic nightfall python-dotenv
ANTHROPIC_API_KEY=your_anthropic_api_key
NIGHTFALL_API_KEY=your_nightfall_api_key
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def sanitize_input(text):
    payload = [text]
    try:
        findings, redacted_payload = nightfall.scan_text(
            payload,
            detection_rules=[detection_rule]
        )
        if findings:
            logger.info(f"Sensitive information detected and redacted")
        return redacted_payload[0] if redacted_payload[0] else text
    except Exception as e:
        logger.error(f"Error in sanitizing input: {e}")
        # Depending on your use case, you might want to return the original text or an error message
        return text
python secure_langchain.py
from secure_langchain import process_customer_input

customer_input = "My credit card 4916-6734-7572-5015 isn't working. Contact me at alice@example.com."
response = process_customer_input(customer_input)
print(response)
> Entering new SimpleSequentialChain chain...

> Finished chain.

Sanitized input: The customer said: 'My credit card number is XXXX-XXXX-XXXX-5015, and the card is getting declined.' How should I respond to the customer?

Final Response: I understand you're having trouble with your credit card (XXXX-XXXX-XXXX-5015) being declined. I apologize for the inconvenience. To assist you better, I'll need some additional information...
curl --request POST \
     --url https://api.nightfall.ai/v3/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer  NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "minNumFindings": 1,
                              "minConfidence": "LIKELY",
                              "displayName": "US Social Security Number",
                              "detectorType": "NIGHTFALL_DETECTOR",
                              "nightfallDetector": "US_SOCIAL_SECURITY_NUMBER"
                         }
                    ],
                    "name": "My Match Rule",
                    "logicalOp": "ANY"
               }
          ]
     },
     "payload": [
          "The customer social security number is 458-02-6124",
          "No PII in this string"
     ]
}
Datadog DLP Tutorial
New Relic DLP Tutorial
Airtable DLP Tutorial
Amazon Kinesis DLP Tutorial
Amazon RDS DLP Tutorial - Full Scan
Amazon RDS DLP Tutorial
Amazon S3 DLP Tutorial
Elasticsearch DLP Tutorial
Snowflake DLP Tutorial
The Nightfall Detection Rules page
Copying a UUID for a Detection Rule
Creating a New Detection Rule
Selecting Detectors for a Detection Rule
Setting confidence levels and minimum findings for a Detection Rule

Search posture events

get

Fetch a list of posture events based on some filters

Authorizations
Query parameters
createdAfterintegerOptional

Unix timestamp in seconds, filters records created โ‰ฅ the value, defaults to -180 days UTC

createdBeforeintegerOptional

Unix timestamp in seconds, filters records created < the value, defaults to end of the current day UTC

updatedAfterintegerOptional

Unix timestamp in seconds, filters records updated > the value

limitinteger ยท max: 100Optional

The maximum number of records to be returned in the response

Default: 50
pageTokenstringOptional

Cursor for getting the next page of results

sortstring ยท enumOptional

Sort key and direction, defaults to descending order by creation time

Default: TIME_DESCPossible values:
querystringRequired

The query containing filter clauses

Search query language

Query structure and terminology

A query clause consists of a field followed by an operator followed by a value:

term value
clause user_email:"amy@rocketrides.io"
field user_email
operator :
value amy@rocketrides.io

You can combine multiple query clauses in a search by separating them with a space.

Field types, substring matching, and numeric comparators

Every search field supports exact matching with a :. Certain fields such as user_email and user_name support substring matching.

Quotes

You may use quotation marks around string values. Quotation marks are required in case the value contains spaces. For example:

  • user_mail:john@example.com
  • user_name:"John Doe"

Special Characters

+ - && || ! ( ) { } [ ] ^ " ~ * ? : are special characters need to be escaped using \. For example:

  • a value like (1+1):2 should be searched for using \(1\+1)\:2

Search Syntax

The following table lists the syntax that you can use to construct a query.

SYNTAX USAGE DESCRIPTION EXAMPLES
: field:value Exact match operator (case insensitive) state:"pending" returns records where the currency is exactly "PENDING" in a case-insensitive comparison
(space) field1:value1 field2:value2 The query returns only records that match both clauses state:active slack.channel_name:general
OR field:(value1 OR value2) The query returns records that match either of the values (case insensitive) state:(active OR pending)

Query Fields

param description
event_id the unique identifier of the posture event to filter on
integration_name the name of the integration to filter on
state the state of the event to filter on (active, pending, resolved, expired)
event_type the type of posture event to filter on
actor_name the name of the actor who performed the action to filter on
actor_email the email of the actor who performed the action to filter on
user_name the username of the user to filter on (backward compatibility)
user_email the email of the user to filter on (backward compatibility)
notes the comment or notes associated with the event to filter on
policy_id the unique identifier of the policy to filter on
policy_name the name of the policy to filter on
resource_id the identifier of the resource to filter on
resource_name the name of the resource to filter on
resource_owner_name the name of the resource owner to filter on
resource_owner_email the email of the resource owner to filter on
resource_content_type the content type of the resource to filter on
endpoint.device_id the device identifier for endpoint events to filter on
endpoint.machine_name the machine name for endpoint events to filter on
gdrive.permission the permission setting for Google Drive files to filter on
gdrive.shared_internal_email the internal emails with which the file is shared to filter on
gdrive.shared_external_email the external emails with which the file is shared to filter on
gdrive.drive the Google Drive name to filter on
gdrive.file_owner the owner of the Google Drive file to filter on
gdrive.label_name the label name applied to Google Drive files to filter on
salesforce.report.scope the scope of the Salesforce report to filter on
salesforce.report.event_source the event source of the Salesforce report to filter on
salesforce.report.source_ip the source IP address of the Salesforce report to filter on
salesforce.report.session_level the session level of the Salesforce report to filter on
salesforce.report.operation the operation type of the Salesforce report to filter on
salesforce.report.description the description of the Salesforce report to filter on
salesforce.file.source_ip the source IP address for Salesforce file events to filter on
salesforce.file.session_level the session level for Salesforce file events to filter on
Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /posture/v1/events/search HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "events": [
    {
      "id": "123e4567-e89b-12d3-a456-426614174000",
      "integration": "text",
      "createdAt": 1,
      "state": "text",
      "eventType": "text",
      "policyUUIDs": [
        "123e4567-e89b-12d3-a456-426614174000"
      ],
      "assetsCount": 1,
      "userInfo": {
        "username": "text",
        "userEmail": "name@gmail.com",
        "userProfileLink": "https://example.com",
        "deviceId": "text",
        "machineName": "text",
        "isExternal": true
      },
      "appInfo": {
        "id": "text",
        "name": "text"
      }
    }
  ],
  "nextPageToken": "text"
}

Fetch posture events

get

Fetch a list of posture events for a period

Authorizations
Query parameters
createdAfterintegerOptional

Unix timestamp in seconds, filters records created โ‰ฅ the value, defaults to -90 days UTC

createdBeforeintegerOptional

Unix timestamp in seconds, filters records created < the value, defaults to end of the current day UTC

updatedAfterintegerOptional

Unix timestamp in seconds, filters records updated > the value

limitinteger ยท max: 100Optional

The maximum number of records to be returned in the response

Default: 50
pageTokenstringOptional

Cursor for getting the next page of results

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /posture/v1/events HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "events": [
    {
      "id": "123e4567-e89b-12d3-a456-426614174000",
      "integration": "text",
      "createdAt": 1,
      "state": "text",
      "eventType": "text",
      "policyUUIDs": [
        "123e4567-e89b-12d3-a456-426614174000"
      ],
      "assetsCount": 1,
      "userInfo": {
        "username": "text",
        "userEmail": "name@gmail.com",
        "userProfileLink": "https://example.com",
        "deviceId": "text",
        "machineName": "text",
        "isExternal": true
      },
      "appInfo": {
        "id": "text",
        "name": "text"
      }
    }
  ],
  "nextPageToken": "text"
}

Fetch posture event details

get

Fetch an posture event details by ID

Authorizations
Path parameters
eventIdstring ยท uuidRequired

The UUID of the event to fetch

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
404
Event does not exist
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /posture/v1/events/{eventId} HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "assets": {
    "id": "text",
    "name": "text",
    "path": "text",
    "sizeBytes": 1,
    "mimetype": "text",
    "owner": {
      "id": "text",
      "email": "name@gmail.com",
      "comment": "text",
      "metadata": {
        "gdrive": {
          "userBelongsToGroups": [
            "text"
          ],
          "isAdmin": true,
          "isSuspended": true,
          "createdAt": 1
        },
        "salesforce": {}
      }
    },
    "comment": "text",
    "ddrViolationIDs": [],
    "metadata": {
      "gdrive": {
        "fileID": "text",
        "fileName": "text",
        "fileSize": "text",
        "fileLink": "text",
        "permissionSetting": "text",
        "sharingExternalUsers": [
          "text"
        ],
        "sharingInternalUsers": [
          "text"
        ],
        "canViewersDownload": true,
        "fileOwner": "text",
        "isInTrash": true,
        "createdAt": 1,
        "updatedAt": 1,
        "drive": "text",
        "labels": [
          "text"
        ],
        "filePermissionType": "text"
      },
      "salesforce": {
        "resourceType": "text",
        "fileResourceMetadata": {
          "fileAction": "text",
          "sourceIP": "text",
          "sessionLevel": "text"
        },
        "reportResourceMetadata": {
          "description": "text",
          "displayEntityFields": [
            "text"
          ],
          "dashboardName": "text",
          "scope": "text",
          "operation": "text",
          "recordCount": 1,
          "queriedEntities": [
            "text"
          ],
          "groupedColumnHeaders": [
            "text"
          ],
          "columnCount": 1,
          "processedRowCount": 1,
          "sourceIP": "text",
          "eventSource": "text",
          "sessionLevel": "text"
        },
        "bulkApiResourceMetadata": {
          "query": "text",
          "eventIdentifier": "text",
          "sourceIP": "text",
          "sessionKey": "text",
          "sessionLevel": "text"
        }
      }
    }
  },
  "actor": {
    "id": "text",
    "email": "name@gmail.com",
    "comment": "text",
    "metadata": {
      "gdrive": {
        "userBelongsToGroups": [
          "text"
        ],
        "isAdmin": true,
        "isSuspended": true,
        "createdAt": 1
      },
      "salesforce": {}
    }
  },
  "events": {
    "type": "PERMISSION_CHANGE",
    "timestamp": 1,
    "metadata": {
      "gdrive": {
        "originatingAppId": "text",
        "originatingAppName": "text",
        "isClientSyncEvent": true
      },
      "salesforce": {
        "sourceIP": "text",
        "sessionLevel": "text",
        "sessionKey": "text",
        "sfUserId": "text"
      }
    },
    "assetIDs": []
  }
}

Fetch asset activity

get

Fetch the activity history for a specific asset

Authorizations
Query parameters
assetIDstringRequired

The ID of the asset to fetch activities for

rangeStartintegerRequired

Unix timestamp in seconds, filters activities created โ‰ฅ the value

rangeEndintegerRequired

Unix timestamp in seconds, filters activities created < the value

pageTokenstringOptional

Cursor for getting the next page of results

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /posture/v1/asset/activity HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "activities": [
    {
      "type": "DOWNLOAD",
      "userEmail": "name@gmail.com",
      "eventTime": 1,
      "assetNames": [
        "text"
      ],
      "metadata": {
        "downloadEventMetadata": {
          "source": "text",
          "fileName": "text"
        },
        "browserUploadMetadata": {
          "domain": "text",
          "fileName": "text"
        },
        "cloudSyncMetadata": {
          "cloudApp": "text",
          "fileName": "text"
        },
        "clipboardMetadata": {
          "browserMetadata": {
            "domain": "text"
          }
        }
      }
    }
  ],
  "nextPageToken": "text"
}

Fetch actor activity

get

Fetch the activity history for a specific actor

Authorizations
Query parameters
actorIDstringRequired

The Nightfall ID of the actor to fetch activities for

rangeStartintegerRequired

Unix timestamp in seconds, filters activities created โ‰ฅ the value

rangeEndintegerRequired

Unix timestamp in seconds, filters activities created < the value

pageTokenstringOptional

Cursor for getting the next page of results

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /posture/v1/actor/activity HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "activities": [
    {
      "type": "DOWNLOAD",
      "userEmail": "name@gmail.com",
      "eventTime": 1,
      "assetNames": [
        "text"
      ],
      "metadata": {
        "downloadEventMetadata": {
          "source": "text",
          "fileName": "text"
        },
        "browserUploadMetadata": {
          "domain": "text",
          "fileName": "text"
        },
        "cloudSyncMetadata": {
          "cloudApp": "text",
          "fileName": "text"
        },
        "clipboardMetadata": {
          "browserMetadata": {
            "domain": "text"
          }
        }
      }
    }
  ],
  "nextPageToken": "text"
}

What Can I do with the Firewall for AI

Firewall for AI is a powerful API that acts as a middleware layer or client wrapper to protect your AI models from consuming sensitive data. By integrating Firewall for AI into your application via API calls, you can proactively prevent data leaks and maintain compliance without disrupting your existing workflows or model updates.

๐ŸŽ‰
pre-made in the Nightfall web app
sys.getsizeof()
here
pre-made in the Nightfall web app
AWS CLI
companion repository
IAM roles page
Kinesis page
Lambda page
companion repository
DynamoDB page
companion repository
Boto3
response code of 429
support@nightfall.ai.
Retry-After
pre-made in the Nightfall web app
here
OWASP LLM06
here
here
instructions
inline detection rule
here
Pre-Configured Detection Rules
Create Inline Detection Rules
Exclusion Rules
Context Rules
Redaction
Scanning Features
Scanning Files
API key
Configuring a Webhook URL
The Detector Listing

What types of data can I scan with API?

Firewall for AI provides a flexible and extensible API that allows you to scan a wide variety of data types, including plain text, structured and unstructured files, and even images. Our API can handle data in various formats such as JSON, XML, CSV, and more. Visit our detector glossary at docs.nightfall.ai/docs/detector-glossary to explore the comprehensive list of supported data types and file formats

What types of detectors are supported out of the box?

Firewall for AI offers a rich set of pre-built detectors that can identify many different types of sensitive data, including personally identifiable information (PII), payment card industry data (PCI), protected health information (PHI), secrets, and credentials. These detectors are powered by advanced machine learning models and can be easily integrated into your application with just a few lines of code. Refer to our detector glossary at docs.nightfall.ai/docs/detector-glossary for a complete list of available detectors.

How do I know my data is secure?

At Nightfall, data security and privacy are our top priorities. We have implemented stringent security measures to protect your sensitive data at every stage of the scanning process. All data transmitted to our API is encrypted in transit using industry-standard protocols. We adhere to best practices for secure coding, undergo regular security audits, and maintain compliance with relevant security standards. Visit our security and compliance page at nightfall.ai/security for more details on our commitment to data protection.

Overview

This section consists of use case tutorials for various scenarios of Firewall for AI. The tutorials explained in this section are as follows.

  • Deploy a File Scanner for Sensitive Data in 40 Lines of Code

  • Redacting Sensitive Data in 4 Lines of Code

  • Detecting Sensitive Data in SMS Automations

  • Building Endpoint DLP to Detect PII on Your Machine in Real-Time

Snowflake DLP Tutorial

Snowflake is a data warehouse built on top of the Amazon Web Services or Microsoft Azure cloud infrastructure. This tutorial demonstrates how to use the Nightfall API for scanning a Snowflake database.

This tutorial allows you to scan your Snowflake databases using the Nightfall API/SDK.

You will need a few things first to use this tutorial:

  • A Snowflake account with at least one database

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • Most recent version of Python Nightfall SDK

We will first install the required Snowflake Python connector modules and the Nightfall SDK that we need to work with:

pip install snowflake-connector-python
pip install nightfall=0.6.0

To accomplish this, we will be using Python and importing the following libraries:

import requests
import snowflake.connector
import os
import sys
import json
from nightfall import Nightfall

We will set the size and length limits for data allowed by the Nightfall API per request. Also, we extract our API Key, and abstract a nightfall class from the SDK, for it.

size_limit = 500000
length_limit = 50000

Next we extract our API Key, and abstract a nightfall class from the SDK, for it.

nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])
detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

First we will set up the connection with Snowflake, and get the data to be scanned from there.

Note, we are setting the Snowflake authentication information as the below environment variables, and referencing the values from there:

  • SNOWFLAKE_USER

  • SNOWFLAKE_PASSWORD

  • SNOWFLAKE_ACCOUNT

  • SNOWFLAKE_DATABASE

  • SNOWFLAKE_SCHEMA

  • SNOWFLAKE_TABLE

  • SNOWFLAKE_PRIMARY_KEY

connection = snowflake.connector.connect(
  user=os.environ.get('SNOWFLAKE_USER'),
  password=os.environ.get('SNOWFLAKE_PASSWORD'),
  account=os.environ.get('SNOWFLAKE_ACCOUNT'),
  schema=os.environ.get('SNOWFLAKE_SCHEMA'),
  database=os.environ.get('SNOWFLAKE_DATABASE')
)
table_name = os.environ.get('SNOWFLAKE_TABLE')
primary_key = os.environ.get('SNOWFLAKE_PRIMARY_KEY')

cursor = connection.cursor()

sql = f"""
        SELECT *
        FROM {table_name}
        LIMIT 1000;
        """

cursor.execute(sql)

cols = [i[0] for i in cursor.description]
data = cursor.fetchall()

We can then check the data size, and as long as it is below the aforementioned limits, can be ran through the API.

If the data payloads are larger than the size or length limits of the API, extra code will be required to further chunk the data into smaller bits that are processable by the Nightfall scan API.

This can be seen in the second and third code panes below:

primary_key_col = []

if len(data) == 0:
  raise Exception('Table is empty! No data to scan.')

all_findings = []
for col_idx, col in enumerate(columns):
    payload = [str(i[col_idx]) for i in data]
    if col == primary_key:
      primary_key_col = payload
      col_size = sys.getsizeof(payload)

    if col_size < size_limit:
   	 resp = nightfall.scanText(
        [payload],
        detection_rule_uuids=[detectionRuleUUID])
    
     col_resp = json.loads(resp)

for item_idx, item in enumerate(col_resp):
  if item != None:
    for finding in item:
      finding['column'] = col
      try:
        finding['index'] = primary_key_col[item_idx]
      except:
          finding['index'] = item_idx
      all_findings.append(finding)
col_resp = []
chunks = []
chunk = []
running_size = 0
big_items = []

for item_idx, item in enumerate(payload):
  item_size = sys.getsizeof(item)
  if (running_size + item_size < size_limit) and (len(chunk) < length_limit):
    chunk.append(item)
    running_size += item_size
  elif item_size < size_limit:
    chunks.append(chunk)
    chunk = [item]
    running_size = item_size
  else:
    if len(chunk) < length_limit:
      chunk.append('')
    else:
      chunks.append(chunk)
      chunk = ['']
      big_items.append(item_idx)
      chunks.append(chunk)

chunk_cursor = 0

for chunk in chunks:
  resp = nightfall.scanText({
        "text": [chunk],
        "detectionRuleUUIDs": [conditionSetUUID]})
  col_resp.extend(json.loads(resp.text))
  chunk_cursor += len(chunk)
  
for item_idx, item in enumerate(col_resp):
  if item != None:
    for finding in item:
      finding['column'] = col
      try:
        finding['index'] = primary_key_col[item_idx]
      except:
          finding['index'] = item_idx
      all_findings.append(finding)
for big in big_items:
  item_size = sys.getsizeof(big)
    chunks_req = (item_size // size_limit) + 1
    chunk_len = len(item) // chunks_req
    cursor = 0
    item_findings = []
    for _ in range(chunks_req):
        p = item[cursor : min(cursor + chunk_len, len(item))]
        resp = nightfall.scanText({
        "text": [[p]],
        "detectionRuleUUIDs": [conditionSetUUID]})
        item_findings.extend(json.loads(resp.text))
        cursor += chunk_len
  
  if item_findings == []:
    raise Exception(f"Error while scanning large item at column {col}, Index {primary_key_col[big]}")
  for find_chunk in item_resp:
      if find_chunk != None:
        for finding in find_chunk:
          finding['column'] = col
          try:
            finding['index'] = primary_key_col[big]
          except:
            finding['index'] = big
          all_findings.append(finding)

To review the results, we will print the number of findings, and write the findings to an output file:

print(f"{len(all_findings)} sensitive findings in {os.environ.get('SNOWFLAKE_TABLE')}")
with open('snowflake_findings.json', 'w') as output_file:
  json.dump(all_findings, output_file)

The following are potential ways to continue building upon this service:

  • Writing Nightfall results to a database and reading that into a visualization tool

  • Redacting sensitive findings in place once they are detected, either automatically or as a follow-up script once findings have been reviewed

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your Snowflake findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with Snowflake

The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down into the sections below, as the file scanning process is more intensive.

Prerequisites

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the Endpoint

  1. Retrieve data from Snowflake

Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our Snowflake Connection. Once the session is established, we can query from Snowflake.

connection = snowflake.connector.connect(
  user=os.environ.get('SNOWFLAKE_USER'),
  password=os.environ.get('SNOWFLAKE_PASSWORD'),
  account=os.environ.get('SNOWFLAKE_ACCOUNT'),
  schema=os.environ.get('SNOWFLAKE_SCHEMA'),
  database=os.environ.get('SNOWFLAKE_DATABASE')
)
table_name = os.environ.get('SNOWFLAKE_TABLE')
primary_key = os.environ.get('SNOWFLAKE_PRIMARY_KEY')

cursor = connection.cursor()

sql = f"""
        SELECT *
        FROM {table_name}
        LIMIT 1000;
        """

cursor.execute(sql)

cols = [i[0] for i in cursor.description]
data = cursor.fetchall()

Now we go through the data and write to a .csv file.

primary_key_col = []

if len(data) == 0:
  raise Exception('Table is empty! No data to scan.')

filename = "nf_snowflake_input-" + str(int(time.time())) + ".csv"  

for col_idx, col in enumerate(columns):
    payload = [str(i[col_idx]) for i in data]   
    with open(filename, 'w') as output_file:
      csv_writer = csv.writer(output_file, delimiter=',')
      csv_writer.writerows(payload)
     
print("Snowflake Data Written to: ", filename)
  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Redacting Sensitive Data in 4 Lines of Code

In this tutorial, we'll demonstrate how easy it is to redact sensitive data and give you a more in-depth look at various redaction techniques, how Nightfall works, and touch upon use cases for redaction techniques.

Masking

Mask sensitive data with a configurable character, allow leaving some characters unmasked, and allow ignoring certain characters.

Examples

Cases
Additional Config
Before
After

Default

None

my ssn is 518-45-7708

my ssn is ***********

Mask with custom character

masking_char="X"

my ssn is 518-45-7708

my ssn is XXXXXXXXXXX

Leave first four characters unmasked

num_chars_to_leave_unmasked=4

my ssn is 518-45-7708

my ssn is 518-*******

Leave last four characters unmasked

mask_right_to_left=True

my ssn is 518-45-7708

my ssn is *******7708

Don't mask - characters

chars_to_ignore=["-"]

my ssn is 518-45-7708

my ssn is ***-**-****

All of the above!

masking_char="X", num_chars_to_leave_unmasked=4, mask_right_to_left=True, chars_to_ignore=["-"]

my ssn is 518-45-7708

my ssn is XXX-XX-7708

Let's put this together in Python with the Nightfall SDK. In our example, we have an input string with a credit card number (4916-6734-7572-5015 is my credit card number) and we wish to mask with an asterisk, unmask the last 4 digits, and ignore hyphens.

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

nightfall = Nightfall()  # reads API key from NIGHTFALL_API_KEY environment variable by default
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule(
        [Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                mask_config=MaskConfig(
                    masking_char="X",
                    num_chars_to_leave_unmasked=4,
                    mask_right_to_left=True,
                    chars_to_ignore=["-"]))
        )]
    )]
)
print(findings)
print(redacted_payload)

We'll see our findings look like this (with line formatting added for clarity):

[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='XXX-XXXX-XXXX-5015', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]

Also, we have received the input payload back as a redacted string in our redacted_payload object:

['XXXX-XXXX-XXXX-5015 is my credit card number']

When to use Masking?

Masking is especially useful in scenarios where you want to retain some of the original format of the data or a certain amount of non-sensitive information as context. For example, it's common to refer to credit card numbers by their last 4 digits, so masking everything but the last 4 digits would ensure that the output is still useful to the viewer.

Substitution

Substitute sensitive data findings with the InfoType, custom word, or an empty string.

Examples

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, Nightfall

nightfall = Nightfall()
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule([
        Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                substitution_phrase="SubMeIn")
        )]
    )]
)
print(findings)
print(redacted_payload)

We'll see our findings object returned to us looks like this (with line formatting added for clarity):

[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='SubMeIn', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]

And our redacted input payload in our redacted_payload object:

['SubMeIn is my credit card number']

Instead of using a custom string as the substitution (SubMeIn), we may want to use the name of the detector for additional context. We can make a one line change to the example above, replacing substitution_phrase="SubMeIn" with infotype_substitution=True.

This yields:

['[CREDIT_CARD_NUMBER] is my credit card number']

When to use Substitution

Substitution is effective in scenarios where you intend to replace sensitive data with a contextual label, for example, you wish to replace a literal credit card number with the label "Credit Card Number". This provides context to the reader of the data that the data is a credit card number, without exposing them to the actual token itself.

Encryption

Encrypt sensitive data findings with a public encryption key that is passed via the API. Make the encryption algorithm configurable.

Encryption is a complex topic so we'll go into a more in-depth tutorial on encrypting and decrypting sensitive data with Nightfall in a separate post, but let's run through the basics below.

Nightfall uses RSA encryption which is asymmetric, meaning it works with two different keys: a public one and a private one. Anyone with your public key can encrypt data. Encrypted data can only be decrypted with the private key. So, you'll pass Nightfall your public key to encrypt with, and only you will have your private key to decrypt the encrypted data.

Example

  • Default case public_key=โ€MIG...AQABโ€ (โ€œmy ssn is 518-45-7708โ€ โ†’ โ€œmy ssn is EhOp/DphEIA0LQd4q1BUq8FtuxKj66VA381Z9DtbiQaaHvy5Wlvtxg0je91DFXEJncOWbhgPbt7EvBl36k5MFlFdPbc5+bg40FxP676SnllEClEO+DDsuiRCk9VC4noAd0zLxgvV8qD/NPE/XhTfOpscqlKhllfTg7G5jZYYSG8=โ€)

For our example, we'll use the cryptography package in Python, so let's install it first: pip3 install cryptography

Let's first generate a public/private RSA key pair in PEM format on the command line. We'll cover how to generate keys programmatically in Python in our encryption-specific tutorial.

First, we'll generate our private key and write it to a file called example_private.pem:

openssl genrsa -out example_private.pem 2048

Next, we'll generate our public key in PEM format from this private key:

openssl rsa -in example_private.pem -outform PEM -pubout -out example_public.pem

Let's take a look at our public key with cat example_public.pem:

-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAnszkbHNclOhYgEc1lMPn
6KLm3cXS+w2CRBSEC5HFlqOUmdcXWnBFa9tlJYvXhQYuMFhXBcjUYgVUSAftK703
oTFMwRGZNnBjcUnNSK+pD4iaCEmdskkSA85GFCPsO1yrcfJp4965c43FrgWqyo7A
Aka5sGW9gX2wibQpQhil9TS0vtWHvEOq1TZnFAJD/DEJFN7zIQhglA/53Vd5PEL9
8fSfXxzbtu68wwhRtRqTaVRjzslx6i2Xs/QWcS/sWnKhnuF/enjlcll+SLyDEoPO
6iGp8MpHkZzJHmjATQJBA1vyu+mqo+G3wWm7WPME6V83VBNfG4wdkZCx/n9N5KzH
yQIDAQAB
-----END PUBLIC KEY-----

Remember to keep your private key safe. Anyone with this key can decrypt your encrypted data.

Now we can use our public key to encrypt any content with Nightfall! To do so, we'll first read the public key into a string.

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

with open(f'example_public.pem', "rb") as key_file:
	public_key = serialization.load_pem_public_key(
		key_file.read()
	)

pem = public_key.public_bytes(
		encoding=serialization.Encoding.PEM,
		format=serialization.PublicFormat.SubjectPublicKeyInfo
	)

pem_str = pem.decode('utf-8')

Now, we'll pass the public key into our redaction configuration, similar to the above examples, so Nightfall can use it to encrypt your sensitive data.

nightfall = Nightfall()
payload = [ "4916-6734-7572-5015 is my credit card number" ]
findings, redacted_payload = nightfall.scan_text(
				        payload,
				        [ DetectionRule([ 
			        		Detector(
			        			min_confidence=Confidence.LIKELY,
		               			nightfall_detector="CREDIT_CARD_NUMBER",
		               			display_name="Credit Card Number",
			               		redaction_config=RedactionConfig(
								remove_finding=False, 
								public_key=pem_str)
			               	)])
				        ])
print(findings)
print(redacted_payload)

We'll see our findings look like this (with line formatting added for clarity):

[[
  Finding(
    finding='4916-6734-7572-5015', 
    redacted_finding='ar4PGD1T3yCBjBdgJ+iX2Ak3hZYIyaaKcRY+AcNS3RjsGnss9hUA9Q0ycLtBOaMjFMeTdCupCEPNUFVYyzeWhHmL009DwWshV47Vkm84zB5O6HroJHAG0JpKHb6bLL58hAb9FHZ73usU4bI67ZEtJhX41HovlOfSCaeUnH4y3pPqRnh7d5roX7EIYQ39wzPGGo2TNbeyqm2pluC1G4Mqt9hLqy0tCwfbmKPXro41i9i1xED9GkVcnxTu0gS8bCMFkvAK4S+Hw0K/gqPq0hu2JGoryKo335IYBCit6S39JESJdNh7IafuE6mrmvYMlR9l4c60VkowEMZAPkUjOelPDw==', 
    before_context=None, 
    after_context=None, 
    detector_name='Credit Card Number', 
    detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
    confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
    byte_range=Range(start=0, end=19), 
    codepoint_range=Range(start=0, end=19), 
    matched_detection_rule_uuids=[], 
    matched_detection_rules=['Inline Detection Rule #1'])
]]

And our redacted input payload in our redacted_payload object (truncated for clarity):

['GpcjUg74...BpQHw== is my credit card number']

When to use Encryption

Third-party encryption is well-suited for use cases where you want to preserve the original sensitive data but ensure that it is only visible to sanctioned parties that have your private key. For example, if you are storing the data or passing it to a sanctioned third-party for processing, encrypting the sensitive tokens can add one additional layer of encryption and security, while still allowing a downstream processor to access the raw data as required with the key.

Congrats! You've now learned about and implemented multiple redaction techniques in just a few lines of code. You're ready to start adding redaction to your apps.

Amazon RDS DLP Tutorial - Full Scan

How to run a full scan of an Amazon database

The export process runs in the background and doesn't affect the performance of your active DB instance. Exporting RDS snapshots can take a while depending on your database type and size

Once the snapshot has been exported you will be able to scan the resulting parquet files with Nightfall like another file. You can do this using our endpoints for uploading files or using our Amazon S3 Python integration.

Prerequisites

In addition to having created your RDS instance, you will need to define the following to export your snapshots so they can later be scanned by Nightfall:

Amazon S3 bucket

๐Ÿ“˜S3 Bucket Requirements

This bucket must have snapshot permissions and the bucket to export must be in the same AWS Region as the the snapshot being exported.

If you have not already created a designated S3 bucket, in the AWS console select Services > Storage > S3

Click the "Create bucket" button and give your bucket a unique name as per the instructions.

Identity and Access Management (IAM) Role

You need an Identity and Access Management (IAM) Role to perform the transfer for a snapshot to your S3 bucket.

This role may be defined at the time of backup and it will be given the proper specific permissions.

You may also create the role under Services > Security, Identity, & Compliance > IAM and select โ€œRolesโ€ from under the โ€œAccess Managementโ€ section of the left-hand navigation.

From there you can click the โ€œCreate roleโ€ button and create a role where โ€œAWS Serviceโ€ is the trusted entity type.

AWS KMS Key

You must create a symmetric encryption AWS Key using the Key Management Service (KMS).

From your AWS console, select the Services > Security, Identity, & Compliance > Key Management Service from the adjacent submenu.

From there you can click the โ€œCreate keyโ€ button and follow the instructions.

Walkthrough

To do this task manually, go to Amazon RDS Service (Services > Database > RDS) and select the database to export from your list of databases.

Select the โ€œMaintenance & backupsโ€ tab. Go to the โ€œSnapshotsโ€ section.

You can select an existing automated snapshot or manually create a new snapshot with the โ€œTake snapshotโ€ button

Once the snapshot is complete, click the snapshotโ€™s name.

From the โ€œActionsโ€ menu in the upper right select โ€œExport to Amazon S3"

  1. Enter a unique export identifier

  2. Choose whether you want to export all or part of your data (You will be exporting to Parquet)

  3. Choose the S3 bucket

  4. Choose or create your designated IAM role for backup

  5. Choose your AWS KMS Key

  6. Click the Export button

Once the Status column of export is "Complete", you can click the link to the export under the S3 bucket column.

Within the export in the S3 bucket, you will find a series of folders corresponding to the different database entities that were exported.

Exported data for specific tables is stored in the format base_prefix/files, where the base prefix is the following:

export_identifier/database_name/schema_name.table_name/

For example:

export-1234567890123-459/rdststdb/rdststdb.DataInsert_7ADB5D19965123A2/

The current convention for file naming is as follows:

partition_index/part-00000-random_uuid.format-based_extension

For example:

1/part-00000-c5a881bb-58ff-4ee6-1111-b41ecff340a3-c000.gz.parquet
2/part-00000-d7a881cc-88cc-5ab7-2222-c41ecab340a4-c000.gz.parquet
3/part-00000-f5a991ab-59aa-7fa6-3333-d41eccd340a7-c000.gz.parquet

You may download these parquet files and upload them to Nightfall to scan as you would any other parquet file.

๐Ÿ“˜Obtaining file size

You can obtain the value for fileSizeBytes you can run the command wc -c

#Start the upload
curl --location --request POST 'https://api.nightfall.ai/v3/upload' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer NF-<Your API Key>' \
--data-raw '{
    "fileSizeBytes": <Your File Size>,
    "mimeType" : "application/zip"
}'

#Resulting payload
{"id":"<Your File Upload ID>","fileSizeBytes":3693,"chunkSize":10485760,"mimeType":"application/zip"}

#Post the file using the 'id' from the returned payload in your path
curl --location --request PATCH 'https://api.nightfall.ai/v3/upload/<Your File Upload ID>' \
--header 'X-Upload-Offset: 0' \
--header 'Content-Type: application/octet-stream' \
--header 'Authorization: Bearer NF-<Your API Key>' \
--data-binary '@///Users/myuser/yourfilepath/userdata1.parquet'

#Finish the upload

curl --location --request POST 'https://api.nightfall.ai/v3/upload/<Your File Upload ID>/finish' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer  NF-<Your API Key>'

# Scan the file using an alert config

curl --request POST \
     --url https://api.nightfall.ai/v3/upload/<Your File Upload ID>/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer NF-<Your API Key>' \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRuleUUIDs": [
               "<Your detection Rule UUID>"
          ],
          "alertConfig": {
               "email": {
                    "address": "<Your Email Address>"
               }
          }
     },
     "requestMetadata": "scan of parquet file"
}
'

In the above sequence of curl invocations, we upload the file and then initiate the file scan with a policy that uses pre-configured detection rule as well as an alertConfig that send the results to an email address.

Note that results you receive in this case will be an attachment with a JSON payload as follows:

{
   "errors":null,
   "findingsPresent":true,
   "findingsURL":"https://files.nightfall.ai/18b8b6b8-59c9-4891-9f92-9c357acc19cd.json?Expires=1655306887&Signature=ASDFcCfQpEush3QrWmdZX9A9RePQjNHZTfRkBlgPwdPf~RcNnPYgYzt3G4AAzkI8IDbUdc4CzBbAROTx0oYOtNODCTdoKHKB7Q0a7~hzRNx3BHYHH1msdhkS1qTl3z82RCh6DZi~nk~Oa~yt-XZvAf3ui4MyNU0wyfqjbKO9o79Ec9YWqMdUmTOP1Ss39YmA71e6ky0VOdjdN4baoQV5VElTQ1rHrkdgYHz-95Dnzd3YK3IxGQR92AU7KA3X-rrcmpIJwMUIJSsl8~or0WIg5ar4U9Ood1BFSE~GmlQsKclEo1LEaX2KclWaQtjmN9~3IQnxOmkhPeAhEt-5n~Hbug__&Key-Pair-Id=ASDFOPZ1EKX0YC",
   "requestMetadata":"scan of go sdk for URLs sdk",
   "uploadID":"18b8b6b8-59c9-4891-9f92-9c357acc19cd",
   "validUntil":"2022-06-15T15:28:07.86163221Z"
}

The findings themselves will be available at the URL specified in findingsURL until the date-time stamp contained in the validUntil property.

Below is a SQL script small table of generated data containing example personal data, including phone numbers and email addresses.

DROP TABLE IF EXISTS `myTable`;

CREATE TABLE `myTable` (
  `id` mediumint(8) unsigned NOT NULL auto_increment,
  `name` varchar(255) default NULL,
  `phone` varchar(100) default NULL,
  `email` varchar(255) default NULL,
  `address` varchar(255) default NULL,
  `postalZip` varchar(10) default NULL,
  `region` varchar(50) default NULL,
  `country` varchar(100) default NULL,
  `alphanumeric` varchar(255),
  `text` TEXT default NULL,
  PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;

INSERT INTO `myTable` (`name`,`phone`,`email`,`address`,`postalZip`,`region`,`country`,`alphanumeric`,`text`)
VALUES
  ("Malcolm Mcgee","1-831-777-4886","[emailย protected]","109-1617 Augue Av.","873766","Delta","Ukraine","HPJ88FSI6HJ","in faucibus orci luctus et ultrices posuere cubilia Curae Phasellus"),
  ("Harrison Dudley","(645) 987-7967","[emailย protected]","P.O. Box 311, 4823 Odio Street","81398-37524","Nordland","Germany","INL95CND6TF","egestas nunc sed libero. Proin sed turpis nec mauris blandit"),
  ("Driscoll Callahan","1-598-623-3631","[emailย protected]","689-226 Eu St.","4534 JV","California","Chile","QQI55BTP0CS","velit dui, semper et, lacinia vitae, sodales at, velit. Pellentesque"),
  ("Anne Rollins","(558) 943-1159","[emailย protected]","5361 Enim, Street","176814","Minas Gerais","Brazil","EVO25RST5RM","cursus vestibulum. Mauris magna. Duis dignissim tempor arcu. Vestibulum ut"),
  ("Noah Townsend","(514) 311-3416","[emailย protected]","797-7375 Consectetuer Ave","5177","Tรขy Ninh","Chile","CNG87EJF4EK","libero at auctor ullamcorper, nisl arcu iaculis enim, sit amet");

Below is an example finding when a scan is done of the resulting parquet exported to S3 where the Detection Rule use Nightfall's built in Detectors for matching phone numbers and emails. In this example shows a match in the 1st row and and 4th column. This is what we would expect based on our table structure.

{
   "findings":[
      {
         "detector":{
            "id":"89f810aa-64a5-4269-b0a0-110d250d55ee",
            "name":"email address"
         },
         "finding":"[emailย protected]",
         "confidence":"LIKELY",
         "location":{
            "byteRange":{
               "start":36,
               "end":51
            },
            "codepointRange":{
               "start":36,
               "end":51
            },
            "lineRange":{
               "start":1,
               "end":1
            },
            "rowRange":{
               "start":1,
               "end":1
            },
            "columnRange":{
               "start":4,
               "end":4
            },
            "commitHash":""
         },
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      },

similarly, it also finds phone numbers in the 3rd column.

{
         "detector":{
            "id":"d08edfc4-b5e2-420a-a5fe-3693fb6276c4",
            "name":"Phone number",
            "version":1
         },
         "finding":"514) 311-3416",
         "confidence":"POSSIBLE",
         "location":{
            "byteRange":{
               "start":814,
               "end":827
            },
            "codepointRange":{
               "start":814,
               "end":827
            },
            "lineRange":{
               "start":5,
               "end":5
            },
            "rowRange":{
               "start":5,
               "end":5
            },
            "columnRange":{
               "start":3,
               "end":3
            },
            "commitHash":""
         },
         "matchedDetectionRuleUUIDs":[
            "950833c9-8608-4c66-8a3a-0734eac11157"
         ],
         "matchedDetectionRules":[
            
         ]
      }

You may also use our tutorial for Integrating with Amazon S3 (Python) to scan through the S3 objects.

GenAI Content Filtering-How to prevent exposure of sensitive data

LangChain/OpenAI Tutorial: Integrating Nightfall for Secure Prompt Sanitization

Consider the following real-world scenarios:

  • Support Chatbots: You use LangChain/Claude to power a level-1 support chatbot to help users resolve issues. Users will likely overshare sensitive information like credit card and Social Security numbers. Without content filtering, this information would be transmitted to Anthropic and added to your support ticketing system.

  • Healthcare Apps: You are using LangChain/Claude to moderate content sent by patients or doctors in your developing health app. These queries may contain sensitive protected health information (PHI), which could be unnecessarily transmitted to Anthropic.

Implementing robust content filtering mechanisms is crucial to protect sensitive data and comply with data protection regulations. In this guide, we will explore how to sanitize prompts using Nightfall before sending them to Claude.

LangChain/OpenAI Example

Let's take a look at what this would look like in a Python example using the LangChain, Anthropic, and Nightfall Python SDKs:

Setup your environment

Install the necessary packages:

```bash
pip install langchain anthropic nightfall python-dotenv

Set up environment variables. Create a .env file in your project directory:

ANTHROPIC_API_KEY=your_anthropic_api_key
NIGHTFALL_API_KEY=your_nightfall_api_key

Implementing Nightfall Sanitization as a LangChain Component

to integrate content filtering into our LangChain pipeline seamlessly. We'll create a custom LangChain component for Nightfall sanitization. This allows us to incorporate content filtering into your LangChain pipeline seamlessly.

import os
from dotenv import load_dotenv
from langchain.llms import Anthropic
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains.base import Chain
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from typing import Dict, List

# Load environment variables
load_dotenv()

# Initialize Nightfall client
nightfall = Nightfall()

# Define Nightfall detection rule
detection_rule = [DetectionRule(
    [Detector(
        min_confidence=Confidence.VERY_LIKELY,
        nightfall_detector="CREDIT_CARD_NUMBER",
        display_name="Credit Card Number",
        redaction_config=RedactionConfig(
            remove_finding=False,
            mask_config=MaskConfig(
                masking_char="X",
                num_chars_to_leave_unmasked=4,
                mask_right_to_left=True,
                chars_to_ignore=["-"])
        )
    )]
)]

class NightfallSanitizationChain(Chain):
    input_key: str = "input"
    output_key: str = "sanitized_input"

    @property
    def input_keys(self) -> List[str]:
        return [self.input_key]

    @property
    def output_keys(self) -> List[str]:
        return [self.output_key]

    def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
        text = inputs[self.input_key]
        payload = [text]
        try:
            findings, redacted_payload = nightfall.scan_text(
                payload,
                detection_rules=[detection_rule]
            )
            sanitized_text = redacted_payload[0] if redacted_payload[0] else text
        except Exception as e:
            print(f"Error in sanitizing input: {e}")
            sanitized_text = text
        return {self.output_key: sanitized_text}

# Initialize the Anthropic LLM
llm = Anthropic(model="claude-v1")

# Create a prompt template
template = "The customer said: '{customer_input}' How should I respond to the customer?"
prompt = PromptTemplate(template=template, input_variables=["customer_input"])

# Create chains
sanitization_chain = NightfallSanitizationChain()
response_chain = LLMChain(llm=llm, prompt=prompt)

# Combine chains
from langchain.chains import SimpleSequentialChain

full_chain = SimpleSequentialChain(
    chains=[sanitization_chain, response_chain],
    verbose=True
)

# Use the combined chain
customer_input = "My credit card number is 4916-6734-7572-5015, and the card is getting declined."
response = full_chain.run(customer_input)

print("\nFinal Response:", response)

Explanation

  1. We start by importing necessary modules and loading environment variables.

  2. We initialize the Nightfall client and define detection rules for credit card numbers.

  3. The NightfallSanitizationChain class is a custom LangChain component that handles content sanitization using Nightfall.

  4. We set up the Anthropic LLM and create a prompt template for customer service responses.

  5. We create separate chains for sanitization and response generation, then combine them using SimpleSequentialChain.

  6. The process_customer_input function provides an easy-to-use interface for our chain.

Error Handling and Logging

In a production environment, you might want to add more robust error handling and logging. For example:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def sanitize_input(text):
    payload = [text]
    try:
        findings, redacted_payload = nightfall.scan_text(
            payload,
            detection_rules=[detection_rule]
        )
        if findings:
            logger.info(f"Sensitive information detected and redacted")
        return redacted_payload[0] if redacted_payload[0] else text
    except Exception as e:
        logger.error(f"Error in sanitizing input: {e}")
        # Depending on your use case, you might want to return the original text or an error message
        return text

Usage

To use this script, you can either run it directly or import the process_customer_input function in another script.

Running the Script Directly

Simply run the script:

python secure_langchain.py

This will process the example customer input and print the sanitized input and final response.

Using in Another Script

You can import the process_customer_input function in another script:

from secure_langchain import process_customer_input

customer_input = "My credit card 4111-1111-1111-1111 isn't working. Contact me at alice@example.com."
response = process_customer_input(customer_input)
print(response)

Expected Output

What does success look like?

If the example runs properly, you should expect to see an output demonstrating the sanitization process and the final response from Claude. Here's what the output might look like:

> Entering new SimpleSequentialChain chain...

> Finished chain.

Sanitized input: The customer said: 'My credit card number is XXXX-XXXX-XXXX-411, and the card is getting declined.' How should I respond to the customer?

Final Response: I understand you're having trouble with your credit card being declined. I apologize for the inconvenience. To assist you better, I'll need some additional information...

New Relic DLP Tutorial

New Relic is a Software as a Service offering that focuses on performance and availability monitoring.

This tutorial allows you to scan your New Relic logs using the Nightfall API/SDK.

You will need a few things first to use this tutorial:

  • A New Relic account with an API key and Account ID

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment (version 3.6 or later)

  • The most recent version of Python Nightfall SDK

To accomplish this, we will install the version required of the Nightfall SDK:

pip install nightfall=0.6.0

We will be using Python and installing/importing the following libraries:

import argparse
import csv
import os
import sys
import time
import collections

import requests
from nightfall import Nightfall

Note, we are setting the New Relic authentication information as the below environment variables, and referencing the values from there:

  • NR_API_KEY

  • NR_ACCOUNT_ID

nr_api_key = os.environ.get('NR_API_KEY')
nr_account_id = os.environ.get('NR_ACCOUNT_ID')
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')
detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

Next we abstract a nightfall class from the SDK, for our API key.

nightfall = Nightfall(nightfall_api_key)

First we will set up the connection with New Relic, and get the data to be scanned from there.

The code sample below will help to scan:

  1. logs - Scans the 100 most recent logs from New Relic. (Note: This can be modified to meet your needs)

Please follow that same option in the next few panes:

# This will return the most recent 100 logs from New Relic.

url = 'https://api.newrelic.com/graphql'

headers = {
  'Content-Type': 'application/json',
  'Api-Key': nr_api_key
}

query = """query {{
    actor {{
        account(id: {nr_account_id}) {{
        name
        nrql(query: "SELECT * FROM Log") {{
            results
        }}
        }}
    }}
    }}""".format(nr_account_id=nr_account_id)

try:
  response = requests.post(
    url=url,
    headers=headers,
    json={'query': query}
  )

  response.raise_for_status()

except requests.HTTPError:
  msg = f"ERROR: NewRelic API returned: {response.status_code} {response.text}"
  sys.exit(msg)

logs = response.json()['data']['actor']['account']['nrql']['results']

We then run a scan on the aggregated data from New Relic, using the Nightfall SDK:

scan_logs = {log['messageId']:log['message'] for log in logs}

findings = nightfall.scan_text(
    [scan_logs],
    detection_rule_uuids=[detectionRuleUUID]
)

To review the results, we will write the findings to an output csv file:

all_findings = []
all_findings.append(
  [
    'message_id', 'detector', 'confidence', 
    'finding_start', 'finding_end', 'fragment'
  ]
)

for messageID, finding in findings.items():
  if finding is None:
    continue

    for item in finding:
			fragment = item['finding']


      row = [
        messageID,
        fragment,
        item['detector']['name'],
        item['confidence'],
        item['location']['byteRange']['start'],
        item['location']['byteRange']['end'],
        item['location']['codepointRange']['start'],
        item['location']['codepointRange']['end']
        ]

      all_findings.append(row)

      if len(all_findings) > 1:
        filename = "nf_newrelic_output-" + str(int(time.time())) + ".csv"
        with open(filename, 'w') as output_file:
          csv_writer = csv.writer(output_file, delimiter=',')
          csv_writer.writerows(all_findings)
        print("Output findings written to", filename)
      else:
        print('No sensitive data detected. Hooray!')

Note:

The results of the scan will be outputted to a file named nf_newrelic_output-TIMESTAMP.csv.

This example will include the full finding above. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.

Finding the Logs in New Relic

The New Relic API does not provide a great way to get a direct URL to a log message. The simplest way to find the log message with sensitive data is to navigate to the New Relic UI and search your logs with this query messageId:"$YOUR_MESSAGE_ID". You can copy the messageId from the CSV file generated using this script.

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your New Relic findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with New Relic

The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down into the sections below, as the file scanning process is more intensive.

Prerequisites

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the Endpoint

  1. Retrieve data from New Relic

Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the data we like, from New Relic. The below example will show the most recent 100 logs:

# This will return the most recent 100 logs from New Relic.

url = 'https://api.newrelic.com/graphql'

headers = {
  'Content-Type': 'application/json',
  'Api-Key': nr_api_key
}

query = """query {{
    actor {{
        account(id: {nr_account_id}) {{
        name
        nrql(query: "SELECT * FROM Log") {{
            results
        }}
        }}
    }}
    }}""".format(nr_account_id=nr_account_id)

try:
  response = requests.post(
    url=url,
    headers=headers,
    json={'query': query}
  )

  response.raise_for_status()

except requests.HTTPError:
  msg = f"ERROR: NewRelic API returned: {response.status_code} {response.text}"
  sys.exit(msg)

logs = response.json()['data']['actor']['account']['nrql']['results']

scan_logs = {log['messageId']:log['message'] for log in logs}

Now we go through write the logs to a .csv file.

filename = "nf_newrelic_input-" + str(int(time.time())) + ".csv"  

with open(filename, 'w') as output_file:
  csv_writer = csv.writer(output_file, delimiter=',')
  csv_writer.writerows(scan_logs)
     
print("New Relic Logs Written to: ", filename)
  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Elasticsearch DLP Tutorial

Elasticsearch is a popular tool for storing, searching, and analyzing all kinds of structured and unstructured data, especially as a part of the larger ELK stack. However, along with all data storage tools, there is huge potential for unintentionally leaking sensitive data. By utilizing Elastic's own REST APIs in conjunction with Nightfall AIโ€™s Scan API, you can discover, classify, and remediate sensitive data within your Elastic stack.

You can follow along with your own instance or spin up a sample instance with the commands listed below. By default, you will be able to download and interact with sample datasets from the elk instance at localhost:5601. Your data can be queried from localhost:9200. The "Add sample data" function can be found underneath the Observability section on the Home page; in this tutorial we reference the "Sample Web Logs" dataset..

docker pull sebp/elk

docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk

You will need a few things to follow along with this tutorial:

  • An Elasticsearch instance with data to query

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment (version 3.7 or later)

  • Python Nightfall SDK

We will need to install the nightfall sdk library and the requests library using pip.

pip install nightfall=1.2.0
pip install requests

We will be using Python and importing the following libraries:

import requests
import json
import os
import csv
from nightfall import Nightfall

We first configure the URLs to communicate with. If you are following along with the Sample Web Logs dataset alluded to at the beginning of this article, you can copy this Elasticsearch URL. If not, your URL will probably take the format http://<hostname>/<index_name>/_search.

elasticsearch_base_url = 'http://localhost:9200/kibana_sample_data_logs/_search'

Also, we abstract a nightfall class from the SDK, from our API key.

nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')

detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

nightfall = Nightfall(nightfall_api_key)

We now construct the payload and headers for our call to Elasticsearch. The payload represents whichever subset of data you wish to query. In this example, we are querying all results from the previous hour.

We then make our call to the Elasticsearch data store and save the resulting response.

elasticsearch_payload = {"query": {"range": {"timestamp": {"gte" : "now-1h"}}}}

elasticsearch_headers = {
  'Content-Type': 'application/json'
}

elasticsearch_response = requests.get(
  url = elasticsearch_base_url,
  headers = elasticsearch_headers,
  params = elasticsearch_payload
)

logs = elasticsearch_response.json()['hits']['hits']

Now we send our Elasticsearch query results to the Nightfall SDK for scanning.

findings, redactions = nightfall.scan_text(
    [json.dumps(l) for l in logs],
    detection_rule_uuids=[detectionRuleUUID]
)

We will create an all_findings object to store Nightfall Scan results. The first row of our all_findings object will constitute our headers, since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.

all_findings = []
all_findings.append(
    [
        'index_name', 'log_id', 'detector', 'confidence', 
        'finding_start', 'finding_end', 'finding'
    ]
)

Next we go through our findings from the Nightfall Scan API and match them to the identifying fields from the Elasticsearch index so we can find them and remediate them in situ.

Finding locations here represent the location within the log as a string. Finding locations can also be found in byteRange.

# Each top level item in findings corresponds to one log
for log_idx, log_findings in enumerate(findings):
    for finding_idx, finding in enumerate(log_findings):

        index_name = logs[log_idx]['_index']
        log_id = logs[log_idx]['_id']

        row = [
            index_name,
            log_id,
            finding.detector_name,
            finding.confidence.value,
            finding.byte_range.start,
            finding.byte_range.end,
            finding.finding,
        ]
        all_findings.append(row)

Finally, we export our results to a csv so they can be easily reviewed.

if len(all_findings) > 1:
    with open('output_file.csv', 'w') as output_file:
        csv_writer = csv.writer(output_file, delimiter = ',')
        csv_writer.writerows(all_findings)
else:
    print('No sensitive data detected. Hooray!')

That's it! You now have insight into all sensitive data shared inside your Elasticsearch instance within the past hour.

However, in use-cases such as this where the data is well-structured, it can be more informative to call out which fiels are found to contain sensitive data, as opposed to the location of the data. While the above script is easy to implement without modifying the queried data, it does not provide insight into these fields.

Using Redaction to Mask Findings

the Nightfall API, you are also able to redact and mask your Elasticsearch findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with Elasticsearch

The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites:

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the Endpoint

  1. Retrieve data from Elasticsearch

Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the data we like, from Elasticsearch:

elasticsearch_base_url = 'http://localhost:9200/kibana_sample_data_logs/_search'

elasticsearch_payload = {"query": {"range": {"timestamp": {"gte" : "now-1h"}}}}

elasticsearch_headers = {
    'Content-Type': 'application/json'
}

elasticsearch_response = requests.get(
    url = elasticsearch_base_url,
    headers = elasticsearch_headers,
    params = elasticsearch_payload
)

logs = elasticsearch_response.json()['hits']['hits']

Now we go through write the logs to a .csv file.

filename = "nf_elasticsearch_input-" + str(int(time.time())) + ".csv"  

with open(filename, 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerows([json.dumps(l) for l in logs])
     
print("Elasticsearch Data Written to: ", filename)
  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Creating a Webhook Server

Learn how to set up a server to handle results of file scans and alerts sent based on policy alert configurations.

Webhook Challenges

Nightfall will send a POST request with a JSON payload with a single field challenge containing randomly-generated bytes when it sends a message to a user-provided webhook address. This is to ensure that the caller owns the server.

{"challenge": "z78woE1uDFu7tPrPvEBV"}

In order to authenticate your webhook server to Nightfall, you must reply with (1) a 200 HTTP Status Code, and (2) a plaintext request body containing only the value of the challenge key.

If Nightfall receives the expected value back, then the file scan operation will proceed; otherwise it will be aborted.

When a server responds successfully to a challenge request, the validity of that URL will be cached for up to 24 hours, after which it will need to be validated again.

If the webhook cannot be reached, you will receive an error with the code "40012" and the description "Webhook URL validation failed" when you initiate the scan.

If the webhook challenge fails, you will receive an error with the code "42201" and the description "Webhook returned incorrect challenge response" when you initiate the scan.

Webhook Signature Verification

This secret is used to sign requests to the customer's configured webhook URL.

Signing Secret Security

The signing secret should never be stored in plaintext, as a leak compromises the authenticity of webhook requests.

If you has any concerns that their signing secret may have leaked, you can request rotation at any time by reaching out to Nightfall Customer Success.

For security purposes, the webhook includes a signature header containing an HMAC-SHA256 digital signature that customers may use to authenticate the client.

In order to authenticate requests to the webhook URL, customers may use the following algorithm:

  1. Check for the presence of the headers X-Nightfall-Signature and X-Nightfall-Timestamp. If these headers are not both present, discard the request.

  2. Read the entire request body into a string body.

  3. Verify that the value in the X-Nightfall-Timestamp header (the POSIX time in seconds) occurred recently. This is to protect against replay attacks, so a threshold on the order of magnitude of minutes should be reasonable. If a request occurred too far in the past, it should be discarded.

  4. Concatenate the timestamp and body with a colon delimiter, i.e. timestamp:body.

  5. Compute the HMAC SHA-256 hash of the payload from the previous step, using your unique signing secret as the key. Encode this computed value in hex.

  6. Compare the value of the X-Nightfall-Signature header to the value computed in the previous step. If the values match, authentication is successful, and processing should proceed. Otherwise, the request must be discarded.

The snippet below shows how you might implement this authentication validation in Python:

from datetime import datetime, timedelta
    import hmac
    import hashlib

    from flask import request

    SIGNING_SECRET = "super-secret"

    given_signature = request.headers.get('X-Nightfall-Signature')
    req_timestamp = request.headers.get('X-Nightfall-Timestamp')
    now = datetime.now()
    if now-timedelta(minutes=5) <= datetime.fromtimestamp(int(req_timestamp)) <= now:
        raise Exception("could not validate timestamp is within the last few minutes")
    computed_signature = hmac.new(
        SIGNING_SECRET.encode(),
        msg=F"{req_timestamp}:{request.get_data(as_text=True)}".encode(),
        digestmod=hashlib.sha256
    ).hexdigest().lower()
    if computed_signature != given_signature:
        raise Exception("could not validate signature of inbound request!")

Example Webhook Server

An example implementation of a simple webhook server is below.

import hmac
import hashlib
from os import getenv, path, mkdir

from flask import Flask, request
import requests

app = Flask(__name__)

output_dir = "findings"

SIGNING_SECRET = getenv("NF_SIGNING_SECRET")


@app.route("/", methods=['POST'])
def hello():
    content = request.get_json(silent=True)
    challenge = content.get("challenge")
    if challenge:
        return challenge
    else:
        verify_signature()

        print(F"Received request metadata: {content['requestMetadata']}")
        print(F"Received errors: {content['errors']}")

        if not content["findingsPresent"]:
            print(F"No findings for {content['uploadID']}")
            return "", 200
        print(F"S3 findings valid until {content['validUntil']}")
        response = requests.get(content["findingsURL"])
        save_findings(content["uploadID"], response.text)
        return "", 200


def verify_signature():
    if SIGNING_SECRET is None:
        return
    given_signature = request.headers.get('X-Nightfall-Signature')
    nonce = request.headers.get('X-Nightfall-Timestamp')
    computed_signature = hmac.new(
        SIGNING_SECRET.encode(),
        msg=F"{nonce}:{request.get_data(as_text=True)}".encode(),
        digestmod=hashlib.sha256
    ).hexdigest().lower()
    if computed_signature != given_signature:
        raise Exception("could not validate signature of inbound request!")


def save_findings(scan_id, finding_json):
    if not path.isdir(output_dir):
        mkdir(output_dir)
    output_path = path.join(output_dir, f"{scan_id}.json")
    with open(output_path, "w+") as out_file:
        out_file.write(finding_json)
    print(F"Findings for {scan_id} written to {output_path}")


if __name__ == "__main__":
    app.run(port=8075)

In the above example, the webhook server is running on port 8075. To route ngrok requests to this server, once you run the python script (having installed the necessary dependencies such getenv and Flask), you would run ngrok as follow:

./ngrok http 8075

Using Scan API (with Python)

Say you have a number of files containing customer or patient data and you are not sure which of them are ok to share in a less secure manner. By leveraging Nightfallโ€™s API you can easily verify whether a file contains sensitive PII, PHI, or PCI.

To make a request to the Nightfall API you will need:

  • A Nightfall API key

  • A list of data types you wish to scan for

  • Data to scan. Note that the API interprets data as plaintext, so you may pass it in any structured or unstructured format.

To run the following API call, we will be using Python's standard json, os, and requests libraries.

import json
import os
import requests

First we define the endpoint we want to reach with our API call.

endpoint = 'https://api.nightfall.ai/v1/scan'

Next we define the headers of our API request. In this example, we have our API key set via an environment variable called "NIGHTFALL_API_KEY". Your API key should never be hard-coded directly into your script.

h = {
    'Content-Type': 'application/json',
    'x-api-key': os.getenv('NIGHTFALL_API_KEY')
}

Next we define the detectors with which we wish to scan our data. The detectors must be formatted as a list of key-value pairs of format {โ€˜nameโ€™:โ€™DETECTOR_NAMEโ€™}.

detector_list = ['US_SOCIAL_SECURITY_NUMBER', 'ICD9_CODE', 'US_DRIVERS_LICENSE_NUMBER']

detector_object = [{'name':detector} for detector in detector_list]

[{'name':'US_SOCIAL_SECURITY_NUMBER'}, 
 {'name':'ICD9_CODE'}, 
 {'name':'US_DRIVERS_LICENSE_NUMBER'}]

Next, we build the request body, which contains the detectors from above, as well as the raw data that you wish to scan. In this example, we will read it from a file called sample_data.csv.

Here we assume that the file is under the 500 KB payload limit of the Scan API. If your file is larger than the limit, consider breaking it down into smaller pieces across multiple API requests.

with open('sample_data.csv', 'r') as f:
  raw_data = f.read()

d = {
    'detectors': detector_object,
    'payload':{'items':[raw_data]}
}

import os

if os.stat('sample_data.csv').st_size < 500000:
  print('This file will fit in a single API call.')
else:
  print('This file will need to be broken into pieces across multiple calls.')

Now we are ready to call the Nightfall API to check if there is any sensitive data in our file. If there are no sensitive findings in our file, the response will be "[[]]".

response = requests.post(endpoint, headers = h, data = json.dumps(d))

if (response.status_code == 200) & (len(response.content.decode()) > 4):
  print('This file contains sensitive data.')
  print(json.loads(response.content.decode()))
elif response.status_code == 200:
  print('No sensitive data detected. Hooray!')
else:
  print(f'Something went wrong -- Response {response.status_code}.')

[[]]

[
  [
  {'fragment': '172-32-1176',
   'detector': 'US_SOCIAL_SECURITY_NUMBER',
   'confidence': {'bucket': 'LIKELY'},
   'location': {'byteRange': {'start': 122, 'end': 133},
    'unicodeRange': {'start': 122, 'end': 133}}},
  {'fragment': '514-14-8905',
   'detector': 'US_SOCIAL_SECURITY_NUMBER',
   'confidence': {'bucket': 'LIKELY'},
   'location': {'byteRange': {'start': 269, 'end': 280},
    'unicodeRange': {'start': 269, 'end': 280}}},
  {'fragment': '213-46-8915',
   'detector': 'US_SOCIAL_SECURITY_NUMBER',
   'confidence': {'bucket': 'LIKELY'},
   'location': {'byteRange': {'start': 418, 'end': 429},
    'unicodeRange': {'start': 418, 'end': 429}}}
  ]
 ]

FAQs

Using Redaction

The Nightfall API is capable of returning a redacted version of your scanned text when a Detector is triggered.

This functionality allows you to hide potentially sensitive information while retaining the original context in which that information appeared.

Specifying a RedactionConfig

In order to redact content, when you call the scan endpoint you must provide a RedactionConfig as part of the definition of your Detection Rule.

You may specify one of the following different methods to redact content:

A RedactionConfig is defined per Detector in a Detection Rule, allowing you to specify a different redaction method for each type of Detector in the rule.

By default, the redaction feature will return both the sensitive finding and the redacted version of that finding. You may set the removeFinding field to true if you want only the redacted version of the finding returned in the response.

Masking Characters

Specifying a MaskConfig as part of your RedactionConfig substitutes a character for each character in the matched text. By default the masking character is an asterisk (*). You may specify an alternate character to use instead (maskingChar).

You may also choose to only mask a portion of the original text by specifying a number of characters to leave unmasked (numCharsToLeaveUnmasked). For instance, if you want to mask all but the last 4 digits of a credit card number, set this value to 4 so that the redacted finding would be rendered as ***************4242.

In the case where you want to leave characters unmasked at the front of the string you may use the maskLeftToRight flag. This flag determines if masking is applied left to right (*****/1984) instead of right to left (01/01*****). By default, this value is false.

Below is an example of how a RedactionConfig would be configured to redact the text that triggers a DATE_OF_BIRTH Detector such that the text 01/11/1995 becomes ??/??/??95

{
  "minNumFindings":1,
  "minConfidence":"POSSIBLE",
  "detectorType":"NIGHTFALL_DETECTOR",
  "nightfallDetector":"DATE_OF_BIRTH",
  "redactionConfig":{
     "maskConfig":{
     "charsToIgnore":[
        "/"
     ],
     "maskingChar":"?",
     "maskRightToLeft":true,
     "numCharsToLeaveUnMasked":2
     }
   }
 }

Phrase Substitution

The SubstitutionConfig substitutes a sensitive finding with the value assigned to the property substitutionPhrase.

If no value is assigned to substitutionPhrase, the finding will be replaced with an empty string.

InfoType Substitution

It is possible to replace a sensitive finding with the name of the NIGHTFALL_DETECTOR that triggered it by using an InfoTypeSubstitutionConfig.

If you use the built in credit card Detector, the string 4242-4242-4242-4242 will be redacted to [CREDIT_CARD_NUMBER]

This config is only valid for Detector's with a detectorType of NIGHTFALL_DETECTOR.

Encryption

A CryptoConfig will encrypt a sensitive finding with a public key (provided as the publicKey property of the config) using RSA encryption.

Note that you are responsible for passing public keys for encryption and handling any decryption of the response payload. Nightfall will not store your keys.

Below is an example of a CryptoConfig being used to redact an EMAIL_ADDRESS detector.

{
  "minNumFindings":1,
  "minConfidence":"POSSIBLE",
  "detectorType":"NIGHTFALL_DETECTOR",
  "nightfallDetector":"EMAIL_ADDRESS",
  "displayName":"email",
  "redactionConfig":{
	 "cryptoConfig":{
		"publicKey":"-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAydYMwOYUGyBXDgHkzv19YR/dYQES4kYTMUps39qv/amNDywz4nsBDvCUqUvcN3nEpplHlYGH5ShSeA4G/FcmRqynSLVyFPZat/8E7n+EeHsgihFrr8oDWo5UBjCwRinTrC0m11q/5SeNzwVCWkf9x40u94QBz13dQoa9yPwaZBX5uBzyH86R7yeZHpad2cLq0ltpmJ3j5UfsFilkOb3JB60TNpNDdfabprot/y30CEnDDOgAXGtV1m0AhQpQjKRnkUs39DntqSbS+i0UgbyqzEGNUkeR1WsotXekW4KnbWA7k6S8SfkO27vnTSY5b9g/KKaOdysn5YaWJPfTVT/nywIDAQAB\n-----END PUBLIC KEY-----"
	 }
  }
}

Redactions in the Scan Response

The original input payload with redactions made inline are returned as a list of strings under the redactedPayload property. Each item in the list of redacted payloads corresponds to the list of strings in the original input payload and, if a Detector was triggered, it will contain a redacted version of that corresponding string.

If an item in the input payload did not have any findings, the entry for that index will be an empty string ("").

The redactedPayload property is omitted if no RedactionConfig was provided.

Additionally, the fields redactedFinding and redactedLocation are added to the finding object when the redaction feature is invoked.

The redactedFinding field contains the redacted version of only the text of the finding without its surrounding context. This is useful when you are masking a portion of the text that triggered a Detector.

The redactedLocation property will be returned as part of the finding that corresponds to an item in the payload. This may be distinct from the location property that is returned for a finding by default.

In the unlikely case where there are findings that overlap, Nightfall will default to replacing the text of the overlapping findings with [REDACTED BY NIGHTFALL].

Example Redaction Call

The following example shows how the redaction functionality may be invoked, with a variety of different redaction methods applied to the different Detectors being used.

curl --location --request POST 'https://api.nightfall.ai/v3/scan' \
--header 'x-api-key: NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--header 'Content-Type: text/plain' \
--data-raw '{
   "payload":[
      "my ssn is 123-45-5555 and date of birth is 01/11/1995 and my credit card number is  4242 4242 4242 4242 and my email is james@gmail.com.",
      "my date of birth is 03 23 4242 4242 4242 4242 amex"
   ],
   "policy":{
      "detectionRules":[
         {
            "detectors":[
               {
                  "minNumFindings":1,
                  "minConfidence":"POSSIBLE",
                  "detectorType":"NIGHTFALL_DETECTOR",
                  "nightfallDetector":"CREDIT_CARD_NUMBER",
                  "displayName":"cc",
                  "redactionConfig":{
                     "infoTypeSubstitutionConfig":{
                        
                     },
                     "removeFinding":true
                  }
               },
               {
                  "minNumFindings":1,
                  "minConfidence":"POSSIBLE",
                  "detectorType":"NIGHTFALL_DETECTOR",
                  "nightfallDetector":"US_SOCIAL_SECURITY_NUMBER",
                  "displayName":"ssn",
                  "redactionConfig":{
                     "substitutionConfig":{
                        "substitutionPhrase":"*REDACTED*"
                     }
                  }
               },
               {
                  "minNumFindings":1,
                  "minConfidence":"POSSIBLE",
                  "detectorType":"NIGHTFALL_DETECTOR",
                  "nightfallDetector":"EMAIL_ADDRESS",
                  "displayName":"email",
                  "redactionConfig":{
                     "cryptoConfig":{
                        "publicKey":"-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAydYMwOYUGyBXDgHkzv19YR/dYQES4kYTMUps39qv/amNDywz4nsBDvCUqUvcN3nEpplHlYGH5ShSeA4G/FcmRqynSLVyFPZat/8E7n+EeHsgihFrr8oDWo5UBjCwRinTrC0m11q/5SeNzwVCWkf9x40u94QBz13dQoa9yPwaZBX5uBzyH86R7yeZHpad2cLq0ltpmJ3j5UfsFilkOb3JB60TNpNDdfabprot/y30CEnDDOgAXGtV1m0AhQpQjKRnkUs39DntqSbS+i0UgbyqzEGNUkeR1WsotXekW4KnbWA7k6S8SfkO27vnTSY5b9g/KKaOdysn5YaWJPfTVT/nywIDAQAB\n-----END PUBLIC KEY-----"
                     }
                  }
               },
               {
                  "minNumFindings":1,
                  "minConfidence":"POSSIBLE",
                  "detectorType":"NIGHTFALL_DETECTOR",
                  "nightfallDetector":"DATE_OF_BIRTH",
                  "redactionConfig":{
                     "maskConfig":{
                        "charsToIgnore":[
                           "/"
                        ],
                        "maskingChar":"?",
                        "maskRightToLeft":true,
                        "numCharsToLeaveUnMasked":2
                     }
                  }
               }
            ],
            "name":"cc",
            "logicalOp":"ANY"
         }
      ]
   }
}'

You can see in the response how the RedactionConfig associated with the various Detectors affects the different findings.

Note that because the 2nd item the payload matches multiple detectors, the redacted text in the redactedPayload property becomes [REDACTED BY NIGHTFALL]

{
   "findings":[
      [
         {
            "finding":"james@gmail.com",
            "redactedFinding":"X8QL0mZGHZ+N47nPEccjsLHf2F/5cFqjF16P6wgYJhy8IaxHipHWMBRAufKR4T8FFkvTuTEanu6ZAA+V8NTkNmTLxHarcWPSVClJ8kjXAPltLuR4I2H4eeT+sWEvUP3ik/BF1KcxRpsYWDQO1bNYk+WReXkWlW72Q7rbWuTGFj2uDFCPS+DUraDh9wNBsMPELFOnh1GSQIKCp9U5GMp/kkpo/0idh83RVHXyjZPT4ReKEST2oG2lQ9UuP5LJy/mHX1VYgd8DwlETn8nkhqJ1T0mGs6kHSh22G6N0ic0PjHnj73RiMnQdPwlLw3qyPmFf6RRLKtFuzmFan8ZGtZhcKA==",
            "detector":{
               "name":"email",
               "uuid":"c0235299-0f26-4ad6-ad8c-71f83daf44e9"
            },
            "confidence":"VERY_LIKELY",
            "location":{
               "byteRange":{
                  "start":120,
                  "end":135
               },
               "codepointRange":{
                  "start":120,
                  "end":135
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "redactedLocation":{
               "byteRange":{
                  "start":120,
                  "end":135
               },
               "codepointRange":{
                  "start":120,
                  "end":135
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "cc"
            ]
         },
         {
            "finding":"01/11/1995",
            "redactedFinding":"??/??/??95",
            "detector":{
               "name":"DATE_OF_BIRTH",
               "uuid":"540856cb-99cb-42e7-b8aa-cd4f22f019d7"
            },
            "confidence":"LIKELY",
            "location":{
               "byteRange":{
                  "start":43,
                  "end":53
               },
               "codepointRange":{
                  "start":43,
                  "end":53
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "redactedLocation":{
               "byteRange":{
                  "start":43,
                  "end":53
               },
               "codepointRange":{
                  "start":43,
                  "end":53
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "cc"
            ]
         },
         {
            "finding":"",
            "redactedFinding":"[CREDIT_CARD_NUMBER]",
            "detector":{
               "name":"cc",
               "uuid":"74c1815e-c0c3-4df5-8b1e-6cf98864a454"
            },
            "confidence":"VERY_LIKELY",
            "location":{
               "byteRange":{
                  "start":84,
                  "end":103
               },
               "codepointRange":{
                  "start":84,
                  "end":103
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "redactedLocation":{
               "byteRange":{
                  "start":84,
                  "end":103
               },
               "codepointRange":{
                  "start":84,
                  "end":103
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "cc"
            ]
         },
         {
            "finding":"123-45-5555",
            "redactedFinding":"*REDACTED*",
            "detector":{
               "name":"ssn",
               "uuid":"e30d9a87-f6c7-46b9-a8f4-16547901e069"
            },
            "confidence":"VERY_LIKELY",
            "location":{
               "byteRange":{
                  "start":10,
                  "end":21
               },
               "codepointRange":{
                  "start":10,
                  "end":21
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "redactedLocation":{
               "byteRange":{
                  "start":10,
                  "end":21
               },
               "codepointRange":{
                  "start":10,
                  "end":21
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "cc"
            ]
         }
      ],
      [
         {
            "finding":"",
            "redactedFinding":"[CREDIT_CARD_NUMBER]",
            "detector":{
               "name":"cc",
               "uuid":"74c1815e-c0c3-4df5-8b1e-6cf98864a454"
            },
            "confidence":"VERY_LIKELY",
            "location":{
               "byteRange":{
                  "start":26,
                  "end":45
               },
               "codepointRange":{
                  "start":26,
                  "end":45
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "redactedLocation":{
               "byteRange":{
                  "start":26,
                  "end":45
               },
               "codepointRange":{
                  "start":26,
                  "end":45
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "cc"
            ]
         },
         {
            "finding":"03 23 4242",
            "redactedFinding":"????????42",
            "detector":{
               "name":"DATE_OF_BIRTH",
               "uuid":"540856cb-99cb-42e7-b8aa-cd4f22f019d7"
            },
            "confidence":"LIKELY",
            "location":{
               "byteRange":{
                  "start":20,
                  "end":30
               },
               "codepointRange":{
                  "start":20,
                  "end":30
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "redactedLocation":{
               "byteRange":{
                  "start":20,
                  "end":30
               },
               "codepointRange":{
                  "start":20,
                  "end":30
               },
               "rowRange":null,
               "columnRange":null,
               "commitHash":""
            },
            "matchedDetectionRuleUUIDs":[
               
            ],
            "matchedDetectionRules":[
               "cc"
            ]
         }
      ]
   ],
   "redactedPayload":[
      "my ssn is *REDACTED* and date of birth is ??/??/??95 and my credit card number is  [CREDIT_CARD_NUMBER] and my email is X8QL0mZGHZ+N47nPEccjsLHf2F/5cFqjF16P6wgYJhy8IaxHipHWMBRAufKR4T8FFkvTuTEanu6ZAA+V8NTkNmTLxHarcWPSVClJ8kjXAPltLuR4I2H4eeT+sWEvUP3ik/BF1KcxRpsYWDQO1bNYk+WReXkWlW72Q7rbWuTGFj2uDFCPS+DUraDh9wNBsMPELFOnh1GSQIKCp9U5GMp/kkpo/0idh83RVHXyjZPT4ReKEST2oG2lQ9UuP5LJy/mHX1VYgd8DwlETn8nkhqJ1T0mGs6kHSh22G6N0ic0PjHnj73RiMnQdPwlLw3qyPmFf6RRLKtFuzmFan8ZGtZhcKA==.",
      "my date of birth is [REDACTED BY NIGHTFALL] amex"
   ]
}

How quickly can I get started with Firewall for AI?

Policy User Scope Update API

Detecting Secrets

Leaked secrets, such as credentials needed to authenticate and authorize a cloud providerโ€™s API request, expose company software, services, infrastructure, and data to hackers.

Nightfall has developed technology to detect secrets and label findings to speed SecOPs workflows from being clogged and eliminate false positive alerts.

Overall Coverage

Nightfall uses machine learning models trained on a large (millions of lines of code) diverse dataset (including all programming languages and application types) to ensure best-in-class secret detection accuracy and coverage.

Explicit Labeling and Endpoint Validation for Popular Services

For a growing set of the most popular services, Nightfall will:

  • label detected secrets by vendor and service type (returned the kind field of the response)

  • label detected secrets as active risks by validating supported credential types with their associated service endpoints (returned as the status of the service)

Our current solution supports the following vendors covering a diverse set of use cases, including cloud storage/infrastructure, communication, social networks, software development, banking, observability, and payment processing.

Key Detection Example

Below is an example of how an AWS Key would be shown in a finding.

The following values are returned for the status field:

  • ACTIVE

  • EXPIRED

  • UNVERIFIED

This value will be based on what information is returned by the corresponding service when attempting the validate the key. If no data is returned fro the service, it will be considered UNVERIFIED.

Using Pre-Configured Detection Rules

In this example, we'll walk through making a request to the scan endpoint.

The endpoint inspects the data you provide via the request body and reports any detected occurrences of the sensitive data types you are searching for.

Please refer to the API reference of the scan endpoint for more detailed information on the request and response schemas.

In this sample request, we provide two main fields:

  1. a policy and its detection rules that we want to use when scanning the text payload

  2. a list of text strings to scan

The aggregate length of all strings in payload list must not exceed 500 KB, and the number of items in the payload may not exceed 50,000.

Executing the curl request will yield a response as follows.

The API call returns a list, where the item at each index is a sublist of matches for the provided detector types.

The indices of the response list correspond directly to the indices of the list provided in the request payload.

In this example, the first item in the response list contains a finding because one credit card number was detected in the first string we provided. The second item in the response list is an empty list because there is no sensitive data in the second input string we provided. The third item in the returned list contains multiple findings as a result of multiple Detectors within the Detection Rule being triggered.

Creating an Inline Detection Rule

  • the UUID of an a Detector defined through the UI

  • a Regular Expression

  • a Word List.

Built In Detectors

In the example below two of Nightfall's native Detectors (detectorType = "NIGHTFALL_DETECTOR") are being used:

  • US_SOCIAL_SECURITY_NUMBER

  • CREDIT_CARD_NUMBER.

In the payload body, you can see that we are submitting a list of three different strings to scan (payload). The first will trigger the U.S. Social Security Detector. The last will trigger the credit card Detector. The middle example will trigger neither.

Below is the response payload to the previous request.

Regular Expression Example

The following example shows a Detection Rule composed of two Detectors defined using regular expressions โ€“ one for the format of an International Standard Recording Code (ISRC) and one for the format of an International Standard Musical Work Code (ISWC) โ€“ matching either of which will trigger the Detection Rule (by using the logicalOp โ€œAnyโ€).

We will provide a payload of two strings, one of which will match the ISRC and one of which will match the ISWC.

The returned response demonstrates how findings are returned, with a finding per payload entry and the Detection Rule and Detector that matched the payload, if any.

The byte range that triggered the match is also provided. In the case of the 2nd item in the payload, since the match occurred at the beginning of the string, it has a location where the byteRange start is 0. In the case of the 3rd payload entry the location offset is 31.

Word List Example

The following example shows how a word list may be used instead of a regular expression.

Below is the resulting payload with the findings detected in our different payload strings.

Note that since the isCaseSensitive flag is set to "false" for the detector, so the first string in our payload matches a word from our word list.

Also note that the confidence level for a word list match defaults to "LIKELY" so you should not set a minConfidence level higher than that if you want matches to result.

Free

5

15

Enterprise

10,

50,

Nightfall Kinesis Flow
AWS IAM Role Confirmation Screen
Amazon Kinesis Stream Creation Screen
Amazon Lambda Function Creation Screen
Nightfall Lambda Source Code View
AWS Lambda Kinesis Trigger Creation Screen
DynamoDB Table Creation
DynamoDB Record of Sensitive Data from Nightfall

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

Before we get started, let's set our Nightfall API key as an environment variable and install our dependencies for our code samples in Python. If you don't have a Nightfall API key, generate one on your Nightfall . If you don't have a Nightfall account, for a free Nightfall Developer Platform account.

Default case (โ€œmy email is โ€ โ†’ โ€œmy email is .โ€)

Case with custom word=โ€[REDACTED BY NIGHTFALL]โ€ (โ€œmy email is โ€ โ†’ โ€œmy email is [REDACTED BY NIGHTFALL].โ€)

Substitute with InfoType โ€œmy email is โ€ โ†’ โ€œmy email is [EMAIL].โ€

To scan an Amazon database instance (i.e. mySQL, Postgres) you must create a snapshot of that instance and .

To perform this scan, you will need to to which you will export a snapshot.

For more information please see Amazon's documentation on identifying an .

For more information see and

When parquet files are analyzed, as with other , not only will the the location of the finding be shown within a given byte range, but also column and row data as well.

For more information please see the Amazon documentation

LLMs like ChatGPT and Claude can inadvertently receive sensitive information from user inputs, posing significant privacy concerns (). Without content filtering, these AI platforms can process and retain confidential data such as health records, financial details, and personal identifying information.

If you're not using LangChain, check our and tutorials.

Next, we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

When a customer signs up for the developer platform, Nightfall automatically generates a unique for them.

You can test your webhook with a tool such as which allows you expose a web server running on your local machine to the internet.

See the section on for details about the json payloads for the different messages sent to webhook servers.

You can read more about or about our in the linked reference guides.

apply (e.g. asterisks)

substitute a

substitute the triggered (referred to as "InfoType substitution")

use

The results of applying redactions are returned in the response payload for requests made to the as both part of an array named redactedPayload as well as additional properties of the finding object.

You can start scanning for sensitive data in just a few minutes. Our developer-friendly API and comprehensive documentation make it easy to integrate Firewall for AI into your application. Follow our Quickstart guide at for step-by-step instructions on setting up the API, configuring detectors, and making your first API call.

This list is not static and will continue to grow as we add support for detecting API keys from additional services. If you want to detect API keys from a service not listed below, please .

To use this functionality, you use our existing built-in API_KEY detector to scan a data source such as . Below is an example using a detection rule defined in line for a text scan.

In the example below we will use a Detection Rule that has been configured in the by supplying its UUID.

Alternatively you may define your policy in code by using a built in Nightfall detector from the as follows:

See for more information about how policies and detection rules may be defined through code.

You can read further about the fields in the response object in the .

In addition to using , you may define Detection Rules within the body of your scan method by either supplying:

the identifier of one of Nightfall's

Out of the box, Nightfall comes with an of native detectors.

When defining a Detection Rule in line, you configure the minimum (minConfidence) and minimum number of times the match must be found (minNumFindings) for the rule to be triggered. .

For more information on the parameters related to redaction, see .

more by request
more by request
LangChain Prompt Sanitization
pre-made in the Nightfall web app
here
Dashboard
sign up
[email protected].
[email protected]
[email protected]
export the snapshot to S3
configure an Amazon S3 bucket
Amazon S3 bucket for export
Identity and Access Management in Amazon RDS
Providing access to an Amazon S3 bucket using an IAM role
tabular data
Exporting DB snapshot data to Amazon S3
OWASP LLM06
OpenAI
Claude
pre-made in the Nightfall web app
here
pre-made in the Nightfall web app
here
siging secret
ngrok
obtaining a Nightfall API key
available data detectors
What Can I do with the Firewall for AI
How quickly can I get started with Firewall for AI?
What types of data can I scan with API?
What types of detectors are supported out of the box?
Can I customize or bring my own detectors?
What is the pricing model?
How do I know my data is secure?
How do I get in touch with you?
Can I test out the detection and my own detection rules before writing any code?
How does Nightfall support custom data types?
How does Nightfall's Firewall for AI differs from other solutions?
scan endpoint
Alerting
masking
custom phrase
name of the Detector
encryption
  • AWS

  • Azure

  • Confluence

  • Confluent

  • Datadog

  • ElasticSearch

  • Facebook

  • GCP

  • Google API

  • GitHub

  • GitLab

  • JIRA

  • JWT

  • Nightfall

  • Notion

  • Okta

  • Paypal

  • Plaid

  • Postmark

  • Postman

  • RapidAPI

  • Salesforce

  • Sendgrid

  • Slack

  • Snyk

  • Splunk

  • Square

  • Stripe

  • Twitter

  • Twilio

  • Zapier


{
  "finding": "zImaKNJJ8u/seIbm1UszokVz3SSARukJs6cghEBXD",
  "detector": {
    "name": "API key",
    "uuid": "0e95732f-bc5c-448f-9d15-bd1417177360"
  },
  "confidence": "VERY_LIKELY",
  ...
  "findingMetadata": {
    "apiKeyMetadata": {
      "status": "ACTIVE",
      "kind": "AWS",
      "description": "Access Key ID: AKIA52FSMBPZS1JIDTPX"
    }
  }
}
curl --request POST \
     --url https://api.nightfall.ai/v3/scan \
     --header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
     --header 'Content-Type: application/json' \
     --data '{
       "policy": {
            "detectionRules": [
                 {
                      "detectors": [
                           {
                                "detectorType": "NIGHTFALL_DETECTOR",
                                "nightfallDetector": "API_KEY",
                                "minNumFindings": 1,
                                "minConfidence": "LIKELY",
                                "displayName": "API Key"
                           }
                      ],
                      "name": "My Match Rule",
                      "logicalOp": "ANY"
                 }
            ]
       },
       "payload": [
            "Is this an active nightfall key? NF-OZ6F9fzF2z5mRxMrUdfL8FddFS51kPzE"
       ]
     }'
curl --request POST \
     --url https://api.nightfall.ai/v3/scan \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
     --header 'Content-Type: application/json' \
     --data '
{
     "policy": {
          "detectionRuleUUIDs": [
               "950833c9-8608-4c66-8a3a-0734eac11157"
          ]
     },
     "payload": [
          "4916-6734-7572-5015 is my credit card number",
          "This string does not have any sensitive data",
          "my api key is yr+ZWwIZp6ifFgaHV8410b2BxbRt5QiAj1EZx1qj and my ๐Ÿ’ณ credit card number ๐Ÿ’ฐ is 30204861594838"
     ]
}
'
curl --request POST \
     --url https://api.nightfall.ai/v3/scan \
     --header 'accept: application/json' \
     --header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
     --header 'content-type: application/json' \
     --data '
{
     "policy": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "nightfallDetector": "CREDIT_CARD_NUMBER",
                              "detectorType": "NIGHTFALL_DETECTOR",
                              "minConfidence": "POSSIBLE",
                              "minNumFindings": 1
                         }
                    ],
                    "logicalOp": "ALL"
               }
          ]
     },
     "payload": [
          "4916-6734-7572-5015 is my credit card number",
          "This string does not have any sensitive data",
          "my api key is yr+ZWwIZp6ifFgaHV8410b2BxbRt5QiAj1EZx1qj and my ๐Ÿ’ณ credit card number ๐Ÿ’ฐ is 30204861594838"
     ]
}
'
{
    "findings": [
        [
            {
                "finding": "4916-6734-7572-5015",
                "detector": {
                    "name": "Credit card number",
                    "uuid": "74c1815e-c0c3-4df5-8b1e-6cf98864a454"
                },
                "confidence": "VERY_LIKELY",
                "location": {
                    "byteRange": {
                        "start": 0,
                        "end": 19
                    },
                    "codepointRange": {
                        "start": 0,
                        "end": 19
                    }
                },
                "matchedDetectionRuleUUIDs": [
                    "950833c9-8608-4c66-8a3a-0734eac11157"
                ],
                "matchedDetectionRules": []
            }
        ],
        [],
        [
            {
                "finding": "30204861594838",
                "detector": {
                    "name": "Phone number",
                    "uuid": "d08edfc4-b5e2-420a-a5fe-3693fb6276c4"
                },
                "confidence": "LIKELY",
                "location": {
                    "byteRange": {
                        "start": 94,
                        "end": 108
                    },
                    "codepointRange": {
                        "start": 88,
                        "end": 102
                    }
                },
                "matchedDetectionRuleUUIDs": [
                    "950833c9-8608-4c66-8a3a-0734eac11157"
                ],
                "matchedDetectionRules": []
            },
            {
                "finding": "30204861594838",
                "detector": {
                    "name": "Credit card number",
                    "uuid": "74c1815e-c0c3-4df5-8b1e-6cf98864a454"
                },
                "confidence": "LIKELY",
                "location": {
                    "byteRange": {
                        "start": 94,
                        "end": 108
                    },
                    "codepointRange": {
                        "start": 88,
                        "end": 102
                    }
                },
                "matchedDetectionRuleUUIDs": [
                    "950833c9-8608-4c66-8a3a-0734eac11157"
                ],
                "matchedDetectionRules": []
            }
        ]
    ]
}
 
      "location": {
        "byteRange": {
          "start": 94,
          "end": 108
        },
        "codepointRange": {
          "start": 88,
          "end": 102
        }
      },
      "matchedDetectionRuleUUIDs": [
        "950833c9-8608-4c66-8a3a-0734eac11157"
      ],
      "matchedDetectionRules": []
    },
    {
      "finding": "30204861594838",
      "detector": {
        "name": "Credit card number",
        "uuid": "74c1815e-c0c3-4df5-8b1e-6cf98864a454"
      },
      "confidence": "LIKELY",
      "location": {
        "byteRange": {
          "start": 94,
          "end": 108
        },
        "codepointRange": {
          "start": 88,
          "end": 102
        }
      },
      "matchedDetectionRuleUUIDs": [
        "950833c9-8608-4c66-8a3a-0734eac11157"
      ],
      "matchedDetectionRules": []
    }
  ]
]
curl --request POST \
     --url https://api.nightfall.ai/v3/scan \
     --header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
     --header 'Content-Type: application/json' \
     --data '{
       "policy": {
            "detectionRules": [
                 {
                      "detectors": [
                           {
                                "detectorType": "NIGHTFALL_DETECTOR",
                                "nightfallDetector": "US_SOCIAL_SECURITY_NUMBER",
                                "minNumFindings": 1,
                                "minConfidence": "LIKELY",
                                "displayName": "US Social Security Number"
                           },
                           {
                                "detectorType": "NIGHTFALL_DETECTOR",
                                "nightfallDetector": "CREDIT_CARD_NUMBER",
                                "minNumFindings": 1,
                                "minConfidence": "LIKELY",
                                "displayName": "Credit Card Number",
                                "redactionConfig": {
                                    "maskConfig": {
                                        "maskingChar": "๐Ÿ‘€",
                                        "charsToIgnore": ["-"]
                                    }
                                }
                           }
                      ],
                      "name": "My Match Rule",
                      "logicalOp": "ANY"
                 }
            ]
       },
       "payload": [
            "The customer social security number is 458-02-6124",
            "No PII in this string",
            "My credit card number is 5310-2768-6832-9293"
       ]
     }'
{
  "findings": [
    [
      {
        "finding": "458-02-6124",
        "detector": {
          "name": "US Social Security Number",
          "uuid": "e30d9a87-f6c7-46b9-a8f4-16547901e069"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 39,
            "end": 50
          },
          "codepointRange": {
            "start": 39,
            "end": 50
          }
        },
        "matchedDetectionRuleUUIDs": [],
        "matchedDetectionRules": [
          "My Match Rule"
        ]
      }
    ],
    [],
    [
      {
        "finding": "5310-2768-6832-9293",
       "redactedFinding": "๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€-๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€-๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€-๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€",
        "detector": {
          "name": "Credit Card Number",
          "uuid": "74c1815e-c0c3-4df5-8b1e-6cf98864a454"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 25,
            "end": 44
          },
          "codepointRange": {
            "start": 25,
            "end": 44
          }
        },
        "redactedLocation": {
          "byteRange": {
            "start": 25,
            "end": 44
          },
          "codepointRange": {
            "start": 25,
            "end": 44
          }
        },
        "matchedDetectionRuleUUIDs": [],
        "matchedDetectionRules": [
          "My Match Rule"
        ]
      }
    ]
  ],
  "redactedPayload": [
    "",
    "",
    "My credit card number is ๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€-๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€-๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€-๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€๐Ÿ‘€"
  ]
}
curl --location --request POST 'https://api.nightfall.ai/v3/scan' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--header 'Content-Type: application/json' \
--data-raw '{
     "config": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "regex": {
                                   "isCaseSensitive": false,
                                   "pattern": "[A-Z]{2}-?\\w{3}-?\\d{2}-?\\d{5}"
                              },
                              "minNumFindings": 1,
                              "minConfidence": "POSSIBLE",
                              "detectorType": "REGEX",
                              "displayName": "ISRC Code Detector"
                         },
                         {
                              "regex": {
                                   "isCaseSensitive": false,
                                   "pattern": "T-[0-9]{3}\\.[0-9]{3}\\.[0-9]{3}-[0-9]"
                              },
                              "minNumFindings": 1,
                              "minConfidence": "POSSIBLE",
                              "detectorType": "REGEX",
                              "displayName": "ISWC Code Detector"
                         }                         
                    ],
                    "name": "ISRC and ISWC Code Detection Rule",
                    "logicalOp": "ANY"
               }
          ]
     },
     "payload": [
          "Non Matching Payload",
          "US-S1Z-99-00001 is an example ISRC Code: ",
          "The ISWC for Symphony No. 9 is T-905.029.737-5"
     ]
}
'
{
    "findings": [
        [],
        [
            {
                "finding": "US-S1Z-99-00001",
                "detector": {
                    "name": "ISRC Code Detector",
                    "uuid": "d8be87c9-4b44-41fd-b78c-8d638fe56069"
                },
                "confidence": "LIKELY",
                "location": {
                    "byteRange": {
                        "start": 0,
                        "end": 15
                    },
                    "codepointRange": {
                        "start": 0,
                        "end": 15
                    }
                },
                "matchedDetectionRuleUUIDs": [],
                "matchedDetectionRules": [
                    "ISRC and ISWC Code Detection Rule"
                ]
            }
        ],
        [
            {
                "finding": "T-905.029.737-5",
                "detector": {
                    "name": "ISWC Code Detector",
                    "uuid": "faf4c830-f2ac-4934-bf9c-ff20f5a6f420"
                },
                "confidence": "LIKELY",
                "location": {
                    "byteRange": {
                        "start": 31,
                        "end": 46
                    },
                    "codepointRange": {
                        "start": 31,
                        "end": 46
                    }
                },
                "matchedDetectionRuleUUIDs": [],
                "matchedDetectionRules": [
                    "ISRC and ISWC Code Detection Rule"
                ]
            }
        ]
    ]
}
curl --location --request POST 'https://api.nightfall.ai/v3/scan' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--header 'x-api-key: NF-rEpLaCeM3w1ThYoUrNiGhTfAlLKeY123' \
--data-raw '{
     "config": {
          "detectionRules": [
               {
                    "detectors": [
                         {
                              "wordList": {
                                   "values": [
                                        "cat",
                                        "dog",
                                        "rat"
                                   ],
                                   "isCaseSensitive": false
                              },
                              "minNumFindings": 1,
                              "minConfidence": "POSSIBLE",
                              "displayName": "animals",
                              "detectorType": "WORD_LIST"
                         }
                    ],
                    "name": "WordListExamples",
                    "logicalOp": "ANY"
               }
          ]
     },
     "payload": [
          "THE CAT SAT ON THE MAT",
          "The dog and the rat are on the west bank of the river",
          "No one here but use chickens"
     ]
}'
{
    "findings": [
        [
            {
                "finding": "cat",
                "detector": {
                    "name": "animals",
                    "uuid": "c033e224-034a-417f-9c0d-0c8d13f462bb"
                },
                "confidence": "LIKELY",
                "location": {
                    "byteRange": {
                        "start": 4,
                        "end": 7
                    },
                    "codepointRange": {
                        "start": 4,
                        "end": 7
                    }
                },
                "matchedDetectionRuleUUIDs": [],
                "matchedDetectionRules": [
                    "WordListExamples"
                ]
            }
        ],
        [
            {
                "finding": "dog",
                "detector": {
                    "name": "animals",
                    "uuid": "c033e224-034a-417f-9c0d-0c8d13f462bb"
                },
                "confidence": "LIKELY",
                "location": {
                    "byteRange": {
                        "start": 4,
                        "end": 7
                    },
                    "codepointRange": {
                        "start": 4,
                        "end": 7
                    }
                },
                "matchedDetectionRuleUUIDs": [],
                "matchedDetectionRules": [
                    "WordListExamples"
                ]
            },
            {
                "finding": "rat",
                "detector": {
                    "name": "animals",
                    "uuid": "c033e224-034a-417f-9c0d-0c8d13f462bb"
                },
                "confidence": "LIKELY",
                "location": {
                    "byteRange": {
                        "start": 16,
                        "end": 19
                    },
                    "codepointRange": {
                        "start": 16,
                        "end": 19
                    }
                },
                "matchedDetectionRuleUUIDs": [],
                "matchedDetectionRules": [
                    "WordListExamples"
                ]
            }
        ],
        []
    ],
    "redactedPayload": [
        "",
        "",
        ""
    ]
}
Nightfall APIs

Can I customize or bring my own detectors?

Absolutely! In addition to the pre-built detectors, Firewall for AI allows you to create custom detectors tailored to your specific requirements. You can either fine-tune one of our pre-configured detection rules or build your own detector from scratch using our intuitive API. Nightfall supports many traditional detector types such as regular expressions, exact data matching, and word list/dictionaries. Check out our dedicated guide on creating custom detectors for more information.

Fetch violations

get

Fetch a list of violations for a period

Authorizations
Query parameters
createdAfterintegerOptional

Unix timestamp in seconds, filters records created โ‰ฅ the value, defaults to -90 days UTC

createdBeforeintegerOptional

Unix timestamp in seconds, filters records created < the value, defaults to end of the current day UTC

updatedAfterintegerOptional

Unix timestamp in seconds, filters records updated > the value

limitinteger ยท max: 100Optional

The maximum number of records to be returned in the response

Default: 50
pageTokenstringOptional

Cursor for getting the next page of results

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /dlp/v1/violations HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "violations": [
    {
      "id": "text",
      "integration": "SLACK",
      "createdAt": 1,
      "updatedAt": 1,
      "possibleActions": [
        "ACKNOWLEDGE"
      ],
      "state": "ACTIVE",
      "resourceLink": "text",
      "metadata": {
        "slackMetadata": {
          "location": "text",
          "locationType": "text",
          "username": "text",
          "userID": "text",
          "messagePermalink": "text",
          "locationMembers": [
            "text"
          ],
          "locationMemberCount": 1,
          "channelID": "text",
          "workspaceName": "text"
        },
        "confluenceMetadata": {
          "itemName": "text",
          "itemType": "text",
          "isArchived": true,
          "createdAt": 1,
          "updatedAt": 1,
          "labels": [
            "text"
          ],
          "spaceName": "text",
          "spaceKey": "text",
          "spaceNameLink": "text",
          "parentPageName": "text",
          "authorName": "text",
          "authorEmail": "text",
          "authorNameLink": "text",
          "permalink": "text",
          "confluenceID": "text",
          "confluenceUserID": "text",
          "itemVersion": 1,
          "parentPageID": "text",
          "parentVersion": 1
        },
        "gdriveMetadata": {
          "fileID": "text",
          "fileName": "text",
          "fileType": "text",
          "fileSize": "text",
          "fileLink": "text",
          "permissionSetting": "text",
          "sharingExternalUsers": [
            "text"
          ],
          "sharingInternalUsers": [
            "text"
          ],
          "canViewersDownload": true,
          "fileOwner": "text",
          "isInTrash": true,
          "createdAt": 1,
          "updatedAt": 1,
          "drive": "text",
          "updatedBy": "text"
        },
        "jiraMetadata": {
          "projectName": "text",
          "ticketNumber": "text",
          "projectType": "text",
          "issueID": "text",
          "projectLink": "text",
          "ticketLink": "text",
          "commentLink": "text",
          "attachmentLink": "text"
        },
        "githubMetadata": {
          "branchName": "text",
          "organization": "text",
          "repository": "text",
          "authorEmail": "text",
          "authorUsername": "text",
          "createdAt": 1,
          "isRepoPrivate": true,
          "filePath": "text",
          "githubPermalink": "text",
          "repositoryOwner": "text",
          "githubRepoLink": "text"
        },
        "salesforceMetadata": {
          "orgName": "text",
          "recordID": "text",
          "objectName": "text",
          "contentType": "text",
          "userID": "text",
          "userName": "text",
          "updatedAt": 1,
          "fields": [
            "text"
          ],
          "fileType": "text",
          "attachmentLink": "text",
          "attachmentName": "text",
          "objectLink": "text"
        },
        "zendeskMetadata": {
          "ticketStatus": "text",
          "ticketTitle": "text",
          "ticketRequestor": "text",
          "ticketGroupAssignee": "text",
          "ticketAgentAssignee": "text",
          "currentUserRole": "text",
          "ticketID": 1,
          "ticketFollowers": [
            "text"
          ],
          "ticketTags": "text",
          "createdAt": 1,
          "UpdatedAt": 1,
          "location": "text",
          "subLocation": "text",
          "ticketCommentID": 1,
          "ticketGroupID": 1,
          "ticketGroupLink": "text",
          "ticketAgentID": 1,
          "ticketAgentLink": "text",
          "ticketEvent": "text",
          "userRole": "text",
          "attachmentName": "text",
          "attachmentLink": "text"
        },
        "notionMetadata": {
          "createdBy": "text",
          "updatedBy": "text",
          "workspaceName": "text",
          "workspaceLink": "text",
          "pageID": "text",
          "pageTitle": "text",
          "createdAt": 1,
          "updatedAt": 1,
          "privatePageLink": "text",
          "publicPageLink": "text",
          "sharedExternally": true,
          "attachmentID": "text"
        },
        "browserMetadata": {
          "location": "text",
          "subLocation": "text",
          "browserName": "text",
          "userComment": "text"
        },
        "m365TeamsMetadata": {
          "teamName": "text",
          "tenantID": "text",
          "tenantDomain": "text",
          "teamID": "text",
          "teamVisibility": "text",
          "teamWebURL": "text",
          "channelID": "text",
          "channelName": "text",
          "channelType": "text",
          "channelWebURL": "text",
          "messageID": "text",
          "createdAt": 1,
          "updatedAt": 1,
          "chatMessageSender": "text",
          "userID": "text",
          "userPrincipalName": "text",
          "attachments": [
            {
              "attachmentID": "text",
              "attachmentName": "text",
              "attachmentURL": "text"
            }
          ],
          "chatMessageImportance": "text",
          "chatID": "text",
          "chatType": "text",
          "chatTopic": "text",
          "chatParticipants": [
            {
              "userID": "text",
              "email": "text",
              "displayName": "text"
            }
          ]
        },
        "m365OnedriveMetadata": {
          "tenantID": "text",
          "tenantDomain": "text",
          "driveItemID": "text",
          "driveItemName": "text",
          "driveItemURL": "text",
          "driveItemMimeType": "text",
          "driveItemSize": 1,
          "parentPath": "text",
          "createdByID": "text",
          "updatedByEmail": "text",
          "updatedByID": "text",
          "updatedByName": "text",
          "createdAt": 1,
          "updatedAt": 1,
          "specialFolderName": "text",
          "driveID": "text",
          "driveOwnerName": "text",
          "driveOwnerEmail": "text",
          "driveOwnerID": "text"
        },
        "inlineEmailMetadata": {
          "domain": "text",
          "user_name": "text",
          "from": "text",
          "to": [
            "text"
          ],
          "cc": [
            "text"
          ],
          "bcc": [
            "text"
          ],
          "subject": "text",
          "sent_at": 1,
          "thread_id": "text",
          "attachment_name": "text",
          "attachment_type": "text"
        }
      },
      "fileDetails": {
        "fileName": "text",
        "mimeType": "text",
        "permalink": "text"
      },
      "policyUUIDs": [
        "text"
      ],
      "detectionRuleUUIDs": [
        "text"
      ],
      "detectorUUIDs": [
        "text"
      ],
      "risk": "UNSPECIFIED",
      "riskSource": "NIGHTFALL",
      "riskScore": 1,
      "userInfo": {
        "username": "text",
        "userEmail": "text"
      }
    }
  ],
  "nextPageToken": "text"
}

Fetch violation

get

Fetch a violation by ID

Authorizations
Path parameters
violationIdstring ยท uuidRequired

The UUID of the violation to fetch

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
404
Violation does not exist
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /dlp/v1/violations/{violationId} HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "id": "text",
  "integration": "SLACK",
  "createdAt": 1,
  "updatedAt": 1,
  "possibleActions": [
    "ACKNOWLEDGE"
  ],
  "state": "ACTIVE",
  "resourceLink": "text",
  "metadata": {
    "slackMetadata": {
      "location": "text",
      "locationType": "text",
      "username": "text",
      "userID": "text",
      "messagePermalink": "text",
      "locationMembers": [
        "text"
      ],
      "locationMemberCount": 1,
      "channelID": "text",
      "workspaceName": "text"
    },
    "confluenceMetadata": {
      "itemName": "text",
      "itemType": "text",
      "isArchived": true,
      "createdAt": 1,
      "updatedAt": 1,
      "labels": [
        "text"
      ],
      "spaceName": "text",
      "spaceKey": "text",
      "spaceNameLink": "text",
      "parentPageName": "text",
      "authorName": "text",
      "authorEmail": "text",
      "authorNameLink": "text",
      "permalink": "text",
      "confluenceID": "text",
      "confluenceUserID": "text",
      "itemVersion": 1,
      "parentPageID": "text",
      "parentVersion": 1
    },
    "gdriveMetadata": {
      "fileID": "text",
      "fileName": "text",
      "fileType": "text",
      "fileSize": "text",
      "fileLink": "text",
      "permissionSetting": "text",
      "sharingExternalUsers": [
        "text"
      ],
      "sharingInternalUsers": [
        "text"
      ],
      "canViewersDownload": true,
      "fileOwner": "text",
      "isInTrash": true,
      "createdAt": 1,
      "updatedAt": 1,
      "drive": "text",
      "updatedBy": "text"
    },
    "jiraMetadata": {
      "projectName": "text",
      "ticketNumber": "text",
      "projectType": "text",
      "issueID": "text",
      "projectLink": "text",
      "ticketLink": "text",
      "commentLink": "text",
      "attachmentLink": "text"
    },
    "githubMetadata": {
      "branchName": "text",
      "organization": "text",
      "repository": "text",
      "authorEmail": "text",
      "authorUsername": "text",
      "createdAt": 1,
      "isRepoPrivate": true,
      "filePath": "text",
      "githubPermalink": "text",
      "repositoryOwner": "text",
      "githubRepoLink": "text"
    },
    "salesforceMetadata": {
      "orgName": "text",
      "recordID": "text",
      "objectName": "text",
      "contentType": "text",
      "userID": "text",
      "userName": "text",
      "updatedAt": 1,
      "fields": [
        "text"
      ],
      "fileType": "text",
      "attachmentLink": "text",
      "attachmentName": "text",
      "objectLink": "text"
    },
    "zendeskMetadata": {
      "ticketStatus": "text",
      "ticketTitle": "text",
      "ticketRequestor": "text",
      "ticketGroupAssignee": "text",
      "ticketAgentAssignee": "text",
      "currentUserRole": "text",
      "ticketID": 1,
      "ticketFollowers": [
        "text"
      ],
      "ticketTags": "text",
      "createdAt": 1,
      "UpdatedAt": 1,
      "location": "text",
      "subLocation": "text",
      "ticketCommentID": 1,
      "ticketGroupID": 1,
      "ticketGroupLink": "text",
      "ticketAgentID": 1,
      "ticketAgentLink": "text",
      "ticketEvent": "text",
      "userRole": "text",
      "attachmentName": "text",
      "attachmentLink": "text"
    },
    "notionMetadata": {
      "createdBy": "text",
      "updatedBy": "text",
      "workspaceName": "text",
      "workspaceLink": "text",
      "pageID": "text",
      "pageTitle": "text",
      "createdAt": 1,
      "updatedAt": 1,
      "privatePageLink": "text",
      "publicPageLink": "text",
      "sharedExternally": true,
      "attachmentID": "text"
    },
    "browserMetadata": {
      "location": "text",
      "subLocation": "text",
      "browserName": "text",
      "userComment": "text"
    },
    "m365TeamsMetadata": {
      "teamName": "text",
      "tenantID": "text",
      "tenantDomain": "text",
      "teamID": "text",
      "teamVisibility": "text",
      "teamWebURL": "text",
      "channelID": "text",
      "channelName": "text",
      "channelType": "text",
      "channelWebURL": "text",
      "messageID": "text",
      "createdAt": 1,
      "updatedAt": 1,
      "chatMessageSender": "text",
      "userID": "text",
      "userPrincipalName": "text",
      "attachments": [
        {
          "attachmentID": "text",
          "attachmentName": "text",
          "attachmentURL": "text"
        }
      ],
      "chatMessageImportance": "text",
      "chatID": "text",
      "chatType": "text",
      "chatTopic": "text",
      "chatParticipants": [
        {
          "userID": "text",
          "email": "text",
          "displayName": "text"
        }
      ]
    },
    "m365OnedriveMetadata": {
      "tenantID": "text",
      "tenantDomain": "text",
      "driveItemID": "text",
      "driveItemName": "text",
      "driveItemURL": "text",
      "driveItemMimeType": "text",
      "driveItemSize": 1,
      "parentPath": "text",
      "createdByID": "text",
      "updatedByEmail": "text",
      "updatedByID": "text",
      "updatedByName": "text",
      "createdAt": 1,
      "updatedAt": 1,
      "specialFolderName": "text",
      "driveID": "text",
      "driveOwnerName": "text",
      "driveOwnerEmail": "text",
      "driveOwnerID": "text"
    },
    "inlineEmailMetadata": {
      "domain": "text",
      "user_name": "text",
      "from": "text",
      "to": [
        "text"
      ],
      "cc": [
        "text"
      ],
      "bcc": [
        "text"
      ],
      "subject": "text",
      "sent_at": 1,
      "thread_id": "text",
      "attachment_name": "text",
      "attachment_type": "text"
    }
  },
  "fileDetails": {
    "fileName": "text",
    "mimeType": "text",
    "permalink": "text"
  },
  "policyUUIDs": [
    "text"
  ],
  "detectionRuleUUIDs": [
    "text"
  ],
  "detectorUUIDs": [
    "text"
  ],
  "risk": "UNSPECIFIED",
  "riskSource": "NIGHTFALL",
  "riskScore": 1,
  "userInfo": {
    "username": "text",
    "userEmail": "text"
  }
}

Search violations

get

Fetch a list of violations based on some filters

Authorizations
Query parameters
createdAfterintegerOptional

Unix timestamp in seconds, filters records created โ‰ฅ the value, defaults to -90 days UTC

createdBeforeintegerOptional

Unix timestamp in seconds, filters records created < the value, defaults to end of the current day UTC

updatedAfterintegerOptional

Unix timestamp in seconds, filters records updated > the value

limitinteger ยท max: 100Optional

The maximum number of records to be returned in the response

Default: 50
pageTokenstringOptional

Cursor for getting the next page of results

sortstring ยท enumOptional

Sort key and direction, defaults to descending order by creation time

Default: TIME_DESCPossible values:
querystringRequired

The query containing filter clauses

Search query language

Query structure and terminology

A query clause consists of a field followed by an operator followed by a value:

term value
clause user_email:"amy@rocketrides.io"
field user_email
operator :
value amy@rocketrides.io

You can combine multiple query clauses in a search by separating them with a space.

Field types, substring matching, and numeric comparators

Every search field supports exact matching with a :. Certain fields such as user_email and user_name support substring matching.

Quotes

You may use quotation marks around string values. Quotation marks are required in case the value contains spaces. For example:

  • user_mail:john@example.com
  • user_name:"John Doe"

Special Characters

+ - && || ! ( ) { } [ ] ^ " ~ * ? : are special characters need to be escaped using \. For example:

  • a value like (1+1):2 should be searched for using \(1\+1)\:2

Search Syntax

The following table lists the syntax that you can use to construct a query.

SYNTAX USAGE DESCRIPTION EXAMPLES
: field:value Exact match operator (case insensitive) state:"pending" returns records where the currency is exactly "PENDING" in a case-insensitive comparison
(space) field1:value1 field2:value2 The query returns only records that match both clauses state:active slack.channel_name:general
OR field:(value1 OR value2) The query returns records that match either of the values (case insensitive) state:(active OR pending)

Query Fields

param description
state the violation states to filter on
user_email the emails of users updating the resource resulting in the violation
user_name the usernames of users updating the resource resulting in the violation
integration_name the integration to filter on
confidence one or more likelihoods/confidences
policy_id one or more policy IDs
detection_rule_id one or more detection rule IDs
detector_id one or more detector IDs
risk_label the risk label to filter on
risk_source the risk determination source to filter on
slack.channel_name the slack channel names to filter on
slack.channel_id the slack channel IDs to filter on
slack.workspace the slack workspaces to filter on
confluence.parent_page_name the names of the parent pages in confluence to filter on
confluence.space_name the names of the spaces in confluence to filter on
gdrive.drive the drive names in gdrive to filter on
jira.project_name the jira project names to filter on
jira.ticket_number the jira ticket numbers to filter on
salesforce.org_name the salesforce organization names to filter on
salesforce.object the salesforce object names to filter on
salesforce.record_id the salesforce record IDs to filter on
github.author_email the github author emails to filter on
github.branch the github branches to filter on
github.commit the github commit ids to filter on
github.org the github organizations to filter on
github.repository the github repositories to filter on
github.repository_owner the github repository owners to filter on
teams.team_name the m365 teams team names to filter on
teams.channel_name the m365 teams channels to filter on
teams.channel_type the m365 teams channel types to filter on
teams.team_sensitivity the m365 teams sensitivities to filter on
teams.sender the m365 teams senders to filter on
teams.msg_importance the m365 teams importance to filter on
teams.msg_attachment the m365 teams attachment names to filter on
teams.chat_id the m365 teams chat ID to filter on
teams.chat_type the m365 teams chat type to filter on
teams.chat_topic the m365 teams chat topic to filter on
teams.chat_participant the m365 teams chat participant's display name to filter on
onedrive.drive_owner drive owner's display name to filter on
onedrive.drive_owner_email drive owner's email to filter on
onedrive.file_name the file name to filter on
onedrive.created_by the m365 user, who created the file in the drive, display name to filter on
onedrive.created_by_email the m365 users, who created the file in the drive, email to filter on
onedrive.modified_by the m365 users, who last modified the file in the drive, display name to filter on
onedrive.modified_by_email the m365 users, who last modified the file in the drive, email to filter on
zendesk.ticket_status the zendesk ticket status to filter on
zendesk.ticket_title the zendesk ticket titles to filter on
zendesk.ticket_group_assignee the zendesk ticket assignee groups to filter on
zendesk.current_user_role the zendesk ticket current assignee user's roles to filter on
notion.created_by the names of the users creating a resource in notion to filter on
notion.last_edited_by the names of the users editing a resource in notion to filter on
notion.page_title the page names in notion to filter on
notion.workspace_name the workspace names in notion to filter on
gmail.user_name the names of the sender to filter on
gmail.from the email of sender to filter on
gmail.to the email or name of recipients to filter on
gmail.cc the email or name of cc to filter on
gmail.bcc the email or name of bcc to filter on
gmail.thread_id the thread id of email to filter on
gmail.subject the subject of email to filter on
gmail.attachment_name the name of attachment to filter on
gmail.attachment_type the type of attachment to filter on
Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /dlp/v1/violations/search HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "violations": [
    {
      "id": "text",
      "integration": "SLACK",
      "createdAt": 1,
      "updatedAt": 1,
      "possibleActions": [
        "ACKNOWLEDGE"
      ],
      "state": "ACTIVE",
      "resourceLink": "text",
      "metadata": {
        "slackMetadata": {
          "location": "text",
          "locationType": "text",
          "username": "text",
          "userID": "text",
          "messagePermalink": "text",
          "locationMembers": [
            "text"
          ],
          "locationMemberCount": 1,
          "channelID": "text",
          "workspaceName": "text"
        },
        "confluenceMetadata": {
          "itemName": "text",
          "itemType": "text",
          "isArchived": true,
          "createdAt": 1,
          "updatedAt": 1,
          "labels": [
            "text"
          ],
          "spaceName": "text",
          "spaceKey": "text",
          "spaceNameLink": "text",
          "parentPageName": "text",
          "authorName": "text",
          "authorEmail": "text",
          "authorNameLink": "text",
          "permalink": "text",
          "confluenceID": "text",
          "confluenceUserID": "text",
          "itemVersion": 1,
          "parentPageID": "text",
          "parentVersion": 1
        },
        "gdriveMetadata": {
          "fileID": "text",
          "fileName": "text",
          "fileType": "text",
          "fileSize": "text",
          "fileLink": "text",
          "permissionSetting": "text",
          "sharingExternalUsers": [
            "text"
          ],
          "sharingInternalUsers": [
            "text"
          ],
          "canViewersDownload": true,
          "fileOwner": "text",
          "isInTrash": true,
          "createdAt": 1,
          "updatedAt": 1,
          "drive": "text",
          "updatedBy": "text"
        },
        "jiraMetadata": {
          "projectName": "text",
          "ticketNumber": "text",
          "projectType": "text",
          "issueID": "text",
          "projectLink": "text",
          "ticketLink": "text",
          "commentLink": "text",
          "attachmentLink": "text"
        },
        "githubMetadata": {
          "branchName": "text",
          "organization": "text",
          "repository": "text",
          "authorEmail": "text",
          "authorUsername": "text",
          "createdAt": 1,
          "isRepoPrivate": true,
          "filePath": "text",
          "githubPermalink": "text",
          "repositoryOwner": "text",
          "githubRepoLink": "text"
        },
        "salesforceMetadata": {
          "orgName": "text",
          "recordID": "text",
          "objectName": "text",
          "contentType": "text",
          "userID": "text",
          "userName": "text",
          "updatedAt": 1,
          "fields": [
            "text"
          ],
          "fileType": "text",
          "attachmentLink": "text",
          "attachmentName": "text",
          "objectLink": "text"
        },
        "zendeskMetadata": {
          "ticketStatus": "text",
          "ticketTitle": "text",
          "ticketRequestor": "text",
          "ticketGroupAssignee": "text",
          "ticketAgentAssignee": "text",
          "currentUserRole": "text",
          "ticketID": 1,
          "ticketFollowers": [
            "text"
          ],
          "ticketTags": "text",
          "createdAt": 1,
          "UpdatedAt": 1,
          "location": "text",
          "subLocation": "text",
          "ticketCommentID": 1,
          "ticketGroupID": 1,
          "ticketGroupLink": "text",
          "ticketAgentID": 1,
          "ticketAgentLink": "text",
          "ticketEvent": "text",
          "userRole": "text",
          "attachmentName": "text",
          "attachmentLink": "text"
        },
        "notionMetadata": {
          "createdBy": "text",
          "updatedBy": "text",
          "workspaceName": "text",
          "workspaceLink": "text",
          "pageID": "text",
          "pageTitle": "text",
          "createdAt": 1,
          "updatedAt": 1,
          "privatePageLink": "text",
          "publicPageLink": "text",
          "sharedExternally": true,
          "attachmentID": "text"
        },
        "browserMetadata": {
          "location": "text",
          "subLocation": "text",
          "browserName": "text",
          "userComment": "text"
        },
        "m365TeamsMetadata": {
          "teamName": "text",
          "tenantID": "text",
          "tenantDomain": "text",
          "teamID": "text",
          "teamVisibility": "text",
          "teamWebURL": "text",
          "channelID": "text",
          "channelName": "text",
          "channelType": "text",
          "channelWebURL": "text",
          "messageID": "text",
          "createdAt": 1,
          "updatedAt": 1,
          "chatMessageSender": "text",
          "userID": "text",
          "userPrincipalName": "text",
          "attachments": [
            {
              "attachmentID": "text",
              "attachmentName": "text",
              "attachmentURL": "text"
            }
          ],
          "chatMessageImportance": "text",
          "chatID": "text",
          "chatType": "text",
          "chatTopic": "text",
          "chatParticipants": [
            {
              "userID": "text",
              "email": "text",
              "displayName": "text"
            }
          ]
        },
        "m365OnedriveMetadata": {
          "tenantID": "text",
          "tenantDomain": "text",
          "driveItemID": "text",
          "driveItemName": "text",
          "driveItemURL": "text",
          "driveItemMimeType": "text",
          "driveItemSize": 1,
          "parentPath": "text",
          "createdByID": "text",
          "updatedByEmail": "text",
          "updatedByID": "text",
          "updatedByName": "text",
          "createdAt": 1,
          "updatedAt": 1,
          "specialFolderName": "text",
          "driveID": "text",
          "driveOwnerName": "text",
          "driveOwnerEmail": "text",
          "driveOwnerID": "text"
        },
        "inlineEmailMetadata": {
          "domain": "text",
          "user_name": "text",
          "from": "text",
          "to": [
            "text"
          ],
          "cc": [
            "text"
          ],
          "bcc": [
            "text"
          ],
          "subject": "text",
          "sent_at": 1,
          "thread_id": "text",
          "attachment_name": "text",
          "attachment_type": "text"
        }
      },
      "fileDetails": {
        "fileName": "text",
        "mimeType": "text",
        "permalink": "text"
      },
      "policyUUIDs": [
        "text"
      ],
      "detectionRuleUUIDs": [
        "text"
      ],
      "detectorUUIDs": [
        "text"
      ],
      "risk": "UNSPECIFIED",
      "riskSource": "NIGHTFALL",
      "riskScore": 1,
      "userInfo": {
        "username": "text",
        "userEmail": "text"
      }
    }
  ],
  "nextPageToken": "text"
}

Fetch violation findings

get

Get findings for a specific violation

Authorizations
Path parameters
violationIdstring ยท uuidRequired

The UUID of the violation

Query parameters
pageTokenstringOptional

Cursor for getting the next page of results

limitinteger ยท int32 ยท max: 1000Optional

Number of findings to fetch in one page (max 1000)

Default: 1000
Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
404
Violation does not exist
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /dlp/v1/violations/{violationId}/findings HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "findings": [
    {
      "id": "text",
      "detectorUUID": "text",
      "subDetectorUUID": "text",
      "confidence": "text",
      "redactedSensitiveText": "text",
      "redactedContext": {
        "beforeContext": "text",
        "afterContext": "text"
      },
      "redactedLocation": {
        "byteRange": {
          "start": 1,
          "end": 1
        },
        "lineRange": {
          "start": 1,
          "end": 1
        }
      },
      "metadata": {
        "apiKeyMetaData": {
          "status": "UNVERIFIED",
          "kind": "UNSPECIFIED",
          "description": "text"
        }
      },
      "subLocation": "text",
      "annotationUUID": "text"
    }
  ],
  "nextPageToken": "text"
}

Fetch annotation

get

Fetch an annotation by ID

Authorizations
Path parameters
annotationIdstring ยท uuidRequired

The UUID of the annotation to fetch

Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
404
Annotation does not exist
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
get
GET /dlp/v1/annotations/{annotationId} HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "type": "DETECTOR_FALSE_POSITIVE",
  "comment": "text",
  "autoApply": true
}

Remove finding annotation

post

Remove the annotation for a finding

Authorizations
Path parameters
findingIdstring ยท uuidRequired

The UUID of the finding to unannotate

Responses
200
Successful response (even if annotation does not exist)
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
post
POST /dlp/v1/findings/{findingId}/unannotate HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*

No content

LogoIn the Atlassian Marketplace, why does it show that the Nightfall app is not approved in security? | Nightfall Firewall for AI

Complete File Upload

post

Validates that all bytes of the file have been uploaded, and that the content type is supported by Nightfall.

Authorizations
Path parameters
fileIdstring ยท uuidRequired

a file ID returned from a previous file creation request

Responses
200
Success
application/json
400
Invalid request payload
application/json
401
Authentication failure
application/json
404
Invalid File ID
application/json
409
File Upload in Incorrect State
application/json
429
Rate Limit Exceeded or Monthly Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
post
POST /v3/upload/{fileId}/finish HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Accept: */*
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "fileSizeBytes": 1,
  "chunkSize": 1,
  "mimeType": "text"
}
this link
contact us
user interface
Detector Glossary
Creating an Inline Detection Rule
pre-defined Detection Rules
native detectors
extensive library
Using Redaction
Git Repository
confidence level

Take an action on Violations

post

Perform an action on a list of violations. If an action can't be performed on a violation, that violation is ignored. Depending on the action, it could be processed immediately or queued.

Authorizations
Body
violationUUIDsstring ยท uuid[]Required

The UUIDs of the violations to perform the action on

actionstring ยท enumRequired

The action to perform on the violations

Possible values:
Responses
200
Successful response (processed immediately)
application/json
202
Accepted response (queued for processing)
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
post
POST /dlp/v1/violations/actions HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 82

{
  "violationUUIDs": [
    "123e4567-e89b-12d3-a456-426614174000"
  ],
  "action": "ACKNOWLEDGE"
}
{
  "submitted": [
    "123e4567-e89b-12d3-a456-426614174000"
  ]
}

Annotate finding

post

Annotate a finding

Authorizations
Path parameters
findingIdstring ยท uuidRequired

The UUID of the finding to annotate

Body
typestring ยท enumRequired

The annotation type

Possible values:
commentstringOptional

The comment to add to the annotation

autoApplybooleanOptional

Whether the annotation applies to all findings of this sensitive data (defaults to true)

Default: true
Responses
200
Successful response
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
409
Finding already annotated
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
post
POST /dlp/v1/findings/{findingId}/annotate HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 68

{
  "type": "DETECTOR_FALSE_POSITIVE",
  "comment": "text",
  "autoApply": true
}
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "type": "DETECTOR_FALSE_POSITIVE",
  "comment": "text",
  "autoApply": true
}

Scan Plain Text

post

Provide a list of arbitrary string data, and scan each item with the provided detectors to uncover sensitive information. Returns a list equal in size to the number of provided string payloads. The item at each list index will be a list of all matches for the provided detectors, or an empty list if no occurrences are found.

Authorizations
Body

The request body of the /v3/scan endpoint

policyUUIDsstring[]Optional

A list of UUIDs referring to policies to use to scan the request payload. Policies can be built in the Nightfall Dashboard. Maximum 1.

payloadstring[]Optional

The text sample(s) you wish to scan. This data is passed as a string list, so you may choose to segment your text into multiple items for better granularity. The aggregate size of your text (summed across all items in the list) must not exceed 500 KB for any individual request, and the number of items in that list may not exceed 50,000.

Responses
200
Success
application/json
400
Invalid request payload
application/json
401
Authentication failure
application/json
422
Unprocessable request payload
application/json
429
Rate Limit Exceeded or Monthly Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
post
POST /v3/scan HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 1595

{
  "policyUUIDs": [
    "text"
  ],
  "policy": {
    "detectionRuleUUIDs": [
      "text"
    ],
    "detectionRules": [
      {
        "name": "text",
        "logicalOp": "ANY",
        "detectors": [
          {
            "minNumFindings": 1,
            "minConfidence": "VERY_UNLIKELY",
            "detectorUUID": "text",
            "displayName": "text",
            "detectorType": "NIGHTFALL_DETECTOR",
            "nightfallDetector": "AMERICAN_BANKERS_CUSIP_ID",
            "regex": {
              "pattern": "text",
              "isCaseSensitive": true
            },
            "wordList": {
              "values": [
                "text"
              ],
              "isCaseSensitive": true
            },
            "contextRules": [
              {
                "regex": {
                  "pattern": "text",
                  "isCaseSensitive": true
                },
                "proximity": {
                  "windowBefore": 1,
                  "windowAfter": 1
                },
                "confidenceAdjustment": {
                  "fixedConfidence": "VERY_UNLIKELY"
                }
              }
            ],
            "exclusionRules": [
              {
                "matchType": "PARTIAL",
                "exclusionType": "REGEX",
                "regex": {
                  "pattern": "text",
                  "isCaseSensitive": true
                },
                "wordList": {
                  "values": [
                    "text"
                  ],
                  "isCaseSensitive": true
                }
              }
            ],
            "redactionConfig": {
              "maskConfig": {
                "maskingChar": "text",
                "charsToIgnore": [
                  "text"
                ],
                "numCharsToLeaveUnmasked": 1,
                "maskLeftToRight": true
              },
              "infoTypeSubstitutionConfig": {},
              "substitutionConfig": {
                "substitutionPhrase": "text"
              },
              "cryptoConfig": {
                "publicKey": "text"
              },
              "removeFinding": true
            },
            "scope": "Content"
          }
        ]
      }
    ],
    "contextBytes": 1,
    "defaultRedactionConfig": {
      "maskConfig": {
        "maskingChar": "text",
        "charsToIgnore": [
          "text"
        ],
        "numCharsToLeaveUnmasked": 1,
        "maskLeftToRight": true
      },
      "infoTypeSubstitutionConfig": {},
      "substitutionConfig": {
        "substitutionPhrase": "text"
      },
      "cryptoConfig": {
        "publicKey": "text"
      },
      "removeFinding": true
    },
    "alertConfig": {
      "slack": {
        "target": "text"
      },
      "email": {
        "address": "text"
      },
      "url": {
        "address": "text"
      },
      "siem": {
        "address": "text",
        "sensitiveHeaders": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "plainTextHeaders": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        }
      }
    }
  },
  "payload": [
    "text"
  ]
}
{
  "findings": [
    [
      {
        "finding": "text",
        "redactedFinding": "text",
        "beforeContext": "text",
        "afterContext": "text",
        "detector": {
          "name": "text",
          "uuid": "123e4567-e89b-12d3-a456-426614174000",
          "subdetector": {
            "name": "text",
            "uuid": "123e4567-e89b-12d3-a456-426614174000"
          }
        },
        "confidence": "VERY_UNLIKELY",
        "location": {
          "byteRange": {
            "start": 1,
            "end": 1
          },
          "codepointRange": {
            "start": 1,
            "end": 1
          }
        },
        "redactedLocation": {
          "byteRange": {
            "start": 1,
            "end": 1
          },
          "codepointRange": {
            "start": 1,
            "end": 1
          }
        }
      }
    ]
  ],
  "redactedPayload": [
    "text"
  ]
}

Initiate File Upload

post

Creates a new file upload session. If this operation returns successfully, the ID returned as part of the response object shall be used to refer to the file in all subsequent upload and scanning operations.

Authorizations
Body
fileSizeBytesintegerOptional

the number of bytes representing the size of the file to-be-uploaded.

Responses
200
Success
application/json
400
Invalid request payload
application/json
401
Authentication failure
application/json
429
Rate Limit Exceeded or Monthly Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
post
POST /v3/upload HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 19

{
  "fileSizeBytes": 1
}
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "fileSizeBytes": 1,
  "chunkSize": 1,
  "mimeType": "text"
}

Upload File Chunk

patch

Upload all bytes contained in the request body to the file identified by the ID in the path parameter.

Authorizations
Path parameters
fileIdstring ยท uuidRequired

a file ID returned from a previous file creation request

Header parameters
X-Upload-OffsetintegerRequired

The numeric offset at which the bytes contained in the body should be written. This offset must be a multiple of the chunk size returned when the file upload was created.

Body
anyOptional

The payload bytes to upload; the size of the request body must exactly match the chunkSize that was returned when the file upload was created.

Responses
204
Success
400
Invalid request payload
application/json
401
Authentication failure
application/json
404
Invalid File ID
application/json
429
Rate Limit Exceeded or Monthly Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
patch
PATCH /v3/upload/{fileId} HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
X-Upload-Offset: 1
Content-Type: application/octet-stream
Accept: */*

No content

Scan Uploaded File

post

Triggers a scan of the file identified by the provided fileID. As the underlying file might be arbitrarily large, this scan is conducted asynchronously. Results from the scan are delivered to the webhook URL provided in the request payload.

Authorizations
Path parameters
fileIdstring ยท uuidRequired

a file ID returned from a previous file creation request

Body
policyUUIDstring ยท uuidOptional

the UUID of the Detection Policy to be used with this scan. Exactly one of this field or "policy" should be provided.

requestMetadatastringOptional

A string containing arbitrary metadata. Callers may opt to use this to help identify their input file upon receiving a webhook response. Maximum length 10 KB.

Responses
200
Success
application/json
400
Invalid request payload
application/json
401
Authentication failure
application/json
404
Invalid File ID
application/json
409
Incorrect File State
application/json
422
Unprocessable request payload
application/json
429
Rate Limit Exceeded or Monthly Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
post
POST /v3/upload/{fileId}/scan HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 1672

{
  "policyUUID": "123e4567-e89b-12d3-a456-426614174000",
  "policy": {
    "detectionRuleUUIDs": [
      "123e4567-e89b-12d3-a456-426614174000"
    ],
    "detectionRules": [
      {
        "name": "text",
        "logicalOp": "ANY",
        "detectors": [
          {
            "minNumFindings": 1,
            "minConfidence": "VERY_UNLIKELY",
            "detectorUUID": "text",
            "displayName": "text",
            "detectorType": "NIGHTFALL_DETECTOR",
            "nightfallDetector": "AMERICAN_BANKERS_CUSIP_ID",
            "regex": {
              "pattern": "text",
              "isCaseSensitive": true
            },
            "wordList": {
              "values": [
                "text"
              ],
              "isCaseSensitive": true
            },
            "contextRules": [
              {
                "regex": {
                  "pattern": "text",
                  "isCaseSensitive": true
                },
                "proximity": {
                  "windowBefore": 1,
                  "windowAfter": 1
                },
                "confidenceAdjustment": {
                  "fixedConfidence": "VERY_UNLIKELY"
                }
              }
            ],
            "exclusionRules": [
              {
                "matchType": "PARTIAL",
                "exclusionType": "REGEX",
                "regex": {
                  "pattern": "text",
                  "isCaseSensitive": true
                },
                "wordList": {
                  "values": [
                    "text"
                  ],
                  "isCaseSensitive": true
                }
              }
            ],
            "redactionConfig": {
              "maskConfig": {
                "maskingChar": "text",
                "charsToIgnore": [
                  "text"
                ],
                "numCharsToLeaveUnmasked": 1,
                "maskLeftToRight": true
              },
              "infoTypeSubstitutionConfig": {},
              "substitutionConfig": {
                "substitutionPhrase": "text"
              },
              "cryptoConfig": {
                "publicKey": "text"
              },
              "removeFinding": true
            },
            "scope": "Content"
          }
        ]
      }
    ],
    "alertConfig": {
      "slack": {
        "target": "text"
      },
      "email": {
        "address": "text"
      },
      "url": {
        "address": "text"
      },
      "siem": {
        "address": "text",
        "sensitiveHeaders": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "plainTextHeaders": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        }
      }
    },
    "defaultRedactionConfig": {
      "maskConfig": {
        "maskingChar": "text",
        "charsToIgnore": [
          "text"
        ],
        "numCharsToLeaveUnmasked": 1,
        "maskLeftToRight": true
      },
      "infoTypeSubstitutionConfig": {},
      "substitutionConfig": {
        "substitutionPhrase": "text"
      },
      "cryptoConfig": {
        "publicKey": "text"
      },
      "removeFinding": true
    },
    "enableFileRedaction": true
  },
  "requestMetadata": "text"
}
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "message": "text"
}

How does Nightfall's Firewall for AI differs from other solutions?

The Firewall for AI Platform differs from other solutions like Google DLP and Amazon Macie, as well as open source solutions like truffleHog, on a number of dimensions summarised below.

Accuracy

  • While solutions like Google DLP have a broad set of detectors, many of them are rules or regex based, which means many of the detectors are not usable in practice. Likewise, detection has been found to be inconsistent in some cases, perhaps due to internal A/B testing.

  • Because of the limitations of regex-based rules, instead of leveraging machine learning based detectors, OSS detection solutions tend to have a much higher rate of false positives compared to Nightfall.

  • Detector configurability and ability to provide metrics at the token level makes Nightfall accurate and actionable to engineering & security teams.

Convenience

  • Want to leave the last 4 digits of a credit card number visible, securely encrypt emails, and completely remove SSNs from your data? The Nightfall platform allows you to redact/replace, substitute, and/or encrypt sensitive data findings in the same API call as your inspection request.

Ease of use

  • OSS secret detection tools tend to rely heavily on manual creation of regex-based detection compared to an ability to programmatically scan text and file inputs using 150+ detectors in Nightfall โ€“ e.g. truffleHog only enables you to scan for secrets like passwords and private keys whereas Nightfall scans for not only secrets and credentials, but also allows you to use our vast detector library to scan for PII, PCI, and PHI.

File parsing

  • To parse files with Google DLP and Macie, each requires that they be in their respective cloud storage (Google Cloud Storage or S3, respectively). With the Nightfall Developer Platform, we take care of storage requirements for you. Uploaded assets are stored encrypted at rest with minimal access permissions, and are automatically deleted after 24 hours.

  • Amazonโ€™s file parsers are limited to around 20 file types. Most notably, Macie does not support images. Text extraction via machine-learning based OCR for images is a core component of Nightfallโ€™s file scanning endpoint.

  • Open source secrets detection solutions are limited in their detection capabilities. Namely, these projects do not support scanning binary files. Nightfall supports binary files and the ability to scan diff files.

Platform agnostic

  • Each cloud provider's DLP products are geared towards protecting their own cloud services. For example, Google DLPโ€™s native integrations are limited to Google Cloud offerings such as BigQuery. Similarly, Macie is primarily designed around scanning AWS S3 buckets. The interface is largely geared towards exploring sensitive data across S3 buckets. To scan content outside of S3, Amazonโ€™s recommendation is to move or replicate the data into S3 to scan, which is impractical.

  • OSS solutions are primarily designed around git repositories.

  • Nightfall has native integrations with many cloud applications like Slack, Atlassian, GitHub, Google Drive, as well a broad set of tutorials and open source code so you can build integrations into any data silo with ease. For example, this includes services like Snowflake, Airtable, and more.

Support and documentation

  • Google DLP and Macie are loosely supported products and with many cloud offerings, support is hard to come by. Nightfall is laser-focused on best-of-breed content inspection and we are ready to address your questions and use cases.

  • Nightfall also has extensive documentation including SDKs for multiple languages including Python, Java, NodeJS, and Go - with more under consistent development.

Cost and scale

  • Costs can balloon quickly with commercial services. They also have rate limits that donโ€™t suit high data volumes.

  • Open source solutions have high hidden costs in the form of TCO, maintenance, and opportunity cost.

  • Nightfall offers a custom enterprise tier that can help you scale pricing based on your anticipated usage as well as custom rate limits.

How do I get in touch with you?

Can I test out the detection and my own detection rules before writing any code?

Test Datasets

The following sample datasets can be used to test Nightfall's advanced AI-based detection capabilities.

This data has been fully de-identified and can be used to test any data loss prevention (DLP) platform.

Overview

Leverage our software development kits (SDKs) to enable easier, faster, and more stable engagement with the Nightfall APIs. Nightfall has a growing library of language specific SDKs including for:

Overview

Nightfall provides you the flexibility to easily integrate into applications using programming languages. The supported languages are as follows.

Python

This guide describes how to use Nightfall with the Python programming language.

The example below will demonstrate how to use Nightfallโ€™s text scanning functionality to verify whether a string contains sensitive PII using the Nightfall Python SDK.

To request the Nightfall API you will need:

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • Data to scan. Note that the API interprets data as plaintext, so you may pass it in any structured or unstructured format.

In this tutorial, we will be downloading, setting up, and using the Python SDK provided by Nightfall.

You can download the Nightfall SDK from PyPi like this:

We will be using the built-in os library to help run this sample API script. This will be used to help extract the API Key from the OS as an environment variable.

Next, we extract our API Key, and abstract a nightfall class from the SDK, for it. In this example, we have our API key set via an environment variable called NIGHTFALL_API_KEY. Your API key should never be hard-coded directly into your script.

In this example, we will use some example data in the payload List.

๐ŸšงPayload Limit

Payloads must be under 500 KB when using the Scan API. If your file is larger than the limit, consider using the file api, which is also available via the Python SDK.

We will ignore the second parameter as we do not have redaction configured for this request.

With the Nightfall API, you can redact and mask your findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Reviewing Results

All data and sample findings shown below are validated, non-sensitive, examples of sample data.

If there are no sensitive findings in our payload, the response will be as shown in the 'empty response' pane below:

You are now ready to use the Python SDK for other scenarios.


Amazon RDS DLP Tutorial

RDS is a service for managing relational databases and can contain databases from several different varieties. This tutorial demonstrates connectivity with a postgresSQL database but could be modified to support other database options.

This tutorial allows you to scan your RDS managed databases using the Nightfall API/SDK.

You will need a few things first to use this tutorial:

  • An AWS account with at least one RDS database (this example uses postgres but could be modified to support other varieties of SQL)

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment (version 3.6 or later)

  • Python Nightfall SDK

To accomplish this, we will install the version required of the Nightfall SDK:

We will be using Python and importing the following libraries:

We will set the size and length limits for data allowed by the Nightfall API per request. Also, we extract our API Key, and abstract a nightfall class from the SDK, for it.

Next we extract our API Key, and abstract a nightfall class from the SDK, for it.

First we will set up the connection with the Postgres table, in RDS, and get the data to be scanned from there.

Note, we are setting the RDS authentication information as the below environment variables, and referencing the values from there:

  • 'RDS_ENDPOINT'

  • 'RDS_USER'

  • 'RDS_PASSWORD'

  • 'RDS_DATABASE'

  • 'RDS_TABLE'

  • 'RDS_PRIMARYKEY'

We can then check the data size, and as long as it is below the aforementioned limits, can be ran through the API.

If the data payloads are larger than the size or length limits of the API, extra code will be required to further chunk the data into smaller bits that are processable by the Nightfall scan API.

This can be seen in the second and third code panes below:

To review the results, we will print the number of findings, and write the findings to an output file:

Please find the full script together below, broken into functions that can be ran in full:

The following are potential ways to continue building upon this service:

  • Writing Nightfall results to a database and reading that into a visualization tool

  • Adding to this script to support other varieties of SQL

  • Redacting sensitive findings in place once they are detected, either automatically or as a follow-up script once findings have been reviewed

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your RDS findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with RDS

The example above is specific for the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the Endpoint

  1. Retrieve data from RDS

Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our AWS RDS Connection. Once the session is established, we can query from RDS.

Now we go through the data and write to a .csv file.

  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Amazon S3 DLP Tutorial

Prerequisites

You will need the following for this tutorial:

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment

Installation

To install boto3 and the Nightfall SDK, run the following command.

Implementation

In addition to boto3, we will be utilizing the following Python libraries to interact with the Nightfall SDK and to process the data.

We've configured our AWS credentials, as well as our Nightfall API key, as environment variables so they don't need to be committed directly into our code.

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID. Also, we extract our API Key, and abstract a nightfall class from the SDK, for it.

Now we create an iterable of scannable objects in our target S3 buckets, and specify a maximum file size to pass to the Nightfall API (500 KB). In practice, you could add additional code to chunk larger files across multiple API requests.

We will also create an all_findings object to store Nightfall Scan results. The first row of our all_findings object will constitute our headers, since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we recommend using the Redaction feature of the Nightfall API to mask your data.

We will now initialize our AWS S3 Session. Once the session is established, we get a handle for the S3 resource.

Now we go through each bucket and retrieve the scannable objects, adding their text contents to objects_to_scan as we go.

In this tutorial, we assume that all files are text-readable. In practice, you may wish to filter out un-scannable file types such as images with the object.get()['ContentType'] property.

For each object content we find in our S3 buckets, we send it as a payload to the Nightfall Scan API with our previously configured detectors.

request-responseOn receiving the request-response, we break down each returned finding and assign it a new row in the CSV we are constructing.

In this tutorial, we scope each object to be scanned with its API request. At the cost of granularity, you may combine multiple smaller files into a single call to the Nightfall API.

Now that we have finished scanning our S3 buckets and collated the results, we are ready to export them to a CSV file for further review.

That's it! You now have insight into all of the sensitive data inside your data stored inside your organization's AWS S3 buckets.

As a next step, you could attempt to delete or redact your files in which sensitive data has been found by further utilizing boto3.

Using the File Scanning Endpoint with S3

The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites for File Scanning

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

File Scan Implementation

Retrieve a File List

The first step is to get a list of files in your S3 buckets/objects

Similar to the process at the beginning of this tutorial for the text scanning endpoint, we will now initialize our AWS S3 Session. Once the session is established, we get a handle for the S3 resource.

Now we go through each bucket and retrieve the scannable objects.

For each object content we find in our S3 buckets, we send it as an argument to the Nightfall File Scan API with our previously configured detectors.

Iterate through a list of files and begin the file upload process.

A webhook server is required for the scan endpoint to submit its results. See our example webhook server.

The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Zendesk DLP Tutorial

Customer support tickets are a potential vector for leaking customer PII. By utilizing ZenDeskโ€™s API in conjunction with Nightfallโ€™s scan SDK you can discover, classify, and remediate sensitive data within your customer support system.

You will need a few things to follow along with this tutorial:

  • A ZenDesk account and API key

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment

To accomplish this, we will install the version required of the Nightfall SDK:

We will be using Python and importing the following libraries:

We've configured the ZenDesk user and API key, as well as the Nightfall API key as environment variables so they don't need to be committed directly into our code.

Here we'll define the headers and other request parameters that we will be using later to call both APIs. Next we extract our API Key, and abstract a nightfall class from the SDK, for it.

Letโ€™s start by using ZenDeskโ€™s API to retrieve all support tickets in our account. We'll set up an "all_findings" object to compile our findings as we go.

The first row of our all_findings object will constitute our headers, since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.

Now that we have a collection of all of our tickets, we will retrieve the set of user comments made on each of those tickets.

Within the above for loop, we compile all of the comment bodies into a list so that we can scan the entire comment thread for a ticket with a single call to the Nightfall SDK.

For each set of results we receive, we can start to compile our findings into a csv format.

Finally, we export our results to a csv so they can be easily reviewed.

That's it! You now have insight into all of the sensitive data inside your customer support tickets. As a next step, we could use these findings as an input to ZenDesk's redact API in order to clean up the original comments. We could also use ZenDesk's API to add a comment to tickets with sensitive findings triggering an email alert for the offending ticket owner.

Putting everything together:

That's it! You should now be set up to start using the Zendesk integration for the Nightfall Text Scanning SDK.

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your Zendesk ticket findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with Zendesk

The example above is specific for the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer โ€” see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the endpoint

  1. Retrieve ticket data from Zendesk

Similar to the process at the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve ticket data from Zendesk.

Now we go through write the ticket data to a .csv file.

  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

All inspection configuration in Google DLP is done as code, which makes it challenging to easily update, visualize, and modify detection rules and configuration. Nightfall allows for configuration as code, as well as the Nightfall for creating and updating detection rules, which makes it easier to collaborate.

Don't hesitate to get in touch with us directly via email at or through the c on our website.

We host on Wednesdays at 12 pm PT to help answer questions, talk through any ideas, and chat about data security. We would love to see you there!

Yes, you can test out the detection engine, including 70+ pre-built detectors without writing any code or having to sign up in our .

If there is a language-specific SDK that you would find valuable but is not here, please don't hesitate to reach out to .

You can read more about or about our in the linked reference guides.

We recommend you first set up a virtual environment. You can learn more about that .

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

Now we are ready to review the results from the Nightfall SDK to check if there is any sensitive data in our file. Since the results will be in a , we can use the built-in __repr__ functions to format the results in a user-friendly and readable manner.

And that's it

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

AWS S3 is a popular tool for storing your data in the cloud, however, it also has huge potential for . By utilizing AWS SDKs in conjunction with Nightfallโ€™s Scan API, you can discover, classify, and remediate sensitive data within your S3 buckets.

most recent version of the

We will use as our AWS client in this demo. If you are using another language, check for AWS's recommended SDKs.

Once the files have been uploaded, begin using the .

most recent version of the

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

Note: If you are scanning a high volume of tickets, you may run into either the , or the . In this tutorial, we assume that you fall under these limits, but additional code may be required to ensure this.

To scan your support tickets on an ongoing basis, you may consider taking advantage of ZenDesk's functionality.

Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

Dashboard
support@nightfall.ai
ontact form
Nightfall Developer Office Hours
Playground
Protected Health Information (PHI)
API Keys
Passwords
Cryptographic Keys
PII (Personally Identifiable Information)
Financial - Credit Card Numbers and Banking Info
Image ID Documents (ID Cards, Passports, Credit Cards, etc as images)
Java
Python
Go
Node.js
product@nightfall.ai
import os

from nightfall import Confidence, DetectionRule, Detector, LogicalOp, Nightfall
nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])
detection_rule_uuid = os.environ.get('DETECTION_RULE_UUID')
payload = [
    "The customer social security number is 458-02-6124",
    "No PII in this string",
    "My credit card number is 4916-6734-7572-5015"
]

result, _ = nightfall.scan_text(
        payload,
        detection_rule_uuids=[detection_rule_uuid]
    )
payload = [
    "The customer social security number is 458-02-6124",
    "No PII in this string",
    "My credit card number is 4916-6734-7572-5015"
]

result, _ = nightfall.scan_text(
    payload,
    detection_rules=[
        DetectionRule(
            name="Sample_Detection_Rule",
            logical_op=LogicalOp.ANY,
            detectors=[
                Detector(
                    min_confidence=Confidence.VERY_LIKELY,
                    min_num_findings=1,
                    display_name="Credit Card",
                    nightfall_detector="CREDIT_CARD_NUMBER",
                ),
                Detector(
                    min_confidence=Confidence.VERY_LIKELY,
                    min_num_findings=1,
                    display_name="Social",
                    nightfall_detector="US_SOCIAL_SECURITY_NUMBER",
                )
            ]
        )
    ]
)
[
    [Finding(finding='458-02-6124', redacted_finding=None, before_context=None, after_context=None, detector_name='US social security number (SSN)', detector_uuid='e30d9a87-f6c7-46b9-a8f4-16547901e069', confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, byte_range=Range(start=39, end=50), codepoint_range=Range(start=39, end=50), matched_detection_rule_uuids=['c67e3dd7-560e-438f-8c72-6ec54979396f'], matched_detection_rules=[])],
    [],
    [Finding(finding='4916-6734-7572-5015', redacted_finding=None, before_context=None, after_context=None, detector_name='Credit card number', detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, byte_range=Range(start=25, end=44), codepoint_range=Range(start=25, end=44), matched_detection_rule_uuids=['c67e3dd7-560e-438f-8c72-6ec54979396f'], matched_detection_rules=[])]
]
[
    [Finding(finding='458-02-6124', redacted_finding=None, before_context=None, after_context=None, detector_name='Social', detector_uuid='e30d9a87-f6c7-46b9-a8f4-16547901e069', confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, byte_range=Range(start=39, end=50), codepoint_range=Range(start=39, end=50), matched_detection_rule_uuids=[], matched_detection_rules=['Sample_Detection_Rule'])],
    [],
    [Finding(finding='4916-6734-7572-5015', redacted_finding=None, before_context=None, after_context=None, detector_name='Credit Card', detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, byte_range=Range(start=25, end=44), codepoint_range=Range(start=25, end=44), matched_detection_rule_uuids=[], matched_detection_rules=['Sample_Detection_Rule'])],
]
[[], [], []]
pip install nightfall=0.6.0
import requests
import psycopg2
import os
import sys
import json
from nightfall import Nightfall
size_limit = 500000
length_limit = 50000
nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])
detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')
connection = psycopg2.connect(
        host = os.environ.get('RDS_ENDPOINT'),
        port = 5432,
        user = os.environ.get('RDS_USER'),
        password = os.environ.get('RDS_PASSWORD'),
        database = os.environ.get('RDS_DATABASE')
    )
table_name = os.environ.get('RDS_TABLE')
primary_key = os.environ.get('RDS_PRIMARYKEY')
cursor = connection.cursor()

sql = f"""
SELECT *
FROM {table_name}
"""

cursor.execute(sql)
connection.commit()

cols = [i.name for i in cursor.description]
data = cursor.fetchall()
primary_key_col = []

if len(data) == 0:
  raise Exception('Table is empty! No data to scan.')

all_findings = []
for col_idx, col in enumerate(columns):
    payload = [str(i[col_idx]) for i in data]
    if col == primary_key:
      primary_key_col = payload
      col_size = sys.getsizeof(payload)

    if col_size < size_limit:
   	 resp = nightfall.scanText(
        [payload],
        detection_rule_uuids=[detectionRuleUUID])
    
     col_resp = json.loads(resp)
      
for item_idx, item in enumerate(col_resp):
  if item != None:
    for finding in item:
      finding['column'] = col
      try:
        finding['index'] = primary_key_col[item_idx]
      except:
          finding['index'] = item_idx
      all_findings.append(finding)
col_resp = []
chunks = []
chunk = []
running_size = 0
big_items = []

for item_idx, item in enumerate(payload):
  item_size = sys.getsizeof(item)
  if (running_size + item_size < size_limit) and (len(chunk) < length_limit):
    chunk.append(item)
    running_size += item_size
  elif item_size < size_limit:
    chunks.append(chunk)
    chunk = [item]
    running_size = item_size
  else:
    if len(chunk) < length_limit:
      chunk.append('')
    else:
      chunks.append(chunk)
      chunk = ['']
      big_items.append(item_idx)
      chunks.append(chunk)

chunk_cursor = 0

for chunk in chunks:
  resp = nightfall.scanText(
        [chunk],
        detection_rule_uuids=[conditionSetUUID])
  col_resp.extend(json.loads(resp.text))
  chunk_cursor += len(chunk)

for item_idx, item in enumerate(col_resp):
  if item != None:
    for finding in item:
      finding['column'] = col
      try:
        finding['index'] = primary_key_col[item_idx]
      except:
          finding['index'] = item_idx
      all_findings.append(finding)
for big in big_items:
  item_size = sys.getsizeof(big)
    chunks_req = (item_size // size_limit) + 1
    chunk_len = len(item) // chunks_req
    cursor = 0
    item_findings = []
    for _ in range(chunks_req):
        p = item[cursor : min(cursor + chunk_len, len(item))]
        resp = nightfall.scanText({
        [[p]],
        detection_rule_uuids=[conditionSetUUID])
        item_findings.extend(json.loads(resp.text))
        cursor += chunk_len
  
  if item_findings == []:
    raise Exception(f"Error while scanning large item at column {col}, Index {primary_key_col[big]}")
  for find_chunk in item_resp:
      if find_chunk != None:
        for finding in find_chunk:
          finding['column'] = col
          try:
            finding['index'] = primary_key_col[big]
          except:
            finding['index'] = big
          all_findings.append(finding)
printf("{len(all_findings)} sensitive findings in {os.environ.get('RDS_TABLE')}")
with open('rds_findings.json', 'w') as output_file:
  json.dump(all_findings, output_file)
import requests
import psycopg2
import os
import sys
import json
from nightfall.api import Nightfall

size_limit = 500000
length_limit = 50000

nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])

detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

def get_from_rds():
    '''
    Gets data to be scanned from postgres table in RDS.
    Inputs: 
        None (all required info should be stored as environment variables)
    Returns: 
        data, list[tuple()] - data from postgres table as a list of tuples
        cols, list[str] - list of column names
        primary_key, str - name of primary key column
    '''
    connection = psycopg2.connect(
        host = os.environ.get('RDS_ENDPOINT'),
        port = 5432,
        user = os.environ.get('RDS_USER'),
        password = os.environ.get('RDS_PASSWORD'),
        database = os.environ.get('RDS_DATABASE')
    )
    table_name = os.environ.get('RDS_TABLE')
    primary_key = os.environ.get('RDS_PRIMARYKEY')
    cursor = connection.cursor()

    sql = f"""
    SELECT *
    FROM {table_name}
    """

    cursor.execute(sql)
    connection.commit()

    cols = [i.name for i in cursor.description]
    data = cursor.fetchall()

    return data, cols, primary_key

def nightfall_scan(payload):
    '''
    Calls the Nightfall scan API on input text
    Inputs:
        payload, list[str] - list of strings to be scanned
    Returns:
        resp - http response from the Nightfall API containing scan results
    '''
    return nightfall.scanText(
        [payload],
        detection_rule_uuids=[detectionRuleUUID])

def craft_chunks(payload, size_limit, length_limit):
    '''
    Chunks a payload into smaller bits processable by the Nightfall scan API
    Inputs:
        payload, list[str] - list of strings to be scanned
        size_limit, int - maximum data size allowed by Nightfall API per request
        length_limit, int - maximum no. of items allowed by Nightfall API per request
    Returns:
        chunks, list[list[str]] - list of lists of strings to be scanned
        big_items, list[int] - list of indices of items that exceed the size_limit on their own
    '''
    chunks = []
    chunk = []
    running_size = 0
    big_items = []
    for item_idx, item in enumerate(payload):
        item_size = sys.getsizeof(item)
        if (running_size + item_size < size_limit) and (len(chunk) < length_limit):
            chunk.append(item)
            running_size += item_size
        elif item_size < size_limit:
            chunks.append(chunk)
            chunk = [item]
            running_size = item_size
        else:
            if len(chunk) < length_limit:
                chunk.append('')
            else:
                chunks.append(chunk)
                chunk = ['']
            big_items.append(item_idx)
    chunks.append(chunk)
    return chunks, big_items

def scan_big_item(item, size_limit):
    '''
    Chunks a single large block of text and sends it to the Nightfall API
    in processable bits
    Inputs:
        item, str - a single string to be scanned by the Nightfall API
        size_limit, int - maximum data size allowed by Nightfall API per request
    Returns:
        item_findings, list[list[dict]] - findings from the Nightfall API for the entire item
    '''
    item_size = sys.getsizeof(item)
    chunks_req = (item_size // size_limit) + 1
    chunk_len = len(item) // chunks_req
    cursor = 0
    item_findings = []
    for _ in range(chunks_req):
        p = item[cursor : min(cursor + chunk_len, len(item))]
        resp = nightfall.scanText(
          [[p]],
          detection_rule_uuids=[detectionRuleUUID])
        item_findings.extend(json.loads(resp.text))
        cursor += chunk_len
    return item_findings

if __name__ == '__main__':
    # This script will be for Postgres but other SQL varieties 
    # will work with modifications
    data, columns, primary_key = get_from_rds()
    primary_key_col = []
    if len(data) == 0:
        raise Exception('Table is empty! No data to scan.')

    all_findings = []
    for col_idx, col in enumerate(columns):
        payload = [str(i[col_idx]) for i in data]
        if col == primary_key:
            primary_key_col = payload
        col_size = sys.getsizeof(payload)

        if col_size < size_limit:
            resp = resp = nightfall.scanText({
              [[p]],
              detection_rule_uuids=[detectionRuleUUID])
            col_resp = json.loads(resp)
        else:
            col_resp = []
            chunks, big_items = craft_chunks(payload, size_limit, length_limit)
            chunk_cursor = 0
            for chunk in chunks:
                resp = nightfall.scanText(
                  [chunk],
                  detection_rule_uuids=[detectionRuleUUID])
                col_resp.extend(json.loads(resp))
                chunk_cursor += len(chunk)
            
            for big in big_items:
                item_resp = scan_big_item(payload[big], size_limit)
                if item_resp == None:
                    raise Exception(f"Error while scanning large item at column {col}, Index {primary_key_col[big]}")
                for find_chunk in item_resp:
                    if find_chunk != None:
                        for finding in find_chunk:
                            finding['column'] = col
                            try:
                                finding['index'] = primary_key_col[big]
                            except:
                                finding['index'] = big
                            all_findings.append(finding)

        # Add location within source Table
        for item_idx, item in enumerate(col_resp):
            if item != None:
                for finding in item:
                    finding['column'] = col
                    try:
                        finding['index'] = primary_key_col[item_idx]
                    except:
                        finding['index'] = item_idx
                    all_findings.append(finding)

    print(f"{len(all_findings)} sensitive findings in {os.environ.get('RDS_TABLE')}")
    with open('rds_findings.json', 'w') as output_file:
        json.dump(all_findings, output_file)
connection = psycopg2.connect(
        host = os.environ.get('RDS_ENDPOINT'),
        port = 5432,
        user = os.environ.get('RDS_USER'),
        password = os.environ.get('RDS_PASSWORD'),
        database = os.environ.get('RDS_DATABASE')
    )
table_name = os.environ.get('RDS_TABLE')
primary_key = os.environ.get('RDS_PRIMARYKEY')
cursor = connection.cursor()

sql = f"""
SELECT *
FROM {table_name}
"""

cursor.execute(sql)
connection.commit()

cols = [i.name for i in cursor.description]
data = cursor.fetchall()
primary_key_col = []

if len(data) == 0:
  raise Exception('Table is empty! No data to scan.')

filename = "nf_rds_input-" + str(int(time.time())) + ".csv"  

for col_idx, col in enumerate(columns):
    payload = [str(i[col_idx]) for i in data]   
    with open(filename, 'w') as output_file:
      csv_writer = csv.writer(output_file, delimiter=',')
      csv_writer.writerows(payload)
     
print("RDS Data Written to: ", filename)
pip install boto3
pip install nightfall=1.2.0
import boto3
import requests
import json
import csv
import os
from nightfall import Nightfall
aws_session_token = os.environ.get('AWS_SESSION_TOKEN')
aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')

nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')
detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])
objects_to_scan = []
size_limit = 475000

all_findings = []
all_findings.append(
  [
    'bucket', 'object', 'detector', 'confidence', 
    'finding_byte_start', 'finding_byte_end',
    'finding_codepoint_start', 'finding_codepoint_end', 'fragment'
  ]
)
my_session = boto3.session.Session(
  aws_session_token = aws_session_token,
  aws_access_key_id = aws_access_key_id,
  aws_secret_access_key = aws_secret_access_key
)

s3 = my_session.resource('s3')
for bucket in s3.buckets.all():
  for obj in bucket.objects.all():
    temp_object = obj.get()
    size = temp_object['ContentLength']

    if size < size_limit:
      objects_to_scan.append((obj, temp_object['Body'].read().decode()))
for obj, data in objects_to_scan:
    findings, redactions = nightfall.scan_text(
        [data],
        detection_rule_uuids=[detectionRuleUUID]
    )

    for finding in findings[0]:
        row = [
            obj.bucket_name,
            obj.key,
            finding.detector_name,
            finding.confidence.value,
            finding.byte_range.start,
            finding.byte_range.end,
            finding.codepoint_range.start,
            finding.codepoint_range.end,
            finding.finding,
        ]
        all_findings.append(row)
if len(all_findings) > 1:
    with open('output_file.csv', 'w') as output_file:
        csv_writer = csv.writer(output_file, delimiter = ',')
        csv_writer.writerows(all_findings)
else:
      print('No sensitive data detected. Hooray!')
my_session = boto3.session.Session(
  aws_session_token = aws_session_token,
  aws_access_key_id = aws_access_key_id,
  aws_secret_access_key = aws_secret_access_key
)

s3 = my_session.resource('s3')
for bucket in s3.buckets.all():
  for obj in b.objects.all():
    # here we can call the file-scanning endpoints
pip install nightfall=0.6.0
import requests
import os
import json
import csv
from nightfall import Nightfall
zendesk_user = os.environ.get('ZENDESK_USER')
zendesk_api_key = os.environ.get('ZENDESK_API_KEY')
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')
zendesk_auth = (f"{zendesk_user}/token",zendesk_api_key)

zendesk_base_url = 'https://YOUR_ORG_HERE.zendesk.com/api/v2/'

nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])
detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')
zendesk_response = requests.get(
                            url = f"{zendesk_base_url}tickets.json", 
                            auth = zendesk_auth
                            )
tickets = json.loads(zendesk_response.text)['tickets']

all_findings = []
all_findings.append(
  [
    'ticket_id', 'comment_id', 'detector', 'confidence', 
    'finding_start', 'finding_end', 'finding'
  ]
)
for ticket in tickets:
    ticket_id = ticket['id']

    ticket_response = requests.get(
      url = f"{zendesk_base_url}tickets/{ticket_id}/comments.json",
      auth = zendesk_auth
    )
    
    comments = json.loads(ticket_response.text)['comments']
comment_bodies = [comment['body'] for comment in comments]  
  
  nightfall_response = nightfall.scanText(
        [comment_bodies],
        detection_rule_uuids=[detectionRuleUUID]
    )

  findings = json.loads(nightfall_response)
for c_idx, comment in enumerate(findings):
    for f_idx, finding in enumerate(comment):
      row = [
        ticket_id, 
        comments[c_idx]['id'], 
        finding['detector']['name'],
        finding['confidence'],
        finding['location']['byteRange']['start'],
        finding['location']['byteRange']['end'],
        finding['location']['codepointRange']['start'],
        finding['location']['codepointRange']['end'],
        finding['finding']
      ] 
      all_findings.append(row)
if len(all_findings) > 1:
  with open('output_file.csv', 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter = ',')
    csv_writer.writerows(all_findings)
else:
  print('No sensitive data detected. Hooray!')
# PUT /api/v2/tickets/{ticket_id}/comments/{comment_id}/redact.json
# -d '{"text": "987-65-4320"}'
import requests
import os
import json
import csv
from nightfall.api import Nightfall


zendesk_base_url = 'https://YOUR_ORG_HERE.zendesk.com/api/v2/'
nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])

# All credentials are stored as environment variables
zendesk_user = os.environ.get('ZENDESK_USER')
zendesk_api_key = os.environ.get('ZENDESK_API_KEY')
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')


if __name__ == '__main__':

    # Set up the headers we need to call the Zendesk API
    zendesk_auth = (f"{zendesk_user}/token", zendesk_api_key)

    # Set up the detectors to scan for in our tickets
    detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

    # Start retreiving our support tickets from ZenDesk

    zendesk_response = requests.get(
                            url = f"{zendesk_base_url}tickets.json", 
                            auth = zendesk_auth
                            )
    tickets = json.loads(zendesk_response.text)['tickets']

    all_findings = []
    all_findings.append(
        [
        'ticket_id', 'comment_id', 'detector', 'confidence', 
        'finding_start', 'finding_end', 'finding'
        ]
        )

    # Note this code assumes you will not run into the ZenDesk or Nightfall 
    # API rate limits. Additional code is required to ensure this
    for ticket in tickets:
        ticket_id = ticket['id']
        
        ticket_response = requests.get(
                                url = f"{zendesk_base_url}tickets/{ticket_id}/comments.json",
                                auth = zendesk_auth
                                )
        comments = json.loads(ticket_response.text)['comments']

        # To correlate across API calls, we will aggregate all of the comments in 
        # a single support ticket for one call to the Nightfall API
        comment_bodies = [comment['body'] for comment in comments]

        # Here we assume that the comment_bodies object is smaller than the maximum
        # payload size allowed by the Nightfall API. You may wish to chunk the comments
        # into multiple, separate requests if they are too large.
                
        nightfall_response = nightfall.scanText(
        [comment_bodies],
        detection_rule_uuids=[detectionRuleUUID])


        findings = json.loads(nightfall_response)

        for c_idx, comment in enumerate(findings):
            for f_idx, finding in enumerate(comment):
                row = [
                    ticket_id, 
                    comments[c_idx]['id'], 
                    finding['detector']['name'],
                    finding['confidence'],
                    finding['location']['byteRange']['start'],
                    finding['location']['byteRange']['end'],
                    finding['location']['codepointRange']['start'],
                    finding['location']['codepointRange']['end'],
                    finding['finding']
                    ] 
                all_findings.append(row)
    
    if len(all_findings) > 1:
        with open('output_file.csv', 'w') as output_file:
            csv_writer = csv.writer(output_file, delimiter = ',')
            csv_writer.writerows(all_findings)
    else:
        print('No sensitive data detected. Hooray!')
# This will return the most recent 100 logs from Datadog.

zendesk_auth = (f"{zendesk_user}/token",zendesk_api_key)

zendesk_base_url = 'https://YOUR_ORG_HERE.zendesk.com/api/v2/'

zendesk_response = requests.get(
                            url = f"{zendesk_base_url}tickets.json", 
                            auth = zendesk_auth
                            )
tickets = json.loads(zendesk_response.text)['tickets']
filename = "nf_zendesk_input-" + str(int(time.time())) + ".csv"  

with open(filename, 'w') as output_file:
  csv_writer = csv.writer(output_file, delimiter=',')
  csv_writer.writerows(tickets)
     
print("Zendesk Ticket Data Written to: ", filename)
Python
Ruby
Java
๐ŸŽ‰
obtaining a Nightfall API key
available data detectors
here
pre-made in the Nightfall web app
dataclass
pre-made in the Nightfall web app
here
unintentionally leaking sensitive data
AWS credentials
Nightfall Python SDK
boto3
this page
scan endpoint
Nightfall Python SDK
pre-made in the Nightfall web app
ZenDesk API's rate limits
Nightfall API's rate limits
Incremental Exports
here
1280
2028
2128

Detecting Sensitive Data in SMS Automations

Transactional email and communication APIs like SendGrid, Twilio, SES, and Mailgun are critical components to modern applications. These services allow developers to easily incorporate end-user communication into their applications without the infrastructural overhead.

However, these services pose a new source of security risk as they can lead to accidental sharing of sensitive data if communications are sent to the wrong users or inadvertently contain sensitive data. Adding data loss prevention (DLP) into your business logic can provide the critical capability to classify & protect sensitive data before it is exposed, leaked, or stored.

Use Cases

The risk of exposing sensitive data is especially common in situations where these transactional communication services are handling user-generated content like messages between agents and users, or peer-to-peer. Here are a few examples:

Example 1

You're building a grocery delivery application like Instacart. The application allows Shoppers and Customers to send and receive text messages with each other, powered by Twilio. The Customer sends the Shopper a picture of their Driver's License since they won't be home for the delivery, even though their Driver's License needs to be verified in person. Now this image with sensitive PII is processed by Twilio, stored in your application's object store, and viewable by the Shopper and support agents.

Example 2

You're building an application for job seekers to connect with small business owners like restaurants that are hiring, and they can exchange messages over text, powered by Twilio. A malicious user signs up to impersonate a restaurant owner and uses this mechanism to collect PII from job seekers such as their SSN. Now this PII is transmitted by Twilio and is accessible by the attacker.

With the Nightfall API, you can scan transactional communications for sensitive data and remediate them accordingly. In this post, weโ€™ll describe the pattern behind how to use Nightfall to scan for sensitive data in outgoing emails.

Standard Pattern for Sending Transactional Email/SMS

The typical pattern for using transactional communication services like SendGrid is as follows:

  1. Get an API key and set environment variables

  2. Initialize the SDK client (e.g. SendGrid Python client), or use the API directly to construct a request

  3. Construct your outgoing message, which includes information like the subject, body, recipients, etc.

  4. Use the SendGrid API or SDK client to send the message

Let's look at a simple example in Python. Note how easy it is to send sensitive data, in this case a credit card number.

import os
from twilio.rest import Client

account_sid = os.environ['TWILIO_ACCOUNT_SID']
auth_token = os.environ['TWILIO_AUTH_TOKEN']
client = Client(account_sid, auth_token)

message_body = "4916-6734-7572-5015 is my credit card number"

message = client.messages.create(
                     body=message_body,
                     from_='+15017122661',
                     to='+15558675310'
                 )

print(message.sid)

Adding Sensitive Data Classification/Protection to the Pattern

It is straightforward to update this pattern to use Nightfall to check for sensitive findings and ensure sensitive data isnโ€™t sent out. Hereโ€™s how:

Step 1. Setup

Get API keys for both communication service (โ€œCSโ€) and Nightfall (โ€œNFโ€), and set environment variables. Learn more about creating a Nightfall API key here.

Step 2. Configure Detection

NF: Create a pre-configured detection rule in the Nightfall dashboard or inline detection rule with Nightfall API or SDK client.

๐Ÿ“˜Consider using Redaction

Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.

Step 3. Classify and/or Redact

NF: Send your outgoing message text (and any other metadata like the subject line, etc.) in a request payload to the Nightfall API text scan endpoint. For example, if you are interested in scanning the subject and body of an outgoing email, you can send these both in the input array payload: [ body, subject ]

# Nightfall API Response
 
{
  "findings": [
    [
      {
        "finding": "458-02-6124",
       "redactedFinding": "***-**-****",
        "detector": {
          "name": "US Social Security Number",
          "uuid": "e30d9a87-f6c7-46b9-a8f4-16547901e069"
        },
        "confidence": "VERY_LIKELY",
        "location": {
          "byteRange": {
            "start": 39,
            "end": 50
          },
          "codepointRange": {
            "start": 39,
            "end": 50
          }
        },
        "redactedLocation": {
          "byteRange": {
            "start": 39,
            "end": 50
          },
          "codepointRange": {
            "start": 39,
            "end": 50
          }
        },
        "matchedDetectionRuleUUIDs": [],
        "matchedDetectionRules": [
          "My Match Rule"
        ]
      }
    ]
  ],
  "redactedPayload": [
    "Thanks for getting back to me. My SSN is ***-**-****."
  ]
}

Step 4. Handle Response and Send

  • Review the response to see if Nightfall has returned sensitive findings:

    • If there are sensitive findings:

      • You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically

      • Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.

    • If no sensitive findings or you chose to redact findings with a redaction config:

      • Initialize the SDK client (e.g. SendGrid Python client), or use the API directly to construct a request

      • Construct your outgoing message, which includes information like the subject, body, recipients, etc.

        • If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you

      • Use the SendGrid API or SDK client to send the sanitized message

Python Example

Let's take a look at what this would look like in a Python example using the Twilio and Nightfall Python SDKs in just 12 lines of code (with comments and formatting added for clarity):

import os
from twilio.rest import Client
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

# Initialize the Twilio and Nightfall clients
account_sid = os.environ['TWILIO_ACCOUNT_SID']
auth_token = os.environ['TWILIO_AUTH_TOKEN']
client = Client(account_sid, auth_token)

nightfall = Nightfall() # By default Nightfall will read the NIGHTFALL_API_KEY environment variable

# The message you intend to send
payload = [ "4916-6734-7572-5015 is my credit card number" ]

# Send the message to Nightfall to scan it for sensitive data
# Nightfall returns the sensitive findings, and a copy of your input payload with sensitive data redacted
findings, redacted_payload = nightfall.scan_text(
				        payload,
				        [ DetectionRule([ # Define an inline detection rule that looks for Likely Credit Card Numbers
			        		Detector(
			        			min_confidence=Confidence.LIKELY,
		               			nightfall_detector="CREDIT_CARD_NUMBER",
		               			display_name="Credit Card Number",
			               		redaction_config=RedactionConfig(
									remove_finding=False, 
									substitution_phrase="[Redacted]")
			               	)])
				        ])

# If the message has sensitive data, use the redacted version, otherwise use the original message
if redacted_payload[0]:
	message_body = redacted_payload[0]
else:
	message_body = payload[0]

print(message_body)

# Send the sanitized message body via Twilio
message = client.messages.create(
                     body=message_body,
                     from_='+15017122661',
                     to='+15558675310'
                 )

print(message.sid)

You'll see that the message we originally intended to send had sensitive data: 4916-6734-7572-5015 is my credit card number

And the message we ultimately sent was redacted!

[Redacted] is my credit card number

Services like Twilio and SendGrid also support inbound communications from end-users, typically via webhook. Meaning inbound messages will be sent to a webhook handler that you specify. Hence, you would insert the above logic in your webhook handler upon receipt of an event payload.

Now that you understand the pattern, give it a shot!

Deploy a File Scanner for Sensitive Data in 40 Lines of Code

The service ingests a local file, scans it for sensitive data with Nightfall, and displays the results in a simple table UI.

We'll deploy the server on Render (a PaaS Heroku alternative) so that you can serve your application publicly in production instead of running it off your local machine. You'll build familiarity with the following tools and frameworks: Python, Flask, Nightfall, Ngrok, Jinja, Render.

Key Concepts

In a nutshell, file scanning is done asynchronously by Nightfall; after you upload a file to Nightfall and trigger the scan, we perform the scan in the background. When the scan completes, Nightfall delivers the results to you by making a request to your webhook server. This asynchronous behavior allows Nightfall to scan files of varying sizes and complexities without requiring you to hold open a long synchronous request, or continuously poll for updates. The impact of this pattern is that you need a webhook endpoint that can receive inbound notifications from Nightfall when scans are completed - that's what we are building in this tutorial.

Getting Started

Setting Up Dependencies

nightfall
Flask
Gunicorn

Then run pip install -r requirements.txt to do the installation.

Configuring Detection with Nightfall

These values are unique to your account and should be kept safe. This means that we will store them as environment variables and should not store them directly in code or commit them into version control. If these values are ever leaked, be sure to visit the Nightfall Dashboard to re-generate new values for these secrets.

export NIGHTFALL_API_KEY=<your_key_here>
export NIGHTFALL_SIGNING_SECRET=<your_secret_here>

Setting Up Our Server

Let's start writing our Flask server. Create a file called app.py. We'll start by importing our dependencies and initializing the Flask and Nightfall clients:

import os
from flask import Flask, request, render_template
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from datetime import datetime, timedelta
import urllib.request, urllib.parse, json

app = Flask(__name__)

nightfall = Nightfall(
	key=os.getenv('NIGHTFALL_API_KEY'),
	signing_secret=os.getenv('NIGHTFALL_SIGNING_SECRET')
)

Next, we'll add our first route, which will display "Hello World" when the client navigates to /ping simply as a way to validate things are working:

@app.route("/ping")
def ping():
	return "Hello World", 200

Run gunicorn app:app on the command line to fire up your server, and navigate to your local server in your web browser. You'll see where the web browser is hosted in the Gunicorn logs, typically it will be 127.0.0.1:8000 aka localhost:8000.

[2021-11-26 14:22:53 -0800] [61196] [INFO] Starting gunicorn 20.1.0
[2021-11-26 14:22:53 -0800] [61196] [INFO] Listening at: http://127.0.0.1:8000 (61196)
[2021-11-26 14:22:53 -0800] [61196] [INFO] Using worker: sync
[2021-11-26 14:22:53 -0800] [61246] [INFO] Booting worker with pid: 61246

After running this command, ngrok will create a tunnel on the public internet that redirects traffic from their site to your local machine. Copy the HTTPS tunnel endpoint that ngrok has created: we can use this as the webhook URL when we trigger a file scan.

Account                       Nightfall Example
Version                       2.3.40
Region                        United States (us)
Web Interface                 http://127.0.0.1:4040
Forwarding                    http://3ecedafba368.ngrok.io -> http://localhost:8000
Forwarding                    https://3ecedafba368.ngrok.io -> http://localhost:8000

Let's set this HTTPS endpoint as a local environment variable so we can reference it later:

export NIGHTFALL_SERVER_URL=https://3ecedafba368.ngrok.io

Tip: With a Pro ngrok account, you can create a subdomain so that your tunnel URL is consistent, instead of randomly generated each time you start the tunnel.

Handling an Inbound Webhook

Before you send a file scan request to Nightfall, let's add logic for our incoming webhook endpoint, so that when Nightfall finishes scanning a file, it can successfully send the sensitive findings to us.

First, what does it mean to have findings? If a file has findings, this means that Nightfall identified sensitive data in the file that matched the detection rules you configured. For example, if you told Nightfall to look for credit card numbers, any substring from the request payload that matched our credit card detector would constitute sensitive findings.

We'll host our incoming webhook at /ingest with a POST method.

Nightfall will POST to the webhook endpoint, and in the inbound payload, Nightfall will indicate if there are sensitive findings in the file, and provide a link where we can access the sensitive findings as JSON.

# respond to POST requests at /ingest
# Nightfall will send requests to this webhook endpoint with file scan results
@app.route("/ingest", methods=['POST'])
def ingest():
	data = request.get_json(silent=True)
	# validate webhook URL with challenge response
	challenge = data.get("challenge") 
	if challenge:
		return challenge
	# challenge was passed, now validate the webhook payload
	else: 
		# get details of the inbound webhook request for validation
		request_signature = request.headers.get('X-Nightfall-Signature')
		request_timestamp = request.headers.get('X-Nightfall-Timestamp')
		request_data = request.get_data(as_text=True)

		if nightfall.validate_webhook(request_signature, request_timestamp, request_data):
			# check if any sensitive findings were found in the file, return if not
			if not data["findingsPresent"]: 
				print("No sensitive data present!")
				return "", 200

			# there are sensitive findings in the file
			# URL escape the temporary signed S3 URL where findings are available for download
			escaped_url = urllib.parse.quote(data['findingsURL'])
			# print the download URL and the URL where we can view the results in our web app
			print(f"Sensitive data present. Findings available until {data['validUntil']}.\n\nDownload:\n{data['findingsURL']}\n\nView:\n{request.url_root}view?findings_url={escaped_url}\n")
			return "", 200
		else:
			return "Invalid webhook", 500

Restart your server so the changes propagate. We'll take a look at the console output of our webhook endpoint and explain what it means in the next section.

Scan a File

import os
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

nightfall = Nightfall() # reads API key from NIGHTFALL_API_KEY environment variable by default

filepath = "sample-pci-xs.csv" # sample file with sensitive data
webhook_url = f"{os.getenv('NIGHTFALL_SERVER_URL')}/ingest"

Next, we will initiate the scan request to Nightfall, by specifying our filepath, webhook URL where the scan results should be posted, and our Detection Rule that specifies what sensitive data we are looking for.

scan_id, message = nightfall.scan_file(filepath, 
	webhook_url=webhook_url,
	detection_rules=[ DetectionRule([ 
		Detector(
			min_confidence=Confidence.LIKELY,
   			nightfall_detector="CREDIT_CARD_NUMBER",
   			display_name="Credit Card Number"
       	)])
	])

print(scan_id, message)

The scan_id is useful for identifying your scan results later.

View Sensitive Findings

Let's run scan.py to trigger our file scan job.

Once Nightfall has finished scanning the file, we'll see our Flask server receive the request at our webhook endpoint (/ingest). In our code above, we parse the webhook payload, and print the following when there are sensitive findings:

Sensitive data present. Findings available until 2021-11-28T00:29:00.479700877Z.

Download:
https://files.nightfall.ai/d2160270-6b07-4304-b1ee-e7b98498be82.json?Expires=1638059340&Signature=AjSdNGlXWGXO0QGSi-lOoDBtbhJdLPE7IWXA7IaBCfLr~3X2IcZ1vavHF5iaEDaoZ-3etnZA4Nu8K8Dq8Kd81ShuX6Ze1o87mzb~8lD6WBk8hXShgW-TPBPpLMoBx2sA9TnefTqy94gI4ykt4tt1MttB67Cj69Miw-46cpFkgY9tannNPOF-90b3vlcS44PwqDUGrtTpQiN6WdsTT6LbpN1N92KbPJIRj3PkGwQW7VvpfM8L4wKmyVmVnRO3ixaW-mXXiOWk9rmfHP9UFMYnk99yaGHp4dZ1JfJiClci~Z8dBx288CrvXVjGUCXBJbdlwo6UrKQJCEk9i9vSbCpI2Q__&Key-Pair-Id=K24YOPZ1EKX0YC

View:
https://d3vwatchtower.ngrok.io/ingest/view?findings_url=https%3A//files.nightfall.ai/d2160270-6b07-4304-b1ee-e7b98498be82.json%3FExpires%3D1638059340%26Signature%3DAjSdNGlXWGXO0QGSi-lOoDBtbhJdLPE7IWXA7IaBCfLr~3X2IcZ1vavHF5iaEDaoZ-3etnZA4Nu8K8Dq8Kd81ShuX6Ze1o87mzb~8lD6WBk8hXShgW-TPBPpLMoBx2sA9TnefTqy94gI4ykt4tt1MttB67Cj69Miw-46cpFkgY9tannNPOF-90b3vlcS44PwqDUGrtTpQiN6WdsTT6LbpN1N92KbPJIRj3PkGwQW7VvpfM8L4wKmyVmVnRO3ixaW-mXXiOWk9rmfHP9UFMYnk99yaGHp4dZ1JfJiClci~Z8dBx288CrvXVjGUCXBJbdlwo6UrKQJCEk9i9vSbCpI2Q__%26Key-Pair-Id%3DK24YOPZ1EKX0YC

In our output, we are printing two URLs.

The first URL is provided to us by Nightfall. It is the temporary signed S3 URL that we can access to fetch the sensitive findings that Nightfall detected.

The second URL won't work yet, we'll implement it next. This URL a we constructed in our ingest() method above - the URL calls /view and passes the Findings URL above as a URL-escaped query parameter.

Let's add a method to our Flask server that opens this URL and displays the findings in a formatted table so that the results are easier to view than downloading them as JSON.

Add the following to our Flask server in app.py:

# respond to GET requests at /view
# Users can access this page to view their file scan results in a table
@app.route("/view")
def view():
	# get the findings URL from the query parameters
	findings_url = request.args.get('findings_url')
	if findings_url:
		# download the findings from the findings URL and parse them as JSON
		with urllib.request.urlopen(findings_url) as url:
			data = json.loads(url.read().decode())
			# render the view.html template and provide the findings object to display in the template
			return render_template('view.html', findings=data['findings'])

Create the Table View

To display the findings in an HTML table, we'll create a new Flask template. Create a folder in your project directory called templates and add a new file within it called view.html.

Our template uses Jinja to iterate through our findings, and create a table row for each sensitive finding.

<!DOCTYPE HTML>
<html>
<head>
    <title>File Scan Viewer</title>
    <style>
    	table, th, td {
		  border: 1px solid black;
		}
		table {
			width: 100%;
		}
	</style>
</head>

<body>
	<table>
		<thead>
			<tr>
				<th>Detector</th>
				<th>beforeContext</th>
				<th>Finding</th>
				<th>afterContext</th>
				<th>byteRangeStart</th>
				<th>byteRangeEnd</th>
				<th>Confidence</th>
			</tr>
		</thead>

		<tbody>
			{% for finding in findings %}
				<tr>
					<td>{{ finding['detector']['name'] }}</td>
					<td>{{ finding['beforeContext'] }}</td>
					<td>{{ finding['finding'] }}</td>
					<td>{{ finding['afterContext'] }}</td>
					<td>{{ finding['location']['byteRange']['start'] }}</td>
					<td>{{ finding['location']['byteRange']['start'] }}</td>
					<td>{{ finding['confidence'] }}</td>
				</tr>
			{% endfor %}
		</tbody>
	</table>

</body>
</html>

Now, if we restart our Flask server, trigger a file scan request, and navigate to the "View" URL printed in the server logs, we should see a formatted table with our results! In fact, we can input any Nightfall-provided signed S3 URL (after URL-escaping it) in the findings_url parameter of the /view route to view it.

Deploy on Render

As a longtime Heroku user, I was initially inclined to write this tutorial with instructions to deploy our app on Heroku. However, new PaaS vendors have been emerging and I was curious to try them out and see how they compare to Heroku. One such vendor is Render, which is where we'll deploy our app.

Deploying our service on Render is straightforward. If you're familiar with Heroku, the process is quite similar. Once you've signed up or logged into Render (free), we'll do the following:

  1. Create a new Web Service on Render, and permit Render to access your new repo.

  2. Use the following values during creation:

  • Environment: Python

  • Build Command: pip install -r requirements.txt

  • Start Command: gunicorn app:app

Let's also set our environment variables during creation. These are the same values we set locally.

NIGHTFALL_API_KEY
NIGHTFALL_SIGNING_SECRET

Scan a file (in production)

Once Render has finished deploying, you'll get the base URL of your application. Set this as your NIGHTFALL_SERVER_URL locally and re-run scan.py - this time, the file scan request is served by your production Flask server running on Render!

export NIGHTFALL_SERVER_URL=https://your-app-url.onrender.com
python3 scan.py

To confirm this, navigate to the Logs tab in your Render app console, you'll see the webhook's output of your file scan results:

Nov 26 04:29:06 PM  Sensitive data present. Findings available until 2021-11-28T00:28:24.564972786Z.
Nov 26 04:29:06 PM  
Nov 26 04:29:06 PM  Download:
Nov 26 04:29:06 PM  https://files.nightfall.ai/d6b6ee4f-d1a8-4fb6-b35a-cb6f88d58083.json?Expires=1638059304&Signature=hz1TN5UXjCGTxCxq~jT2wfuUWlj9Se-mWNL1K-tJhiAIXUg1FxJrCVP2iH1I4TNymFBuOnj5TTiLGpD8tZAKGm9J0lTHncZkaeaU8KZQ2j-~8qYQVlunNj019sqtTkMbVRfakzYzW-qWHEvLXN-PFcGYX05g3LZHvW802-lAVlM-WpGApw2u8BnzoY1pdWAxpJ0VIN1Zax4UuVeQBKieR7k8H9v9HdYYJlVGkVA5F9EzklLy99fyD8r4WR~jfqN5Fr1KceDtsxffC6MPuZ8nIIdSG5~tVtjCjgIjyh3IePPW1Wq-E8yZiVAhpDDbYX1wngUTwlAu~MU7N39vd8mlYQ__&Key-Pair-Id=K24YOPZ1EKX0YC
Nov 26 04:29:06 PM  
Nov 26 04:29:06 PM  View:
Nov 26 04:29:06 PM  https://flask-file-scanner-example.onrender.com/view?findings_url=https%3A//files.nightfall.ai/d6b6ee4f-d1a8-4fb6-b35a-cb6f88d58083.json%3FExpires%3D1638059304%26Signature%3Dhz1TN5UXjCGTxCxq~jT2wfuUWlj9Se-mWNL1K-tJhiAIXUg1FxJrCVP2iH1I4TNymFBuOnj5TTiLGpD8tZAKGm9J0lTHncZkaeaU8KZQ2j-~8qYQVlunNj019sqtTkMbVRfakzYzW-qWHEvLXN-PFcGYX05g3LZHvW802-lAVlM-WpGApw2u8BnzoY1pdWAxpJ0VIN1Zax4UuVeQBKieR7k8H9v9HdYYJlVGkVA5F9EzklLy99fyD8r4WR~jfqN5Fr1KceDtsxffC6MPuZ8nIIdSG5~tVtjCjgIjyh3IePPW1Wq-E8yZiVAhpDDbYX1wngUTwlAu~MU7N39vd8mlYQ__%26Key-Pair-Id%3DK24YOPZ1EKX0YC

Navigate to the View link above in your browser to verify that you can see the results formatted in a table on your production site.

Congrats, you've successfully created a file scanning server and deployed it in production! You're now ready to build more advanced business logic around your file scanner. Here are some ideas on how to extend this tutorial:

  • Use WebSockets to send a notification back from the webhook to the client that initiated the file scan request

  • Build a more advanced detection rule using pre-built or custom detectors

  • Add a user interface to add more interactive capabilities, for example allowing users to upload files or read files from URLs

Building Endpoint DLP to Detect PII on Your Machine in Real-Time

  • Python

  • Flask

  • Nightfall

  • Ngrok

  • Watchdog

Key Concepts

In a nutshell, file scanning is done asynchronously by Nightfall; after you upload a file to Nightfall and trigger the scan, we perform the scan in the background. When the scan completes, Nightfall delivers the results to you by requesting your webhook server. This asynchronous behavior allows Nightfall to scan files of varying sizes and complexities without requiring you to hold open a long synchronous request, or continuously poll for updates. The impact of this pattern is that you need a webhook endpoint that can receive inbound notifications from Nightfall when scans are completed - that's one of the two services we are building in this tutorial.

Getting Started

Setting Up Dependencies

nightfall
Flask
Gunicorn
watchdog

Then run pip install -r requirements.txt to do the installation.

Configuring Detection with Nightfall

These values are unique to your account and should be kept safe. This means that we will store them as environment variables and should not store them directly in code or commit them into version control. If these values are ever leaked, be sure to visit the Nightfall Dashboard to re-generate new values for these secrets.

export NIGHTFALL_API_KEY=<your_key_here>
export NIGHTFALL_SIGNING_SECRET=<your_secret_here>

Monitoring File System Events

Watchdog is a Python module that watches for file system events. Create a file called scanner.py. We'll start by importing our dependencies and setting up a basic event handler. This event handler responds to file change events for file paths that match a given set of regular expressions (regexes). In this case, the .* indicates we are matching on any file path - we'll customize this a bit later. When a file system event is triggered, we'll print a line to the console.

import os
import time
from watchdog.observers import Observer
from watchdog.events import RegexMatchingEventHandler
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

class MyHandler(RegexMatchingEventHandler):
    # event handler callback that is called when a file is modified (created or changed)
    def on_modified(self, event):
        print(f'Event type: {event.event_type} | Path: {event.src_path}')

if __name__ == "__main__":
    regexes = [ ".*" ]

    # register event handler to monitor file paths that match our regex
    event_handler = MyHandler(regexes)
    observer = Observer()
    observer.schedule(event_handler,  path='',  recursive=True)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

Run python scanner.py and you'll notice lots of lines getting printed to the console. These are all the files that are getting created and changed on your machine in real-time. You'll notice that your operating system and the apps you're running are constantly writing, modifying, and deleting files on disk!

Event type: modified | Path: /Users/myuser/Library/Caches
Event type: modified | Path: /Users/myuser/Library/Caches/com.apple.nsservicescache.plist
Event type: modified | Path: /Users/myuser/Library/Caches
Event type: modified | Path: /Users/myuser/Library/Caches/Google/Chrome/Default/Cache
Event type: modified | Path: /private/tmp
Event type: modified | Path: /Users/myuser/Library/Preferences/ContextStoreAgent.plist
Event type: modified | Path: /private/tmp
Event type: modified | Path: /Users/myuser/Library/Assistant
Event type: modified | Path: /Users/myuser/Library/Assistant/SyncSnapshot.plist
...

Next, we'll update our event handler so that instead of simply printing to the console, we are sending the file to Nightfall to be scanned. We will initiate the scan request to Nightfall, by specifying the file path of the changed/created file, a webhook URL where the scan results should be sent, and our Detection Rule that specifies what sensitive data we are looking for. If the file scan is initiated successfully, we'll print the corresponding Upload ID that Nightfall provides us to the console. This ID will be useful later when identifying scan results.

Here's our complete scanner.py, explained further below:

import os
import time
from watchdog.observers import Observer
from watchdog.events import RegexMatchingEventHandler
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

class MyHandler(RegexMatchingEventHandler):
    def scan_file(self, filepath):
        nightfall = Nightfall() # reads API key from NIGHTFALL_API_KEY environment variable by default
        webhook_url = f"{os.getenv('NIGHTFALL_SERVER_URL')}/ingest" # webhook server we'll create

        try:
            scan_id, message = nightfall.scan_file(
                filepath, 
                webhook_url=webhook_url,
                # detection rule to detect credit card numbers, SSNs, and API keys
                detection_rules=[ DetectionRule([ 
                    Detector(
                        min_confidence=Confidence.LIKELY,
                        nightfall_detector="CREDIT_CARD_NUMBER",
                        display_name="Credit Card Number"),
                    Detector(
                        min_confidence=Confidence.LIKELY,
                        nightfall_detector="US_SOCIAL_SECURITY_NUMBER",
                        display_name="US Social Security Number"),
                    Detector(
                        min_confidence=Confidence.LIKELY,
                        nightfall_detector="API_KEY",
                        display_name="API Key")
                    ])
                ])
            return scan_id, message
        except Exception as err:
            print(f"Error processing {filepath} | {err}")
            return None, None

    def on_modified(self, event):
        # scan file with Nightfall
        scan_id, message = self.scan_file(event.src_path)
        if scan_id:
            print(f"Scan initiated | Path {event.src_path} | UploadID {scan_id}")
        print(f'Event type: {event.event_type} | Path: {event.src_path}')

if __name__ == "__main__":
    regexes = [ ".*/Downloads/.*", ".*/Desktop/.*", ".*/Documents/.*" ]

    # register event handler to monitor file paths that match our regexes
    event_handler = MyHandler(regexes)
    observer = Observer()
    observer.schedule(event_handler,  path='',  recursive=True)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

We can't run this just yet, since we need to set our webhook URL, which is currently reading from an environment variable that we haven't set yet. We'll create our webhook server and set the webhook URL in the next set of steps.

Also note that we've updated our regex from .* to a set of file paths on Macs that commonly contain user generated files - the Desktop, Documents, and Downloads folders:

regexes = [ ".*/Downloads/.*", ".*/Desktop/.*", ".*/Documents/.*" ]

You can customize these regexes to whatever file paths are of interest to you. Another option is to write a catch-all regex that ignores/excludes paths to config and temp files:

regexes = [ "(?!/opt/|.*/Library/|.*/private/|/System/|/Applications/|/usr/).*" ]

Setting Up Webhook Server

Next, we'll set up our Flask webhook server, so we can receive file scanning results from Nightfall. Create a file called app.py. We'll start by importing our dependencies and initializing the Flask and Nightfall clients:

import os
from flask import Flask, request, render_template
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from datetime import datetime, timedelta
import urllib.request, urllib.parse, json
import csv

app = Flask(__name__)

nightfall = Nightfall(
	key=os.getenv('NIGHTFALL_API_KEY'),
	signing_secret=os.getenv('NIGHTFALL_SIGNING_SECRET')
)

Next, we'll add our first route, which will display "Hello World" when the client navigates to /ping simply as a way to validate things are working:

@app.route("/ping")
def ping():
	return "Hello World", 200

In a second command line window, run gunicorn app:app on the command line to fire up your server, and navigate to your local server in your web browser. You'll see where the web browser is hosted in the Gunicorn logs, typically it will be 127.0.0.1:8000 aka localhost:8000.

[2021-11-26 14:22:53 -0800] [61196] [INFO] Starting gunicorn 20.1.0
[2021-11-26 14:22:53 -0800] [61196] [INFO] Listening at: http://127.0.0.1:8000 (61196)
[2021-11-26 14:22:53 -0800] [61196] [INFO] Using worker: sync
[2021-11-26 14:22:53 -0800] [61246] [INFO] Booting worker with pid: 61246
./ngrok http 8000

After running this command, ngrok will create a tunnel on the public internet that redirects traffic from their site to your local machine. Copy the HTTPS tunnel endpoint that ngrok has created: we can use this as the webhook URL when we trigger a file scan.

Account                       Nightfall Example
Version                       2.3.40
Region                        United States (us)
Web Interface                 http://127.0.0.1:4040
Forwarding                    http://3ecedafba368.ngrok.io -> http://localhost:8000
Forwarding                    https://3ecedafba368.ngrok.io -> http://localhost:8000

Let's set this HTTPS endpoint as a local environment variable so we can reference it later:

export NIGHTFALL_SERVER_URL=https://3ecedafba368.ngrok.io

With a Pro ngrok account, you can create a subdomain so that your tunnel URL is consistent, instead of randomly generated each time you start the tunnel.

Handling Inbound Webhooks

Before we send a file scan request to Nightfall, let's implement our incoming webhook endpoint, so that when Nightfall finishes scanning a file, it can successfully send the sensitive findings to us.

First, what does it mean to have findings? If a file has findings, this means that Nightfall identified sensitive data in the file that matched the detection rules you configured. For example, if you told Nightfall to look for credit card numbers, any substring from the request payload that matched our credit card detector would constitute sensitive findings.

We'll host our incoming webhook at /ingest with a POST method.

Nightfall will POST to the webhook endpoint, and in the inbound payload, Nightfall will indicate if there are sensitive findings in the file, and provide a link where we can access the sensitive findings as JSON.

We'll validate the inbound webhook from Nightfall, retrieve the JSON findings from the link provided, and write the findings to a CSV file. First, let's initialize our CSV file where we will write results, and add our /ingest POST method.

# create CSV where sensitive findings will be written
headers = ["upload_id", "#", "datetime", "before_context", "finding", "after_context", "detector", "confidence", "loc", "detection_rules"]
with open(f"results.csv", 'a') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(headers)

# respond to POST requests at /ingest
# Nightfall will send requests to this webhook endpoint with file scan results
@app.route("/ingest", methods=['POST'])
def ingest():
    data = request.get_json(silent=True)
    # validate webhook URL with challenge response
    challenge = data.get("challenge") 
    if challenge:
        return challenge
    # challenge was passed, now validate the webhook payload
    else: 
        # get details of the inbound webhook request for validation
        request_signature = request.headers.get('X-Nightfall-Signature')
        request_timestamp = request.headers.get('X-Nightfall-Timestamp')
        request_data = request.get_data(as_text=True)

        if nightfall.validate_webhook(request_signature, request_timestamp, request_data):
            # check if any sensitive findings were found in the file, return if not
            if not data["findingsPresent"]: 
                print("No sensitive data present!")
                return "", 200

            # there are sensitive findings in the file
            output_results(data)
            return "", 200
        else:
            return "Invalid webhook", 500

You'll notice that when there are sensitive findings, we call the output_results() method. Let's write that next. In output_results(), we are going to parse the findings and write them as rows into our CSV file.

def output_results(data):
	findings_url = data['findingsURL']
	# open findings URL provided by Nightfall to access findings
	with urllib.request.urlopen(findings_url) as url:
		findings = json.loads(url.read().decode())
		findings = findings['findings']

	print(f"Sensitive data found, outputting {len(findings)} finding(s) to CSV | UploadID {data['uploadID']}")
	table = []
	# loop through findings JSON, get relevant finding metadata, write each finding as a row into output CSV
	for i, finding in enumerate(findings):
		row = [
			data['uploadID'],
			i+1,
			datetime.now(),
			repr(finding['beforeContext']), 
			repr(finding['finding']),
			repr(finding['afterContext']),
			finding['detector']['name'],
			finding['confidence'],
			finding['location']['byteRange'],
			finding['matchedDetectionRules']
		]
		table.append(row)
		with open(f"results.csv", 'a') as csvfile:
			writer = csv.writer(csvfile)
			writer.writerow(row)
	return

Restart your server so the changes propagate. We'll take a look at the console and CSV output of our webhook endpoint in the next section.

Scan Changed Files in Real-Time

In our previous command line window, we can now turn our attention back to scanner.py. We now have our webhook URL so let's set it here as well and run our scanner.

export NIGHTFALL_SERVER_URL=https://3ecedafba368.ngrok.io
python scanner.py
curl https://raw.githubusercontent.com/nightfallai/dlp-sample-data/main/sample-pci.csv > ~/Downloads/sample-pci.csv

You'll see the following console output from scanner.py:

Event type: modified | Path: /Users/myuser/Downloads/sample-pci.csv
Scan initiated | Path /Users/myuser/Downloads/sample-pci.csv | UploadID c23fdde2-5e98-4183-90b0-31e2cdd20ac0

And the following console output from our webhook server:

Sensitive data found, outputting 10 finding(s) to CSV | UploadID ac6a4a9d-a7b9-4a78-810d-8a66f7644704

And the following sensitive findings written to results.csv:

upload_id,#,datetime,before_context,finding,after_context,detector,confidence,loc,detection_rules
ac6a4a9d-a7b9-4a78-810d-8a66f7644704,1,2021-12-04 22:12:21.039602,'Name\tCredit Card\nRep. Viviana Hintz\t','5433-9502-3725-7862','\nEloisa Champlin\t3457-389808-83234\nOmega',Credit Card Number,VERY_LIKELY,"{'start': 36, 'end': 55}",[]
...

Each row in the output CSV will correspond to a sensitive finding. Each row will have the following fields, which you can customize in app.py: the upload ID provided by Nightfall, an incrementing index, timestamp, characters before the sensitive finding (for context), the sensitive finding itself, characters after the sensitive finding (for context), the confidence level of the detection, the byte range location (character indicies) of the sensitive finding in its parent file, and the corresponding detection rules that flagged the sensitive finding.

Note that you may also see events for system files like .DS_Store or errors corresponding to failed attempts to scan temporary versions of files. This is because doing things like downloading a file can trigger multiple file modification events. As an extension to this tutorial, you could consider filtering those out further, though they shouldn't impact our ability to scan files of interest.

If we leave these services running, we'll continue to monitor files for sensitive data and appending to our results CSV when sensitive findings are discovered!

Running Endpoint DLP in the Background

We can run both of our services in the background nohup so that we don't need to leave two command line tabs open indefinitely. We'll pipe console output to log files so that we can always reference the application's output or determine if the services crashed for any reason.

nohup python -u scanner.py > scanner.log &
nohup gunicorn app:app > server.log &

This will return the corresponding process IDs - we can always check on these later with the ps command.

Next Steps

This post is simply of a proof of concept version of endpoint DLP. Building a production-grade endpoint DLP application will have additional complexity and functionality. However, the detection engine is one of the biggest components of an endpoint DLP system, and this example should give you a sense of how easy it is to integrate with Nightfall's APIs and the power of Nightfall's detection engine.

Here are few ideas on how you can extend upon this service further:

  • Run the scanner on EC2 machines to scan your production machines in real-time

  • Respond to more system events like I/O of USB drives and external ports

  • Implement remediation actions like end-user notifications or file deletion

  • Redact the sensitive findings prior to writing them to the results file

  • Store the results in the cloud for central reporting

  • Package in an executable so the application can be run easily

  • Scan all files on disk on the first boot of the application

Before we get started on our implementation, start by familiarizing yourself with with Nightfall, so you're acquainted with the flow we are implementing.

You can fork the sample repo and view the complete code , or follow along below. If you're starting from scratch, create a new GitHub repository.

First, let's start by installing our dependencies. We'll be using Nightfall for data classification, the web framework in Python, and as our web server. Create requirements.txt and add the following to the file:

Next, we'll need our Nightfall API Key and Webhook Signing Secret; the former authenticates us to the Nightfall API, while the latter authenticates that incoming webhooks are originating from Nightfall. You can retrieve your API Key and Webhook Signing Secret from the Nightfall . Complete the Nightfall Quickstart for a more detailed walk-through. for a free Nightfall account if you don't have one.

To expose our local webhook server via a public tunnel that Nightfall can send requests to, we'll use ngrok. Download and install ngrok via their quickstart documentation . We'll create an ngrok tunnel as follows:

Now, we want to trigger a file scan request, so that Nightfall will scan the file and send a POST request to our /ingest webhook endpoint when the scan is complete. We'll write a simple script that sends a file to Nightfall to scan it for . Create a new file called scan.py.

First, we'll establish our dependencies, initialize the Nightfall client, and specify the filepath to the file we wish to scan as well as the webhook endpoint we created above. The filepath is a relative path to any file, in this case we are scanning the sample-pci-xs.csv file which is in the same directory as scan.py. This is a sample CSV file with 10 credit card numbers in it - you can download it in the tutorial's GitHub .

In this simple example, we have specified an inline Detection Rule that detects Likely Credit Card Numbers. This Detection Rule is a simple starting point that just scratches the surface of the types of detection you can build with Nightfall. Learn more about building inline detection rules here or how to configure them in the Nightfall .

We'll do this by adding a view method that responds to GET requests to the /view route. The /view route will read the URL to the S3 Findings URL via a query parameter. It will then open the findings URL, parse it as JSON, pass the results to an HTML template, and display the results in a simple HTML table using . Jinja is a simple templating engine in Python.

Endpoint data loss prevention (DLP) discovers, classifies, and protects sensitive data - like PII, credit card numbers, and secrets - that proliferates onto endpoint devices, like your computer or EC2 machines. This is a way to help keep data safe, so that you can detect and stop occurrences of data exfiltration. Our endpoint DLP application will be composed of two core services that will run locally. The first service will monitor for file system events using the package in Python. When a file system event is triggered, such as when a file is created or modified, the service will send the file to to be scanned for sensitive data. The second service is a webhook server that will receive scan results from Nightfall, parse the sensitive findings, and write them to a CSV file as output. You'll build familiarity with the following tools and frameworks:

Before we get started on our implementation, start by familiarizing yourself with with Nightfall, so you're acquainted with the flow we are implementing.

You can fork the sample repo and view the complete code , or follow along below. If you're starting from scratch, create a new GitHub repository. This tutorial was developed on a Mac and assumes that's the endpoint operating system you're running, however, this tutorial should work across operating systems with minor modifications. For example, you may wish to extend this tutorial by running endpoint DLP on an EC2 machine to monitor your production systems.

First, let's start by installing our dependencies. We'll be using Nightfall for data classification, the web framework in Python, for monitoring file system events, and as our web server. Create requirements.txt and add the following to the file:

Next, we'll need our Nightfall API Key and Webhook Signing Secret; the former authenticates us to the Nightfall API, while the latter authenticates that incoming webhooks are originating from Nightfall. You can retrieve your API Key and Webhook Signing Secret from the Nightfall . Complete the Nightfall Quickstart for a more detailed walk-through. for a free Nightfall account if you don't have one.

In this example, we have specified an inline Detection Rule that detects Likely Credit Card Numbers, Social Security Numbers, and API Keys. This Detection Rule is a simple starting point that just scratches the surface of the types of detection you can build with Nightfall. Learn more about building inline detection rules here or how to configure them in the Nightfall .

To expose our local webhook server via a public tunnel that Nightfall can send requests to, we'll use ngrok. Download and install ngrok via their quickstart documentation . We'll create an ngrok tunnel as follows:

To trigger a file scan event, download the following . Assuming it automatically downloads to your Downloads folder, this should immediately trigger a file change event and you'll see console log output! If not, you can also download the file with curl into a location that matches your event handler's regex we set earlier.

how scanning files works
here
Flask
Gunicorn
Dashboard
Sign up
here
credit card numbers
repo
Dashboard
Jinja
Watchdog
Nightfall
how file scanning works
here
Flask
watchdog
Gunicorn
Dashboard
Sign up
Dashboard
here
sample data file
2028
2064
1500
2030
2308

What is the pricing model?

We offer a free tier that allows you to sign up and start using Firewall for AI with zero upfront costs or commitments. This tier provides a generous data scanning capacity and access to all the core features.

Contact our team at sales@nightfall.ai or via the contact form on our website to discuss your specific needs and get a tailored pricing quote.

We offer enterprise pricing plans for advanced requirements such as higher data volumes, custom rate limits, and dedicated support.

More information on pricing...

How does Nightfall support custom data types?

In two ways:

  • Nightfallโ€™s out of the box detectors can be modified with context rules and exclusion rules.

Nightfall also supports inputting custom regular expressions or word lists (i.e. dictionaries) as detectors in the RE2 standard as documented .

here

Contact Us

Schedule a Demo

Email Us

You can schedule a demo or a meeting with our sales/solutions engineering team directly via Calendly . If you don't see a suitable time, please email us at .

For support inquiries, please email us at .

For sales inquiries, please email us at .

here
sales@nightfall.ai
support@nightfall.ai
sales@nightfall.ai
Cover

Getting Started

Learn how to create a Nightfall API key, Detectors, Detection rules, and Polices

Cover

Nightfall APIs

Nightfall Scan and Workflow APIs enable you to integrate DLP protection programatically

Cover

SDKs

Learn how to leverage the Nightfall Software Development Kits (SDKs)

Cover

Language Specific Guides

Learn how to use Nightfall's APIs/SDKs with specific programming languages.

Cover

Integration Tutorials

Learn how to integrate Nightfall into some GenAI apps and datastores.

Cover

Popular Use Cases

Review popular scenarios and apply them to your DLP use case.

Cover

Detection Playground

A code free environament for test driving Firewall for AI.

Update a policy user scope

post

Update a policy user scope, define inclusion/exclusion rule for users using user emails. Only supports Google Drive policies, separates internal or external users based on google domains registered in Nightfall. You can use this endpoint with Nightfall Sensitive Data Protction, Exfiltration Prevention, and Data Security Posture Management.

Authorizations
Path parameters
policyIDstring ยท uuidRequired

The UUID of the policy to update

Body
Responses
200
Successful response (processed immediately)
application/json
400
Invalid request parameters
application/json
401
Authentication failure
application/json
403
Operation prohibited on the policy
application/json
404
Policy not found
application/json
429
Rate Limit Exceeded or Daily Quota Exceeded
application/json
500
Internal Nightfall Error
application/json
post
POST /policy/v1/{policyID}/scope/users HTTP/1.1
Host: api.nightfall.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 136

{
  "add": {
    "include": [
      "name@gmail.com"
    ],
    "exclude": [
      "name@gmail.com"
    ]
  },
  "delete": {
    "include": [
      "name@gmail.com"
    ],
    "exclude": [
      "name@gmail.com"
    ]
  }
}
{
  "includedUsers": [
    "text"
  ],
  "excludedUsers": [
    "text"
  ]
}