Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
At Nightfall, data security and privacy are our top priorities. We have implemented stringent security measures to protect your sensitive data at every stage of the scanning process. All data transmitted to our API is encrypted in transit using industry-standard protocols. We adhere to best practices for secure coding, undergo regular security audits, and maintain compliance with relevant security standards. Visit our security and compliance page at nightfall.ai/security for more details on our commitment to data protection.
The Document will guide you in making your first API request.
This page will get you up and running with the Nightfall API so you can start scanning for sensitive data.
The Nightfall API requires a valid API key to authenticate your API requests.
You can create API keys in the Dashboard.
Learn more about Authentication and Security.
Below is an example request to the scan endpoint.
To run this example yourself, replace the API key (NF-rEpLaCe...
) with the one you created in the dashboard or set it as the environment variable NIGHTFALL_API_KEY
as necessary.
The cURL example may be run from the command line without any additional installation. To run the Python example, you will need to download the corresponding SDK.
The Policy (policy
) you define indicates what to scan for in your payload with a logical grouped (ANY or ALL) set of Detection Rules (detectionRules
).
Detection Rules can be defined two ways:
inline as code, as shown above
in the Nightall app, which you will then reference by UUID.
Learn more about setting up Nightfall in the Nightfall app to create your own Detectors, Detection Rules, and Policies. See Using Pre-Configured Detection Rules for an example as to how to execute queries using an existing Detection Rules UUID.
In the example above, two of Nightfall's native Detectors are being used: US_SOCIAL_SECURITY_NUMBER
and CREDIT_CARD_NUMBER
.
You can find a full list of native Detectors in the Detector Glossary.
If you don't want to create your Detectors, Detection Rules, and Policies in the Nightfall app, but would prefer to do it in code, it is possible to define Detectors inline with your own regular expressions or word list as well as extend our native Detectors with exclusion and context rules.
When defining a Detection Rule, you configure the minimum confidence level (minConfidence
) and minimum number of times the match must be found (minNumFindings
) for the rule to be triggered.
Another feature Nightfall offers is the ability to redact sensitive findings. Detectors may be configured (via redactionConfig
) to replace the text that triggered them with a variety of customizable masks, including an encrypted version of the text.
In the payload body, you can see that we are submitting a list of three different strings to scan (payload
). The first will trigger the U.S. Social Security Detector. The last will trigger the credit card Detector. The middle example will trigger neither.
The Nightfall API returns a response with an array (findings
) with a length that corresponds to the length of the payload array. In this example, only the first and last items in the request payload triggered the Detectors, so the second element of the array is empty.
In the first element of the array, you can see details about which Detection Rule was triggered and the data that was found (finding
). The response also provides a confidence level (confidence
), as well as the location within the original text where the data was found either in terms of bytes (byteRange
) or characters (codepointRange
).
Congratulations! You have successfully completed the Nightfall Quickstart.
You can modify the Detectors or payload in the example request to get more practice with the Nightfall API.
The Nightfall API uses API keys to authenticate requests. You can create and view your API keys in the Nightfall app on the Manage API Keys page.
Your API keys carry many privileges, so be sure to keep them secure. Do not share your secret API keys in publicly accessible areas such as GitHub, client-side code, or anywhere else that would compromise their secrecy. If you believe one of your API Keys has been compromised, you should delete it through the Dashboard.
All API requests must be made over HTTPS.
Calls made over plain HTTP will fail.
API requests without authentication will fail.
The Nightfall Developer Platform offers two different types of subscription plans: Free, and Enterprise. Pricing is based on the uncompressed data volume scanned by Nightfall
Free Plan When you sign up for the Nightfall Developer Platform, you are automatically enrolled in the Free plan, which comes with a set volume limit of 3 GB of data scanned per month.
Enterprise Plans If you are consistently scanning significant data volumes each month, you may want to reach out to sales@nightfall.ai to discuss our Enterprise plans which offer custom pricing and rate limits for DLP APIs and SaaS APIs.
For additional information, please contact our team to discuss your use cases.
Welcome to Nightfall's Firewall for AI Developers Scan and Workflow APIs documentation. This documentation helps developers leverage Nightfall AI's industry-leading detection engine to identify and protect sensitive customer and corporate data anywhere. It prevents unauthorized access and data breaches and allows you to focus on innovation.
Scan prompts, text, documents, spreadsheets, logs, zips, JSON, images, etc., for PII, PHI, PCI, banking information, API keys, passwords, and network information with the highest accuracy and lightning-fast response times. Redact sensitive findings with customizable formatting.
Leverage the full potential of the Nightfall console application through our Workflow APIs. Customize your SIEM workflows and reporting, take actions, update support tickets, alert users, search violations, annotate findings, create reports, and more.
AI-Powered Identification: Utilize advanced AI models to detect and prevent security threats in real-time.
Comprehensive Sensitive Data Detection: Identify PII, PHI, PCI, banking information, API keys, passwords, and network information across various formats including text, documents, spreadsheets, logs, zips, and images.
Customizable Redaction: Tailor data protection to your needs with fully customizable redaction for each sensitive entity type.
Flexible Detectors: Leverage Nightfall’s comprehensive list of machine learning-based detectors, customize them, or create your own with specialized logic.
High Accuracy and Performance: Achieve precision and recall rates of 95% or higher, handle over 1K requests per second, and experience latency of less than 100 ms.
Seamless Integration: Easily integrate with your existing AI development and data engineering tools for smooth and efficient operation.
You can leverage Nightfall’s machine learning-based detectors or create your own detectors with customized logic to scan third-party apps, internal services, and data silos to identify instances of potentially sensitive types of data such as:
Personally Identifiable Information (PII) including Social Security Numbers, passport numbers, email addresses, or date of birth
Protected Health Information (PHI) such as insurance claim numbers or ICD10 codes
Financial information like credit card numbers or bank routing numbers
Secrets such as API and cryptographic Keys, database connection strings, passwords, etc.
Network information such as IP Address or MAC Address
Key features of Nightfall’s detection engine include:
Defining minimum confidence thresholds and minimum finding counts on detectors to reduce the chance of false positives.
Specifying context rules and exclusion rules on detectors to fine-tune their accuracy to better suit your use cases.
Choosing which detectors are triggered for each policy.
The Nightfall API consumes arbitrary data as input either as strings or as files and allows you to use any combination of detectors to return a collection of “findings" objects.
The detectors may be defined in our web app and referenced in an API call or defined as part of the payload to an API call.
The findings display the relevant detector, the likelihood of a match, and the location within the given data where the matched token occurred (not only in terms bytes — there is support for tabular and JSON data as well).
You can take protective action on sensitive text by redacting, substituting, or encrypting it with the API. You may also set up webhooks to receive asynchronous notifications when findings are detected.
The Nightfall API is RESTful and uses JSON for its payloads. Our API is designed to have predictable, resource-oriented URLs for each endpoint and uses HTTP response codes to indicate any API errors.
You may test out the API through the interactive reference documentation.
The following guide will walk you through getting started and describe the API functionality in more detail. If you want to execute an API call immediately, see our Quickstart guide to see how to obtain an API Key and make a simple scan request.
After that, you can learn about Nightfall with our Key Concepts section, which will also help you get set up with Nightfall.
If you’re looking for more ideas about best to leverage Nightfall’s functionality, see our Use Cases guide.
We have created numerous tutorials and example implementations that demonstrate how to implement DLP for a variety of platforms (including OpenAI, LangChang, Amazon, Datadog, and Elasticsearch) and handle various scenarios (such as detecting sensitive data in GenAI prompts or detecting PII on your machine in real-time).
We also have several language-specific SDKs to get you up and running in Java, Python, Go, Node.js, and Ruby.
You can also quickly test out Nightfall detectors or your custom Detection Rules in the Nightfall Playground. Please also consult our Detector Glossary to see the variety of built-in detectors that Nightfall offers.
The Firewall for AI Overview page allows you to create API keys and manage Detectors and Detection Rules through a straightforward user interface. Log in here to access the Dashboard, or sign up to create a free account.
For frequently asked questions, feedback, and other help, please contact Nightfall support at support@nightfall.ai. We also host Nightfall Developer Office Hours on Wednesdays at 12pm PT to help answer questions, talk through any ideas, and chat about data security. We would love to see you there!
This section describes the terms you will need to know when using the API.
Detectors provide the logic to find potentially sensitive pieces of data.
When this logic detects such data, the Detector is considered "triggered."
Nightfall's has numerous pre-built Detectors that are trained via machine learning. Detectors may also be defined with regular expressions or dictionaries. Their accuracy may be further refined with exclusion rules and context rules. Whether a Detector is triggered may be controlled by a minimum confidence threshold per Detector and minimum number of findings per Detector as set on a Detection Rule.
The built-in set of Detectors cover a number of different categories of data, including:
Standard PII (e.g. social security number, driver's license number, ID card image)
PCI (Credit Card Number, credit card image)
Healthcare (e.g. PHI, US Medicare Beneficiary Number)
Finance - Banking (e.g. SWIFT code, IBAN code, US bank routing number)
Network (e.g. an IP Address)
The full set is enumerated in the Detector Glossary.
Nightfall also supports RE2 regexes and word lists for any custom detectors that you may want to implement.
Over time, we've aggregated the following regex library, which you're welcome to select from to save you some time. Please note that a regular expression is an established yet limited method that searches for pre-defined patterns, so your mileage may vary.
You can test regular expressions here.
You can input custom detectors in two ways: directly in the Nightfall Dashboard by navigating to Detectors → New Detector → Regular expression, or define them inline.
An exclusion rule is a regular expression or word list that will be used once a Detector is triggered by its primary expression or word list to eliminate false positives.
For instance, you may have a Detector designed to detect phone numbers. However, you may have a particular set of phone numbers that you use for testing purposes that are known not to be valid (e.g. they start with the prefix 555) and this should be ignored. Adding an exclusion rule would allow you to prevent those matches from being returned by the API.
Context Rules are additional matching expressions for a Detector that may be used to adjust the confidence score of a match.
You may provide a regular expression and the number of leading or trailing characters within which a match of that expression must occur in order to adjust the confidence level to a particular level.
For instance, if you found a sequence that appeared to be a social security number based on its length or formatting, you might boost the confidence score if it was preceded by the text like “SSN” or “Social Security Number.”
You may request that a sequence of bytes of a given length be provided from before and after the text that triggers a Detection Rule.
This information can help you better understand whether or not something is an actual violation by observing the circumstances within which the detected text was found.
You are limited to a maximum of 40 bytes of this context text preceding and trailing the match for a total of 80 bytes overall.
See: Using Context
Detection Rules are aggregations of Detectors that are assigned a minimum confidence level. The identifiers of Detection Rules are used as a parameter to the API.
You may create Detection Rules as described in the section Creating Detection Rules and use their identifier as part of API calls to scan content.
Alternatively you may specify Detection Rules programmatically in each API call, as described in the scan method documentation below.
A Detection Rule is composed of a list of Detectors with which you wish to scan each request payload, where any or all Detectors may be satisfied in order to trigger the rule. You can add up to 50 total Detectors with a limit of 30 regular expression type custom detectors.
Additionally, each Detector in the Detection Rule is assigned a “minimum confidence” level (see below and a minimum number of findings to determine if the Detection Rule should be considered triggered.
Detection results will be returned with one of the following confidence values.
In practice, the API will only return detections assigned a POSSIBLE or higher confidence level.
VERY_LIKELY (recommended)
LIKELY
POSSIBLE
UNLIKELY
VERY_UNLIKELY
Learn more about what different confidence levels mean and how to choose the right minimum confidence level for your detection rule here.
Policies allow you to create templates for the most common workflows by unifying a set of Detection Rules with the actions to be taken when those rules are triggered, including:
automated actions such as redaction of findings
alerting through webhooks
Once defined, a Policy may be used in requests to the Nightfall API, such as calls to scan file uploads, though automated redactions are not available for uploaded files at this time.
There are many use cases for a high accuracy data classification and protection system like Nightfall. Here are some of the most popular to spark your imagination.
We can't wait to hear more about what you're planning to build: reach out to us anytime at support@nightfall.ai to discuss your use case.
Motivation
Third-party APIs provide services that greatly augment the capabilities of your applications.
For example, GenAI LLMs can automatically generate content. These LLMs can be accessed via APIs, such as OpenAI or Anthropic APIs.
Another example are telecom/communications APIs like SendGrid and Twilio that provide communications infrastructure.
The challenge is that these services may unnecessarily receive sensitive or confidential information from your application that is calling these APIs, which can pose data privacy risks because customer data is being shared outside the intended scope. For example, LLMs can handle very large inputs, or prompts, and these prompts may contain sensitive customer information.
Benefits
By filtering out customer data from API inputs, you will be able to leverage cutting-edge third-party services and APIs without introducing data privacy risks by oversharing sensitive or confidential information.
Motivation
Applications collect and store sensitive information from consumers. Users may “overshare” or incorrectly input information, leading to sensitive data ending up in places it is not expected, or internal services may proliferate or handle this data in unexpected ways.
Fintech applications that intake, store, and generate files with PII like W-2s and paystubs.
Healthcare applications that handle protected health information or SSNs.
Marketplaces and social media applications allow for user generated content that may contain sensitive or illicit information, such as profanity, toxicity.
Support channels receive any inbound information from consumers, and can include highly sensitive information or over-sharing that is then exposed to support agents.
This data can come in a variety of unstructured formats - whether that be screenshots, images, documents, plaintext, compressed folders or archives, so to inspect this content requires high quality text extraction.
Benefits
Reduce the possibility of users inputting sensitive data that should not be collected or retained within your application or service by scanning data upon submission. Warn or prevent users from inputting sensitive data into form fields or file uploads.
Diminish collection of sensitive data types that could result in regulatory fines or brand damage, if leaked or breached.
Limit exposure of sensitive data to internal personnel like support agents that could lead to accidental misuse or intentional theft.
Motivation
Compliance regimes like FedRAMP, PCI, and HIPAA may require that sensitive data is not proliferating into unsanctioned data silos, like project management systems, data warehouses, and logging infrastructure.
Many different development teams may be writing data into these internal services like logging and data warehousing, so it is challenging to enforce data sanitization on data ingress.
CDP tools like Segment and Fivetran can further proliferate sensitive data into a broader set of data silos than its original location.
Data analytics and data science teams may replicate and transform data, leading to further copies and versions across internal systems.
Edge cases, unexpected errors, and stack traces can lead to sensitive data landing or replicating in application logs.
Benefits
Identify and remove sensitive data from places that it shouldn’t be.
Monitor data at rest in data silos instead of at points of ingress/egress that would be hard to monitor or track.
Scan extremely high volumes of unstructured data at scale.
Build workflows to delete data, redact data, or alert the right teams when sensitive data is found where it shouldn’t be.
Motivation
Data classification and DLP capabilities are increasingly expected by regulated institutions such as big banks.
Building data classification and DLP from scratch is complex and has high opportunity costs in moving developers away from working on the core product offering. Building a half-baked solution erodes customer trust, especially when there is already a high degree of skepticism around the quality of traditional DLP solutions.
SaaS and security vendors can deliver additional customer value and drive additional revenue through premium enterprise feature tiers that include security features like DLP, SAML SSO, audit logging, and more.
Benefits
Reduce time-to-market by leveraging out of the box components.
Reduce the overhead of an in-house data classification service that requires text extraction services, detector research and tuning, machine learning model development and deployment, maintenance & support.
Deliver best in class accuracy, reducing the risk of alert fatigue or missing sensitive data that erodes customer trust.
Motivation
Detecting a single type of sensitive data well (e.g. a credit card number) can be complex - requiring research and maintenance as the detector evolves over time. This becomes especially challenging for esoteric detectors, for example those that are region or industry-specific.
Managing regexes and input validation is complex and evolving. For example, a regex embedded in code to validate a Google Docs link may need to be updated over time as the format for Google Docs links changes, false positives are identified and accounted for, any performance implications are observed.
Many data types cannot be detected accurately with a regex because they require a certain level of validation, are heavily context dependent, or are highly variable or entropic in nature leading to a regex being overly sensitive or overly specific.
Benefits
Leverage out of the box detectors so no engineering time is spent on research, training, tuning detectors. No need to reinvent the wheel. These detectors span the categories of PII, PCI, PHI, credentials & secrets, ID numbers, and more.
Reduce time spent finding, tuning, and sharing regular expressions.
Build upon out of the box detectors with custom logic, instead of having to start from scratch with a regex or custom validation logic.
Motivations
Existing content inspection systems may yield a high degree of false positives (i.e. noise), leading to alert fatigue and significant time wasted on inaccurate alerts.
On the contrary, existing solutions may also be very limited in detection scope, leading to a high degree of false negatives (i.e. misses), putting the business at risk when sensitive data is missed.
Benefits
Replace existing, brittle solutions with a highly accurate content inspection system.
Reduce engineering time spent analyzing false positives and attempting to tune them out.
Motivation
In training complex learning models, data scientists must compile and use large corpuses of data to improve the accuracy of the trained model. Unknowingly leveraging sensitive data in this effort can lead to violations of compliance regimes like HIPAA, GDPR, or PCI.
Models that focus on health, finance, public sector applications are particularly at risk for ingesting sensitive data that may violate industry specific compliance mandates.
Labeled data is often ingested from unregulated sources like customer communications, emails, public repos, and more. Inspecting all of these input sources manually is untenable.
Additionally, the data being leveraged may be in a variety of unstructured formats like screenshots, images, documents, plaintext, compressed folders or archives – to inspect this content requires high quality text extraction.
Benefits
Ensure the hygiene of the labeled data you are using to train your machine learning models
Diminish collection of sensitive data types that could result in regulatory fines or brand damage, if leaked or breached.
Healthcare: Detect PHI to ensure HIPAA compliance in your apps
Financial services: Secure PII and PCI like bank account numbers, payment card details, and social security numbers
E-commerce: Prevent costly data breaches of PII and PCI that can damage brand reputation
Education: Protect student and faculty privacy within applications
Customer support: Redact sensitive data in customer support system, shielding agents from information they shouldn’t see
IT Operations: Search for API keys, credentials, and secrets across internal and external data silos
Product: Create custom solutions for data classification, DLP, content moderation and more within your applications
Compliance: Address PCI-DSS, HIPAA, FedRAMP, GDPR, CCPA, GLBA, FERPA, PHIPA, and more
People & Community: Content moderation to detect profanity, toxicity
Gaming: Detecting profanity, toxicity, or even personal or financial information being shared in community chat rooms
Welcome to the amazing world of the Nightfall Firewall for AI (formerly known as Nightfall Developer Platform). Here you can find all the information about Nightfall's APIs, and SDKs, and also usage examples of these APIs and SDKs.
Before you use the scan endpoint, there are a number of actions to do within the Nightfall dashboard to get your environment set up properly.
See to see how to create the necessary Authentication token for making API calls.
See for how to define your own custom logic for detecting sensitive data
See for how to aggregate Detectors for use in the scan
endpoint
See for how to set up common workflows that combine your Detection Rules with remediation actions such as alerting.
Nightfall has the ability to send alerts when a violation is detected.
Policies for alerting may be configured through the Nightfall app user interface or they may be set up . Policies that are configured under Developer Platform > Overview > Policies may be used in the API by referencing their Policy UUID.
The way that an alert notification presents itself depends on the platform in question.
For example, notifications sent to Slack will appear as formatted messages sent by the Nightfall Alerts Bot. Other destinations such as email, SIEM url, and webhooks, will present the information as JSON objects.
In the case of webhooks, detailed information about the finding will be sent. For other destinations, sensitive information is redacted.
In order to use asynchronous notifications with Slack, you must install the Nightfall Alerts plugin from the Slack Marketplace.
See our end user documentation on installing for more details.
Once you have authenticated Nightfall to your Slack workspace, you can provide any public channel name (e.g. #general) as part of a request to the Nightfall API.
To send notifications to a private channel, a member of the channel should invite the Nightfall bot to the specific private channel and allow channel access to the bot.
Follow the steps below to invite Nightfall Alerts bot to a private channel:
Go to the Slack channel in question
Type /invite @Nightfall Alerts as a message
Press 'Enter' (you should see a message that Nightfall Alerts has now joined the channel)
If any findings are detected as part of that request, then the Nightfall Alerts bot will send a message to the channel you configured. Conversely, if there are no findings in the request payload, then Nightfall will not send an alert message.
Documentation TBD
Email is unauthenticated, so you can get started using Nightfall to send email alerts without any initial setup work.
Nightfall will send an email to the provided address only if findings were detected as part of the request. The findings themselves will be attached in a JSON file.
You may send your alerts to a designated url, such as an endpoint hosted by SIEM software for log collection.
In addition to the url, you may provide headers, either for security or logging purposes.
You may use a webhook server to programmatically handle a finding, allowing you to create your own custom workflows with your own or 3rd party systems.
Nightfall will always send an alert to the client's webhook server if it is provided as part of an API request, even if the scan request yielded no findings.
The request body sent by Nightfall is JSON, and uses the schemas in the section documented below.
Since file scans can produce a large number of results, findings are not transmitted directly in the notification that Nightfall sends. The notification object looks like the following:
The requestMetadata
field contains arbitrary contents provided by the client at request time, and can be used by the client to correlate this response to the original request.
The value of the findingsURL
field is a pre-signed URL, which means anyone with the link can download the file. Therefore, this URL itself should be treated as sensitive and must not be leaked. The object stored at this URL is a JSON file containing a single key findings
containing a list of all data detected from the request. The schema for the finding
object inside the list is shared between the text-based and file-based API endpoints.
You can define Detection Rules “inline” in the body of each request to the scan endpoint. See the example in the walk through of the scan endpoint Creating an Inline Detection Rule.
You can also use the > to predefine your Detection Rules. Once you have created a Detection Rule, you will receive a UUID, which you can pass in as part of your API request payloads.
You may add up to 50 detectors to your detection rule.
To create a Detection Rule in the Nightfall UI, Select "Detection Rules" from the left hand navigation.
Click the + New Detection Rule
button in the upper right hand corner.
First, enter a name for your Detection Rule as well as an optional description.
Then click the + Detectors
button to add Detectors to your Detection Rule.
In this example we have selected the US drivers license and Canada Government ID detectors.
Click the Add
button in the lower right hand corner at the end of the detector list when you are done adding detectors.
Now that your Detectors are set, choose a minimum confidence level and a minimum # of findings for each detector.
If these minimums for a Detector are not met, the Detection Rule will not be triggered.
Save your Detection Rule in the lower left hand corner once you are done.
Once the Detection Rule is saved, it is available for use in requests to the Nightfall API to scan your data for sensitive information. Pass in the UUID of the Detection Rule as the detectionRuleUUIDs
field of your requests to the the scan endpoints.
The UUID may be obtained by clicking the "copy" icon, the left most icon in the set of icons that appear next to the Detection Rules name when your cursor highlights a Detection Rule in the list of Detection Rules.
See Using Pre-Configured Detection Rules for an example of using a Detection Rule UUID.
You can customize your Detection Rules by creating custom detectors in the .
To create a Detector, select "Detectors" from the left-hand navigation and click the + New Detector
button
Custom detectors can add context and exclusion rules on top of pre-built Nightfall detectors, or can be built off your own custom regular expressions.
Be aware that you may not have two detectors based on the same Nightfall data type within the same detection rule.
A full glossary of Nightfall's prebuilt detectors can be found in the Detector Glossary
Updated 2 months ago
The API expects an API Key to be passed via the Authorization: Bearer <key>
HTTP header.
To create and manage API keys:
Log in to Nightfall.
Click Overview under the Firewall for AI section.
Click Create key.
The Generate API Key window is displayed.
Enter a name for the API key and click Create.
The API key is generated and displayed (blurred in the following image). Click the copy button to copy the API key and store it in a. secure location. Once you click the Got it button, you cannot retrieve the API key again.
🚧Be Sure to Record the API Key's ValueFor security reasons, after closing the window, you will not be able to recover the key's value.
Once you close the window, the My API Keys page will display your newly generated key, with the majority of the Key redacted.
You can return to the Overview
page at any time to create new keys (assuming your license allows you to generate additional keys) or delete old keys.
This document applies only to the Nightfall Firewall for AI customers. If you are a Nightfall SaaS application customer, refer to .
Policies allow customers to create templates for their most common workflows by unifying a set of Detection Rules with the actions to be taken when those rules are triggered, including:
automated actions such as redaction of findings
alerting through webhooks
Once defined, a Policy may be used in requests to the Nightfall API, such as calls to scan file uploads, though automated redactions are not available for uploaded files at this time.
To create a policy:
Log in to Nightfall.
Click Overview under the Firewall for AI section.
Click Create Policy.
The policy creation page is displayed as follows.
If you click the Policies button under the Setting Up section, you need to execute a couple of additional steps to reach the policy creation page, as displayed in the following image.
Enter a name for the policy.
(Optional) Enter a Description for the policy.
Click + Detection rule to add a Detector rule to the policy.
Select the check box of the Detector rules that you wish you add to the Policy.
Select the Redact Violations check box to mask sensitive information found in your transmitted data.
Select one of the alerting method available.
Click Save Policy.
When you click + Application Webhook, the following window is displayed.
If you have custom headers you would like to add to requests sent to the Webhook URL, you can do this from the overlay that appears when you click the "+ Webhook" button on the policy creation and edit page. These headers may be used for the purpose of authentication as well as integrating with Security Incidents and Event Management (SIEMs) or similar tools that aggregate content through HTTP event collection.
Click the "Add Header" button to add your custom headers.
Once your header key and value is entered you may obfuscate it by clicking on the "lock" icon next to the value field for the header. Click the "Save" button to persist your changes to the headers.
When you have completed configuring your Webhook URL and Headers, click the "Save" button.
🚧Limits On Webhook HeadersIt is currently not possible to configure headers for webhooks programmatically when defining policies through the API.
After you click the "Save Policy" button, your policy should be immediately available for use. You can refer to the API Docs for the comprehensive list of endpoints that support policy UUIDs.
See in our end user guide or our for more details.
See for more details.
The payload that is forwarded on behalf of text scanning requests is identical to the response body that is synchronously returned to the client. Refer to the for more details on this payload.
Click + Application Webhook to add the URL of a webhook that needs to be notified. See to learn more.
The scan
endpoint allows you to apply Policies and Detection Rules to a list of text strings provided as a payload
.
You may use Pre-Configured Detection Rules or Create Inline Detection Rules
Text scanning supports the use of Exclusion Rules, Context Rules, and Redaction as well as other Scanning Features.
For scanning files, see Scanning Files.
Note that you must generate an API key to send requests to the Nightfall API.
As part of submitting a file scan request, the request payload must contain a reference to a webhook server URL defined as part of a policy
defined inline.
When Nightfall prepares a file scan operation, it will issue a challenge to the webhook server to verify its legitimacy.
After the file scan has been processed asynchronously, the results will be delivered to the webhook.
For a file scan, your webhook will receive a request body that will be a JSON payload containing:
the upload UUID (uploadID
)
a boolean indicating whether or not any data in the file matched the provided detection rules (findingsPresent
)
a pre-signed S3 URL where the caller may fetch the findings for the scan (findingsURL
). if there are no findings in the file, this field will be empty.
the date until which the findingsURL is valid (validUntil
) formatted to RFC 3339. Results are valid for 24 hours after scan completion. The time will be in UTC.
the value you supplied for requestMetadata. Callers may opt to use this to help identify their input file upon receiving a webhook response. Maximum length 10 KB.
Below is an example of a payload sent to the webhook URL.
If you follow the URL (before it expires) it will return a JSON representation of the findings similar to those returned by the Scan Plain Text endpoint.
In this example, we have uploaded a zip file with a python script (upload.py) and a README.md file. A Detector in our DetectionRule checks for the presence of the string http://localhost
File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange
, codepointRange
, and lineRange
properties.
Findings will contain a columnRange
and a rowRange
that will allow you to identify the specific row and column within the tabular data wherein the finding is present.
This functionality is applicable to the following mime types:
text/csv
text/tab-separated-values
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel
Apache parquet data files are also accepted.
Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.
Nightfall provides special handling for archives of GitHub repositories.
Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.
In order to scan the repository, you will need to create a clone, i.e.
git clone https://github.com/nightfallai/nightfall-go-sdk.git
This creates a clone of the Nightfall go SDK.
You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.
zip -r directory.zip directory
Note that in order to work, the hidden directory .github
must be included in the archive.
When you initiate the file upload sequence with this file, you will receive scan results that contain the commitHash
property filled in.
Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http://
or https://
), which will send results such as the following:
Sensitive Data in GitHub Repositories
If the finding in a GitHub repository is considered to be sensitive, it should be considered compromised and appropriate mitigation steps (i.e. secrets should be rotated).
To retrieve the specific checkout, you will need to clone the repository, i.e.
git clone https://github.com/nightfallai/nightfall-go-sdk.git
You can then checkout the specific commit using the commit hash returned by Nightfall.
Note that you are in a 'detached HEAD' state when working with this sort of check out of a repository.
The file scan API has first-class support for text extraction and scanning on all MIME types enumerated below.
Certain file types receive special handling, such as tabular data and archives of Git repositories, that results in more precise information about the location of findings within the source file..
application/json
application/x-ndjson
application/x-php
text/calendar
text/css
text/csv (treated as tabular data and may be redacted )
text/html
text/javascript
text/plain
text/tab-separated-values (treated as tabular data)
text/tsv (treated as tabular data)
text/x-php
application/pdf
application/vnd.openxmlformats-officedocument.presentationml.presentation
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (treated as tabular data)
application/vnd.openxmlformats-officedocument.wordprocessingml.document
application/vnd.ms-excel (treated as tabular data)
application/bzip2
application/ear
application/gzip
application/jar
application/java-archive
application/tar+gzip
application/vnd.android.package-archive
application/war
application/x-bzip2
application/x-gzip
application/x-rar-compressed
application/x-tar
application/x-webarchive
application/x-zip-compressed
application/x-zip
application/zip
image/apng
image/avif
image/gif
image/jpeg
image/jpg
image/png
image/svg+xml
image/tiff
image/webp
The file scan API explicitly rejects requests with MIME types that are not conducive to extracting or scanning text. Sample rejected MIME types include:
application/photoshop
audio/midi
audio/wav
video/mp4
video/quicktime
File scans of Microsoft Office, Apache parquet, csv, and tab separated files will provide additional properties to locate findings within the document beyond the standard byteRange
, codepointRange
, and lineRange
properties.
Findings will contain a columnRange
and a rowRange
that will allow you to identify the specific row and column within the tabular data wherein the finding is present.
This functionality is applicable to the following mime types:
text/csv
text/tab-separated-values
text/tsv
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel
Apache parquet data files are also accepted.
Below is a sample match of a spreadsheet containing dummy PII where a SSN was detected in the 2nd column and 55th row.
Findings within csv files may be redacted.
To enable redaction in files, set the enableFileRedaction
flag of your policy
to "true"
The csv file will be redacted based on the configuration of the defaultRedactionConfig
of the policy
Below is an example curl request for a csv file that has already been uploaded .
When results are sent to the location specified in the alertConfig
(in this case an email address) a redactedFile
property will be set with a fileURL
in addition the findingsURL
This redacted file will be a modified version of the original csv file.
Below is an example of a redacted csv file.
Nightfall provides special handling for archives of Git repositories.
Nightfall will scan the repository history to discover findings in particular checkin, returning the hash for the checkin.
In order to scan the repository, you will need to create a clone, i.e.
git clone https://github.com/nightfallai/nightfall-go-sdk.git
This creates a clone of the Nightfall go SDK.
You will then need to create an archive that can be uploaded using Nightfall's file scanning sequence.
zip -r directory.zip directory
Note that in order to work, the hidden directory .github
must be included in the archive.
When you initiate the file upload sequence with this file, you will receive scan results that contain the commitHash
property filled in.
Using the Nightfall go SDK archive created above, a simple example would be to scan for URLs (i.e. strings starting with http://
or https://
), which will send results such as the following:
Large repositories result in a large volume of data sent at once. We are working on changes to allow these and other large surges of data to be processed in a more controlled manner, and will increase the limit or remove it altogether once those changes are complete.
To retrieve the specific checkout, you will need to clone the repository, i.e.
git clone https://github.com/nightfallai/nightfall-go-sdk.git
You can then checkout the specific commit using the commit hash returned by Nightfall.
Note that you are in a 'detached HEAD' state when workin with this sort of check out of a repository.
Nightfall's upload process is built to accommodate files of any size. Once files are uploaded, they may be scanned with Detection Rules and Policies to detect potential violations.
Many users will find it more convenient to use our our native language SDKs to complete the upload process.
Uploading files using Client SDK libraries requires fewer steps as all the required API operations are wrapped in a single function call. Furthermore these SDKs handle all the programmatic logic necessary to send files in smaller chunks to Nightfall.
For users that are looking to understand the entire upload process end-to-end, that is also outlined in this document. We will walk you through the order of operations necessary to upload the file.
Rather than implementing the full sequence of API calls for the upload functionality yourself, the Nightfall’s native language SDKs provide a single method that wraps the steps required to upload your file.
Below is an example of uploading a file from our Python SDK and our Node SDK.
To run the node sample script you must compile it as TypesScript. Save it as a .ts file and run
tsc <yourfilename>.ts -lib ES2015,DOM
You can then run the resulting JavaScript file:
NIGHTFALL_API_KEY=<YourApiKey> node yourscriptname.js
Note that these examples use an email address to receive the results for simplicity.
You may also want to use a webhook. See Webhooks and Asynchronous Notifications for additional information on how to set up Webhook server to receive these results.
The upload process consists of 3 stages:
Once the upload is complete, you may initiate the file scan.
After we discuss each API call in the sequence, you will find a script that walks through the full sequence at the end of this guide.
POST /v3/upload
The first step in the process of scanning a binary file is to initiate an upload in order to get a fileId
through the Initiate a File Upload endpoint.
As part of the initialization you must provide the total byte size of the file being uploaded.
You may also provide the mime-type, otherwise the system will attempt to determine it once the upload is complete.
The id
of the returned JSON object will be used as the fileId
in subsequent requests.
The chunkSize
is the maximum number of bytes to upload during the uploading phase.
Use the Upload a Chunk of a File endpoint to upload the file contents in chunks.
The size of these chunks are determined by the chunkSize
value returned by POST /upload
endpoint used in the previous step.
Below is a simple example where the file is less than the chunkSize
so may safely be uploaded with one call to the upload endpoint.
If your file's size exceeds the chunkSize
, to upload the complete file you will need to send iterative requests as you read portions of the file's contents. This means you will send multiple requests to the upload
endpoint as shown above. As you do so, you will be updating the value of the X-Upload-Offset
header based on the portion of the file being sent.
Each request should send a chunk of the file exactly chunkSize
bytes long except for the final uploaded chunk. The final uploaded chunk is allowed to contain fewer bytes as the remainder of the file may be less than the chunkSize
returned by the initialization step.
The request body should be the contents of the chunk being uploaded.
The value of the X-UPLOAD-OFFSET
header should be the byte offset specifying where to insert the data into the file as an integer. This byte offset is zero-indexed.
Successful calls to this endpoint return an empty response with an HTTP status code of 204
See the full example script below for an illustration as to how this upload process can be done programmatically.
POST /v3/upload/<uploadUUID>/finish
Once all chunks are uploaded, mark the upload as completed using the Complete a File Upload endpoint.
When an upload completes successfully, the returned payload will indicate the mimeType the system determined to file to be if it was not provided during upload initialization.
Once a file has been marked as completed, you may initiate a scan of the uploaded file.
After an upload is finalized, it can be scanned against a Detection Policy. A Detection Policy represents a pairing of:
a webhook URL
a set of detection rules to scan data against
The scanning process is asynchronous, with results being delivered to the webhook URL configured on the detection policy. See Webhooks and Asynchronous Notifications for more information about creating a Webhook server.
Exactly one policy
should be provided in the request body, which includes a webhookURL
to which the callback will be made once the file scan has been completed (this must be an HTTPS URL) as well as a Detection Rule as either an a list of UUIDs or as a rule that has been defined in-line.
You may also supply a value to the requestMetadata
field to help identify the input file upon receiving a response to your webhook. This field has a maximum length 10 KB.
Below is a sample Python script that handles the complete sequence of API calls to upload a file using a path specified as an argument.
Nightfall’s file scan API allows a user to upload a file in chunks, then to scan it with Detection Rules once the upload is complete.
The scan will then be processed asynchronously before sending the results to the webhook URL that is provided along with your Detection Rules.
The following sequence diagram illustrates the full process for scanning a binary file with Nightfall.
For a detailed walkthrough of the API calls necessary to upload and scan a file and full script that shows the entire process, see Uploading and Scanning Files.
In order to utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer <key>
— see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (detailed information to follow)
File scanning also support Nightfall's functionality for Using Exclusion Rules and Using Context Rules as part of your scan requests.
Nightfall offers many useful features beyond its detectors, including:
The ability to use and to narrow the scope of matches.
The ability to create in a way that is highly configurable so that sensitive data is appropriately obfuscated.
The ability to create that determine how leaks of sensitive information should be mitigated (i.e. through alerts sent to email or Slack).
In order to accept requests from Nightfall, a Webhook server must use a signing key to verify requests.
To access or generate your Webhook signing key, start by logging in to the Nightfall .
Select the Developer Platform > Manage API Keys using the navigation bar on the left side of the page. You will see the Webhook signing section:
Unlike the API Key, it is possible to reveal the signature via the "eye" icon furtherest to the left of the three icons displayed.
You may copy the current value to your clipboard with the "copy" icon in the center of the three icons displayed.
You may also regenerate the key with the circular arrow icon furthest to the right.
Use this value as shown in the code examples that are used in the following sections.
The Nightfall API supports the ability to send asynchronous notifications when findings are detected as part of a scan request.
The supported destinations for these notifications include external platforms, such as Slack, email, or url to a SIEM log collector as well as to a webhook server.
Nightfall issues notifications under the following scenarios:
to notify a client about the results of a . File scans themselves are always performed asynchronously because of complexity relating to text extraction and data volume.
to notify a client about results from a text scan request. Although results are already delivered synchronously in the response object, clients may configure the request to forward results to other platforms such a webhook, SIEM endpoint, or email through a
To create a webhook you will need to and then set up a
For more information on how webhooks and asynchronous notifications are used please see our guides on:
Learn how to set up a server to handle results of file scans and alerts sent based on policy alert configurations.
Nightfall will send a POST request with a JSON payload with a single field challenge
containing randomly-generated bytes when it sends a message to a user-provided webhook address. This is to ensure that the caller owns the server.
In order to authenticate your webhook server to Nightfall, you must reply with (1) a 200 HTTP Status Code, and (2) a plaintext request body containing only the value of the challenge
key.
If Nightfall receives the expected value back, then the file scan operation will proceed; otherwise it will be aborted.
When a server responds successfully to a challenge request, the validity of that URL will be cached for up to 24 hours, after which it will need to be validated again.
If the webhook cannot be reached, you will receive an error with the code "40012" and the description "Webhook URL validation failed" when you initiate the scan.
If the webhook challenge fails, you will receive an error with the code "42201" and the description "Webhook returned incorrect challenge response" when you initiate the scan.
When a customer signs up for the developer platform, Nightfall automatically generates a unique for them.
This secret is used to sign requests to the customer's configured webhook URL.
If you has any concerns that their signing secret may have leaked, you can request rotation at any time by reaching out to Nightfall Customer Success.
For security purposes, the webhook includes a signature header containing an HMAC-SHA256 digital signature that customers may use to authenticate the client.
In order to authenticate requests to the webhook URL, customers may use the following algorithm:
Check for the presence of the headers X-Nightfall-Signature
and X-Nightfall-Timestamp
. If these headers are not both present, discard the request.
Read the entire request body into a string body
.
Verify that the value in the X-Nightfall-Timestamp
header (the POSIX time in seconds) occurred recently. This is to protect against replay attacks, so a threshold on the order of magnitude of minutes should be reasonable. If a request occurred too far in the past, it should be discarded.
Concatenate the timestamp and body with a colon delimiter, i.e. timestamp:body
.
Compute the HMAC SHA-256 hash of the payload from the previous step, using your unique signing secret as the key. Encode this computed value in hex.
Compare the value of the X-Nightfall-Signature
header to the value computed in the previous step. If the values match, authentication is successful, and processing should proceed. Otherwise, the request must be discarded.
The snippet below shows how you might implement this authentication validation in Python:
An example implementation of a simple webhook server is below.
In the above example, the webhook server is running on port 8075. To route ngrok requests to this server, once you run the python script (having installed the necessary dependencies such getenv and Flask), you would run ngrok as follow:
./ngrok http 8075
Nightfall supports Detectors that will scan for file names, file types, and file finger prints.
In addition to scanning the content of files, you may configure the Detectors to scan file names as well.
This is done through the “scope” attribute of a Detector.
The scope attribute allows you to scan either within file contents, the file name, or both the file contents and file name.
File extensions can be scanned for by creating a Regular Expression type custom Detector with a scope
to scan only file names ("File") or both the content and file name ("ContentAndFile"), as shown in the example request below.
Note that confidence sensitivity does not apply to file names. Sensitive findings will always be reported on.
Nightfall’s File Type detection allows you to implement compliance policies that detect and alert you when particular file types that are not allowed in a given location are discovered.
This functionality is implemented by creating a specific Detector called a “File Type Detector”
To create a File Type Detector, select “Detectors” from the left hand navigation and click the button labeled “+New Detector” in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the “File Type” Detector type.
You can either scroll through the list of mime-types in the select box or you may type in a portion of the mime-type and the contents of the select box will be filtered to match your input.
File Type Detectors vary from other Nightfall Detectors in that the attributes of scope
and confidence
are not relevant to File Type Detectors
Nightfall allows you to discover the location of specific files that you have deemed sensitive and want to avoid sharing.
This discovery is done through document fingerprinting. Fingerprinting is the process of algorithmically creating a unique identifier for a file by mapping the data of the document to a signature that can be recalled quickly. This allows the file to be identified in a manner akin to how human fingerprints uniquely identify individual people.
This functionality is achieved in Nightfall by creating a specific Detector type called a File Fingerprint Detector.
The Fingerprint Detector allows you to create a fingerprint for one more files (a sort “handful” of fingerprints, if you would).
To create a Fingerprint Detector, select “Detectors” from the left hand navigation and click the button labeled “+New Detector” in the upper right hand corner. From there a drop down list of Detector types will be displayed which will include the “Fingerprint” Detector type.
When you create a File Fingerprint Detector you can upload up to 50 files that need to be fingerprinted. The file size limit is 25MB.
Once the fingerprint is generated, the actual content of the file is discarded so no sensitive content is stored on Nightfall’s system.
These Detectors may only be created through the console.
In this example, we'll walk through making a request to the scan endpoint.
The endpoint inspects the data you provide via the request body and reports any detected occurrences of the sensitive data types you are searching for.
Please refer to the API reference of the scan endpoint for more detailed information on the request and response schemas.
In this sample request, we provide two main fields:
a policy and its detection rules that we want to use when scanning the text payload
a list of text strings to scan
In the example below we will use a Detection Rule that has been configured in the by supplying its UUID.
The aggregate length of all strings in payload list must not exceed 500 KB, and the number of items in the payload may not exceed 50,000.
Executing the curl request will yield a response as follows.
The API call returns a list, where the item at each index is a sublist of matches for the provided detector types.
The indices of the response list correspond directly to the indices of the list provided in the request payload.
In this example, the first item in the response list contains a finding
because one credit card number was detected in the first string we provided. The second item in the response list is an empty list because there is no sensitive data in the second input string we provided. The third item in the returned list contains multiple findings as a result of multiple Detectors within the Detection Rule being triggered.
You can test your webhook with a tool such as which allows you expose a web server running on your local machine to the internet.
See the section on for details about the json payloads for the different messages sent to webhook servers.
In addition to scanning based on file name, you may also use a which allows you to scan for files based on their mime-type.
You will then select one or more file types for which to scan by selecting from a list of
Nightfall supports detection for a wide variety of mime-types. See the Internet Assigned Numbers Authority’s (IANA) website for a definitive list of . Note however that Nightfall does not support the detection of audio and video related mime-types.
Detection of file types is done based on the file contents, not its extension. However, you can create by setting the scope
attribute.
Once you have added all the mime-types you wish to scan for, save your new Detector. You may then add your new Detector to and.
You may then treat the Fingerprint detector like any other and incorporate it into a using its unique Detector identifier.
You may incorporate these Detectors into that will alert you whenever files that match the fingerprint are detected.
Alternatively you may define your policy in code by using a built in Nightfall detector from the as follows:
See for more information about how policies and detection rules may be defined through code.
You can read further about the fields in the response object in the .
An Exclusion Rule allows you to refine a Detector to make sure false positives are not surfaced by Nightfall.
For instance you may want to detect whether credit card numbers are being shared inappropriately in your organization. However, there may be cases where members of your QA are sharing test credit card numbers, which should not be considered a violation and should be ignored by Nightfall.
In the following example, we define a Detector with a regular expression to match credit cards.
We then add an exclusion for some known test credit cards.
As the resulting payload shows, only the 3rd provided Credit Card number matches because the first two items in the payload are included in our ExclusionRules word list.
You can use the surrounding context of a match to help determine how likely it is that your potential match should actually be considered as a match by adjusting its confidence rating.
You can also tell the Detection Rule to return a portion of the surrounding context for manual review.
In the following example, in addition to providing a regular expression to match Social Security Numbers, we also look to see if someone has written the text “SSN” before and after the match, which might be a label indicating it is indeed a social security number. In which case, we change our confidence score to “VERY_LIKELY.” We then provide two possible matches in our payload, the first of which contains the string “SSN”.
In the results, you can see the confidence for the first finding in the payload has been set to VERY_LIKELY while the second item is only LIKELY.
In addition to using pre-defined Detection Rules, you may define Detection Rules within the body of your scan method by either supplying:
the identifier of one of Nightfall's native detectors
the UUID of an a Detector defined through the UI
a Regular Expression
a Word List.
Out of the box, Nightfall comes with an extensive library of native detectors.
In the example below two of Nightfall's native Detectors (detectorType
= "NIGHTFALL_DETECTOR") are being used:
US_SOCIAL_SECURITY_NUMBER
CREDIT_CARD_NUMBER.
When defining a Detection Rule in line, you configure the minimum confidence level (minConfidence
) and minimum number of times the match must be found (minNumFindings
) for the rule to be triggered. .
In the payload body, you can see that we are submitting a list of three different strings to scan (payload
). The first will trigger the U.S. Social Security Detector. The last will trigger the credit card Detector. The middle example will trigger neither.
For more information on the parameters related to redaction, see Using Redaction.
Below is the response payload to the previous request.
The following example shows a Detection Rule composed of two Detectors defined using regular expressions – one for the format of an International Standard Recording Code (ISRC) and one for the format of an International Standard Musical Work Code (ISWC) – matching either of which will trigger the Detection Rule (by using the logicalOp “Any”).
We will provide a payload of two strings, one of which will match the ISRC and one of which will match the ISWC.
The returned response demonstrates how findings are returned, with a finding per payload entry and the Detection Rule and Detector that matched the payload, if any.
The byte range that triggered the match is also provided. In the case of the 2nd item in the payload, since the match occurred at the beginning of the string, it has a location where the byteRange start is 0. In the case of the 3rd payload entry the location offset is 31.
The following example shows how a word list may be used instead of a regular expression.
Below is the resulting payload with the findings detected in our different payload strings.
Note that since the isCaseSensitive
flag is set to "false" for the detector, so the first string in our payload matches a word from our word list.
Also note that the confidence level for a word list match defaults to "LIKELY" so you should not set a minConfidence
level higher than that if you want matches to result.
The Nightfall API is capable of returning a redacted version of your scanned text when a Detector is triggered.
This functionality allows you to hide potentially sensitive information while retaining the original context in which that information appeared.
In order to redact content, when you call the scan endpoint you must provide a RedactionConfig as part of the definition of your Detection Rule.
You may specify one of the following different methods to redact content:
apply masking (e.g. asterisks)
substitute a custom phrase
substitute the name of the Detector triggered (referred to as "InfoType substitution")
use encryption
A RedactionConfig is defined per Detector in a Detection Rule, allowing you to specify a different redaction method for each type of Detector in the rule.
By default, the redaction feature will return both the sensitive finding and the redacted version of that finding. You may set the removeFinding
field to true
if you want only the redacted version of the finding returned in the response.
Specifying a MaskConfig as part of your RedactionConfig substitutes a character for each character in the matched text. By default the masking character is an asterisk (*
). You may specify an alternate character to use instead (maskingChar
).
You may also choose to only mask a portion of the original text by specifying a number of characters to leave unmasked (numCharsToLeaveUnmasked
). For instance, if you want to mask all but the last 4 digits of a credit card number, set this value to 4 so that the redacted finding would be rendered as ***************4242
.
In the case where you want to leave characters unmasked at the front of the string you may use the maskLeftToRight
flag. This flag determines if masking is applied left to right (*****/1984
) instead of right to left (01/01*****
). By default, this value is false
.
Below is an example of how a RedactionConfig would be configured to redact the text that triggers a DATE_OF_BIRTH
Detector such that the text 01/11/1995
becomes ??/??/??95
The SubstitutionConfig substitutes a sensitive finding with the value assigned to the property substitutionPhrase
.
If no value is assigned to substitutionPhrase
, the finding will be replaced with an empty string.
It is possible to replace a sensitive finding with the name of the NIGHTFALL_DETECTOR
that triggered it by using an InfoTypeSubstitutionConfig.
If you use the built in credit card Detector, the string 4242-4242-4242-4242
will be redacted to [CREDIT_CARD_NUMBER]
This config is only valid for Detector's with a detectorType
of NIGHTFALL_DETECTOR
.
A CryptoConfig will encrypt a sensitive finding with a public key (provided as the publicKey
property of the config) using RSA encryption.
Note that you are responsible for passing public keys for encryption and handling any decryption of the response payload. Nightfall will not store your keys.
Below is an example of a CryptoConfig being used to redact an EMAIL_ADDRESS
detector.
The results of applying redactions are returned in the response payload for requests made to the scan endpoint as both part of an array named redactedPayload
as well as additional properties of the finding
object.
The original input payload with redactions made inline are returned as a list of strings under the redactedPayload
property. Each item in the list of redacted payloads corresponds to the list of strings in the original input payload and, if a Detector was triggered, it will contain a redacted version of that corresponding string.
If an item in the input payload did not have any findings, the entry for that index will be an empty string ("").
The redactedPayload
property is omitted if no RedactionConfig was provided.
Additionally, the fields redactedFinding
and redactedLocation
are added to the finding
object when the redaction feature is invoked.
The redactedFinding
field contains the redacted version of only the text of the finding without its surrounding context. This is useful when you are masking a portion of the text that triggered a Detector.
The redactedLocation
property will be returned as part of the finding that corresponds to an item in the payload. This may be distinct from the location
property that is returned for a finding by default.
In the unlikely case where there are findings that overlap, Nightfall will default to replacing the text of the overlapping findings with [REDACTED BY NIGHTFALL]
.
The following example shows how the redaction functionality may be invoked, with a variety of different redaction methods applied to the different Detectors being used.
You can see in the response how the RedactionConfig associated with the various Detectors affects the different findings.
Note that because the 2nd item the payload matches multiple detectors, the redacted text in the redactedPayload
property becomes [REDACTED BY NIGHTFALL]
Firewall for AI DLP APIs enables developers to write custom code to sanitize data anywhere–RAG data sets, analytics data stores, data pipelines, and unsupported SaaS applications.
Policies allow customers to create templates for their most common workflows such as sending alerts when detection rules are triggered.
These policies may be created manually through the dashboard or may be defined programmatically.
When defining an a Policy inline, in addition to specifying the Detection Rules (either by referencing the UUID of an existing Detection Rule or defining a Detection Rule and its Detectors inline), you must define an alertConfig
which will determine where findings are sent.
The alertConfig can be either:
an email address
a Slack channel
a webhook url
a url to a SIEM host as well authentication and other headers
Below is a simple example of a payload with a policy that will send alerts to an email address that you would use with our endpoint for Scan Plain Text.
You will receive the following response:
Note that you may also use a pre-defined policy defined under Developer Platform > Overview > Policies by copying the Policy UUID and sending a request as shown below.
The policy
object supersedes the config
object. The use of config
objects will still continue to be supported, but its use should be considered deprecated. If you specify policy
object you cannot also specify a config
object.
Also note that previous iterations of the API allowed for a simple list of policyUUIDs
to be specified instead of of a policy
object. This has been preserved for backwards compatibility, but it is recommended you use the policy
object as it has a richer set of features. You may not use both a policyUUIDs
list and a policy
object.
The following payload will be sent to the given email address with the subject "🚨 Findings Detected by Nightfall! 🚨" as an attachment with the name nightfall-findings.json
:
This attachment has the same content as the response payload to the initial request.
Note that the sender address will be no-reply@nightfall.ai
This email address will not respond to messages sent to it.
Policies also allow you to send findings to a callback designated URL using the url
property of the alertConfig
object.
This mechanism allows you to programmatically consume findings and the data sent will contain sensitive information as well as additional metadata like the location of the findings in the payload. For this reason the URL must be an HTTPS URL and the service backing it be implemented to properly respond with your Webhook signing key and act as a Webhook Server.
Below is what Webhook URL should like in your policy's alertConfig
in a payload sent to our endpoint used for scanning plain text.
Another option supported by Policies is sending finding data to a designated Slack channel.
This feature requires that you have configured the Nightfall Slack integration.
Below is a sample payload for scanning plain text.
Below is an example as to how the violation will appear in Slack.
See the section on Slack in the overview on Alerting for more details.
SIEM (pronounced “sim”) is a combination of security information management (SIM) and security event management systems. SIEM technology collects event log data for analysis in order to provide visibility into network activity.
It is possible to send findings from a policy to a SIEM service such as LogRhythm, SumoLogic, or Splunk using the siem
alertConfig.
This configuration will require a URL to a collector that uses an HTTPS endpoint.
Note that the URL for the siem
alertConfig must:
use the HTTPS scheme
be able to accept requests made with the POST verb
respond with a 200 status code upon receipt of the event
See the documentation for your SIEM service for how to set up this URL.
Unlike the url
alertConfig option, the siem
alertConfig does not require that the endpoint for the service implement a custom challenge response. Events sent to the siem
alertConfig endpoint contain a subset of what is sent to the url
alertConfig. Furthermore the findings are sent in a redacted form similar to Slack or email alerts.
In addition to the URL, you may provide headers such as those that are used for authorization.
The headers in the SIEM alertConfig are divided into sensitiveHeaders
and plainTextHeaders
header mappings.
The sensitiveHeaders
field is specifically for header values like authentication. Nightfall ensures that these header values are always hidden in our service. They are never logged or saved in analytic events.
You can use plainTextHeaders
for all other type of information you would like passed along with Nightfall alerts to you HTTP endpoint. Nightfall assumes that the values stored plainTextHeaders
do not contain any sensitive information so we do not take any action to hide or protect these values.
Below is an example of a payload using a siem
alertConfig.
A policy may be configured with default redaction rules as a defaultRedactionConfig
that will affect the content of the redactedPayload
field of the content that is sent to the alert locations specified in the policy alertConfig
. Note that this redaction does not affect the findings themselves.
These redaction rules will be applied to Detection Rules that do not have a specified redaction configuration.
The redactionConfig
specified must be one and only one of the four available redaction types:
maskConfig
infoTypeSubstitutionConfig
substitutionConfig
cryptoConfig
For more information on Redactions see: Using Redaction
Below is a simple example of a payload for scanning plain text using a policy set up to use a defaultRedactionConfig
In additional to a defaultRedactionConfig
it is possible to set the number of bytes to include as before and after a given finding as the contextBytes
. This context can provide meaning to how the finding appears within the text to allow human readers to better understand the meaning of the finding. The maximum value for contextBytes
is 40.
Leaked secrets, such as credentials needed to authenticate and authorize a cloud provider’s API request, expose company software, services, infrastructure, and data to hackers.
Nightfall has developed technology to detect secrets and label findings to speed SecOPs workflows from being clogged and eliminate false positive alerts.
Nightfall uses machine learning models trained on a large (millions of lines of code) diverse dataset (including all programming languages and application types) to ensure best-in-class secret detection accuracy and coverage.
For a growing set of the most popular services, Nightfall will:
label detected secrets by vendor and service type (returned the kind
field of the response)
label detected secrets as active risks by validating supported credential types with their associated service endpoints (returned as the status
of the service)
Our current solution supports the following vendors covering a diverse set of use cases, including cloud storage/infrastructure, communication, social networks, software development, banking, observability, and payment processing.
This list is not static and will continue to grow as we add support for detecting API keys from additional services. If you want to detect API keys from a service not listed below, please contact us.
Below is an example of how an AWS Key would be shown in a finding.
The following values are returned for the status
field:
ACTIVE
EXPIRED
UNVERIFIED
This value will be based on what information is returned by the corresponding service when attempting the validate the key. If no data is returned fro the service, it will be considered UNVERIFIED
.
To use this functionality, you use our existing built-in API_KEY detector to scan a data source such as Git Repository. Below is an example using a detection rule defined in line for a text scan.
The Nightfall Developer Playground () is a sample app that you may use to test out API functionality before writing any code.
Our playground environment allows you to:
Test Detectors and Detection Rules. Here are some .
Generate sample data for DLP testing.
Explore a sample app built on our APIs
While using Nightfall's Scan API, you may encounter some of the common errors outlined below. Try following the provided troubleshooting steps.
If problems persist, please for further assistance.
The following error codes are returned as part of a standard HTTP response.
HTTP Error Code | Description | Troubleshooting |
---|
Protected health information (PHI), also referred to as personal health information, describes a patient's medical history — including ailments, various treatments, and outcomes. PHI may include:
demographic information
test and laboratory results
mental health conditions
insurance information
The Health Insurance Portability and Accountability Act (HIPAA) of 1996 is the primary law that oversees the use of, access to, and disclosure of PHI in the United States. HIPAA lists 18 different personal information identifiers (PII) that, when paired with health information, become PHI. In order to more accurately detect potential PHI, Nightfall has introduced specific new detectors that allow for specialized combinations.
These HIPAA PII and PHI-specific detectors intelligently aggregate Nightfall's built-in detector to ensure compliance with governing law. For example, finding a patient's name in a document or message is not considered HIPAA PII as it does not uniquely identify an individual, many people can share the same name. However, the information would be considered HIPAA PII if the patient's name and address were in the same message.
Specific PHI and HIPAA PII can be detected with greater confidence, especially as they relate to specific medical codes or terms in association with specific logical combinations of other PII. For instance when the patient's name and date of birth or a person's name and street address or any of a set of particular PII (phone number email, SSN, etc) it would be considered HIPAA PII.
If the combined detectors all match with a confidence of "Very Likely" it would match our "HIPAA PII Very Likely" Detection Rule. Otherwise if these detectors match with a confidence of "Likely" it would match our "HIPAA PII Likely" Detection Rule.
Alternatively when any of the above PII options are found in conjunction with a specific set of medical related codes or terms (IDC Codes, FDA Drug Names or Codes, Procedures), that finding could be flagged as PHI.
When all the detectors within these PHI Detection Rules make findings that have a confidence of "Very Likely," that would match our "PHI Very Likely" Detection Rule, while if some are all are met with a confidence of "Likely" that would match our "PHI Likely" Detection Rule.
The following sample datasets can be used to test Nightfall's advanced AI-based detection capabilities.
This data has been fully de-identified and can be used to test any data loss prevention (DLP) platform.
Our PHI Detectors may be used just like other Detectors with or .
AWS
Azure
Confluence
Confluent
Datadog
ElasticSearch
GCP
Google API
GitHub
GitLab
JIRA
JWT
Nightfall
Notion
Okta
Paypal
Plaid
Postmark
Postman
RapidAPI
Salesforce
Sendgrid
Slack
Snyk
Splunk
Square
Stripe
Twilio
Zapier
400 | Bad Request | This error most often occurs when there is something syntactically incorrect in the body of your request. Check your request format and try again. For example, this error could occur if the request body size is greater than 500 KB, or if the number of items to scan in the payload exceeds 50,000. |
401 | Unauthorized | You may be using an incorrect API key or calling the wrong endpoint. |
422 | Unprocessable Entity | You may be using an invalid or unrecognized detector set. You may also have exceeded the maximum allowable payload size; try spreading your payload across multiple requests. |
429 | Too Many Requests or Quota Exceeded | Either your monthly request limit has been exceeded, or you have exceeded the allowed rate limit. Consider upgrading to a higher volume plan, or wait several moments to retry the requests. |
500 | Internal Server Error | Wait a few moments and try again. If the problem persists, Nightfall may be experiencing an outage. |
The native SaaS app APIs can be utilized by customers using Nightfall’s SaaS apps, supported natively, to fetch violations, search violations by app meta-data attributes, and fetch findings within violations. These DLP APIs do not provide access to violations for apps scanned via the developer platform. These APIs require you to create an API key as outlined in the Getting Started with the Developer Platform section. However, to use these APIs, you need not create any detectors, detection rules, and policies in the developer platform.
If you are using Nightfall SaaS apps, you can use APIs to fetch violations, search through the violations, and fetch specific findings within the Violations. To scan data in any custom apps or cloud infrastructure services like AWS S3, you must use the APIs in the DLP APIs - Firewall for AI Platform section.
To prevent misuse and ensure the stability of our platform, we enforce a rate limit on an API Key and endpoint basis, similar to the way many other APIs enforce rate limits.
When operating under our Free plan, accounts and their corresponding API Keys have a rate limit of 5 requests per second on average, with support for bursts of 15 requests per second. If you upgrade to a paid plan – the Enterprise plan – this rate increases to a limit of 10 requests per second on average and bursts of 50 requests per second.
Plan | Requests Per Second (Avg) | Burst |
---|---|---|
The Nightfall API follows standard practices and conventions to signal when these rate limits have been exceeded.
Successful requests return a header X-Rate-Limit-Remaining
with the integer number of requests remaining before errors will be returned to the client.
When your application exceeds the rate limit for a given API endpoint, the Nightfall API will return an HTTP response code of 429 "Too Many Requests.” If your use case requires increased rate limiting, please reach out to support@nightfall.ai.
Additionally, these unsuccessful requests return the number of seconds to wait before retrying the request in a Retry-After Header.
Your Request Rate Limiting throttles how frequently you can make requests to the API. You can monitor your rate limit usage via the `X-Rate-Limit-Remaining
` header, which tells you how many remaining requests you can make within the next second before being throttled.
Your Quota limits how many bytes of data you're permitted to scan within a given period. Your current remaining quota and the end of your current quota period are denoted by the following response headers.
Response Headers | Type | Description |
---|---|---|
Free
5
15
Enterprise
10, more by request
50, more by request
X-Quota-Remaining
string
The bytes remaining in your quota for this period. Will be reset to the amount specified in your billing plan at the end of your quota cycle.
X-Quota-Period-End
datetime
the date and time at which your quota will be reset, encoded as a string in the RFS-3339 format.
Leverage our software development kits (SDKs) to enable easier, faster, and more stable engagement with the Nightfall APIs. Nightfall has a growing library of language specific SDKs including for:
If there is a language-specific SDK that you would find valuable but is not here, please don't hesitate to reach out to product@nightfall.ai.
Note
Internal only endpoint. This will change once Nightfall introduces CRUD API's for policies.
To prevent misuse and ensure the stability of our platform, we enforce a rate limit on an API Key and endpoint basis, similar to the way many other APIs enforce rate limits.
When operating under our Free plan, accounts and their corresponding API Keys have a rate limit of 5 requests per second on average, with support for bursts of 15 requests per second. If you upgrade to a paid plan – the Enterprise plan – this rate increases to a limit of 10 requests per second on average and bursts of 50 requests per second.
Plan | Requests Per Second (Avg) | Burst |
---|---|---|
The Nightfall API follows standard practices and conventions to signal when these rate limits have been exceeded.
Successful requests return a header X-Rate-Limit-Remaining
with the integer number of requests remaining before errors will be returned to the client.
When your application exceeds the rate limit for a given API endpoint, the Nightfall API will return an HTTP response code of 429 "Too Many Requests.” If your use case requires increased rate limiting, please reach out to support@nightfall.ai.
Additionally, these unsuccessful requests return the number of seconds to wait before retrying the request in a Retry-After Header.
Your Request Rate Limiting throttles how frequently you can make requests to the API. You can monitor your rate limit usage via the `X-Rate-Limit-Remaining
` header, which tells you how many remaining requests you can make within the next second before being throttled.
Your Quota limits how many requests you can make within a given period. Your current remaining quota and the end of your current quota period are denoted by the following response headers.
Response Headers | Type | Description |
---|---|---|
For the free plan, we allow 5 requests per second and 10000 requests in a day.
Free
5
15
Enterprise
10
50
X-Quota-Remaining
string
The requests remaining in your quota for this period. Will be reset to the amount specified in your billing plan at the end of your quota cycle.
X-Quota-Period-End
datetime
The date and time at which your quota will be reset, encoded as a string in the RFS-3339 format.
Getting Started
Learn how to create a Nightfall API key, Detectors, Detection rules, and Polices
Nightfall APIs
Nightfall Scan and Workflow APIs enable you to integrate DLP protection programatically
SDKs
Learn how to leverage the Nightfall Software Development Kits (SDKs)
Language Specific Guides
Learn how to use Nightfall's APIs/SDKs with specific programming languages.
Integration Tutorials
Learn how to integrate Nightfall into some GenAI apps and datastores.
Popular Use Cases
Review popular scenarios and apply them to your DLP use case.
Detection Playground
A code free environament for test driving Firewall for AI.
Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (OWASP LLM06). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:
Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property
Real-world scenarios highlight the urgency of this issue:
Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.
Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.
Let's look at a Python example using OpenAI and Nightfall's Python SDK. You can download this sample code here.
Step 1: Setup Nightfall
Get an API key for Nightfall and set environment variables. Learn more about creating an API key here.
Step 2: Configure Detection
Create an inline detection rule with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.
If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.
Step 3: Classify, Redact, Filter Your User Input
Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.
For example, let’s say we send Nightfall the following:
We get back the following redacted text:
Step 4: Send Redacted Prompt to OpenAI
Review the response to see if Nightfall has returned sensitive findings:
If there are sensitive findings:
You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.
Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request.
Construct your outgoing prompt.
If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
Use the OpenAI API or SDK client to send the prompt to the AI model.
You'll see that the message we originally intended to send had sensitive data:
And the message we ultimately sent was redacted, and that’s what we sent to OpenAI:
OpenAI sends us the same response either way because it doesn’t need to receive sensitive data to generate a cogent response. This means we were able to leverage ChatGPT just as easily but we didn’t risk sending OpenAI any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.
This guide describes how to use Nightfall with the Python programming language.
The example below will demonstrate how to use Nightfall’s text scanning functionality to verify whether a string contains sensitive PII using the Nightfall Python SDK.
To request the Nightfall API you will need:
A Nightfall API key
An existing Nightfall Detection Rule
Data to scan. Note that the API interprets data as plaintext, so you may pass it in any structured or unstructured format.
You can read more about obtaining a Nightfall API key or about our available data detectors in the linked reference guides.
In this tutorial, we will be downloading, setting up, and using the Python SDK provided by Nightfall.
We recommend you first set up a virtual environment. You can learn more about that here.
You can download the Nightfall SDK from PyPi like this:
We will be using the built-in os
library to help run this sample API script. This will be used to help extract the API Key from the OS as an environment variable.
Next, we extract our API Key, and abstract a nightfall class from the SDK, for it. In this example, we have our API key set via an environment variable called NIGHTFALL_API_KEY
. Your API key should never be hard-coded directly into your script.
Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.
In this example, we will use some example data in the payload
List.
🚧Payload LimitPayloads must be under 500 KB when using the Scan API. If your file is larger than the limit, consider using the file api, which is also available via the Python SDK.
We will ignore the second parameter as we do not have redaction configured for this request.
With the Nightfall API, you can redact and mask your findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.
Now we are ready to review the results from the Nightfall SDK to check if there is any sensitive data in our file. Since the results will be in a dataclass, we can use the built-in __repr__
functions to format the results in a user-friendly and readable manner.
All data and sample findings shown below are validated, non-sensitive, examples of sample data.
If there are no sensitive findings in our payload, the response will be as shown in the 'empty response' pane below:
You are now ready to use the Python SDK for other scenarios.
This section consists of various documents that assist you in scanning various popular SaaS GenAI services and frameworks using Nightfall APIs.
This section consists of various documents that assist you in scanning various popular SaaS applications using Nightfall APIs.
This guide describes how to use Nightfall with the Java programming language.
The example below will demonstrate how to use Nightfall’s text scanning functionality to verify whether a string contains sensitive PII using the Nightfall Java SDK.
In this tutorial, we will be downloading, setting up, and using the Java SDK provided by Nightfall.
To make a request to the Nightfall API you will need:
A Nightfall API key
Plaintext data to scan.
You can read more about obtaining or about our available from the linked reference guides.
You can add the Nightfall package to your project by adding a dependency to your pom.xml
:
First add the required imports to the top of the file.
These are the objects we will use from the Nightfall SDK, as well as some collection classes for data handling.
We can then declare some data to scan in a List
:
Create a ScanTextRequest
to scan the payload with. First create a new instance of the credit card detector, and set to trigger if there are any findings that are confidence LIKELY
or above.
Add a second detector, looking for social security numbers. Set it to be triggered if there is at least a possible finding.
Combine these detectors into a detection rule, which will return findings if either of these detectors are triggered.
Finally, combine the payload and configuration together as a new ScanTextRequest
, and return it.
Use the ScanTextRequest
instance with a NightfallClient to send your request to Nightfall.
The resulting ScanTextResponse
may be used to print out the results:
You are now ready to use the Java SDK for other scenarios.
Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:
Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property
Real-world scenarios highlight the urgency of this issue:
Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.
Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.
Let's examine this in a Python example using the LangChain, Anthropic, and Nightfall Python SDKs. You can download this sample code .
Step 1: Setup Nightfall
Install the necessary packages using the command line:
Set up environment variables. Create a .env
file in your project directory:
Step 3: Classify, Redact, Filter Your User Input
to integrate content filtering into our LangChain pipeline seamlessly. We'll create a custom LangChain component for Nightfall sanitization. This allows us to seamlessly integrate content filtering into our LangChain pipeline.
We start by importing necessary modules and loading environment variables.
We initialize the Nightfall client and define detection rules for credit card numbers.
The NightfallSanitizationChain
class is a custom LangChain component that handles content sanitization using Nightfall.
We set up the Anthropic LLM and create a prompt template for customer service responses.
We create separate chains for sanitization and response generation, then combine them using SimpleSequentialChain
.
The process_customer_input
function provides an easy-to-use interface for our chain.
In a production environment, you might want to add more robust error handling and logging. For example:
To use this script, you can either run it directly or import the process_customer_input
function in another script.
Simply run the script:
This will process the example customer input and print the sanitized input and final response.
You can import the process_customer_input
function in another script:
If the example runs properly, you should expect to see an output demonstrating the sanitization process and the final response from Claude. Here's what the output might look like:
And that's it
And that's it
If you don't yet have a Nightfall account, sign up .
Create a Nightfall key. Here are the .
Create an with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.
If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction .
Customer support tickets are a potential vector for leaking customer PII. By utilizing HubSpot's CRM tickets API in conjunction with Nightfall AI’s scan API you can discover, classify, and remediate sensitive data within your customer support system.
You will need a few things to follow along with this tutorial:
A HubSpot account and API key
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment (version 3.6 or later)
Most recent version of Python Nightfall SDK
To accomplish this, we will install the version required of the Nightfall SDK:
We will be using Python and importing the following libraries:
We've configured the HubSpot and Nightfall API keys as environment variables so they don't need to be committed directly into our code.
Next, we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.
Also, we abstract a nightfall class from the SDK, from our API key.
Here we'll define the headers and other request parameters that we will be using later to call the Hubspot API.
Let’s start by using HubSpot API to retrieve all support tickets in our account. As the HubSpot API takes a "page limit" parameter, we will query the tickets over multiple requests to the HubSpot API, checking for list completion on each call. We'll compile the tickets into a list called all_tickets
.
The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later. We won't include the sensitive fragments themselves to avoid replicating PII unnecessarily, but we'll include a redacted copy with 3 characters exposed to help identify it during the review process.
'Properties' -> 'Content' is the only field where users can supply their data, so it is the only field we need to pass to the Nightfall API. We store the ticket IDs in a matching list so that we can put a location to our findings later.
We are now ready to call the Nightfall API to scan our HubSpot tickets. This tutorial assumes that the totality of your tickets falls under the payload limit of the Nightfall API. In practice, you may want to check the size of your payload using a method like sys.getsizeof() and chunk the payload across multiple requests if appropriate.
Now that we have a collection of all of our tickets, we will begin constructing an all_findings
object to collect our results. The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later.
This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.
For each finding in each ticket, we collect the required information from the Nightfall API to identify and locate the sensitive data, pairing them with the HubSpot ticket IDs we set aside earlier.
Finally, we export our results to a CSV so they can be easily reviewed.
That's it! You now have insight into all of the sensitive data inside your customer support tickets. As a next step, we could utilize HubSpot's API to add a comment to tickets with sensitive findings, and then trigger an email alert for the offending ticket owner.
To scan your support tickets on an ongoing basis, you may consider persisting your last ticket query's paging value and/or checking the last modified date of your tickets.
With the Nightfall API, you are also able to redact and mask your HubSpot findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.
The example above is specific for the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Retrieve ticket data from Hubspot
Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the ticket data from Hubspot:
Now we go through write the logs to a .csv file.
Begin the file upload process to the Scan API, with the above written .csv file, as shown here.
Once the files have been uploaded, begin using the scan endpoint mentioned here. Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
Customer support tickets are a potential vector for leaking customer PII. By utilizing ZenDesk’s API in conjunction with Nightfall’s scan SDK you can discover, classify, and remediate sensitive data within your customer support system.
You will need a few things to follow along with this tutorial:
A ZenDesk account and API key
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment
most recent version of the Nightfall Python SDK
To accomplish this, we will install the version required of the Nightfall SDK:
We will be using Python and importing the following libraries:
We've configured the ZenDesk user and API key, as well as the Nightfall API key as environment variables so they don't need to be committed directly into our code.
Here we'll define the headers and other request parameters that we will be using later to call both APIs. Next we extract our API Key, and abstract a nightfall class from the SDK, for it.
Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.
Let’s start by using ZenDesk’s API to retrieve all support tickets in our account. We'll set up an "all_findings" object to compile our findings as we go.
The first row of our all_findings object will constitute our headers, since we will dump this object to a CSV file later.
This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.
Now that we have a collection of all of our tickets, we will retrieve the set of user comments made on each of those tickets.
Note: If you are scanning a high volume of tickets, you may run into either the ZenDesk API's rate limits, or the Nightfall API's rate limits. In this tutorial, we assume that you fall under these limits, but additional code may be required to ensure this.
Within the above for loop, we compile all of the comment bodies into a list so that we can scan the entire comment thread for a ticket with a single call to the Nightfall SDK.
For each set of results we receive, we can start to compile our findings into a csv format.
Finally, we export our results to a csv so they can be easily reviewed.
That's it! You now have insight into all of the sensitive data inside your customer support tickets. As a next step, we could use these findings as an input to ZenDesk's redact API in order to clean up the original comments. We could also use ZenDesk's API to add a comment to tickets with sensitive findings triggering an email alert for the offending ticket owner.
To scan your support tickets on an ongoing basis, you may consider taking advantage of ZenDesk's Incremental Exports functionality.
Putting everything together:
That's it! You should now be set up to start using the Zendesk integration for the Nightfall Text Scanning SDK.
With the Nightfall API, you are also able to redact and mask your Zendesk ticket findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.
The example above is specific for the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Retrieve ticket data from Zendesk
Similar to the process at the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve ticket data from Zendesk.
Now we go through write the ticket data to a .csv file.
Begin the file upload process to the Scan API, with the above written .csv file, as shown here.
Once the files have been uploaded, begin using the scan endpoint mentioned here. Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
Datadog is a monitoring and analytics tool for information technology (IT) and DevOps teams that can be used to determine performance metrics as well as event monitoring for infrastructure and cloud services. This tutorial demonstrates how to use the Nightfall API for scanning your Datadog logs/metrics/events.
This tutorial allows you to scan your Datadog instance using the Nightfall API/SDK.
You will need a few things first to use this tutorial:
A Datadog account with an API key and Application key
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment (version 3.7 or later)
Python Nightfall SDK
We need to install the nightfall and requests library using pip. All the other libraries we will be using are built into Python.
We will be using Python and installing/importing the following libraries:
Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.
Note, we are setting the Datadog authentication information as the below environment variables, and referencing the values from there:
DD_API_KEY
DD_APPLICATION_KEY
Next we abstract a nightfall class from the SDK, for our API key.
First we will set up the connection with Datadog, and get the data to be scanned from there.
The three different code sample options below are for the three different available items from Datadog to scan:
logs - Scans the 100 most recent logs from Datadog.
metrics - Scans all active metric tags from the last 24 hours.
events - Scans all events from the last 24 hours.
Each one of these options saves the data into a data_to_scan
list of tuples where the first element in the tuple is the id of the data to scan and the second element is a string of data to scan.
Please follow that same option in the next few panes:
We then run a scan on the aggregated data from using the Nightfall SDK. Since all of the examples create the same data_to_scan
list, we can use the same code to scan them all.
To review the results, we will write the findings to an output csv file:
Note
The results of the scan will be outputted to a file named nf_datadog_output-TIMESTAMP.csv.
This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.
With the Nightfall API, you are also able to redact and mask your Datadog findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.
The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Retrieve data from Datadog
Similar to the process at the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the data we like, from Datadog. This can be either logs/metrics/events. The below example will show logs:
Now we go through write the logs to a .csv file.
Begin the file upload process to the Scan API, with the above written .csv file, as shown here.
Once the files have been uploaded, begin using the scan endpoint mentioned here. Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
Generative AI systems like OpenAI's ChatGPT have revolutionized how we interact with technology, but they come with a significant risk: the inadvertent exposure of sensitive information (OWASP LLM06). Without proper safeguards, these AI platforms may receive, process, and potentially retain confidential data, including:
Personally Identifiable Information (PII)
Protected Health Information (PHI)
Financial details (e.g., credit card numbers, bank account information)
Intellectual property
Real-world scenarios highlight the urgency of this issue:
Support Chatbots: Imagine a customer service AI powered by OpenAI. Users, in their quest for help, might unknowingly share credit card numbers or Social Security information. Without content filtering, this sensitive data could be transmitted to OpenAI and logged in your support system.
Healthcare Applications: Consider an AI-moderated health app that processes patient and doctor communications. These exchanges may contain protected health information (PHI), which, if not filtered, could be unnecessarily exposed to the AI system.
Content filtering is a crucial safeguard, removing sensitive data before it reaches the AI system. This ensures that only necessary, non-sensitive information is used for content generation, effectively preventing the spread of confidential data to AI platforms.
A typical pattern for leveraging Claude is as follows:
Get an API key and set environment variables
Initialize the Anthropic SDK client (e.g. Anthropic Python client), or use the API directly to construct a request
Construct your prompt and decide which endpoint and model is most applicable.
Send the request to Anthropic
Let's look at a simple example in Python. We’ll ask a Claude model for an auto-generated response we can send to a customer who is asking our customer support team about an issue with their payment method. Note how easy it is to send sensitive data, in this case, a credit card number, to Claude.
This is a risky practice because now we are sending sensitive customer information to Anthropic. Next, let’s explore how we can prevent this while still benefitting from Claude.
Updating this pattern by using Nightfall is straightforward to check for sensitive findings and ensure sensitive data isn’t sent out. Here’s how:
Step 1: Setup Nightfall
Get an API key for Nightfall and set environment variables. Learn more about creating a Nightfall API key here. In this example, we’ll use the Nightfall Python SDK.
Step 2: Configure Detection
Create a pre-configured detection rule in the Nightfall dashboard or an inline detection rule with the Nightfall API or SDK client.
Consider using Redaction
Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.
Step 3: Classify, Redact, Filter
Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.
For example, let’s say we send Nightfall the following:
We get back the following redacted text:
Send Redacted Prompt to Anthropic
Review the response to see if Nightfall has returned sensitive findings:
If there are sensitive findings:
You can specify a redaction config in your request so that sensitive findings are redacted automatically.
Without a redaction config, you can break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
Initialize the Anthropic SDK client (e.g., Anthropic Python client), or use the API directly to construct a request.
Construct your outgoing prompt.
If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
Use the Anthropic API or SDK client to send the prompt to the AI model.
Let's look at a Python example using Anthropic Claude and Nightfall's Python SDK. You can download this sample code here.
Get an API key for Nightfall and set environment variables. Learn more about creating an API key here.
Step 2: Configure Detection
Create an inline detection rule with the Nightfall API or SDK client, or use a pre-configured detection rule in the Nightfall account. In this example, we will do the former.
If you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.
Step 3: Classify, Redact, Filter Your User Input
Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint. The Nightfall API will respond with detections and the redacted payload.
For example, let’s say we send Nightfall the following:
We get back the following redacted text:
Step 4: Send Redacted Prompt to Anthropic
Review the response to see if Nightfall has returned sensitive findings:
If there are sensitive findings:
You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically.
Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
Construct your outgoing prompt.
If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you.
Use the Claude API or SDK client to send the prompt to the AI model.
You'll see that the message we originally intended to send had sensitive data:
And the message we ultimately sent was redacted, and that’s what we sent to Anthropic:
Anthropic sends us the same response either way because it doesn’t need to receive sensitive data to generate a cogent response. This means we were able to leverage Claude just as easily but we didn’t risk sending Anthropic any unnecessary sensitive data. Now, you are one step closer to leveraging generative AI safely in an enterprise setting.
This section consists of various documents that assist you in scanning various popular observability platforms using Nightfall APIs.
This section consists of various documents that assist you in scanning various popular data stores using Nightfall APIs.
New Relic is a Software as a Service offering that focuses on performance and availability monitoring.
This tutorial allows you to scan your New Relic logs using the Nightfall API/SDK.
You will need a few things first to use this tutorial:
A New Relic account with an API key and Account ID
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment (version 3.6 or later)
The most recent version of Python Nightfall SDK
To accomplish this, we will install the version required of the Nightfall SDK:
We will be using Python and installing/importing the following libraries:
Note, we are setting the New Relic authentication information as the below environment variables, and referencing the values from there:
NR_API_KEY
NR_ACCOUNT_ID
Next we abstract a nightfall class from the SDK, for our API key.
First we will set up the connection with New Relic, and get the data to be scanned from there.
The code sample below will help to scan:
logs - Scans the 100 most recent logs from New Relic. (Note: This can be modified to meet your needs)
Please follow that same option in the next few panes:
We then run a scan on the aggregated data from New Relic, using the Nightfall SDK:
To review the results, we will write the findings to an output csv file:
Note:
The results of the scan will be outputted to a file named nf_newrelic_output-TIMESTAMP.csv.
This example will include the full finding above. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.
Finding the Logs in New Relic
The New Relic API does not provide a great way to get a direct URL to a log message. The simplest way to find the log message with sensitive data is to navigate to the New Relic UI and search your logs with this query messageId:"$YOUR_MESSAGE_ID". You can copy the messageId from the CSV file generated using this script.
With the Nightfall API, you are also able to redact and mask your New Relic findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.
The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down into the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Retrieve data from New Relic
Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the data we like, from New Relic. The below example will show the most recent 100 logs:
Now we go through write the logs to a .csv file.
Begin the file upload process to the Scan API, with the above written .csv file, as shown here.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
How to scan for sensitive data in Airtable
Airtable is a popular cloud collaboration tool that lands somewhere between a spreadsheet and a database. As such, it can house all sorts of sensitive data that you may not want to surface in a shared environment.
By utilizing Airtable's API in conjunction with Nightfall AI’s scan API, you can discover, classify, and remediate sensitive data within your Airtable bases.
You will need a few things to follow along with this tutorial:
An Airtable account and API key
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment (version 3.7 or later)
The most recent version of Python Nightfall SDK
Install the Nightfall SDK and the requests library using pip.
To start, import all the libraries we will be using.
The JSON, OS, and CSV libraries are part of Python so we don't need to install them.
We've configured the Airtable and Nightfall API keys as environment variables so they are not written directly into the code.
Next, we define the Detection Rule with which we wish to scan our data.
Also, we abstract a nightfall class from the SDK, for our API key.
The Airtable API doesn't list all bases in a workspace or all tables in a base; instead, you must specifically call each table to get its contents.
In this example, we have set up a config.json
file to store that information for the Airtable My First Workspace
bases. You may also wish to consider setting up a separate Base and Table that stores your schema and retrieves that information with a call to the Airtable API.
As an extension of this exercise, you could write Nightfall findings back to another table within that Base.
Now we set up the parameters we will need to call the Airtable API using the previously referenced API key and config file.
We will now call the Airtable API to retrieve the contents of our Airtable workspace. The data hierarchy in Airtable goes Workspace > Base > Table. We will need to perform a GET request on each table in turn.
As we go along, we will convert each data field into its string enriched with identifying metadata so that we can locate and remediate the data later should sensitive findings occur.
🚧WarningIf you are sending more than 50,000 items or more than 500KB, consider using the file API. You can learn more about how to use the file API in the Using the File Scanning Endpoint with Airtable section below.
Before moving on we will define a helper function to use later so that we can unpack the metadata from the strings we send to the Nightfall API.
We will begin constructing an all_findings
object to collect our results. The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later.
This example will include the full finding below. As the finding might be a piece of sensitive data, we recommend using the Redaction feature of the Nightfall API to mask your data.
Now we call the Nightfall API on content retrieved from Airtable. For every sensitive data finding we receive, we strip out the identifying metadata from the sent string and store it with the finding in all_findings
so we can analyze it later.
Finally, we export our results to a CSV so they can be easily reviewed.
That's it! You now have insight into all of the sensitive data stored within your Airtable workspace!
As a next step, you could write your findings to a separate 'Nightfall Findings' Airtable base for review, or you could update and redact confirmed findings in situ using the Airtable API.
The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down into the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize and retrieve the data we want to retrieve from Airtable.
Now we go through writing the data to a .csv file.
Using the above .csv file, begin the Scan API file upload process.
Once the files have been uploaded, use the scan endpoint.
A webhook server is required for the scan endpoint to submit its results. See our example webhook server.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
This guide describes how to use Nightfall with the Ruby programming language.
The example below will demonstrate how to use Nightfall’s text scanning functionality to verify whether a string contains sensitive PII using the Nightfall Python SDK.
To follow along, you will need:
A Nightfall API Key
An existing Detection Rule
Data to scan. Note that the API interprets data as plaintext, so you may pass it in any structured or unstructured format.
A local Ruby 2.6 or greater environment.
Start by creating a new file called nightfall_demo.rb
Now we will walk through the code step by step. If you'd like to skip ahead you can see the complete code sample at the bottom of this page.
We will be using a few built-in Ruby libraries to run this sample API script.
First, we will load some environment variables that will be used to interact with the Nightfall API. NIGHTFALL_API_KEY
should be your Nightfall API Key, and NIGHTFALL_DETECTION_RULE_UUID
should be the UUID for your existing Nightfall condition set.
Next, we will construct our payload to scan as an array. You can replace this with any data you'd like, or read plaintext from a file.
Next, we build the HTTP request headers and body using the environment variables that we previously defined.
Next, we build the HTTP object and make a request to the Nightfall API.
Lastly, we make the API request and process the response from Nightfall. If there are sensitive findings in the response we pretty-print them to the console. If there are no findings, we print a message to the console. Otherwise, if there is a problem with the HTTP request we print the status code and message to the console.
Now we can run our script:
If there are sensitive findings based on your Nightfall detection rule, you should see output similar to this in your console, corresponding to each of the 3 items inputted to scan in the payload
.
For your convenience, the complete Ruby code sample is shown below.
Amazon Kinesis allows you to collect, process, and analyze real-time streaming data. In this tutorial, we will set up Nightfall DLP to scan Kinesis streams for sensitive data. An overview of what we are going to build is shown in the diagram below.
We will send data to Kinesis using a simple producer written in Python. Next, we will use an AWS Lambda function to send data from Kinesis to Nightfall. Nightfall will scan the data for sensitive information. If there are any findings returned by Nightfall, the Lambda function will write the findings to a DynamoDB table.
To complete this tutorial you will need the following:
An AWS Account with access to Kinesis, Lambda, and DynamoDB
A Nightfall API Key
An existing Nightfall Detection Rule which contains at least one detector for email addresses.
Before continuing, you should clone the companion repository locally.
First, we will configure all of our required Services on AWS.
Choose Create role.
Create a role with the following properties:
Lambda as the trusted entity
Permissions
AWSLambdaKinesisExecutionRole
AmazonDynamoDBFullAccess
Role name: nightfall-kinesis-role
Enter nightfall-demo
as the Data stream name
Enter 1
as the Number of open shards
Select Create data stream
Choose Author from scratch and add the following Basic information:
nightfall-lambda
as the Function name
Python 3.8 as the Runtime
Select Change default execution role, Use an existing role, and select the previously created nightfall-kinesis-role
You should now see the previous sample code replaced with our Nightfall-specific Lambda function.
Next, we need to configure environment variables for the Lambda function.
Within the same Lambda view, select the Configuration tab and then select Environment variables.
Add the following environment variables that will be used during the Lambda function invocation.
NIGHTFALL_API_KEY
: your Nightfall API Key
DETECTION_RULE_UUID
: your Nightfall Detection Rule UUID.
🚧Detection Rule RequirementsThis tutorial uses a data set that contains a name, email, and random text. In order to see results, please make sure that the Nightfall Detection Rule you choose contains at least one detector for email addresses.
Lastly, we need to create a trigger that connects our Lambda function to our Kinesis stream.
In the function overview screen on the top of the page, select Add trigger.
Choose Kinesis as the trigger.
Select the previously created nightfall-demo
Kinesis stream.
Select Add
The last step in creating our demo environment is to create a DynamoDB table.
Enter nightfall-findings
as the Table Name
Enter KinesisEventID
as the Primary Key
Be sure to also run the following before the Lambda function is created:
This is to ensure that the required version of the Python SDK for Nightfall has been installed. We also need to install boto3.
Before we start processing the Kinesis stream data with Nightfall, we will provide a brief overview of how the Lambda function code works. The entire function is shown below:
This is a relatively simple function that does four things.
Create a DynamoDB client using the boto3
library.
Extract and decode data from the Kinesis stream and add it to a single list of strings.
Create a Nightfall client using the nightfall
library and scan the records that were extracted in the previous step.
Iterate through the response from Nightfall, if there is are findings for a record we copy the record and findings metadata into a DynamoDB table. We need to process the list of Finding objects into a list of dicts before passing them to DynamoDB.
Now that you've configured all of the required AWS services, and understand how the Lambda function works, you're ready to start sending data to Kinesis and scanning it with Nightfall.
The script will send one record with the data shown above every 10 seconds.
You can start sending data with the following steps:
Open the companion repo that you cloned earlier in a terminal.
Create and Activate a new Python Virutalenv
Install Dependencies
Start sending data
If everything worked, you should see output similar to this in your terminal:
As the data starts to get sent to Kinesis, the Lambda function that we created earlier will begin to process each record and check for sensitive data using the Nightfall Detection Rule that we specified in the configuration.
If Nightfall detects a record with sensitive data, the Lambda function will copy that record and additional metadata from Nightfall to the DynamoDB table that we created previously.
If you'd like to clean up the created resources in AWS after completing this tutorial you should remove the following resources:
nightfall-kinesis-role
IAM Role
nightfall-demo
Kinesis data stream
nightfall-lambda
Lambda Function
nightfall-findings
DynamoDB Table
With the Nightfall API, you are also able to redact and mask your Kinesis findings. You can add a Redaction Config, as part of your Detection Rule, as a section within the lambda function. For more information on how to use redaction with the Nightfall API, and its specific options, please refer to the guide here.
Next, we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.
Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.
The Detection Rule can be and referenced by UUID.
Congrats . You've successfully scanned text for sensitive data with Ruby using the Nightfall API.
The installed and configured on your local machine.
Local copy of the for this tutorial.
Open the in the AWS console.
Open the and select Create Data Stream
Open the and select Create function
Once the function has been created, in the Code tab of the Lambda function select Upload from and choose .zip file. Select the local nightfall-lambda-package.zip
file that you cloned earlier from the and upload it to AWS Lambda.
Open the and select Create table
We've included a sample script in the that allows you to send fake data to Kinesis. The data that we are going to be sending looks like this:
Before running the script, make sure that you have the AWS CLI installed and configured locally. The user that you are logged in with should have the appropriate permissions to add records to the Kinesis stream. This script uses the library which handles authentication based on the credentials file that is created with the AWS CLI.
Congrats You've successfully integrated Nightfall with Amazon Kinesis, Lambda, and DynamoDB. If you have an existing Kinesis Stream, you should be able to take the same Lambda Function that we used in this tutorial and start scanning that data without any additional changes.
This section consists of use case tutorials for various scenarios of Firewall for AI. The tutorials explained in this section are as follows.
AWS S3 is a popular tool for storing your data in the cloud, however, it also has huge potential for unintentionally leaking sensitive data. By utilizing AWS SDKs in conjunction with Nightfall’s Scan API, you can discover, classify, and remediate sensitive data within your S3 buckets.
You will need the following for this tutorial:
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment
most recent version of the Nightfall Python SDK
We will use boto3 as our AWS client in this demo. If you are using another language, check this page for AWS's recommended SDKs.
To install boto3 and the Nightfall SDK, run the following command.
In addition to boto3, we will be utilizing the following Python libraries to interact with the Nightfall SDK and to process the data.
We've configured our AWS credentials, as well as our Nightfall API key, as environment variables so they don't need to be committed directly into our code.
Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID. Also, we extract our API Key, and abstract a nightfall class from the SDK, for it.
Now we create an iterable of scannable objects in our target S3 buckets, and specify a maximum file size to pass to the Nightfall API (500 KB). In practice, you could add additional code to chunk larger files across multiple API requests.
We will also create an all_findings
object to store Nightfall Scan results. The first row of our all_findings object will constitute our headers, since we will dump this object to a CSV file later.
This example will include the full finding below. As the finding might be a piece of sensitive data, we recommend using the Redaction feature of the Nightfall API to mask your data.
We will now initialize our AWS S3 Session. Once the session is established, we get a handle for the S3 resource.
Now we go through each bucket and retrieve the scannable objects, adding their text contents to objects_to_scan
as we go.
In this tutorial, we assume that all files are text-readable. In practice, you may wish to filter out un-scannable file types such as images with the object.get()['ContentType']
property.
For each object content we find in our S3 buckets, we send it as a payload to the Nightfall Scan API with our previously configured detectors.
request-responseOn receiving the request-response, we break down each returned finding and assign it a new row in the CSV we are constructing.
In this tutorial, we scope each object to be scanned with its API request. At the cost of granularity, you may combine multiple smaller files into a single call to the Nightfall API.
Now that we have finished scanning our S3 buckets and collated the results, we are ready to export them to a CSV file for further review.
That's it! You now have insight into all of the sensitive data inside your data stored inside your organization's AWS S3 buckets.
As a next step, you could attempt to delete or redact your files in which sensitive data has been found by further utilizing boto3.
The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
The first step is to get a list of files in your S3 buckets/objects
Similar to the process at the beginning of this tutorial for the text scanning endpoint, we will now initialize our AWS S3 Session. Once the session is established, we get a handle for the S3 resource.
Now we go through each bucket and retrieve the scannable objects.
For each object content we find in our S3 buckets, we send it as an argument to the Nightfall File Scan API with our previously configured detectors.
Iterate through a list of files and begin the file upload process.
Once the files have been uploaded, begin using the scan endpoint.
A webhook server is required for the scan endpoint to submit its results. See our example webhook server.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
Elasticsearch is a popular tool for storing, searching, and analyzing all kinds of structured and unstructured data, especially as a part of the larger ELK stack. However, along with all data storage tools, there is huge potential for unintentionally leaking sensitive data. By utilizing Elastic's own REST APIs in conjunction with Nightfall AI’s Scan API, you can discover, classify, and remediate sensitive data within your Elastic stack.
You can follow along with your own instance or spin up a sample instance with the commands listed below. By default, you will be able to download and interact with sample datasets from the elk instance at localhost:5601
. Your data can be queried from localhost:9200
. The "Add sample data" function can be found underneath the Observability section on the Home page; in this tutorial we reference the "Sample Web Logs" dataset..
You will need a few things to follow along with this tutorial:
An Elasticsearch instance with data to query
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment (version 3.7 or later)
Python Nightfall SDK
We will need to install the nightfall sdk library and the requests library using pip.
We will be using Python and importing the following libraries:
We first configure the URLs to communicate with. If you are following along with the Sample Web Logs dataset alluded to at the beginning of this article, you can copy this Elasticsearch URL. If not, your URL will probably take the format http://<hostname>/<index_name>/_search
.
Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.
Also, we abstract a nightfall class from the SDK, from our API key.
We now construct the payload and headers for our call to Elasticsearch. The payload represents whichever subset of data you wish to query. In this example, we are querying all results from the previous hour.
We then make our call to the Elasticsearch data store and save the resulting response.
Now we send our Elasticsearch query results to the Nightfall SDK for scanning.
We will create an all_findings
object to store Nightfall Scan results. The first row of our all_findings
object will constitute our headers, since we will dump this object to a CSV file later.
This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.
Next we go through our findings from the Nightfall Scan API and match them to the identifying fields from the Elasticsearch index so we can find them and remediate them in situ.
Finding locations here represent the location within the log as a string. Finding locations can also be found in byteRange.
Finally, we export our results to a csv so they can be easily reviewed.
That's it! You now have insight into all sensitive data shared inside your Elasticsearch instance within the past hour.
However, in use-cases such as this where the data is well-structured, it can be more informative to call out which fiels are found to contain sensitive data, as opposed to the location of the data. While the above script is easy to implement without modifying the queried data, it does not provide insight into these fields.
the Nightfall API, you are also able to redact and mask your Elasticsearch findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.
The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Retrieve data from Elasticsearch
Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the data we like, from Elasticsearch:
Now we go through write the logs to a .csv file.
Begin the file upload process to the Scan API, with the above written .csv file, as shown here.
Once the files have been uploaded, begin using the scan endpoint mentioned here. Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
Snowflake is a data warehouse built on top of the Amazon Web Services or Microsoft Azure cloud infrastructure. This tutorial demonstrates how to use the Nightfall API for scanning a Snowflake database.
This tutorial allows you to scan your Snowflake databases using the Nightfall API/SDK.
You will need a few things first to use this tutorial:
A Snowflake account with at least one database
A Nightfall API key
An existing Nightfall Detection Rule
Most recent version of Python Nightfall SDK
We will first install the required Snowflake Python connector modules and the Nightfall SDK that we need to work with:
To accomplish this, we will be using Python and importing the following libraries:
We will set the size and length limits for data allowed by the Nightfall API per request. Also, we extract our API Key, and abstract a nightfall class from the SDK, for it.
Next we extract our API Key, and abstract a nightfall class from the SDK, for it.
Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.
First we will set up the connection with Snowflake, and get the data to be scanned from there.
Note, we are setting the Snowflake authentication information as the below environment variables, and referencing the values from there:
SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ACCOUNT
SNOWFLAKE_DATABASE
SNOWFLAKE_SCHEMA
SNOWFLAKE_TABLE
SNOWFLAKE_PRIMARY_KEY
We can then check the data size, and as long as it is below the aforementioned limits, can be ran through the API.
If the data payloads are larger than the size or length limits of the API, extra code will be required to further chunk the data into smaller bits that are processable by the Nightfall scan API.
This can be seen in the second and third code panes below:
To review the results, we will print the number of findings, and write the findings to an output file:
The following are potential ways to continue building upon this service:
Writing Nightfall results to a database and reading that into a visualization tool
Redacting sensitive findings in place once they are detected, either automatically or as a follow-up script once findings have been reviewed
With the Nightfall API, you are also able to redact and mask your Snowflake findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.
The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down into the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Retrieve data from Snowflake
Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our Snowflake Connection. Once the session is established, we can query from Snowflake.
Now we go through the data and write to a .csv file.
Begin the file upload process to the Scan API, with the above written .csv file, as shown here.
Once the files have been uploaded, begin using the scan endpoint mentioned here. Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
RDS is a service for managing relational databases and can contain databases from several different varieties. This tutorial demonstrates connectivity with a postgresSQL database but could be modified to support other database options.
This tutorial allows you to scan your RDS managed databases using the Nightfall API/SDK.
You will need a few things first to use this tutorial:
An AWS account with at least one RDS database (this example uses postgres but could be modified to support other varieties of SQL)
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment (version 3.6 or later)
Python Nightfall SDK
To accomplish this, we will install the version required of the Nightfall SDK:
We will be using Python and importing the following libraries:
We will set the size and length limits for data allowed by the Nightfall API per request. Also, we extract our API Key, and abstract a nightfall class from the SDK, for it.
Next we extract our API Key, and abstract a nightfall class from the SDK, for it.
Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.
First we will set up the connection with the Postgres table, in RDS, and get the data to be scanned from there.
Note, we are setting the RDS authentication information as the below environment variables, and referencing the values from there:
'RDS_ENDPOINT'
'RDS_USER'
'RDS_PASSWORD'
'RDS_DATABASE'
'RDS_TABLE'
'RDS_PRIMARYKEY'
We can then check the data size, and as long as it is below the aforementioned limits, can be ran through the API.
If the data payloads are larger than the size or length limits of the API, extra code will be required to further chunk the data into smaller bits that are processable by the Nightfall scan API.
This can be seen in the second and third code panes below:
To review the results, we will print the number of findings, and write the findings to an output file:
Please find the full script together below, broken into functions that can be ran in full:
The following are potential ways to continue building upon this service:
Writing Nightfall results to a database and reading that into a visualization tool
Adding to this script to support other varieties of SQL
Redacting sensitive findings in place once they are detected, either automatically or as a follow-up script once findings have been reviewed
With the Nightfall API, you are also able to redact and mask your RDS findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.
The example above is specific for the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.
To utilize the File Scanning API you need the following:
An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)
Retrieve data from RDS
Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our AWS RDS Connection. Once the session is established, we can query from RDS.
Now we go through the data and write to a .csv file.
Begin the file upload process to the Scan API, with the above written .csv file, as shown here.
Once the files have been uploaded, begin using the scan endpoint mentioned here. Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.
How to run a full scan of an Amazon database
To scan an Amazon database instance (i.e. mySQL, Postgres) you must create a snapshot of that instance and export the snapshot to S3.
The export process runs in the background and doesn't affect the performance of your active DB instance. Exporting RDS snapshots can take a while depending on your database type and size
Once the snapshot has been exported you will be able to scan the resulting parquet files with Nightfall like another file. You can do this using our endpoints for uploading files or using our Amazon S3 Python integration.
In addition to having created your RDS instance, you will need to define the following to export your snapshots so they can later be scanned by Nightfall:
To perform this scan, you will need to configure an Amazon S3 bucket to which you will export a snapshot.
📘S3 Bucket RequirementsThis bucket must have snapshot permissions and the bucket to export must be in the same AWS Region as the the snapshot being exported.
If you have not already created a designated S3 bucket, in the AWS console select Services > Storage > S3
Click the "Create bucket" button and give your bucket a unique name as per the instructions.
For more information please see Amazon's documentation on identifying an Amazon S3 bucket for export.
You need an Identity and Access Management (IAM) Role to perform the transfer for a snapshot to your S3 bucket.
This role may be defined at the time of backup and it will be given the proper specific permissions.
You may also create the role under Services > Security, Identity, & Compliance > IAM and select “Roles” from under the “Access Management” section of the left-hand navigation.
From there you can click the “Create role” button and create a role where “AWS Service” is the trusted entity type.
For more information see Identity and Access Management in Amazon RDS and Providing access to an Amazon S3 bucket using an IAM role
You must create a symmetric encryption AWS Key using the Key Management Service (KMS).
From your AWS console, select the Services > Security, Identity, & Compliance > Key Management Service from the adjacent submenu.
From there you can click the “Create key” button and follow the instructions.
To do this task manually, go to Amazon RDS Service (Services > Database > RDS) and select the database to export from your list of databases.
Select the “Maintenance & backups” tab. Go to the “Snapshots” section.
You can select an existing automated snapshot or manually create a new snapshot with the “Take snapshot” button
Once the snapshot is complete, click the snapshot’s name.
From the “Actions” menu in the upper right select “Export to Amazon S3"
Enter a unique export identifier
Choose whether you want to export all or part of your data (You will be exporting to Parquet)
Choose the S3 bucket
Choose or create your designated IAM role for backup
Choose your AWS KMS Key
Click the Export button
Once the Status column of export is "Complete", you can click the link to the export under the S3 bucket column.
Within the export in the S3 bucket, you will find a series of folders corresponding to the different database entities that were exported.
Exported data for specific tables is stored in the format base_prefix/files, where the base prefix is the following:
export_identifier/database_name/schema_name.table_name/
For example:
export-1234567890123-459/rdststdb/rdststdb.DataInsert_7ADB5D19965123A2/
The current convention for file naming is as follows:
partition_index/part-00000-random_uuid.format-based_extension
For example:
You may download these parquet files and upload them to Nightfall to scan as you would any other parquet file.
📘Obtaining file sizeYou can obtain the value for
fileSizeBytes
you can run the commandwc -c
In the above sequence of curl invocations, we upload the file and then initiate the file scan with a policy that uses pre-configured detection rule as well as an alertConfig that send the results to an email address.
Note that results you receive in this case will be an attachment with a JSON payload as follows:
The findings themselves will be available at the URL specified in findingsURL
until the date-time stamp contained in the validUntil
property.
When parquet files are analyzed, as with other tabular data, not only will the the location of the finding be shown within a given byte range, but also column and row data as well.
Below is a SQL script small table of generated data containing example personal data, including phone numbers and email addresses.
Below is an example finding when a scan is done of the resulting parquet exported to S3 where the Detection Rule use Nightfall's built in Detectors for matching phone numbers and emails. In this example shows a match in the 1st row and and 4th column. This is what we would expect based on our table structure.
similarly, it also finds phone numbers in the 3rd column.
You may also use our tutorial for Integrating with Amazon S3 (Python) to scan through the S3 objects.
For more information please see the Amazon documentation Exporting DB snapshot data to Amazon S3
Say you have a number of files containing customer or patient data and you are not sure which of them are ok to share in a less secure manner. By leveraging Nightfall’s API you can easily verify whether a file contains sensitive PII, PHI, or PCI.
To make a request to the Nightfall API you will need:
A Nightfall API key
A list of data types you wish to scan for
Data to scan. Note that the API interprets data as plaintext, so you may pass it in any structured or unstructured format.
You can read more about or about our in the linked reference guides.
To run the following API call, we will be using Python's standard json, os, and requests libraries.
First we define the endpoint we want to reach with our API call.
Next we define the headers of our API request. In this example, we have our API key set via an environment variable called "NIGHTFALL_API_KEY". Your API key should never be hard-coded directly into your script.
Next we define the detectors with which we wish to scan our data. The detectors must be formatted as a list of key-value pairs of format {‘name’:’DETECTOR_NAME’}.
Next, we build the request body, which contains the detectors from above, as well as the raw data that you wish to scan. In this example, we will read it from a file called sample_data.csv.
Here we assume that the file is under the 500 KB payload limit of the Scan API. If your file is larger than the limit, consider breaking it down into smaller pieces across multiple API requests.
Now we are ready to call the Nightfall API to check if there is any sensitive data in our file. If there are no sensitive findings in our file, the response will be "[[]]".
[[]]
LLMs like ChatGPT and Claude can inadvertently receive sensitive information from user inputs, posing significant privacy concerns (). Without content filtering, these AI platforms can process and retain confidential data such as health records, financial details, and personal identifying information.
Consider the following real-world scenarios:
Support Chatbots: You use LangChain/Claude to power a level-1 support chatbot to help users resolve issues. Users will likely overshare sensitive information like credit card and Social Security numbers. Without content filtering, this information would be transmitted to Anthropic and added to your support ticketing system.
Healthcare Apps: You are using LangChain/Claude to moderate content sent by patients or doctors in your developing health app. These queries may contain sensitive protected health information (PHI), which could be unnecessarily transmitted to Anthropic.
Implementing robust content filtering mechanisms is crucial to protect sensitive data and comply with data protection regulations. In this guide, we will explore how to sanitize prompts using Nightfall before sending them to Claude.
If you're not using LangChain, check our and tutorials.
Let's take a look at what this would look like in a Python example using the LangChain, Anthropic, and Nightfall Python SDKs:
Install the necessary packages:
Set up environment variables. Create a .env
file in your project directory:
to integrate content filtering into our LangChain pipeline seamlessly. We'll create a custom LangChain component for Nightfall sanitization. This allows us to incorporate content filtering into your LangChain pipeline seamlessly.
We start by importing necessary modules and loading environment variables.
We initialize the Nightfall client and define detection rules for credit card numbers.
The NightfallSanitizationChain
class is a custom LangChain component that handles content sanitization using Nightfall.
We set up the Anthropic LLM and create a prompt template for customer service responses.
We create separate chains for sanitization and response generation, then combine them using SimpleSequentialChain
.
The process_customer_input
function provides an easy-to-use interface for our chain.
In a production environment, you might want to add more robust error handling and logging. For example:
To use this script, you can either run it directly or import the process_customer_input
function in another script.
Simply run the script:
This will process the example customer input and print the sanitized input and final response.
You can import the process_customer_input
function in another script:
If the example runs properly, you should expect to see an output demonstrating the sanitization process and the final response from Claude. Here's what the output might look like:
Transactional email and communication APIs like SendGrid, Twilio, SES, and Mailgun are critical components to modern applications. These services allow developers to easily incorporate end-user communication into their applications without the infrastructural overhead.
However, these services pose a new source of security risk as they can lead to accidental sharing of sensitive data if communications are sent to the wrong users or inadvertently contain sensitive data. Adding data loss prevention (DLP) into your business logic can provide the critical capability to classify & protect sensitive data before it is exposed, leaked, or stored.
The risk of exposing sensitive data is especially common in situations where these transactional communication services are handling user-generated content like messages between agents and users, or peer-to-peer. Here are a few examples:
You're building a grocery delivery application like Instacart. The application allows Shoppers and Customers to send and receive text messages with each other, powered by Twilio. The Customer sends the Shopper a picture of their Driver's License since they won't be home for the delivery, even though their Driver's License needs to be verified in person. Now this image with sensitive PII is processed by Twilio, stored in your application's object store, and viewable by the Shopper and support agents.
You're building an application for job seekers to connect with small business owners like restaurants that are hiring, and they can exchange messages over text, powered by Twilio. A malicious user signs up to impersonate a restaurant owner and uses this mechanism to collect PII from job seekers such as their SSN. Now this PII is transmitted by Twilio and is accessible by the attacker.
With the Nightfall API, you can scan transactional communications for sensitive data and remediate them accordingly. In this post, we’ll describe the pattern behind how to use Nightfall to scan for sensitive data in outgoing emails.
The typical pattern for using transactional communication services like SendGrid is as follows:
Get an API key and set environment variables
Initialize the SDK client (e.g. SendGrid Python client), or use the API directly to construct a request
Construct your outgoing message, which includes information like the subject, body, recipients, etc.
Use the SendGrid API or SDK client to send the message
Let's look at a simple example in Python. Note how easy it is to send sensitive data, in this case a credit card number.
It is straightforward to update this pattern to use Nightfall to check for sensitive findings and ensure sensitive data isn’t sent out. Here’s how:
Get API keys for both communication service (“CS”) and Nightfall (“NF”), and set environment variables. Learn more about creating a Nightfall API key here.
NF: Create a pre-configured detection rule in the Nightfall dashboard or inline detection rule with Nightfall API or SDK client.
📘Consider using RedactionNote that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.
NF: Send your outgoing message text (and any other metadata like the subject line, etc.) in a request payload to the Nightfall API text scan endpoint. For example, if you are interested in scanning the subject and body of an outgoing email, you can send these both in the input array payload: [ body, subject ]
Review the response to see if Nightfall has returned sensitive findings:
If there are sensitive findings:
You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically
Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
If no sensitive findings or you chose to redact findings with a redaction config:
Initialize the SDK client (e.g. SendGrid Python client), or use the API directly to construct a request
Construct your outgoing message, which includes information like the subject, body, recipients, etc.
If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you
Use the SendGrid API or SDK client to send the sanitized message
Let's take a look at what this would look like in a Python example using the Twilio and Nightfall Python SDKs in just 12 lines of code (with comments and formatting added for clarity):
You'll see that the message we originally intended to send had sensitive data:
4916-6734-7572-5015 is my credit card number
And the message we ultimately sent was redacted!
[Redacted] is my credit card number
Services like Twilio and SendGrid also support inbound communications from end-users, typically via webhook. Meaning inbound messages will be sent to a webhook handler that you specify. Hence, you would insert the above logic in your webhook handler upon receipt of an event payload.
Now that you understand the pattern, give it a shot!
In this tutorial, we'll demonstrate how easy it is to redact sensitive data and give you a more in-depth look at various redaction techniques, how Nightfall works, and touch upon use cases for redaction techniques.
Before we get started, let's set our Nightfall API key as an environment variable and install our dependencies for our code samples in Python. If you don't have a Nightfall API key, generate one on your Nightfall . If you don't have a Nightfall account, for a free Nightfall Developer Platform account.
Mask sensitive data with a configurable character, allow leaving some characters unmasked, and allow ignoring certain characters.
Cases | Additional Config | Before | After |
---|
Let's put this together in Python with the Nightfall SDK. In our example, we have an input string with a credit card number (4916-6734-7572-5015 is my credit card number
) and we wish to mask with an asterisk, unmask the last 4 digits, and ignore hyphens.
We'll see our findings
look like this (with line formatting added for clarity):
Also, we have received the input payload back as a redacted string in our redacted_payload
object:
Masking is especially useful in scenarios where you want to retain some of the original format of the data or a certain amount of non-sensitive information as context. For example, it's common to refer to credit card numbers by their last 4 digits, so masking everything but the last 4 digits would ensure that the output is still useful to the viewer.
Substitute sensitive data findings with the InfoType
, custom word, or an empty string.
We'll see our findings
object returned to us looks like this (with line formatting added for clarity):
And our redacted input payload in our redacted_payload
object:
Instead of using a custom string as the substitution (SubMeIn
), we may want to use the name of the detector for additional context. We can make a one line change to the example above, replacing substitution_phrase="SubMeIn"
with infotype_substitution=True
.
This yields:
Substitution is effective in scenarios where you intend to replace sensitive data with a contextual label, for example, you wish to replace a literal credit card number with the label "Credit Card Number". This provides context to the reader of the data that the data is a credit card number, without exposing them to the actual token itself.
Encrypt sensitive data findings with a public encryption key that is passed via the API. Make the encryption algorithm configurable.
Encryption is a complex topic so we'll go into a more in-depth tutorial on encrypting and decrypting sensitive data with Nightfall in a separate post, but let's run through the basics below.
Nightfall uses RSA encryption which is asymmetric, meaning it works with two different keys: a public one and a private one. Anyone with your public key can encrypt data. Encrypted data can only be decrypted with the private key. So, you'll pass Nightfall your public key to encrypt with, and only you will have your private key to decrypt the encrypted data.
Default case public_key=”MIG...AQAB” (“my ssn is 518-45-7708” → “my ssn is EhOp/DphEIA0LQd4q1BUq8FtuxKj66VA381Z9DtbiQaaHvy5Wlvtxg0je91DFXEJncOWbhgPbt7EvBl36k5MFlFdPbc5+bg40FxP676SnllEClEO+DDsuiRCk9VC4noAd0zLxgvV8qD/NPE/XhTfOpscqlKhllfTg7G5jZYYSG8=”)
For our example, we'll use the cryptography
package in Python, so let's install it first:
pip3 install cryptography
Let's first generate a public/private RSA key pair in PEM format on the command line. We'll cover how to generate keys programmatically in Python in our encryption-specific tutorial.
First, we'll generate our private key and write it to a file called example_private.pem
:
Next, we'll generate our public key in PEM format from this private key:
Let's take a look at our public key with cat example_public.pem
:
Remember to keep your private key safe. Anyone with this key can decrypt your encrypted data.
Now we can use our public key to encrypt any content with Nightfall! To do so, we'll first read the public key into a string.
Now, we'll pass the public key into our redaction configuration, similar to the above examples, so Nightfall can use it to encrypt your sensitive data.
We'll see our findings
look like this (with line formatting added for clarity):
And our redacted input payload in our redacted_payload
object (truncated for clarity):
Third-party encryption is well-suited for use cases where you want to preserve the original sensitive data but ensure that it is only visible to sanctioned parties that have your private key. For example, if you are storing the data or passing it to a sanctioned third-party for processing, encrypting the sensitive tokens can add one additional layer of encryption and security, while still allowing a downstream processor to access the raw data as required with the key.
Congrats! You've now learned about and implemented multiple redaction techniques in just a few lines of code. You're ready to start adding redaction to your apps.
The service ingests a local file, scans it for sensitive data with Nightfall, and displays the results in a simple table UI.
We'll deploy the server on Render (a PaaS Heroku alternative) so that you can serve your application publicly in production instead of running it off your local machine. You'll build familiarity with the following tools and frameworks: Python, Flask, Nightfall, Ngrok, Jinja, Render.
Before we get started on our implementation, start by familiarizing yourself with with Nightfall, so you're acquainted with the flow we are implementing.
In a nutshell, file scanning is done asynchronously by Nightfall; after you upload a file to Nightfall and trigger the scan, we perform the scan in the background. When the scan completes, Nightfall delivers the results to you by making a request to your webhook server. This asynchronous behavior allows Nightfall to scan files of varying sizes and complexities without requiring you to hold open a long synchronous request, or continuously poll for updates. The impact of this pattern is that you need a webhook endpoint that can receive inbound notifications from Nightfall when scans are completed - that's what we are building in this tutorial.
You can fork the sample repo and view the complete code , or follow along below. If you're starting from scratch, create a new GitHub repository.
First, let's start by installing our dependencies. We'll be using Nightfall for data classification, the web framework in Python, and as our web server. Create requirements.txt
and add the following to the file:
Then run pip install -r requirements.txt
to do the installation.
These values are unique to your account and should be kept safe. This means that we will store them as environment variables and should not store them directly in code or commit them into version control. If these values are ever leaked, be sure to visit the Nightfall Dashboard to re-generate new values for these secrets.
Let's start writing our Flask server. Create a file called app.py
. We'll start by importing our dependencies and initializing the Flask and Nightfall clients:
Next, we'll add our first route, which will display "Hello World" when the client navigates to /ping
simply as a way to validate things are working:
Run gunicorn app:app
on the command line to fire up your server, and navigate to your local server in your web browser. You'll see where the web browser is hosted in the Gunicorn logs, typically it will be 127.0.0.1:8000
aka localhost:8000
.
After running this command, ngrok
will create a tunnel on the public internet that redirects traffic from their site to your local machine. Copy the HTTPS tunnel endpoint that ngrok has created: we can use this as the webhook URL when we trigger a file scan.
Let's set this HTTPS endpoint as a local environment variable so we can reference it later:
Tip: With a Pro ngrok account, you can create a subdomain so that your tunnel URL is consistent, instead of randomly generated each time you start the tunnel.
Before you send a file scan request to Nightfall, let's add logic for our incoming webhook endpoint, so that when Nightfall finishes scanning a file, it can successfully send the sensitive findings to us.
First, what does it mean to have findings? If a file has findings, this means that Nightfall identified sensitive data in the file that matched the detection rules you configured. For example, if you told Nightfall to look for credit card numbers, any substring from the request payload that matched our credit card detector would constitute sensitive findings.
We'll host our incoming webhook at /ingest
with a POST method.
Nightfall will POST to the webhook endpoint, and in the inbound payload, Nightfall will indicate if there are sensitive findings in the file, and provide a link where we can access the sensitive findings as JSON.
Restart your server so the changes propagate. We'll take a look at the console output of our webhook endpoint and explain what it means in the next section.
Next, we will initiate the scan request to Nightfall, by specifying our filepath, webhook URL where the scan results should be posted, and our Detection Rule that specifies what sensitive data we are looking for.
The scan_id
is useful for identifying your scan results later.
Let's run scan.py
to trigger our file scan job.
Once Nightfall has finished scanning the file, we'll see our Flask server receive the request at our webhook endpoint (/ingest
). In our code above, we parse the webhook payload, and print the following when there are sensitive findings:
In our output, we are printing two URLs.
The first URL is provided to us by Nightfall. It is the temporary signed S3 URL that we can access to fetch the sensitive findings that Nightfall detected.
The second URL won't work yet, we'll implement it next. This URL a we constructed in our ingest()
method above - the URL calls /view
and passes the Findings URL above as a URL-escaped query parameter.
Let's add a method to our Flask server that opens this URL and displays the findings in a formatted table so that the results are easier to view than downloading them as JSON.
Add the following to our Flask server in app.py
:
To display the findings in an HTML table, we'll create a new Flask template. Create a folder in your project directory called templates
and add a new file within it called view.html
.
Our template uses Jinja to iterate through our findings, and create a table row for each sensitive finding.
Now, if we restart our Flask server, trigger a file scan request, and navigate to the "View" URL printed in the server logs, we should see a formatted table with our results! In fact, we can input any Nightfall-provided signed S3 URL (after URL-escaping it) in the findings_url
parameter of the /view
route to view it.
As a longtime Heroku user, I was initially inclined to write this tutorial with instructions to deploy our app on Heroku. However, new PaaS vendors have been emerging and I was curious to try them out and see how they compare to Heroku. One such vendor is Render, which is where we'll deploy our app.
Deploying our service on Render is straightforward. If you're familiar with Heroku, the process is quite similar. Once you've signed up or logged into Render (free), we'll do the following:
Create a new Web Service
on Render, and permit Render to access your new repo.
Use the following values during creation:
Environment: Python
Build Command: pip install -r requirements.txt
Start Command: gunicorn app:app
Let's also set our environment variables during creation. These are the same values we set locally.
Once Render has finished deploying, you'll get the base URL of your application. Set this as your NIGHTFALL_SERVER_URL
locally and re-run scan.py
- this time, the file scan request is served by your production Flask server running on Render!
To confirm this, navigate to the Logs
tab in your Render app console, you'll see the webhook's output of your file scan results:
Navigate to the View
link above in your browser to verify that you can see the results formatted in a table on your production site.
Congrats, you've successfully created a file scanning server and deployed it in production! You're now ready to build more advanced business logic around your file scanner. Here are some ideas on how to extend this tutorial:
Use WebSockets to send a notification back from the webhook to the client that initiated the file scan request
Build a more advanced detection rule using pre-built or custom detectors
Add a user interface to add more interactive capabilities, for example allowing users to upload files or read files from URLs
Endpoint data loss prevention (DLP) discovers, classifies, and protects sensitive data - like PII, credit card numbers, and secrets - that proliferates onto endpoint devices, like your computer or EC2 machines. This is a way to help keep data safe, so that you can detect and stop occurrences of data exfiltration. Our endpoint DLP application will be composed of two core services that will run locally. The first service will monitor for file system events using the package in Python. When a file system event is triggered, such as when a file is created or modified, the service will send the file to to be scanned for sensitive data. The second service is a webhook server that will receive scan results from Nightfall, parse the sensitive findings, and write them to a CSV file as output. You'll build familiarity with the following tools and frameworks:
Python
Flask
Nightfall
Ngrok
Watchdog
Before we get started on our implementation, start by familiarizing yourself with with Nightfall, so you're acquainted with the flow we are implementing.
In a nutshell, file scanning is done asynchronously by Nightfall; after you upload a file to Nightfall and trigger the scan, we perform the scan in the background. When the scan completes, Nightfall delivers the results to you by requesting your webhook server. This asynchronous behavior allows Nightfall to scan files of varying sizes and complexities without requiring you to hold open a long synchronous request, or continuously poll for updates. The impact of this pattern is that you need a webhook endpoint that can receive inbound notifications from Nightfall when scans are completed - that's one of the two services we are building in this tutorial.
You can fork the sample repo and view the complete code , or follow along below. If you're starting from scratch, create a new GitHub repository. This tutorial was developed on a Mac and assumes that's the endpoint operating system you're running, however, this tutorial should work across operating systems with minor modifications. For example, you may wish to extend this tutorial by running endpoint DLP on an EC2 machine to monitor your production systems.
First, let's start by installing our dependencies. We'll be using Nightfall for data classification, the web framework in Python, for monitoring file system events, and as our web server. Create requirements.txt
and add the following to the file:
Then run pip install -r requirements.txt
to do the installation.
These values are unique to your account and should be kept safe. This means that we will store them as environment variables and should not store them directly in code or commit them into version control. If these values are ever leaked, be sure to visit the Nightfall Dashboard to re-generate new values for these secrets.
Watchdog is a Python module that watches for file system events. Create a file called scanner.py
. We'll start by importing our dependencies and setting up a basic event handler. This event handler responds to file change events for file paths that match a given set of regular expressions (regexes). In this case, the .*
indicates we are matching on any file path - we'll customize this a bit later. When a file system event is triggered, we'll print a line to the console.
Run python scanner.py
and you'll notice lots of lines getting printed to the console. These are all the files that are getting created and changed on your machine in real-time. You'll notice that your operating system and the apps you're running are constantly writing, modifying, and deleting files on disk!
Next, we'll update our event handler so that instead of simply printing to the console, we are sending the file to Nightfall to be scanned. We will initiate the scan request to Nightfall, by specifying the file path of the changed/created file, a webhook URL where the scan results should be sent, and our Detection Rule that specifies what sensitive data we are looking for. If the file scan is initiated successfully, we'll print the corresponding Upload ID that Nightfall provides us to the console. This ID will be useful later when identifying scan results.
Here's our complete scanner.py
, explained further below:
We can't run this just yet, since we need to set our webhook URL, which is currently reading from an environment variable that we haven't set yet. We'll create our webhook server and set the webhook URL in the next set of steps.
Also note that we've updated our regex from .*
to a set of file paths on Macs that commonly contain user generated files - the Desktop, Documents, and Downloads folders:
You can customize these regexes to whatever file paths are of interest to you. Another option is to write a catch-all regex that ignores/excludes paths to config and temp files:
Next, we'll set up our Flask webhook server, so we can receive file scanning results from Nightfall. Create a file called app.py
. We'll start by importing our dependencies and initializing the Flask and Nightfall clients:
Next, we'll add our first route, which will display "Hello World" when the client navigates to /ping
simply as a way to validate things are working:
In a second command line window, run gunicorn app:app
on the command line to fire up your server, and navigate to your local server in your web browser. You'll see where the web browser is hosted in the Gunicorn logs, typically it will be 127.0.0.1:8000
aka localhost:8000
.
After running this command, ngrok
will create a tunnel on the public internet that redirects traffic from their site to your local machine. Copy the HTTPS tunnel endpoint that ngrok has created: we can use this as the webhook URL when we trigger a file scan.
Let's set this HTTPS endpoint as a local environment variable so we can reference it later:
With a Pro ngrok account, you can create a subdomain so that your tunnel URL is consistent, instead of randomly generated each time you start the tunnel.
Before we send a file scan request to Nightfall, let's implement our incoming webhook endpoint, so that when Nightfall finishes scanning a file, it can successfully send the sensitive findings to us.
First, what does it mean to have findings? If a file has findings, this means that Nightfall identified sensitive data in the file that matched the detection rules you configured. For example, if you told Nightfall to look for credit card numbers, any substring from the request payload that matched our credit card detector would constitute sensitive findings.
We'll host our incoming webhook at /ingest
with a POST method.
Nightfall will POST to the webhook endpoint, and in the inbound payload, Nightfall will indicate if there are sensitive findings in the file, and provide a link where we can access the sensitive findings as JSON.
We'll validate the inbound webhook from Nightfall, retrieve the JSON findings from the link provided, and write the findings to a CSV file. First, let's initialize our CSV file where we will write results, and add our /ingest
POST method.
You'll notice that when there are sensitive findings, we call the output_results()
method. Let's write that next. In output_results()
, we are going to parse the findings and write them as rows into our CSV file.
Restart your server so the changes propagate. We'll take a look at the console and CSV output of our webhook endpoint in the next section.
In our previous command line window, we can now turn our attention back to scanner.py
. We now have our webhook URL so let's set it here as well and run our scanner.
You'll see the following console output from scanner.py
:
And the following console output from our webhook server:
And the following sensitive findings written to results.csv
:
Each row in the output CSV will correspond to a sensitive finding. Each row will have the following fields, which you can customize in app.py
: the upload ID provided by Nightfall, an incrementing index, timestamp, characters before the sensitive finding (for context), the sensitive finding itself, characters after the sensitive finding (for context), the confidence level of the detection, the byte range location (character indicies) of the sensitive finding in its parent file, and the corresponding detection rules that flagged the sensitive finding.
Note that you may also see events for system files like .DS_Store
or errors corresponding to failed attempts to scan temporary versions of files. This is because doing things like downloading a file can trigger multiple file modification events. As an extension to this tutorial, you could consider filtering those out further, though they shouldn't impact our ability to scan files of interest.
If we leave these services running, we'll continue to monitor files for sensitive data and appending to our results CSV when sensitive findings are discovered!
We can run both of our services in the background nohup
so that we don't need to leave two command line tabs open indefinitely. We'll pipe console output to log files so that we can always reference the application's output or determine if the services crashed for any reason.
This will return the corresponding process IDs - we can always check on these later with the ps
command.
This post is simply of a proof of concept version of endpoint DLP. Building a production-grade endpoint DLP application will have additional complexity and functionality. However, the detection engine is one of the biggest components of an endpoint DLP system, and this example should give you a sense of how easy it is to integrate with Nightfall's APIs and the power of Nightfall's detection engine.
Here are few ideas on how you can extend upon this service further:
Run the scanner on EC2 machines to scan your production machines in real-time
Respond to more system events like I/O of USB drives and external ports
Implement remediation actions like end-user notifications or file deletion
Redact the sensitive findings prior to writing them to the results file
Store the results in the cloud for central reporting
Package in an executable so the application can be run easily
Scan all files on disk on the first boot of the application
Default case (“my email is ” → “my email is .”)
Case with custom word=”[REDACTED BY NIGHTFALL]” (“my email is ” → “my email is [REDACTED BY NIGHTFALL].”)
Substitute with InfoType “my email is ” → “my email is [EMAIL].”
Next, we'll need our Nightfall API Key and Webhook Signing Secret; the former authenticates us to the Nightfall API, while the latter authenticates that incoming webhooks are originating from Nightfall. You can retrieve your API Key and Webhook Signing Secret from the Nightfall . Complete the Nightfall Quickstart for a more detailed walk-through. for a free Nightfall account if you don't have one.
To expose our local webhook server via a public tunnel that Nightfall can send requests to, we'll use ngrok. Download and install ngrok via their quickstart documentation . We'll create an ngrok tunnel as follows:
Now, we want to trigger a file scan request, so that Nightfall will scan the file and send a POST request to our /ingest
webhook endpoint when the scan is complete. We'll write a simple script that sends a file to Nightfall to scan it for . Create a new file called scan.py
.
First, we'll establish our dependencies, initialize the Nightfall client, and specify the filepath to the file we wish to scan as well as the webhook endpoint we created above. The filepath is a relative path to any file, in this case we are scanning the sample-pci-xs.csv
file which is in the same directory as scan.py
. This is a sample CSV file with 10 credit card numbers in it - you can download it in the tutorial's GitHub .
In this simple example, we have specified an inline Detection Rule that detects Likely Credit Card Numbers. This Detection Rule is a simple starting point that just scratches the surface of the types of detection you can build with Nightfall. Learn more about building inline detection rules here or how to configure them in the Nightfall .
We'll do this by adding a view
method that responds to GET requests to the /view
route. The /view
route will read the URL to the S3 Findings URL via a query parameter. It will then open the findings URL, parse it as JSON, pass the results to an HTML template, and display the results in a simple HTML table using . Jinja is a simple templating engine in Python.
Next, we'll need our Nightfall API Key and Webhook Signing Secret; the former authenticates us to the Nightfall API, while the latter authenticates that incoming webhooks are originating from Nightfall. You can retrieve your API Key and Webhook Signing Secret from the Nightfall . Complete the Nightfall Quickstart for a more detailed walk-through. for a free Nightfall account if you don't have one.
In this example, we have specified an inline Detection Rule that detects Likely Credit Card Numbers, Social Security Numbers, and API Keys. This Detection Rule is a simple starting point that just scratches the surface of the types of detection you can build with Nightfall. Learn more about building inline detection rules here or how to configure them in the Nightfall .
To expose our local webhook server via a public tunnel that Nightfall can send requests to, we'll use ngrok. Download and install ngrok via their quickstart documentation . We'll create an ngrok tunnel as follows:
To trigger a file scan event, download the following . Assuming it automatically downloads to your Downloads folder, this should immediately trigger a file change event and you'll see console log output! If not, you can also download the file with curl
into a location that matches your event handler's regex we set earlier.
Default | None |
|
|
Mask with custom character |
|
|
|
Leave first four characters unmasked |
|
|
|
Leave last four characters unmasked |
|
|
|
Don't mask |
|
|
|
All of the above! |
|
|
|
Firewall for AI provides a flexible and extensible API that allows you to scan a wide variety of data types, including plain text, structured and unstructured files, and even images. Our API can handle data in various formats such as JSON, XML, CSV, and more. Visit our detector glossary at docs.nightfall.ai/docs/detector-glossary to explore the comprehensive list of supported data types and file formats
Firewall for AI offers a rich set of pre-built detectors that can identify many different types of sensitive data, including personally identifiable information (PII), payment card industry data (PCI), protected health information (PHI), secrets, and credentials. These detectors are powered by advanced machine learning models and can be easily integrated into your application with just a few lines of code. Refer to our detector glossary at docs.nightfall.ai/docs/detector-glossary for a complete list of available detectors.
Firewall for AI is a powerful API that acts as a middleware layer or client wrapper to protect your AI models from consuming sensitive data. By integrating Firewall for AI into your application via API calls, you can proactively prevent data leaks and maintain compliance without disrupting your existing workflows or model updates.
Absolutely! In addition to the pre-built detectors, Firewall for AI allows you to create custom detectors tailored to your specific requirements. You can either fine-tune one of our pre-configured detection rules or build your own detector from scratch using our intuitive API. Nightfall supports many traditional detector types such as regular expressions, exact data matching, and word list/dictionaries. Check out our dedicated guide on creating custom detectors for more information.
You can start scanning for sensitive data in just a few minutes. Our developer-friendly API and comprehensive documentation make it easy to integrate Firewall for AI into your application. Follow our Quickstart guide at this link for step-by-step instructions on setting up the API, configuring detectors, and making your first API call.
We offer a free tier that allows you to sign up and start using Firewall for AI with zero upfront costs or commitments. This tier provides a generous data scanning capacity and access to all the core features.
We offer enterprise pricing plans for advanced requirements such as higher data volumes, custom rate limits, and dedicated support.
Contact our team at sales@nightfall.ai or via the contact form on our website to discuss your specific needs and get a tailored pricing quote.
Don't hesitate to get in touch with us directly via email at or through the c on our website.
We host on Wednesdays at 12 pm PT to help answer questions, talk through any ideas, and chat about data security. We would love to see you there!
Remove the annotation for a finding
The UUID of the finding to unannotate
Successful response (even if annotation does not exist)
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
Fetch an annotation by ID
The UUID of the annotation to fetch
Successful response
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
The annotation id
The annotation comment
Whether the annotation applies to all findings of this sensitive data
Annotate a finding
The UUID of the finding to annotate
The comment to add to the annotation
Whether the annotation applies to all findings of this sensitive data (defaults to true)
Successful response
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
The annotation id
The annotation comment
Whether the annotation applies to all findings of this sensitive data
Perform an action on a list of violations. If an action can't be performed on a violation, that violation is ignored. Depending on the action, it could be processed immediately or queued.
The UUIDs of the violations to perform the action on
Successful response (processed immediately)
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
violation UUIDs that were processed
Fetch a list of violations based on some filters
Unix timestamp in seconds, filters records created ≥ the value, defaults to -90 days UTC
Unix timestamp in seconds, filters records created < the value, defaults to end of the current day UTC
Unix timestamp in seconds, filters records updated > the value
The maximum number of records to be returned in the response
Cursor for getting the next page of results
Sort key and direction, defaults to descending order by creation time
The query containing filter clauses
Query structure and terminology
A query clause
consists of a field
followed by an operator
followed by a value
:
term | value |
---|---|
clause | user_email:"amy@rocketrides.io" |
field | user_email |
operator | : |
value | amy@rocketrides.io |
You can combine multiple query clauses in a search by separating them with a space.
Field types, substring matching, and numeric comparators
Every search field supports exact matching with a :
. Certain fields such as user_email
and user_name
support substring matching.
Quotes
You may use quotation marks around string values. Quotation marks are required in case the value contains spaces. For example:
user_mail:john@example.com
user_name:"John Doe"
Special Characters
+ - && || ! ( ) { } [ ] ^ " ~ * ? :
are special characters need to be escaped using \
. For example:
(1+1):2
should be searched for using \(1\+1)\:2
Search Syntax
The following table lists the syntax that you can use to construct a query.
SYNTAX | USAGE | DESCRIPTION | EXAMPLES |
---|---|---|---|
: |
field:value | Exact match operator (case insensitive) | state:"pending" returns records where the currency is exactly "PENDING" in a case-insensitive comparison |
(space) |
field1:value1 field2:value2 | The query returns only records that match both clauses | state:active slack.channel_name:general |
OR |
field:(value1 OR value2) | The query returns records that match either of the values (case insensitive) | state:(active OR pending) |
Query Fields
param | description |
---|---|
state | the violation states to filter on |
user_email | the emails of users updating the resource resulting in the violation |
user_name | the usernames of users updating the resource resulting in the violation |
integration_name | the integration to filter on |
confidence | one or more likelihoods/confidences |
policy_id | one or more policy IDs |
detection_rule_id | one or more detection rule IDs |
detector_id | one or more detector IDs |
risk_label | the risk label to filter on |
risk_source | the risk determination source to filter on |
slack.channel_name | the slack channel names to filter on |
slack.channel_id | the slack channel IDs to filter on |
slack.workspace | the slack workspaces to filter on |
confluence.parent_page_name | the names of the parent pages in confluence to filter on |
confluence.space_name | the names of the spaces in confluence to filter on |
gdrive.drive | the drive names in gdrive to filter on |
jira.project_name | the jira project names to filter on |
jira.ticket_number | the jira ticket numbers to filter on |
salesforce.org_name | the salesforce organization names to filter on |
salesforce.object | the salesforce object names to filter on |
salesforce.record_id | the salesforce record IDs to filter on |
github.author_email | the github author emails to filter on |
github.branch | the github branches to filter on |
github.commit | the github commit ids to filter on |
github.org | the github organizations to filter on |
github.repository | the github repositories to filter on |
github.repository_owner | the github repository owners to filter on |
teams.team_name | the m365 teams team names to filter on |
teams.channel_name | the m365 teams channels to filter on |
teams.channel_type | the m365 teams channel types to filter on |
teams.team_sensitivity | the m365 teams sensitivities to filter on |
teams.sender | the m365 teams senders to filter on |
teams.msg_importance | the m365 teams importance to filter on |
teams.msg_attachment | the m365 teams attachment names to filter on |
teams.chat_id | the m365 teams chat ID to filter on |
teams.chat_type | the m365 teams chat type to filter on |
teams.chat_topic | the m365 teams chat topic to filter on |
teams.chat_participant | the m365 teams chat participant's display name to filter on |
onedrive.drive_owner | drive owner's display name to filter on |
onedrive.drive_owner_email | drive owner's email to filter on |
onedrive.file_name | the file name to filter on |
onedrive.created_by | the m365 user, who created the file in the drive, display name to filter on |
onedrive.created_by_email | the m365 users, who created the file in the drive, email to filter on |
onedrive.modified_by | the m365 users, who last modified the file in the drive, display name to filter on |
onedrive.modified_by_email | the m365 users, who last modified the file in the drive, email to filter on |
zendesk.ticket_status | the zendesk ticket status to filter on |
zendesk.ticket_title | the zendesk ticket titles to filter on |
zendesk.ticket_group_assignee | the zendesk ticket assignee groups to filter on |
zendesk.current_user_role | the zendesk ticket current assignee user's roles to filter on |
notion.created_by | the names of the users creating a resource in notion to filter on |
notion.last_edited_by | the names of the users editing a resource in notion to filter on |
notion.page_title | the page names in notion to filter on |
notion.workspace_name | the workspace names in notion to filter on |
gmail.user_name | the names of the sender to filter on |
gmail.from | the email of sender to filter on |
gmail.to | the email or name of recipients to filter on |
gmail.cc | the email or name of cc to filter on |
gmail.bcc | the email or name of bcc to filter on |
gmail.thread_id | the thread id of email to filter on |
gmail.subject | the subject of email to filter on |
gmail.attachment_name | the name of attachment to filter on |
gmail.attachment_type | the type of attachment to filter on |
Successful response
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
The violation id
Unix timestamp when the violation was created
Unix timestamp when the violation was updated
Possible actions for the violation
The link to the resource on the integration
The channel name in case of a message in a channel
Type of location
User name
ID - user
Link to message
Members for the location
Count of members for the location
ID - channel
Name of workspace
Name of item
Type of item
Archived status
Unix timestamp
Unix timestamp
List of labels
Name of space
Key of space
Link of space
Parent page
Name of author
Email of author
Link of author name
Link to resource
ID - Confluence internal
ID - Confluence user
Version of item
ID - parent page
Version of parent page
ID of file
The name of the file
Type of file
File size
Link to file
Permissions
User list shared with - external
User list shared with - internal
Available for viewers to download
File owner
In trash
Unix timestamp, when the file was created
Unix timestamp, when the file was updated
Drive name
Updated by user
Name of project
Ticket number
Type of project
ID for the issue
Link to project
Link to ticket
Link to comment
Link to attachment
Branch on which violation occurred
Name of the organization or username in case of an individual account
Name of the repository
Email of the user who pushed the changes to GitHub
Username of the user who pushed the changes to GitHub
Unix timestamp
Boolean to check if the repo is private or public
Path of the file on which violation occurred
Permalink to the version of the file where sensitive content was identified
Owner of the repository
Link to the repository
Name of the Salesforce organization
ID of the record
Name of the object
Attachment or Object
ID of the user
Salesforce username of the author
Unix timestamp when the object was last updated
Fields of the Object
File Type
Link to the attachment
Name of the attachment
Link to the object
Status of the ticket
Title of the ticket
Ticket requested by
Group the ticket is assigned to
Agent the ticket is assigned to
User role
ID of the ticket
Followers of the ticket
Tags for the ticket
Unix timestamp
Unix timestamp
Location
Sub-location
ID - ticket comment
ID - ticket group
Link to the ticket group
ID - ticket agent
Link - ticket agent
Ticket event
Role of the user
Name of the attachment
Link for the attachment
Page creator
Page update by
Workspace name
Link to workspace
ID of the page
Title of the page
Unix timestamp
Unix timestamp
Private page link
Public page link
Externally shared state
ID of the attachment
Page URL where the extension is launched
Specific location on the page
Browser type
Remediation comment from the user
Name of the team containing the channel where the message was sent
ID of the tenant
Domain name of the tenant
ID of the team containing the channel where the message was sent
Visibility of the team containing the channel where the message was sent
Web URL of the team containing the channel where the message was sent
ID of the channel where the message was sent
Name of the channel where the message was sent
Type of the channel where the message was sent
Web URL of the channel where the message was sent
ID of the message
Unix timestamp
Unix timestamp
Sender of the chat message
ID of the user who sent the message
Principal name of the user who sent the message
Attachment details
ID of the attachment present in the message
Name of the attachment present in the message
URL of the attachment present in the message
Importance of the sent message
ID of the chat conversation
Type of the chat conversation (one-on-one, group, meeting)
Topic or subject of the chat conversation
ID of the user participating in the chat conversation
email address of the chat participant
display name of the chat participant
ID of the tenant
Domain name of the tenant
ID of the drive item
Name of the drive item
URL of the drive item
Mime type of the drive item
Size of the drive item in bytes
Path to the drive item relative to the root of the drive
ID of the user who created the drive item
Email of the user who last updated the drive item
ID of the user who last updated the drive item
Name of the user who last updated the drive item
Unix timestamp when the drive item was created
Unix timestamp when the drive item was last updated
Name of the special folder if drive item is inside one
ID of the drive where the drive item is present
Name of user who owns the drive where the drive item is present
Email of user who owns the drive where the drive item is present
ID of user who owns the drive where the drive item is present
Domain of the company where email was sent from
User Name who sent the email
Email of the sender
Recipients of the Email
Recipients mentioned in the CC field of the Email
Recipients mentioned in the BCC field of the Email
Subject of the email
Unix timestamp of when email was sent
ThreadID of the email
Name of the attachment
Type of attachment
The name of the file
The file mime type
The link to the resource on the integration
Policies violated
Detection rules triggered
Detectors triggered
The calculated score of the risk for this violation
Username as on the integration
User email as on the integration, may be empty
Next page cursor, omitted if end of results reached
Fetch a violation by ID
The UUID of the violation to fetch
Successful response
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
The violation id
Unix timestamp when the violation was created
Unix timestamp when the violation was updated
Possible actions for the violation
The link to the resource on the integration
The channel name in case of a message in a channel
Type of location
User name
ID - user
Link to message
Members for the location
Count of members for the location
ID - channel
Name of workspace
Name of item
Type of item
Archived status
Unix timestamp
Unix timestamp
List of labels
Name of space
Key of space
Link of space
Parent page
Name of author
Email of author
Link of author name
Link to resource
ID - Confluence internal
ID - Confluence user
Version of item
ID - parent page
Version of parent page
ID of file
The name of the file
Type of file
File size
Link to file
Permissions
User list shared with - external
User list shared with - internal
Available for viewers to download
File owner
In trash
Unix timestamp, when the file was created
Unix timestamp, when the file was updated
Drive name
Updated by user
Name of project
Ticket number
Type of project
ID for the issue
Link to project
Link to ticket
Link to comment
Link to attachment
Branch on which violation occurred
Name of the organization or username in case of an individual account
Name of the repository
Email of the user who pushed the changes to GitHub
Username of the user who pushed the changes to GitHub
Unix timestamp
Boolean to check if the repo is private or public
Path of the file on which violation occurred
Permalink to the version of the file where sensitive content was identified
Owner of the repository
Link to the repository
Name of the Salesforce organization
ID of the record
Name of the object
Attachment or Object
ID of the user
Salesforce username of the author
Unix timestamp when the object was last updated
Fields of the Object
File Type
Link to the attachment
Name of the attachment
Link to the object
Status of the ticket
Title of the ticket
Ticket requested by
Group the ticket is assigned to
Agent the ticket is assigned to
User role
ID of the ticket
Followers of the ticket
Tags for the ticket
Unix timestamp
Unix timestamp
Location
Sub-location
ID - ticket comment
ID - ticket group
Link to the ticket group
ID - ticket agent
Link - ticket agent
Ticket event
Role of the user
Name of the attachment
Link for the attachment
Page creator
Page update by
Workspace name
Link to workspace
ID of the page
Title of the page
Unix timestamp
Unix timestamp
Private page link
Public page link
Externally shared state
ID of the attachment
Page URL where the extension is launched
Specific location on the page
Browser type
Remediation comment from the user
Name of the team containing the channel where the message was sent
ID of the tenant
Domain name of the tenant
ID of the team containing the channel where the message was sent
Visibility of the team containing the channel where the message was sent
Web URL of the team containing the channel where the message was sent
ID of the channel where the message was sent
Name of the channel where the message was sent
Type of the channel where the message was sent
Web URL of the channel where the message was sent
ID of the message
Unix timestamp
Unix timestamp
Sender of the chat message
ID of the user who sent the message
Principal name of the user who sent the message
Attachment details
ID of the attachment present in the message
Name of the attachment present in the message
URL of the attachment present in the message
Importance of the sent message
ID of the chat conversation
Type of the chat conversation (one-on-one, group, meeting)
Topic or subject of the chat conversation
ID of the user participating in the chat conversation
email address of the chat participant
display name of the chat participant
ID of the tenant
Domain name of the tenant
ID of the drive item
Name of the drive item
URL of the drive item
Mime type of the drive item
Size of the drive item in bytes
Path to the drive item relative to the root of the drive
ID of the user who created the drive item
Email of the user who last updated the drive item
ID of the user who last updated the drive item
Name of the user who last updated the drive item
Unix timestamp when the drive item was created
Unix timestamp when the drive item was last updated
Name of the special folder if drive item is inside one
ID of the drive where the drive item is present
Name of user who owns the drive where the drive item is present
Email of user who owns the drive where the drive item is present
ID of user who owns the drive where the drive item is present
Domain of the company where email was sent from
User Name who sent the email
Email of the sender
Recipients of the Email
Recipients mentioned in the CC field of the Email
Recipients mentioned in the BCC field of the Email
Subject of the email
Unix timestamp of when email was sent
ThreadID of the email
Name of the attachment
Type of attachment
The name of the file
The file mime type
The link to the resource on the integration
Policies violated
Detection rules triggered
Detectors triggered
The calculated score of the risk for this violation
Username as on the integration
User email as on the integration, may be empty
Get findings for a specific violation
The UUID of the violation
Cursor for getting the next page of results
Number of findings to fetch in one page (max 1000)
Successful response
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
The id of the findings
The id of the detector that was triggered
The sub detector id in case the detector uses a combination of detectors
The likelihood of the detection
The redacted sensitive data
Data preceding the sensitive data
Data after the sensitive data
Start point for a range
End point for a range
Start point for a range
End point for a range
Additional details about the key
Metadata/sub-location of the finding in the resource. For example - title
or description
for a Jira ticket.
The annotation id, if present
Next page cursor, omitted if end of results reached
Fetch a list of violations for a period
Unix timestamp in seconds, filters records created ≥ the value, defaults to -90 days UTC
Unix timestamp in seconds, filters records created < the value, defaults to end of the current day UTC
Unix timestamp in seconds, filters records updated > the value
The maximum number of records to be returned in the response
Cursor for getting the next page of results
Successful response
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
The violation id
Unix timestamp when the violation was created
Unix timestamp when the violation was updated
Possible actions for the violation
The link to the resource on the integration
The channel name in case of a message in a channel
Type of location
User name
ID - user
Link to message
Members for the location
Count of members for the location
ID - channel
Name of workspace
Name of item
Type of item
Archived status
Unix timestamp
Unix timestamp
List of labels
Name of space
Key of space
Link of space
Parent page
Name of author
Email of author
Link of author name
Link to resource
ID - Confluence internal
ID - Confluence user
Version of item
ID - parent page
Version of parent page
ID of file
The name of the file
Type of file
File size
Link to file
Permissions
User list shared with - external
User list shared with - internal
Available for viewers to download
File owner
In trash
Unix timestamp, when the file was created
Unix timestamp, when the file was updated
Drive name
Updated by user
Name of project
Ticket number
Type of project
ID for the issue
Link to project
Link to ticket
Link to comment
Link to attachment
Branch on which violation occurred
Name of the organization or username in case of an individual account
Name of the repository
Email of the user who pushed the changes to GitHub
Username of the user who pushed the changes to GitHub
Unix timestamp
Boolean to check if the repo is private or public
Path of the file on which violation occurred
Permalink to the version of the file where sensitive content was identified
Owner of the repository
Link to the repository
Name of the Salesforce organization
ID of the record
Name of the object
Attachment or Object
ID of the user
Salesforce username of the author
Unix timestamp when the object was last updated
Fields of the Object
File Type
Link to the attachment
Name of the attachment
Link to the object
Status of the ticket
Title of the ticket
Ticket requested by
Group the ticket is assigned to
Agent the ticket is assigned to
User role
ID of the ticket
Followers of the ticket
Tags for the ticket
Unix timestamp
Unix timestamp
Location
Sub-location
ID - ticket comment
ID - ticket group
Link to the ticket group
ID - ticket agent
Link - ticket agent
Ticket event
Role of the user
Name of the attachment
Link for the attachment
Page creator
Page update by
Workspace name
Link to workspace
ID of the page
Title of the page
Unix timestamp
Unix timestamp
Private page link
Public page link
Externally shared state
ID of the attachment
Page URL where the extension is launched
Specific location on the page
Browser type
Remediation comment from the user
Name of the team containing the channel where the message was sent
ID of the tenant
Domain name of the tenant
ID of the team containing the channel where the message was sent
Visibility of the team containing the channel where the message was sent
Web URL of the team containing the channel where the message was sent
ID of the channel where the message was sent
Name of the channel where the message was sent
Type of the channel where the message was sent
Web URL of the channel where the message was sent
ID of the message
Unix timestamp
Unix timestamp
Sender of the chat message
ID of the user who sent the message
Principal name of the user who sent the message
Attachment details
ID of the attachment present in the message
Name of the attachment present in the message
URL of the attachment present in the message
Importance of the sent message
ID of the chat conversation
Type of the chat conversation (one-on-one, group, meeting)
Topic or subject of the chat conversation
ID of the user participating in the chat conversation
email address of the chat participant
display name of the chat participant
ID of the tenant
Domain name of the tenant
ID of the drive item
Name of the drive item
URL of the drive item
Mime type of the drive item
Size of the drive item in bytes
Path to the drive item relative to the root of the drive
ID of the user who created the drive item
Email of the user who last updated the drive item
ID of the user who last updated the drive item
Name of the user who last updated the drive item
Unix timestamp when the drive item was created
Unix timestamp when the drive item was last updated
Name of the special folder if drive item is inside one
ID of the drive where the drive item is present
Name of user who owns the drive where the drive item is present
Email of user who owns the drive where the drive item is present
ID of user who owns the drive where the drive item is present
Domain of the company where email was sent from
User Name who sent the email
Email of the sender
Recipients of the Email
Recipients mentioned in the CC field of the Email
Recipients mentioned in the BCC field of the Email
Subject of the email
Unix timestamp of when email was sent
ThreadID of the email
Name of the attachment
Type of attachment
The name of the file
The file mime type
The link to the resource on the integration
Policies violated
Detection rules triggered
Detectors triggered
The calculated score of the risk for this violation
Username as on the integration
User email as on the integration, may be empty
Next page cursor, omitted if end of results reached
Upload all bytes contained in the request body to the file identified by the ID in the path parameter.
a file ID returned from a previous file creation request
The numeric offset at which the bytes contained in the body should be written. This offset must be a multiple of the chunk size returned when the file upload was created.
The payload bytes to upload; the size of the request body must exactly match the chunkSize that was returned when the file upload was created.
Success
Creates a new file upload session. If this operation returns successfully, the ID returned as part of the response object shall be used to refer to the file in all subsequent upload and scanning operations.
the number of bytes representing the size of the file to-be-uploaded.
Success
a UUID to uniquely identify a particular file upload
the size of the file in bytes
the number of bytes to upload in each chunk upload request
an RFC2045 media type that describes the underlying content type
Validates that all bytes of the file have been uploaded, and that the content type is supported by Nightfall.
a file ID returned from a previous file creation request
Success
a UUID to uniquely identify a particular file upload
the size of the file in bytes
the number of bytes to upload in each chunk upload request
an RFC2045 media type that describes the underlying content type
Triggers a scan of the file identified by the provided fileID. As the underlying file might be arbitrarily large, this scan is conducted asynchronously. Results from the scan are delivered to the webhook URL provided in the request payload.
a file ID returned from a previous file creation request
the UUID of the Detection Policy to be used with this scan. Exactly one of this field or "policy" should be provided.
A list of pre-existing detection rule UUIDs to scan a file against. These UUIDs can be fetched from the Nightfall Dashboard.
A list of inlined detection rule definitions to scan a file against.
An optional name for the detection rule.
Supported values ALL or ANY. Applies a logical "AND" or "OR" (respectively) to the list of detectors to decide when a finding should be surfaced.
A list of detectors the request payload should be scanned against.
The minimum number of findings required in order for this detector to be reported.
The confidence level of a finding.
The UUID of a pre-existing detector to use. If this value is provided, all below fields are ignored.
The display name for this detector's findings in the response.
The type of detector.
The name for a Nightfall detector.
The regex object for the regex detector, context rules, and exclusion rules.
The regex pattern to match on.
The case sensitivity for the regex pattern.
The WordList object for wordList detector and exclusion rules.
A list of words for wordList.
The case sensitivity for words in the wordList. If false, ignore the case of findings.
A list of context rules.
The regex object for the regex detector, context rules, and exclusion rules.
The regex pattern to match on.
The case sensitivity for the regex pattern.
The object containing the length of characters before and after finding to evaluate context.
The number of leading characters to include as context before the finding itself.
The number of trailing characters to include as context after the finding itself.
The object containing the confidence level to adjust findings to.
The confidence level of a finding.
A list of exclusion rules.
The type of match for a pattern.
The type of exclusion rule.
The regex object for the regex detector, context rules, and exclusion rules.
The regex pattern to match on.
The case sensitivity for the regex pattern.
The WordList object for wordList detector and exclusion rules.
A list of words for wordList.
The case sensitivity for words in the wordList. If false, ignore the case of findings.
A config that determines how a finding will be redacted. Must contain exactly one of [maskConfig, infoTypeSubstitutionConfig, substitutionConfig, cryptoConfig].
A config that masks a sensitive finding. e.g. '4242-4242-4242-4242' can be configured to be redacted to '####-####-####-4242'.
The UTF-8 character used to mask a finding. If not provided, we will mask with an asterisk "*". Other examples include "#", "X", "🙅🏽", "🙈", etc.
A list of characters that will not be masked. For example, you could set this field to ["-","@"] to preserve formatting context that is typically present in credit cards or emails (e.g. --- versus *********, or ***************** versus @).
A character that will not be masked. e.g. "-"
The number of characters that will be left unmasked. For instance, if you want to mask all but the last 4 digits of a credit card number, set this value to 4 so that the redacted finding would look like ***************4242.
Determines if masking is applied left to right (/1984) instead of right to left (01/01). By default, this value is false.
A config that substitutes a sensitive finding with the name of the NIGHTFALL_DETECTOR
that triggered it. This config is only valid for detector's with detectorType NIGHTFALL_DETECTOR
. e.g. '4242-4242-4242-4242' can be configured to be redacted to '[CREDIT_CARD_NUMBER]'.
A config that substitutes a sensitive finding with the configured substitutionPhrase. If no substitutionPhrase is configured, it will substitute the finding with an empty string. For example, 'my cc is 4242-4242-4242-4242' can be configured to be redacted to 'my cc is <oh no!🙈>'
The value that will replace a sensitive finding. e.g. '<oh no!🙈>'
A config that will encrypt a sensitive finding with the provided PEM formatted public key using RSA encryption.
The PEM formatted public key block that will be used to encrypt findings. Currently, only RSA encryption is supported.
Here's an example PEM formatted public key block:
-----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAydYMwOYUGyBXDgHkzv19 YR/dYQES4kYTMUps39qv/amNDywz4nsBDvCUqUvcN3nEpplHlYGH5ShSeA4G/Fcm RqynSLVyFPZat/8E7n+EeHsgihFrr8oDWo5UBjCwRinTrC0m11q/5SeNzwVCWkf9 x40u94QBz13dQoa9yPwaZBX5uBzyH86R7yeZHpad2cLq0ltpmJ3j5UfsFilkOb3J B60TNpNDdfabprot/y30CEnDDOgAXGtV1m0AhQpQjKRnkUs39DntqSbS+i0Ugbyq zEGNUkeR1WsotXekW4KnbWA7k6S8SfkO27vnTSY5b9g/KKaOdysn5YaWJPfTVT/n ywIDAQAB -----END PUBLIC KEY-----
Determines if the response object will contain the un-redacted sensitive finding that was triggered by the scan. Defaults to false.
The scope to run the detector over. Setting any detector to File will cause it to run against the file name.
A configuration object that allows clients to specify where alerts should be delivered when findings are discovered as part of a scan. These alerts are delivered asynchronously to the provided platforms.
Contains the configuration required to allow clients to send asynchronous alerts to a Slack workspace when findings are detected. In order to use this alert destination, you must first authenticate Nightfall to your Slack workspace under the Settings menu on the Nightfall Dashboard. Alerts are only sent if findings are detected.
The name of the Slack conversation to which alerts should be sent. Currently, Nightfall supports sending alerts to public channels, formatted like "#general".
Contains the configuration required to allow clients to send an asynchronous email message when findings are detected. Alerts are only sent if findings are detected.
The email address to which alerts should be sent.
Contains the configuration required to allow clients to send a webhook event to an external URL when findings are detected. When findings are detected, an alert is always sent to the webhook, even when there are no findings.
The URL to which alerts should be sent. This URL must (1) use the HTTPS scheme, (2) be able to accept requests made with the POST verb, and (3) respond with a 200 status code upon receipt of the event.
Contains the configuration required to allow clients to send a SIEM events to an external URL when findings are detected. When findings are detected, an alert is always sent to the webhook, even when there are no findings.
The URL to which alerts should be sent. This URL must (1) use the HTTPS scheme, (2) be able to accept requests made with the POST verb, and (3) respond with a 200 status code upon receipt of the event.
Sensitive header key value pairs to include in the SIEM request. Used for adding sensitive content like authentication tokens.
Header key value pairs to include in the SIEM request.
A config that determines how a finding will be redacted. Must contain exactly one of [maskConfig, infoTypeSubstitutionConfig, substitutionConfig, cryptoConfig].
A config that masks a sensitive finding. e.g. '4242-4242-4242-4242' can be configured to be redacted to '####-####-####-4242'.
The UTF-8 character used to mask a finding. If not provided, we will mask with an asterisk "*". Other examples include "#", "X", "🙅🏽", "🙈", etc.
A list of characters that will not be masked. For example, you could set this field to ["-","@"] to preserve formatting context that is typically present in credit cards or emails (e.g. --- versus *********, or ***************** versus @).
A character that will not be masked. e.g. "-"
The number of characters that will be left unmasked. For instance, if you want to mask all but the last 4 digits of a credit card number, set this value to 4 so that the redacted finding would look like ***************4242.
Determines if masking is applied left to right (/1984) instead of right to left (01/01). By default, this value is false.
A config that substitutes a sensitive finding with the name of the NIGHTFALL_DETECTOR
that triggered it. This config is only valid for detector's with detectorType NIGHTFALL_DETECTOR
. e.g. '4242-4242-4242-4242' can be configured to be redacted to '[CREDIT_CARD_NUMBER]'.
A config that substitutes a sensitive finding with the configured substitutionPhrase. If no substitutionPhrase is configured, it will substitute the finding with an empty string. For example, 'my cc is 4242-4242-4242-4242' can be configured to be redacted to 'my cc is <oh no!🙈>'
The value that will replace a sensitive finding. e.g. '<oh no!🙈>'
A config that will encrypt a sensitive finding with the provided PEM formatted public key using RSA encryption.
The PEM formatted public key block that will be used to encrypt findings. Currently, only RSA encryption is supported.
Here's an example PEM formatted public key block:
-----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAydYMwOYUGyBXDgHkzv19 YR/dYQES4kYTMUps39qv/amNDywz4nsBDvCUqUvcN3nEpplHlYGH5ShSeA4G/Fcm RqynSLVyFPZat/8E7n+EeHsgihFrr8oDWo5UBjCwRinTrC0m11q/5SeNzwVCWkf9 x40u94QBz13dQoa9yPwaZBX5uBzyH86R7yeZHpad2cLq0ltpmJ3j5UfsFilkOb3J B60TNpNDdfabprot/y30CEnDDOgAXGtV1m0AhQpQjKRnkUs39DntqSbS+i0Ugbyq zEGNUkeR1WsotXekW4KnbWA7k6S8SfkO27vnTSY5b9g/KKaOdysn5YaWJPfTVT/n ywIDAQAB -----END PUBLIC KEY-----
Determines if the response object will contain the un-redacted sensitive finding that was triggered by the scan. Defaults to false.
Determines if a redacted version of the file will be returned, if available for the mime type. Current supported mime types are CSV and TSV. Defaults to false.
A string containing arbitrary metadata. Callers may opt to use this to help identify their input file upon receiving a webhook response. Maximum length 10 KB.
Success
a UUID to uniquely identify a particular file upload
message indicating that file scanning has been initiated
Provide a list of arbitrary string data, and scan each item with the provided detectors to uncover sensitive information. Returns a list equal in size to the number of provided string payloads. The item at each list index will be a list of all matches for the provided detectors, or an empty list if no occurrences are found.
A list of UUIDs referring to policies to use to scan the request payload. Policies can be built in the Nightfall Dashboard. Maximum 1.
A policy UUID.
Policy can contain a list of pre-configured detection rule UUID's and/or a list of inline detection rules with which to scan the request payload. At least one list must be non-empty.
A list of UUIDs referring to detection rules to use to scan the request payload. Detection rules can be built in the Nightfall dashboard. Maximum 20.
A detection rule UUID.
A list of inline detection rule definitions to use to scan the request payload. Maximum 20.
An optional name for the detection rule.
Supported values ALL or ANY. Applies a logical "AND" or "OR" (respectively) to the list of detectors to decide when a finding should be surfaced.
A list of detectors the request payload should be scanned against.
The minimum number of findings required in order for this detector to be reported.
The confidence level of a finding.
The UUID of a pre-existing detector to use. If this value is provided, all below fields are ignored.
The display name for this detector's findings in the response.
The type of detector.
The name for a Nightfall detector.
The regex object for the regex detector, context rules, and exclusion rules.
The regex pattern to match on.
The case sensitivity for the regex pattern.
The WordList object for wordList detector and exclusion rules.
A list of words for wordList.
The case sensitivity for words in the wordList. If false, ignore the case of findings.
A list of context rules.
The regex object for the regex detector, context rules, and exclusion rules.
The regex pattern to match on.
The case sensitivity for the regex pattern.
The object containing the length of characters before and after finding to evaluate context.
The number of leading characters to include as context before the finding itself.
The number of trailing characters to include as context after the finding itself.
The object containing the confidence level to adjust findings to.
The confidence level of a finding.
A list of exclusion rules.
The type of match for a pattern.
The type of exclusion rule.
The regex object for the regex detector, context rules, and exclusion rules.
The regex pattern to match on.
The case sensitivity for the regex pattern.
The WordList object for wordList detector and exclusion rules.
A list of words for wordList.
The case sensitivity for words in the wordList. If false, ignore the case of findings.
A config that determines how a finding will be redacted. Must contain exactly one of [maskConfig, infoTypeSubstitutionConfig, substitutionConfig, cryptoConfig].
A config that masks a sensitive finding. e.g. '4242-4242-4242-4242' can be configured to be redacted to '####-####-####-4242'.
The UTF-8 character used to mask a finding. If not provided, we will mask with an asterisk "*". Other examples include "#", "X", "🙅🏽", "🙈", etc.
A list of characters that will not be masked. For example, you could set this field to ["-","@"] to preserve formatting context that is typically present in credit cards or emails (e.g. --- versus *********, or ***************** versus @).
A character that will not be masked. e.g. "-"
The number of characters that will be left unmasked. For instance, if you want to mask all but the last 4 digits of a credit card number, set this value to 4 so that the redacted finding would look like ***************4242.
Determines if masking is applied left to right (/1984) instead of right to left (01/01). By default, this value is false.
A config that substitutes a sensitive finding with the name of the NIGHTFALL_DETECTOR
that triggered it. This config is only valid for detector's with detectorType NIGHTFALL_DETECTOR
. e.g. '4242-4242-4242-4242' can be configured to be redacted to '[CREDIT_CARD_NUMBER]'.
A config that substitutes a sensitive finding with the configured substitutionPhrase. If no substitutionPhrase is configured, it will substitute the finding with an empty string. For example, 'my cc is 4242-4242-4242-4242' can be configured to be redacted to 'my cc is <oh no!🙈>'
The value that will replace a sensitive finding. e.g. '<oh no!🙈>'
A config that will encrypt a sensitive finding with the provided PEM formatted public key using RSA encryption.
The PEM formatted public key block that will be used to encrypt findings. Currently, only RSA encryption is supported.
Here's an example PEM formatted public key block:
-----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAydYMwOYUGyBXDgHkzv19 YR/dYQES4kYTMUps39qv/amNDywz4nsBDvCUqUvcN3nEpplHlYGH5ShSeA4G/Fcm RqynSLVyFPZat/8E7n+EeHsgihFrr8oDWo5UBjCwRinTrC0m11q/5SeNzwVCWkf9 x40u94QBz13dQoa9yPwaZBX5uBzyH86R7yeZHpad2cLq0ltpmJ3j5UfsFilkOb3J B60TNpNDdfabprot/y30CEnDDOgAXGtV1m0AhQpQjKRnkUs39DntqSbS+i0Ugbyq zEGNUkeR1WsotXekW4KnbWA7k6S8SfkO27vnTSY5b9g/KKaOdysn5YaWJPfTVT/n ywIDAQAB -----END PUBLIC KEY-----
Determines if the response object will contain the un-redacted sensitive finding that was triggered by the scan. Defaults to false.
The scope to run the detector over. Setting any detector to File will cause it to run against the file name.
The number of bytes to include as before / after context when a finding is returned. Maximum 40.
A config that determines how a finding will be redacted. Must contain exactly one of [maskConfig, infoTypeSubstitutionConfig, substitutionConfig, cryptoConfig].
A config that masks a sensitive finding. e.g. '4242-4242-4242-4242' can be configured to be redacted to '####-####-####-4242'.
The UTF-8 character used to mask a finding. If not provided, we will mask with an asterisk "*". Other examples include "#", "X", "🙅🏽", "🙈", etc.
A list of characters that will not be masked. For example, you could set this field to ["-","@"] to preserve formatting context that is typically present in credit cards or emails (e.g. --- versus *********, or ***************** versus @).
A character that will not be masked. e.g. "-"
The number of characters that will be left unmasked. For instance, if you want to mask all but the last 4 digits of a credit card number, set this value to 4 so that the redacted finding would look like ***************4242.
Determines if masking is applied left to right (/1984) instead of right to left (01/01). By default, this value is false.
A config that substitutes a sensitive finding with the name of the NIGHTFALL_DETECTOR
that triggered it. This config is only valid for detector's with detectorType NIGHTFALL_DETECTOR
. e.g. '4242-4242-4242-4242' can be configured to be redacted to '[CREDIT_CARD_NUMBER]'.
A config that substitutes a sensitive finding with the configured substitutionPhrase. If no substitutionPhrase is configured, it will substitute the finding with an empty string. For example, 'my cc is 4242-4242-4242-4242' can be configured to be redacted to 'my cc is <oh no!🙈>'
The value that will replace a sensitive finding. e.g. '<oh no!🙈>'
A config that will encrypt a sensitive finding with the provided PEM formatted public key using RSA encryption.
The PEM formatted public key block that will be used to encrypt findings. Currently, only RSA encryption is supported.
Here's an example PEM formatted public key block:
-----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAydYMwOYUGyBXDgHkzv19 YR/dYQES4kYTMUps39qv/amNDywz4nsBDvCUqUvcN3nEpplHlYGH5ShSeA4G/Fcm RqynSLVyFPZat/8E7n+EeHsgihFrr8oDWo5UBjCwRinTrC0m11q/5SeNzwVCWkf9 x40u94QBz13dQoa9yPwaZBX5uBzyH86R7yeZHpad2cLq0ltpmJ3j5UfsFilkOb3J B60TNpNDdfabprot/y30CEnDDOgAXGtV1m0AhQpQjKRnkUs39DntqSbS+i0Ugbyq zEGNUkeR1WsotXekW4KnbWA7k6S8SfkO27vnTSY5b9g/KKaOdysn5YaWJPfTVT/n ywIDAQAB -----END PUBLIC KEY-----
Determines if the response object will contain the un-redacted sensitive finding that was triggered by the scan. Defaults to false.
A configuration object that allows clients to specify where alerts should be delivered when findings are discovered as part of a scan. These alerts are delivered asynchronously to the provided platforms.
Contains the configuration required to allow clients to send asynchronous alerts to a Slack workspace when findings are detected. In order to use this alert destination, you must first authenticate Nightfall to your Slack workspace under the Settings menu on the Nightfall Dashboard. Alerts are only sent if findings are detected.
The name of the Slack conversation to which alerts should be sent. Currently, Nightfall supports sending alerts to public channels, formatted like "#general".
Contains the configuration required to allow clients to send an asynchronous email message when findings are detected. Alerts are only sent if findings are detected.
The email address to which alerts should be sent.
Contains the configuration required to allow clients to send a webhook event to an external URL when findings are detected. When findings are detected, an alert is always sent to the webhook, even when there are no findings.
The URL to which alerts should be sent. This URL must (1) use the HTTPS scheme, (2) be able to accept requests made with the POST verb, and (3) respond with a 200 status code upon receipt of the event.
Contains the configuration required to allow clients to send a SIEM events to an external URL when findings are detected. When findings are detected, an alert is always sent to the webhook, even when there are no findings.
The URL to which alerts should be sent. This URL must (1) use the HTTPS scheme, (2) be able to accept requests made with the POST verb, and (3) respond with a 200 status code upon receipt of the event.
Sensitive header key value pairs to include in the SIEM request. Used for adding sensitive content like authentication tokens.
Header key value pairs to include in the SIEM request.
The text sample(s) you wish to scan. This data is passed as a string list, so you may choose to segment your text into multiple items for better granularity. The aggregate size of your text (summed across all items in the list) must not exceed 500 KB for any individual request, and the number of items in that list may not exceed 50,000.
A collection of strings to scan.
Success
A list of all findings that were detected in the request payload. Each item in the list is a list of all findings that occurred at the corresponding list index from the input payload.
The string that triggered a match during the scan.
The redacted version of finding. This key is omitted if no redactionConfig was configured the detector that triggered the match.
The sequence of bytes that occurred directly prior to the matched finding. The number of bytes is usually equal to the requested number from the request config, but it could be smaller if the finding occurs near the beginning of the payload. This key is omitted if no context was requested.
The sequence of bytes that occurred directly after the matched finding. The number of bytes is usually equal to the requested number from the request config, but it could be smaller if the finding occurs near the end of the payload. This key is omitted if no context was requested.
Metadata describing the detector that matched the finding.
The display name of the detector that matched the finding.
The UUID of the detector that matched the finding. This UUID can be looked up in the Nightfall dashboard.
Optional metadata describing the subdetector that matched the finding.
The display name of the subdetector that matched the finding.
The UUID of the subdetector that matched the finding. This UUID can be looked up in the Nightfall dashboard.
The confidence level of a finding.
The location of the finding in the corresponding original input payload string.
The index of the fragment's starting byte.
The index of the fragment's ending byte.
The index of the fragment's starting codepoint character.
The index of the fragment's ending codepoint character.
The location of the redacted finding in the corresponding redactedPayload string.
The index of the fragment's starting byte.
The index of the fragment's ending byte.
The index of the fragment's starting codepoint character.
The index of the fragment's ending codepoint character.
A list containing the the redacted version of each string in the input payload. If no redactions were applied, the corresponding string will be empty.
Update a policy user scope, define inclusion/exclusion rule for users using user emails. Only supports gDrive policies, separates internal or external users based on google domains registered in Nightfall.
The UUID of the policy to update
user emails to be added in inclusion setting, supports both internal & external users
user emails to be added in exclusion setting, supports both internal & external users
user emails to be removed in inclusion setting, supports both internal & external users
user emails to be removed in exclusion setting, supports both internal & external users
Successful response (processed immediately)
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
a list of all included user identifiers (emails or id's) in the policy
a list of all excluded user identifiers (emails or id's) in the policy
Return a list of Github repositories that Nightfall has access to. Each Github repository includes details like whether the repository is getting monitored or not, the last scanned time the Nightfall ran the scan against the applicable policies.
The maximum number of records to be returned in the response
Cursor for getting the next page of results
Successful response
How many remaining requests you can make within the next second before being throttled
How many remaining requests you can make within the next quota period
When the current quota period expires
the list of repositories being scanned
The GitHub repository ID
The name of the repository
Whether the repo is private
The URL of the repository
Unix timestamp, the last scan time of any file/commit in the repository. Omitted if not scanned yet.
Whether the repository is covered by a policy
GitHub username in case of a personal account and organization name in case of an organization
Next page cursor, omitted if end of results reached