Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Tips for improving finding accuracy.
Our models learn from diverse data sets, aim for high accuracy, and often perform well. However, customer data sometimes contains new patterns. Think of our models as capable students who excel in some subjects but still have unavoidable gaps in their knowledge. New data may expose gaps.
By reporting these misses, you directly help close knowledge gaps and improve alerting for your company and others.
Reporting misidentified findings
If your company is a member of our ML training program, annotate the false positive in the Violations UI. Your annotated sample will be added to our ML data set and used in upcoming model re-training.
If not an ML training program member, please send us anonymized samples without changing the key pattern.
Reach out to support@nightfall.ai for more information about our ML training program.
If the customer needs a fix urgently, many cases can be addressed by extending a Nightfall detector. if you need help, please don’t hesitate to ask for help.
Reporting missed sensitive data.
Please provide us with anonymized samples without changing the key pattern.
If anonymization is impossible, coordinate with us to send the data using a secure tool like Keybase.
With the Nightfall Detection Engine, there are various ways you can improve the accuracy of detection:
Share your false positives with Nightfall. We'll identify the issue and retrain our models. If your company is a member of our ML training program, annotate the false positive in the Violations UI. Your annotated sample will be added to our ML data set and used in upcoming model re-training. If not an ML training program member, please send us anonymized samples without changing the key pattern. Reach out to support@nightfall.ai for more information about our ML training program.
Modify the detector using an exclusion rule recognizing known test or mock values. Read more about exclusion rules in How do I use Exclusion Rules?
Modify the detector using a context rule recognizing common patterns in the text surrounding the sensitive token. Read more about context rules in How do I use Context Rules?
Increase the detector's minimum confidence to “Likely” or “Very Likely”. Read more about detector confidence levels in What do different “Confidence Levels” mean?
Set up the Detection Rule’s to trigger when each entity type is found. Common Entities such as names, addresses, phone numbers, and emails are found everywhere and are generally not sensitive unless found in combination.
Set up policies appropriate to the scanned data source - i.e. integration, channel, folder, instance, etc. For example with Slack Enterprise, you can enable or disable detection in private channels, direct messages by channel ID, workspace ID, or in all private channels & DMs. Reach out to support@nightfall.ai and we will help.
Increase the Detection Rule’s minimum number of findings threshold. For example, when trying to detect phone list sharing, set you detection rule to ony trigger alerts when seeing 10 or more person names and phone numbers.
Learn how Nightfall assigns confidence levels to sensitive data classifications.
Detection results are comprised of two parts:
A data type detector that was scanned for (either a Nightfall detector or a custom regex detector). For example, a Credit Card Number.
A confidence that the scanned data matches the specific info type. For example, a “Very Likely” match.
Confidence may be any of the following values corresponding to the given intervals:
Likelihood
Probability Threshold
Interpretation
POSSIBLE (“Possible”)
40%+
“It is possible that the data matches the info type.”
LIKELY (“Likely”)
60%+
“It is likely that the data matches the info type.”
VERY_LIKELY
(“Very Likely”)
80%+
“It is very likely that the data matches the info type.”
Each info type is unique, so it is not possible to dictate that what yields a “Very Likely” match for one detector, will yield a “Very Likely” result for another. Nonetheless, we can provide some high-level guidance. Generally speaking, higher confidence detections may have certain features that increase confidence in the detection being accurate - some examples of the features include:
Formatting of token
Passes validation functions
Passes substring checks
Context clues that are indicative of that info type
You can learn more about these specific features and how detection works in How does detection work?
Minimum confidence may be specified as part of any condition in the Nightfall Detection Engine. In the Nightfall scan API and Atlassian integrations, the confidence is returned to the end-user as part of the results. In Nightfall’s Slack and GitHub integrations, the confidence is not returned with the results at this time. You can tune the accuracy of the results by adjusting the confidence thresholds in the Nightfall dashboard.
For pre-built detectors, a “Possible” confidence level is triggered by the appearance of the token, without considering context, whereas “Likely” and “Very Likely” take context into account. When a custom regex is detected, its Confidence Level is assessed as “Likely” - you may determine how the Confidence Level adjusts from there based on context.
Of course, there is a tradeoff - a lower Confidence Level may result in more noise. We highly recommend setting the Minimum Confidence of every detector to Likely or Very Likely in order to reduce noise and focus your DLP efforts on priority violations. Setting your detectors to Possible or below will lead to many more findings and is best suited for scenarios in which risk tolerance is very low, or for special / advanced use cases that involve optimizing for reducing false negatives.
When setting confidence thresholds, also consider how structured the data tends to be. For example, a Social Security Number or Credit Card Number has a very typical structure and false positives may be less likely - so you could decrease the confidence threshold in order to implement a very conservative policy. On the other hand, less structured data such as Names could result in more false positives, and thus you may want to increase the confidence threshold.
Nightfall scans for sensitive data in a broad range of file types
Nightfall supports scanning most file types, except for audio and video files. Image files are scanned using optical character recognition (OCR). Embedded files images and non-visible URLs are included in file scans. Max file size is 1 GB. Contact us at support@nightfall.ai, if you need us to raise this limit.
Supported file types:
Microsoft Office documents (Word, Excel, Powerpoint, etc.)
Google Workspace documents (Doc, Sheet, Slide, etc.)
Image file types (PNG, JPG, TIFF, GIF, etc.)
PDF files
All text-based document files (HTML, TXT, etc.)
All text-based code and config file types
All text-based data file types (CSV, TSV, JSON, XML, parquet, etc.)
Compressed file types (zip, gzip, etc.) are decompressed, and the contained files are scanned.
Unsupported file types (audio, video, and animation):
audio/midi
audio/mpeg
audio/wav
audio/x-midi
video/mp4
video/mpeg
application/photoshop
video/Quicktime
Autodesk 3d animation files (ma, fbx)
Select from an aggregated library of regexes
Yes! While Nightfall's pre-built detectors listed above are trained via machine learning, Nightfall also supports RE2 regexes and word lists for any custom detectors that may be of interest to you. Over time, we've aggregated the following Regex Library, which you're welcome to select from to save you some time.
Please note that a regular expression is an established yet limited method that searches for pre-defined patterns, so your mileage may vary.
You can test regular expressions here.
You can input custom detectors directly in the Nightfall console at app.nightfall.ai by navigating to Detectors → New Detector → Regular expression.
Learn about typical cases of Nightfall
Social security numbers (SSNs) are a common data type we scan for. You may notice 9-digit numbers in your data that resemble SSNs. If you lower your detection rule confidence to possible, these numbers will appear in your dashboard.
However, over 90% of 9-digit numbers are typically false positives that don't require action. Numbers formatted like SSNs are often used as internal IDs - for tickets, service calls, events, dispatches, transactions, etc. They can also be valid driver's licenses or bank account numbers.
With more context, our models can better determine if a 9-digit number is likely an SSN. For example, when a number appears in a sentence or tabular data with a descriptive header, our confidence in predicting it as an SSN increases to likely or very likely.
Providing contextual cues helps our models accurately identify SSNs and avoid false positives. We continue improving our algorithms to balance detection with precision.
The same thinking applies to other alpha-numeric info types like credit card numbers and passport numbers.
Please reach out if you have any further questions!
Customize finding confidence to suit your business logic
Context Rules are a way to tune detectors and increase their sensitivity by up-weighting or down-weighting detection of a sensitive finding based on its surrounding context. Contrary to Exclusion Rules, which will disqualify “tokens” or items from detection, Context Rules are a way to increase or decrease the Confidence Level in your detections if there are certain types of tokens, also known as “hot words”, that surround the sensitive token.
For example, let’s say you are detecting the Social Security Number “489-36-8350”. If you see “MyCo Test SSN is 489-36-8350” you may want to lower the Confidence Level of this alert because you know this is a test or invalid SSN. Alternatively if you see “MyCo Customer SSN: 489-36-8350” you may want to upweight this confidence level because you know that at your company SSNs are real when they are formatted in this way. In this way, Context Rules help adapt detectors to specific business context relevant to you.
Context Rules are based on a regular expression (a known pattern to match against, also known as a “regex”) that you can create. They follow RE2 syntax listed here. You can test your regular expression here.
Context Rules can be added to any detector (either pre-made by Nightfall or to your own custom regex). You can control how many characters surrounding a finding define its context, and adjust how the original confidence level is affected by this context.
Here’s how you can build your own Context Rule and attach it to a detector:
Enter a regex pattern for your context rule. If you wish the pattern to be case sensitive, check the box to the right of the input window.
Define the window to be considered as context by specifying the number of characters away to look for your regex pattern and whether to consider context before the discovered token, after it, or both.
Define the confidence adjustment for your context rule from the dropdown menu. If your context rule is triggered, the finding will automatically be adjusted to your selected confidence level.
Add your Context Rule to the detector by clicking the "Save" button in the lower right.
Please find links to our Detection Platform FAQs below:
Exclude certain tokens from detection to suit your business logic
Exclusion Rules, also known as an allowlist, help you reduce false positives in your sensitive findings by ignoring content from detection, or “allowing” it to pass through without being flagged.
For example, let’s say you are using the Email Address detector and you don’t want any corporate domains of “@example.com” to be detected by Nightfall. You can add “*@example.com” as a regex Exclusion Rule.
You can add in a list of known safe tokens as a dictionary or craft a regular expression. A dictionary is a list of literal values, for example, a list of dummy credit card numbers or API keys. Regular expressions are known patterns to match against, for example if you want to allow all emails of a given domain as in the example above. Regular expressions follow RE2 syntax listed here. You can test your regular expression here.
Exclusion Rules can be added on to any detector (either pre-made by Nightfall or off your own custom regular expression or “regex”) to omit certain “tokens” or items from resulting in detection events.
Here’s how you can build your own Exclusion Rule and attach it to a detector:
Select the type of Exclusion Rule you would like to use. It can be either a dictionary of words (which may be entered manually or uploaded from a list) or a regex pattern.
Select the match type from the dropdown menu on the right, either Partial or Full match.
Append your Exclusion Rule to your custom detector by clicking the "Save" button in the lower right.
Zoom Password Findings
Previously, Nightfall Password detector used to report Zoom passwords as sensitive data findings. However, the Nightfall Password detector now recognizes passwords found in personal Zoom links. Customers sharing these links in docs and messages will no longer be alerted for these non-sensitive findings.
The detector can now identify patterns such as:
Zoom meeting invites containing embedded PWD parameters and separate alpha-numeric passcodes.
Zoom recording URLs with associated passcodes.
Examples:
Multi-line meeting invite with embedded Zoom password and passcode
./j?pwd=VBhZM3h5SEhQTXEzQzEyYUREWkJHQT09
Meeting ID: 991 1959 8432
Passcode: 475749
Recording URL with passcode
./share/zuBQvxkbLce3PRl6eYSBXN0Uj47mE9grL_6G4J1qeFcaWCqB94hdgC5hb4B3R3SF.VxapNL9oXfGuiokO
Passcode: BH%$!.06