Evaluating Detection
In order to evaluate the success of your cloud DLP detection, you must first have clarity around your organization’s goals. This will be highly dependent on your organization’s unique circumstances and needs.
Here are some guiding questions:
- What level of risk are you comfortable with? Would you like to see all of the possible occurrences of a given sensitive data type, or would you like to alert on only the most risky findings?
- What sensitive data would you like to detect and which are the highest priorities for each SaaS application?
- How much operational load are you willing to take on? The more granular your approach, the more human discretion will be required in the filtering and remediation steps.
- What are the requirements of any compliance regimes you are accountable to?
- What user strategy would you like to implement?
- Some potential answers:
- Minimal impact on user experience, only alerting and monitoring to security staff?
- Educating and empowering users about DLP best practices?
- Enforcing DLP through strict controls and automation?
Once you have a sense of your goals, we recommend that you optimize your detection during a phased, iterative implementation - start small, and broaden gradually. Typically, the biggest pitfall customers want to avoid during implementation is to get flooded by too many low priority alerts that become unmanageable - which is typically caused by taking too broad of an approach for detection. By taking the time to optimize your detection gradually during the first few weeks, your team will be set up for ongoing success.
We recommend that you select just a few detectors to start out with, in order to optimize detection of the most critical information types. If possible, start with detectors that will naturally have very high accuracy due to the structured nature of the information (e.g. Credit Card Number, US Social Security Number).
Once a few detectors are in place, test some sample data and monitor the results. We suggest you evaluate metrics such as false positive and false negative rate, and assess whether the volume of alerts / results is manageable by your team. You may adjust parameters such as Minimum Confidence and Minimum Number of Findings to optimize the level of alerts.
Then, continue to iterate and broaden your detection by adding and optimizing a few more detectors. Focus on optimizing the level of alerts with the accuracy you need, and slowly relax or broaden the rules over time.
Many organizations choose to optimize detection using their own sample data. However, Nightfall does offer sample data in our Developer Playground that you may leverage. Please speak with your account manager if you need help accessing this sample data.
It’s important to understand that the rate of true positive data “in the wild” tends to be significantly lower than true positive data in a test environment. That’s because in a test environment, many occurrences of sensitive information will be intentionally planted.
As a result, test environments are typically most useful for optimizing sensitivity, or the true positive rate - detection is optimized toward identifying sensitive information as sensitive. While this helps ensure that occurrences of sensitive information are, in fact, correctly flagged, it also means detection is somewhat skewed toward flagging information as sensitive, thus increasing the potential for false positives (see above on the tradeoff between sensitivity and specificity).
What this means for you is that if you fine-tune a Detection Rule in a test environment, you may over-index toward sensitivity rather than specificity, which can result in a higher degree of false positives in the wild. As a result, we highly recommend evaluating Detection Rules in a production environment rather than a test environment, when possible.
There are a handful of evaluation scenarios we commonly observe that can generate false negatives (misses), especially in a test or sandbox environment. We would recommend avoiding these test scenarios or overfitting for them, as they are not typically representative of production environments.
False Negative Scenario | Reason Behind False Negative | Solution |
Inputting a broad range of data types, including uncommon or noisy data types. Not all of these data types are getting flagged. | Your detection rule may be tightly defined. More specifically, your detection rule may not have certain detectors enabled, minimum confidence may be high to reduce the chance of noise in production, or minimum number of findings may be greater than 1. |
|
Inputting invalid test data, which is not getting flagged. | Nightfall validates data where possible to reduce noise. Preventing test data from getting flagged is a feature, not a bug, and leads to better accuracy. For example, a random 16-digit number is unlikely to meet the validation requirements of a valid Credit Card Number, so it will not get flagged. | Use valid test data wherever possible. Nightfall can provide valid sample test data and has a sample data generator in the Nightfall Playground. There are generators and validators available online as well. Lowering the minimum confidence of a detector can lower its validation and formatting requirements, however this can also introduce more noise in production. |
Inputting the same file across various file types. Some of the files are getting flagged, but not all. | File types are parsed differently in order to cater to their specific nuances. Thus, the same content saved as different file types may not necessarily parse in the exact same way. If two file types parse differently, their detections may be different as a result. | Focus on file types that are common in your work environment and common for the sensitive data types in question. |
For the reasons mentioned above, optimizing for reducing false negatives (misses) in a test environment has the potential to lead to a higher degree of false positives in production. Thus, we recommend running your detection rules in production to get a comparative baseline on detection quality before investing time in over-optimizing detection in a test environment.