What file types will Nightfall scan for sensitive data? What are the limitations?
Nightfall scans for sensitive data in a broad range of file types
Nightfall supports scanning for sensitive data within files as well as plaintext. Nightfall parses text from both text-based and non-text-based file types.
Text-based files are those with electronic text, for example, text files or Word docs. Non-text files are those that do not have parsable electronic text, for example images, screenshots, scanned PDFs, etc. For these non-text-based file types, Nightfall performs machine-learning based optical character recognition (OCR) to electronically extract text from these data types for classification.
Most Nightfall integrations, such as Nightfall for Slack, support scanning a very broad range of file types up to 0.5 MB in size, except for audio and video files. Supported file types include but are not limited to:
- .txt and other text formats
- .pdf files
- .csv and .tsv files
- .json files
- .xml files
- Code files
- Microsoft office documents (i.e. Word, Excel, Powerpoint, etc)
- Common image file types (i.e. .png, .jpg, tiff, etc)
The Nightfall Developer Platform Text Scanning API supports ONLY raw text input up to 0.5 MB in size and does not support any file types. However, for the File Scanning API from the Developer Platform, a much more comprehensive list of file types is scanned, which can be found here.
Nightfall Radar for GitHub supports all text and code file types up to 0.5 MB in size.
Nightfall’s integrations naturally may be limited by the set of file types that are supported by each third-party cloud application.
An incomplete but illustrative list of unsupported mime types is: