Nightfall Documentation
  • Data Detection and Response
  • Posture Management
  • Data Exfiltration Prevention
  • Data Encryption
  • Firewall for AI
  • Data Classification and Discovery
  • Welcome
  • Introduction to Firewall for AI
    • Overview
    • Quickstart
    • Use Cases
    • Authentication and Security
  • Key Concepts
    • Entities and Terms to Know
    • Setting Up Nightfall
      • Creating API Key
      • Creating Detectors
      • Creating Detection Rules
      • Creating Policies
    • Alerting
    • Scanning Text
    • Scanning Files
      • Supported File Types
      • File Scanning and Webhooks
      • Uploading and Scanning API Calls
      • Special File Types
      • Specialized File Detectors
      • Webhooks and Asynchronous Notifications
        • Accessing Your Webhook Signing Key
        • Creating a Webhook Server
    • Scanning Features
      • Using Pre-Configured Detection Rules
        • Scanning Images for patterns using Custom Regex Detectors
      • Creating an Inline Detection Rule
      • Using Exclusion Rules
      • Using Context Rules
      • Using Redaction
      • Using Policies to Send Alerts
      • Detecting Secrets
      • PHI Detection Rules
    • Detector Glossary
    • Test Datasets
    • Errors
    • Nightfall Playground
  • Nightfall APIs
    • DLP APIs - Firewall for AI Platform
      • Rate Limits for Firewall APIs
    • DLP APIs - Native SaaS Apps
      • Policy User Scope Update API
      • Rate Limits for Native SaaS app APIs
  • Exfiltration Prevention APIs
    • Default
    • Models
  • Posture Management APIs
    • Default
    • Models
  • Nightfall Software Development Kit (SDK)
    • Overview
    • Java SDK
    • Python SDK
    • Go SDK
    • Node.JS SDK
  • Language Specific Guides
    • Overview
    • Python
    • Ruby
    • Java
  • Tutorials
    • GenAI Protection
      • OpenAI Prompt Sanitization Tutorial
      • Anthropic Prompt Sanitization Tutorial
      • LangChain Prompt Sanitization Tutorial
    • SaaS Protection
      • HubSpot DLP Tutorial
      • Zendesk DLP Tutorial
    • Observability Protection
      • Datadog DLP Tutorial
      • New Relic DLP Tutorial
    • Datastore Protection
      • Airtable DLP Tutorial
      • Amazon Kinesis DLP Tutorial
      • Amazon RDS DLP Tutorial
      • Amazon RDS DLP Tutorial - Full Scan
      • Amazon S3 DLP Tutorial
      • Elasticsearch DLP Tutorial
      • Snowflake DLP Tutorial
  • Nightfall Use Cases
    • Overview
    • GenAI Content Filtering-How to prevent exposure of sensitive data
    • Redacting Sensitive Data in 4 Lines of Code
    • Detecting Sensitive Data in SMS Automations
    • Building Endpoint DLP to Detect PII on Your Machine in Real-Time
    • Deploy a File Scanner for Sensitive Data in 40 Lines of Code
    • Using Scan API (with Python)
  • FAQs
    • What Can I do with the Firewall for AI
    • How quickly can I get started with Firewall for AI?
    • What types of data can I scan with API?
    • What types of detectors are supported out of the box?
    • Can I customize or bring my own detectors?
    • What is the pricing model?
    • How do I know my data is secure?
    • How do I get in touch with you?
    • Can I test out the detection and my own detection rules before writing any code?
    • How does Nightfall support custom data types?
    • How does Nightfall's Firewall for AI differs from other solutions?
  • Nightfall Playground
  • Login to Nightfall
  • Contact Us
Powered by GitBook
On this page
  • Using Redaction to Mask Findings
  • Using the File Scanning Endpoint with Elasticsearch
  • Prerequisites:
  • Steps to use the Endpoint

Was this helpful?

Export as PDF
  1. Tutorials
  2. Datastore Protection

Elasticsearch DLP Tutorial

Elasticsearch is a popular tool for storing, searching, and analyzing all kinds of structured and unstructured data, especially as a part of the larger ELK stack. However, along with all data storage tools, there is huge potential for unintentionally leaking sensitive data. By utilizing Elastic's own REST APIs in conjunction with Nightfall AI’s Scan API, you can discover, classify, and remediate sensitive data within your Elastic stack.

You can follow along with your own instance or spin up a sample instance with the commands listed below. By default, you will be able to download and interact with sample datasets from the elk instance at localhost:5601. Your data can be queried from localhost:9200. The "Add sample data" function can be found underneath the Observability section on the Home page; in this tutorial we reference the "Sample Web Logs" dataset..

docker pull sebp/elk

docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk

You will need a few things to follow along with this tutorial:

  • An Elasticsearch instance with data to query

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment (version 3.7 or later)

  • Python Nightfall SDK

We will need to install the nightfall sdk library and the requests library using pip.

pip install nightfall=1.2.0
pip install requests

We will be using Python and importing the following libraries:

import requests
import json
import os
import csv
from nightfall import Nightfall

We first configure the URLs to communicate with. If you are following along with the Sample Web Logs dataset alluded to at the beginning of this article, you can copy this Elasticsearch URL. If not, your URL will probably take the format http://<hostname>/<index_name>/_search.

elasticsearch_base_url = 'http://localhost:9200/kibana_sample_data_logs/_search'

Also, we abstract a nightfall class from the SDK, from our API key.

nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')

detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

nightfall = Nightfall(nightfall_api_key)

We now construct the payload and headers for our call to Elasticsearch. The payload represents whichever subset of data you wish to query. In this example, we are querying all results from the previous hour.

We then make our call to the Elasticsearch data store and save the resulting response.

elasticsearch_payload = {"query": {"range": {"timestamp": {"gte" : "now-1h"}}}}

elasticsearch_headers = {
  'Content-Type': 'application/json'
}

elasticsearch_response = requests.get(
  url = elasticsearch_base_url,
  headers = elasticsearch_headers,
  params = elasticsearch_payload
)

logs = elasticsearch_response.json()['hits']['hits']

Now we send our Elasticsearch query results to the Nightfall SDK for scanning.

findings, redactions = nightfall.scan_text(
    [json.dumps(l) for l in logs],
    detection_rule_uuids=[detectionRuleUUID]
)

We will create an all_findings object to store Nightfall Scan results. The first row of our all_findings object will constitute our headers, since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.

all_findings = []
all_findings.append(
    [
        'index_name', 'log_id', 'detector', 'confidence', 
        'finding_start', 'finding_end', 'finding'
    ]
)

Next we go through our findings from the Nightfall Scan API and match them to the identifying fields from the Elasticsearch index so we can find them and remediate them in situ.

Finding locations here represent the location within the log as a string. Finding locations can also be found in byteRange.

# Each top level item in findings corresponds to one log
for log_idx, log_findings in enumerate(findings):
    for finding_idx, finding in enumerate(log_findings):

        index_name = logs[log_idx]['_index']
        log_id = logs[log_idx]['_id']

        row = [
            index_name,
            log_id,
            finding.detector_name,
            finding.confidence.value,
            finding.byte_range.start,
            finding.byte_range.end,
            finding.finding,
        ]
        all_findings.append(row)

Finally, we export our results to a csv so they can be easily reviewed.

if len(all_findings) > 1:
    with open('output_file.csv', 'w') as output_file:
        csv_writer = csv.writer(output_file, delimiter = ',')
        csv_writer.writerows(all_findings)
else:
    print('No sensitive data detected. Hooray!')

That's it! You now have insight into all sensitive data shared inside your Elasticsearch instance within the past hour.

However, in use-cases such as this where the data is well-structured, it can be more informative to call out which fiels are found to contain sensitive data, as opposed to the location of the data. While the above script is easy to implement without modifying the queried data, it does not provide insight into these fields.

Using Redaction to Mask Findings

the Nightfall API, you are also able to redact and mask your Elasticsearch findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with Elasticsearch

The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites:

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the Endpoint

  1. Retrieve data from Elasticsearch

Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the data we like, from Elasticsearch:

elasticsearch_base_url = 'http://localhost:9200/kibana_sample_data_logs/_search'

elasticsearch_payload = {"query": {"range": {"timestamp": {"gte" : "now-1h"}}}}

elasticsearch_headers = {
    'Content-Type': 'application/json'
}

elasticsearch_response = requests.get(
    url = elasticsearch_base_url,
    headers = elasticsearch_headers,
    params = elasticsearch_payload
)

logs = elasticsearch_response.json()['hits']['hits']

Now we go through write the logs to a .csv file.

filename = "nf_elasticsearch_input-" + str(int(time.time())) + ".csv"  

with open(filename, 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerows([json.dumps(l) for l in logs])
     
print("Elasticsearch Data Written to: ", filename)
  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

PreviousAmazon S3 DLP TutorialNextSnowflake DLP Tutorial

Last updated 10 months ago

Was this helpful?

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be and referenced by UUID.

Once the files have been uploaded, begin using the scan endpoint mentioned . Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

pre-made in the Nightfall web app
here