Nightfall Documentation
  • Data Detection and Response
  • Posture Management
  • Data Exfiltration Prevention
  • Data Encryption
  • Firewall for AI
  • Data Classification and Discovery
  • Welcome
  • Introduction to Firewall for AI
    • Overview
    • Quickstart
    • Use Cases
    • Authentication and Security
  • Key Concepts
    • Entities and Terms to Know
    • Setting Up Nightfall
      • Creating API Key
      • Creating Detectors
      • Creating Detection Rules
      • Creating Policies
    • Alerting
    • Scanning Text
    • Scanning Files
      • Supported File Types
      • File Scanning and Webhooks
      • Uploading and Scanning API Calls
      • Special File Types
      • Specialized File Detectors
      • Webhooks and Asynchronous Notifications
        • Accessing Your Webhook Signing Key
        • Creating a Webhook Server
    • Scanning Features
      • Using Pre-Configured Detection Rules
        • Scanning Images for patterns using Custom Regex Detectors
      • Creating an Inline Detection Rule
      • Using Exclusion Rules
      • Using Context Rules
      • Using Redaction
      • Using Policies to Send Alerts
      • Detecting Secrets
      • PHI Detection Rules
    • Detector Glossary
    • Test Datasets
    • Errors
    • Nightfall Playground
  • Nightfall APIs
    • DLP APIs - Firewall for AI Platform
      • Rate Limits for Firewall APIs
    • DLP APIs - Native SaaS Apps
      • Policy User Scope Update API
      • Rate Limits for Native SaaS app APIs
  • Exfiltration Prevention APIs
    • Default
    • Models
  • Posture Management APIs
    • Default
    • Models
  • Nightfall Software Development Kit (SDK)
    • Overview
    • Java SDK
    • Python SDK
    • Go SDK
    • Node.JS SDK
  • Language Specific Guides
    • Overview
    • Python
    • Ruby
    • Java
  • Tutorials
    • GenAI Protection
      • OpenAI Prompt Sanitization Tutorial
      • Anthropic Prompt Sanitization Tutorial
      • LangChain Prompt Sanitization Tutorial
    • SaaS Protection
      • HubSpot DLP Tutorial
      • Zendesk DLP Tutorial
    • Observability Protection
      • Datadog DLP Tutorial
      • New Relic DLP Tutorial
    • Datastore Protection
      • Airtable DLP Tutorial
      • Amazon Kinesis DLP Tutorial
      • Amazon RDS DLP Tutorial
      • Amazon RDS DLP Tutorial - Full Scan
      • Amazon S3 DLP Tutorial
      • Elasticsearch DLP Tutorial
      • Snowflake DLP Tutorial
  • Nightfall Use Cases
    • Overview
    • GenAI Content Filtering-How to prevent exposure of sensitive data
    • Redacting Sensitive Data in 4 Lines of Code
    • Detecting Sensitive Data in SMS Automations
    • Building Endpoint DLP to Detect PII on Your Machine in Real-Time
    • Deploy a File Scanner for Sensitive Data in 40 Lines of Code
    • Using Scan API (with Python)
  • FAQs
    • What Can I do with the Firewall for AI
    • How quickly can I get started with Firewall for AI?
    • What types of data can I scan with API?
    • What types of detectors are supported out of the box?
    • Can I customize or bring my own detectors?
    • What is the pricing model?
    • How do I know my data is secure?
    • How do I get in touch with you?
    • Can I test out the detection and my own detection rules before writing any code?
    • How does Nightfall support custom data types?
    • How does Nightfall's Firewall for AI differs from other solutions?
  • Nightfall Playground
  • Login to Nightfall
  • Contact Us
Powered by GitBook
On this page
  • Prerequisites
  • Installation
  • Implementation
  • Using the File Scanning Endpoint with S3
  • Prerequisites for File Scanning
  • File Scan Implementation

Was this helpful?

Export as PDF
  1. Tutorials
  2. Datastore Protection

Amazon S3 DLP Tutorial

PreviousAmazon RDS DLP Tutorial - Full ScanNextElasticsearch DLP Tutorial

Last updated 10 months ago

Was this helpful?

AWS S3 is a popular tool for storing your data in the cloud, however, it also has huge potential for . By utilizing AWS SDKs in conjunction with Nightfall’s Scan API, you can discover, classify, and remediate sensitive data within your S3 buckets.

Prerequisites

You will need the following for this tutorial:

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment

  • most recent version of the

We will use as our AWS client in this demo. If you are using another language, check for AWS's recommended SDKs.

Installation

To install boto3 and the Nightfall SDK, run the following command.

pip install boto3
pip install nightfall=1.2.0

Implementation

In addition to boto3, we will be utilizing the following Python libraries to interact with the Nightfall SDK and to process the data.

import boto3
import requests
import json
import csv
import os
from nightfall import Nightfall

We've configured our AWS credentials, as well as our Nightfall API key, as environment variables so they don't need to be committed directly into our code.

aws_session_token = os.environ.get('AWS_SESSION_TOKEN')
aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')

nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID. Also, we extract our API Key, and abstract a nightfall class from the SDK, for it.

detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])

Now we create an iterable of scannable objects in our target S3 buckets, and specify a maximum file size to pass to the Nightfall API (500 KB). In practice, you could add additional code to chunk larger files across multiple API requests.

We will also create an all_findings object to store Nightfall Scan results. The first row of our all_findings object will constitute our headers, since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we recommend using the Redaction feature of the Nightfall API to mask your data.

objects_to_scan = []
size_limit = 475000

all_findings = []
all_findings.append(
  [
    'bucket', 'object', 'detector', 'confidence', 
    'finding_byte_start', 'finding_byte_end',
    'finding_codepoint_start', 'finding_codepoint_end', 'fragment'
  ]
)

We will now initialize our AWS S3 Session. Once the session is established, we get a handle for the S3 resource.

my_session = boto3.session.Session(
  aws_session_token = aws_session_token,
  aws_access_key_id = aws_access_key_id,
  aws_secret_access_key = aws_secret_access_key
)

s3 = my_session.resource('s3')

Now we go through each bucket and retrieve the scannable objects, adding their text contents to objects_to_scan as we go.

In this tutorial, we assume that all files are text-readable. In practice, you may wish to filter out un-scannable file types such as images with the object.get()['ContentType'] property.

for bucket in s3.buckets.all():
  for obj in bucket.objects.all():
    temp_object = obj.get()
    size = temp_object['ContentLength']

    if size < size_limit:
      objects_to_scan.append((obj, temp_object['Body'].read().decode()))

For each object content we find in our S3 buckets, we send it as a payload to the Nightfall Scan API with our previously configured detectors.

request-responseOn receiving the request-response, we break down each returned finding and assign it a new row in the CSV we are constructing.

In this tutorial, we scope each object to be scanned with its API request. At the cost of granularity, you may combine multiple smaller files into a single call to the Nightfall API.

for obj, data in objects_to_scan:
    findings, redactions = nightfall.scan_text(
        [data],
        detection_rule_uuids=[detectionRuleUUID]
    )

    for finding in findings[0]:
        row = [
            obj.bucket_name,
            obj.key,
            finding.detector_name,
            finding.confidence.value,
            finding.byte_range.start,
            finding.byte_range.end,
            finding.codepoint_range.start,
            finding.codepoint_range.end,
            finding.finding,
        ]
        all_findings.append(row)

Now that we have finished scanning our S3 buckets and collated the results, we are ready to export them to a CSV file for further review.

if len(all_findings) > 1:
    with open('output_file.csv', 'w') as output_file:
        csv_writer = csv.writer(output_file, delimiter = ',')
        csv_writer.writerows(all_findings)
else:
      print('No sensitive data detected. Hooray!')

That's it! You now have insight into all of the sensitive data inside your data stored inside your organization's AWS S3 buckets.

As a next step, you could attempt to delete or redact your files in which sensitive data has been found by further utilizing boto3.

Using the File Scanning Endpoint with S3

The example above is specific to the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites for File Scanning

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

File Scan Implementation

Retrieve a File List

The first step is to get a list of files in your S3 buckets/objects

Similar to the process at the beginning of this tutorial for the text scanning endpoint, we will now initialize our AWS S3 Session. Once the session is established, we get a handle for the S3 resource.

my_session = boto3.session.Session(
  aws_session_token = aws_session_token,
  aws_access_key_id = aws_access_key_id,
  aws_secret_access_key = aws_secret_access_key
)

s3 = my_session.resource('s3')

Now we go through each bucket and retrieve the scannable objects.

for bucket in s3.buckets.all():
  for obj in b.objects.all():
    # here we can call the file-scanning endpoints

For each object content we find in our S3 buckets, we send it as an argument to the Nightfall File Scan API with our previously configured detectors.

Iterate through a list of files and begin the file upload process.

A webhook server is required for the scan endpoint to submit its results. See our example webhook server.

The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Once the files have been uploaded, begin using the .

unintentionally leaking sensitive data
AWS credentials
Nightfall Python SDK
boto3
this page
scan endpoint