How to run a full scan of an Amazon database
To scan an Amazon database instance (i.e. mySQL, Postgres) you must create a snapshot of that instance and export the snapshot to S3.
The export process runs in the background and doesn't affect the performance of your active DB instance. Exporting RDS snapshots can take a while depending on your database type and size
Once the snapshot has been exported you will be able to scan the resulting parquet files with Nightfall like another file. You can do this using our endpoints for uploading files or using our Amazon S3 Python integration.
In addition to having created your RDS instance, you will need to define the following to export your snapshots so they can later be scanned by Nightfall:
To perform this scan, you will need to configure an Amazon S3 bucket to which you will export a snapshot.
đS3 Bucket RequirementsThis bucket must have snapshot permissions and the bucket to export must be in the same AWS Region as the the snapshot being exported.
If you have not already created a designated S3 bucket, in the AWS console select Services > Storage > S3
Click the "Create bucket" button and give your bucket a unique name as per the instructions.
For more information please see Amazon's documentation on identifying an Amazon S3 bucket for export.
You need an Identity and Access Management (IAM) Role to perform the transfer for a snapshot to your S3 bucket.
This role may be defined at the time of backup and it will be given the proper specific permissions.
You may also create the role under Services > Security, Identity, & Compliance > IAM and select âRolesâ from under the âAccess Managementâ section of the left-hand navigation.
From there you can click the âCreate roleâ button and create a role where âAWS Serviceâ is the trusted entity type.
For more information see Identity and Access Management in Amazon RDS and Providing access to an Amazon S3 bucket using an IAM role
You must create a symmetric encryption AWS Key using the Key Management Service (KMS).
From your AWS console, select the Services > Security, Identity, & Compliance > Key Management Service from the adjacent submenu.
From there you can click the âCreate keyâ button and follow the instructions.
To do this task manually, go to Amazon RDS Service (Services > Database > RDS) and select the database to export from your list of databases.
Select the âMaintenance & backupsâ tab. Go to the âSnapshotsâ section.
You can select an existing automated snapshot or manually create a new snapshot with the âTake snapshotâ button
Once the snapshot is complete, click the snapshotâs name.
From the âActionsâ menu in the upper right select âExport to Amazon S3"
Enter a unique export identifier
Choose whether you want to export all or part of your data (You will be exporting to Parquet)
Choose the S3 bucket
Choose or create your designated IAM role for backup
Choose your AWS KMS Key
Click the Export button
Once the Status column of export is "Complete", you can click the link to the export under the S3 bucket column.
Within the export in the S3 bucket, you will find a series of folders corresponding to the different database entities that were exported.
Exported data for specific tables is stored in the format base_prefix/files, where the base prefix is the following:
export_identifier/database_name/schema_name.table_name/
For example:
export-1234567890123-459/rdststdb/rdststdb.DataInsert_7ADB5D19965123A2/
The current convention for file naming is as follows:
partition_index/part-00000-random_uuid.format-based_extension
For example:
You may download these parquet files and upload them to Nightfall to scan as you would any other parquet file.
đObtaining file sizeYou can obtain the value for
fileSizeBytes
you can run the commandwc -c
In the above sequence of curl invocations, we upload the file and then initiate the file scan with a policy that uses pre-configured detection rule as well as an alertConfig that send the results to an email address.
Note that results you receive in this case will be an attachment with a JSON payload as follows:
The findings themselves will be available at the URL specified in findingsURL
until the date-time stamp contained in the validUntil
property.
When parquet files are analyzed, as with other tabular data, not only will the the location of the finding be shown within a given byte range, but also column and row data as well.
Below is a SQL script small table of generated data containing example personal data, including phone numbers and email addresses.
Below is an example finding when a scan is done of the resulting parquet exported to S3 where the Detection Rule use Nightfall's built in Detectors for matching phone numbers and emails. In this example shows a match in the 1st row and and 4th column. This is what we would expect based on our table structure.
similarly, it also finds phone numbers in the 3rd column.
You may also use our tutorial for Integrating with Amazon S3 (Python) to scan through the S3 objects.
For more information please see the Amazon documentation Exporting DB snapshot data to Amazon S3