HubSpot DLP Tutorial

Customer support tickets are a potential vector for leaking customer PII. By utilizing HubSpot's CRM tickets API in conjunction with Nightfall AI’s scan API you can discover, classify, and remediate sensitive data within your customer support system.

You will need a few things to follow along with this tutorial:

A HubSpot account and API key
A Nightfall API key
An existing Nightfall Detection Rule
A Python 3 environment (version 3.6 or later)
Most recent version of Python Nightfall SDK

To accomplish this, we will install the version required of the Nightfall SDK:

pip install nightfall=0.6.0

We will be using Python and importing the following libraries:

import requests
import os
import json
import csv
from nightfall import Nightfall

We've configured the HubSpot and Nightfall API keys as environment variables so they don't need to be committed directly into our code.

Next, we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.

Also, we abstract a nightfall class from the SDK, from our API key.

hubspot_api_key = os.environ.get('HUBSPOT_API_KEY')
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')

detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

nightfall = Nightfall(nightfall_api_key)

Here we'll define the headers and other request parameters that we will be using later to call the Hubspot API.

hubspot_headers = {'accept': 'application/json'}
page_limit = 100
hubspot_querystring = {
  "limit":str(page_limit),
	"archived":"false",
	"hapikey":hubspot_api_key
}

hubspot_base_url = "https://api.hubapi.com/crm/v3/objects/tickets"

Let’s start by using HubSpot API to retrieve all support tickets in our account. As the HubSpot API takes a "page limit" parameter, we will query the tickets over multiple requests to the HubSpot API, checking for list completion on each call. We'll compile the tickets into a list called all_tickets.

The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later. We won't include the sensitive fragments themselves to avoid replicating PII unnecessarily, but we'll include a redacted copy with 3 characters exposed to help identify it during the review process.

hubspot_response = requests.get(
  url = hubspot_base_url, 
  headers = hubspot_headers,
  params = hubspot_querystring
)
response_dict = json.loads(hubspot_response.text)
all_tickets = []

keep_going = True
while keep_going:
  all_tickets.extend(response_dict['results'])
  if len(response_dict['results']) < page_limit:
    keep_going = False
  else:
    new_url = f"{response_dict['paging']['next']['link']}&hapikey={hubspot_api_key}"
    new_response = requests.get(url = new_url, headers = hubspot_headers)
    response_dict = json.loads(new_response.text)

'Properties' -> 'Content' is the only field where users can supply their data, so it is the only field we need to pass to the Nightfall API. We store the ticket IDs in a matching list so that we can put a location to our findings later.

all_ids = [ticket['id'] for ticket in all_tickets]
all_content = [ticket['properties']['content'] for ticket in all_tickets]

We are now ready to call the Nightfall API to scan our HubSpot tickets. This tutorial assumes that the totality of your tickets falls under the payload limit of the Nightfall API. In practice, you may want to check the size of your payload using a method like sys.getsizeof() and chunk the payload across multiple requests if appropriate.

nightfall_response = nightfall.scan_text{
  [all_content],
  detection_rule_uuids=[detectionRuleUUID]
}

findings = json.loads(nightfall_response)

Now that we have a collection of all of our tickets, we will begin constructing an all_findings object to collect our results. The first row of our all_findings object will constitute our headers since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.

all_findings = []
all_findings.append(
  [
    'ticket_id', 'detector', 'confidence', 
    'finding_start', 'finding_end', 'finding'
  ]
)

For each finding in each ticket, we collect the required information from the Nightfall API to identify and locate the sensitive data, pairing them with the HubSpot ticket IDs we set aside earlier.

for c_idx, ticket in enumerate(findings):
  for f_idx, finding in enumerate(ticket):
    row = [
      all_ids[c_idx], 
      finding['detector']['name'],
      finding['confidence'],
      finding['location']['byteRange']['start'],
      finding['location']['byteRange']['end'],
      finding['location']['codepointRange']['start'],
      finding['location']['codepointRange']['end'],
      finding['finding']
    ] 
    all_findings.append(row)

Finally, we export our results to a CSV so they can be easily reviewed.

if len(all_findings) > 1:
  with open('output_file.csv', 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter = ',')
    csv_writer.writerows(all_findings)
else:
  print('No sensitive data detected. Hooray!')

That's it! You now have insight into all of the sensitive data inside your customer support tickets. As a next step, we could utilize HubSpot's API to add a comment to tickets with sensitive findings, and then trigger an email alert for the offending ticket owner.

To scan your support tickets on an ongoing basis, you may consider persisting your last ticket query's paging value and/or checking the last modified date of your tickets.

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your HubSpot findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with Hubspot

The example above is specific for the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites

To utilize the File Scanning API you need the following:

An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security
A Nightfall Detection Policy associated with a webhook URL
A web server configured to listen for file scanning results (more information below)

Steps to use the Endpoint

Retrieve ticket data from Hubspot

Similar to the process in the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve the ticket data from Hubspot:

hubspot_headers = {'accept': 'application/json'}
page_limit = 100
hubspot_querystring = {
  "limit":str(page_limit),
	"archived":"false",
	"hapikey":hubspot_api_key
}

hubspot_base_url = "https://api.hubapi.com/crm/v3/objects/tickets"

hubspot_response = requests.get(
  url = hubspot_base_url, 
  headers = hubspot_headers,
  params = hubspot_querystring
)
response_dict = json.loads(hubspot_response.text)
all_tickets = []

keep_going = True
while keep_going:
  all_tickets.extend(response_dict['results'])
  if len(response_dict['results']) < page_limit:
    keep_going = False
  else:
    new_url = f"{response_dict['paging']['next']['link']}&hapikey={hubspot_api_key}"
    new_response = requests.get(url = new_url, headers = hubspot_headers)
    response_dict = json.loads(new_response.text)

Now we go through write the logs to a .csv file.

filename = "nf_hubspot_input-" + str(int(time.time())) + ".csv"  

for ticket in all_tickets:
  with open(filename, 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter=',')
    csv_writer.writerows(ticket['properties']['content'])
     
print("Hubspot Ticket Data Written to: ", filename)

Begin the file upload process to the Scan API, with the above written .csv file, as shown here.
Once the files have been uploaded, begin using the scan endpoint mentioned here. Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.
The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

PreviousSaaS Protection NextZendesk DLP Tutorial

Last updated 1 year ago

Was this helpful?