Zendesk DLP Tutorial

Customer support tickets are a potential vector for leaking customer PII. By utilizing ZenDesk’s API in conjunction with Nightfall’s scan SDK you can discover, classify, and remediate sensitive data within your customer support system.

You will need a few things to follow along with this tutorial:

  • A ZenDesk account and API key

  • A Nightfall API key

  • An existing Nightfall Detection Rule

  • A Python 3 environment

  • most recent version of the Nightfall Python SDK

To accomplish this, we will install the version required of the Nightfall SDK:

pip install nightfall=0.6.0

We will be using Python and importing the following libraries:

import requests
import os
import json
import csv
from nightfall import Nightfall

We've configured the ZenDesk user and API key, as well as the Nightfall API key as environment variables so they don't need to be committed directly into our code.

zendesk_user = os.environ.get('ZENDESK_USER')
zendesk_api_key = os.environ.get('ZENDESK_API_KEY')
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')

Here we'll define the headers and other request parameters that we will be using later to call both APIs. Next we extract our API Key, and abstract a nightfall class from the SDK, for it.

zendesk_auth = (f"{zendesk_user}/token",zendesk_api_key)

zendesk_base_url = 'https://YOUR_ORG_HERE.zendesk.com/api/v2/'

nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])

Next we define the Detection Rule with which we wish to scan our data. The Detection Rule can be pre-made in the Nightfall web app and referenced by UUID.

detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

Let’s start by using ZenDesk’s API to retrieve all support tickets in our account. We'll set up an "all_findings" object to compile our findings as we go.

The first row of our all_findings object will constitute our headers, since we will dump this object to a CSV file later.

This example will include the full finding below. As the finding might be a piece of sensitive data, we would recommend using the Redaction feature of the Nightfall API to mask your data. More information can be seen in the 'Using Redaction to Mask Findings' section below.

zendesk_response = requests.get(
                            url = f"{zendesk_base_url}tickets.json", 
                            auth = zendesk_auth
                            )
tickets = json.loads(zendesk_response.text)['tickets']

all_findings = []
all_findings.append(
  [
    'ticket_id', 'comment_id', 'detector', 'confidence', 
    'finding_start', 'finding_end', 'finding'
  ]
)

Now that we have a collection of all of our tickets, we will retrieve the set of user comments made on each of those tickets.

Note: If you are scanning a high volume of tickets, you may run into either the ZenDesk API's rate limits, or the Nightfall API's rate limits. In this tutorial, we assume that you fall under these limits, but additional code may be required to ensure this.

for ticket in tickets:
    ticket_id = ticket['id']

    ticket_response = requests.get(
      url = f"{zendesk_base_url}tickets/{ticket_id}/comments.json",
      auth = zendesk_auth
    )
    
    comments = json.loads(ticket_response.text)['comments']

Within the above for loop, we compile all of the comment bodies into a list so that we can scan the entire comment thread for a ticket with a single call to the Nightfall SDK.

comment_bodies = [comment['body'] for comment in comments]  
  
  nightfall_response = nightfall.scanText(
        [comment_bodies],
        detection_rule_uuids=[detectionRuleUUID]
    )

  findings = json.loads(nightfall_response)

For each set of results we receive, we can start to compile our findings into a csv format.

for c_idx, comment in enumerate(findings):
    for f_idx, finding in enumerate(comment):
      row = [
        ticket_id, 
        comments[c_idx]['id'], 
        finding['detector']['name'],
        finding['confidence'],
        finding['location']['byteRange']['start'],
        finding['location']['byteRange']['end'],
        finding['location']['codepointRange']['start'],
        finding['location']['codepointRange']['end'],
        finding['finding']
      ] 
      all_findings.append(row)

Finally, we export our results to a csv so they can be easily reviewed.

if len(all_findings) > 1:
  with open('output_file.csv', 'w') as output_file:
    csv_writer = csv.writer(output_file, delimiter = ',')
    csv_writer.writerows(all_findings)
else:
  print('No sensitive data detected. Hooray!')

That's it! You now have insight into all of the sensitive data inside your customer support tickets. As a next step, we could use these findings as an input to ZenDesk's redact API in order to clean up the original comments. We could also use ZenDesk's API to add a comment to tickets with sensitive findings triggering an email alert for the offending ticket owner.

# PUT /api/v2/tickets/{ticket_id}/comments/{comment_id}/redact.json
# -d '{"text": "987-65-4320"}'

To scan your support tickets on an ongoing basis, you may consider taking advantage of ZenDesk's Incremental Exports functionality.

Putting everything together:

import requests
import os
import json
import csv
from nightfall.api import Nightfall


zendesk_base_url = 'https://YOUR_ORG_HERE.zendesk.com/api/v2/'
nightfall = Nightfall(os.environ['NIGHTFALL_API_KEY'])

# All credentials are stored as environment variables
zendesk_user = os.environ.get('ZENDESK_USER')
zendesk_api_key = os.environ.get('ZENDESK_API_KEY')
nightfall_api_key = os.environ.get('NIGHTFALL_API_KEY')


if __name__ == '__main__':

    # Set up the headers we need to call the Zendesk API
    zendesk_auth = (f"{zendesk_user}/token", zendesk_api_key)

    # Set up the detectors to scan for in our tickets
    detectionRuleUUID = os.environ.get('DETECTION_RULE_UUID')

    # Start retreiving our support tickets from ZenDesk

    zendesk_response = requests.get(
                            url = f"{zendesk_base_url}tickets.json", 
                            auth = zendesk_auth
                            )
    tickets = json.loads(zendesk_response.text)['tickets']

    all_findings = []
    all_findings.append(
        [
        'ticket_id', 'comment_id', 'detector', 'confidence', 
        'finding_start', 'finding_end', 'finding'
        ]
        )

    # Note this code assumes you will not run into the ZenDesk or Nightfall 
    # API rate limits. Additional code is required to ensure this
    for ticket in tickets:
        ticket_id = ticket['id']
        
        ticket_response = requests.get(
                                url = f"{zendesk_base_url}tickets/{ticket_id}/comments.json",
                                auth = zendesk_auth
                                )
        comments = json.loads(ticket_response.text)['comments']

        # To correlate across API calls, we will aggregate all of the comments in 
        # a single support ticket for one call to the Nightfall API
        comment_bodies = [comment['body'] for comment in comments]

        # Here we assume that the comment_bodies object is smaller than the maximum
        # payload size allowed by the Nightfall API. You may wish to chunk the comments
        # into multiple, separate requests if they are too large.
                
        nightfall_response = nightfall.scanText(
        [comment_bodies],
        detection_rule_uuids=[detectionRuleUUID])


        findings = json.loads(nightfall_response)

        for c_idx, comment in enumerate(findings):
            for f_idx, finding in enumerate(comment):
                row = [
                    ticket_id, 
                    comments[c_idx]['id'], 
                    finding['detector']['name'],
                    finding['confidence'],
                    finding['location']['byteRange']['start'],
                    finding['location']['byteRange']['end'],
                    finding['location']['codepointRange']['start'],
                    finding['location']['codepointRange']['end'],
                    finding['finding']
                    ] 
                all_findings.append(row)
    
    if len(all_findings) > 1:
        with open('output_file.csv', 'w') as output_file:
            csv_writer = csv.writer(output_file, delimiter = ',')
            csv_writer.writerows(all_findings)
    else:
        print('No sensitive data detected. Hooray!')

That's it! You should now be set up to start using the Zendesk integration for the Nightfall Text Scanning SDK.

Using Redaction to Mask Findings

With the Nightfall API, you are also able to redact and mask your Zendesk ticket findings. You can add a Redaction Config, as part of your Detection Rule. For more information on how to use redaction, and its specific options, please refer to the guide here.

Using the File Scanning Endpoint with Zendesk

The example above is specific for the Nightfall Text Scanning API. To scan files, we can use a similar process as we did the text scanning endpoint. The process is broken down in the sections below, as the file scanning process is more intensive.

Prerequisites

To utilize the File Scanning API you need the following:

  • An active API Key authorized for file scanning passed via the header Authorization: Bearer — see Authentication and Security

  • A Nightfall Detection Policy associated with a webhook URL

  • A web server configured to listen for file scanning results (more information below)

Steps to use the endpoint

  1. Retrieve ticket data from Zendesk

Similar to the process at the beginning of this tutorial for the text scanning endpoint, we will now initialize our and retrieve ticket data from Zendesk.

# This will return the most recent 100 logs from Datadog.

zendesk_auth = (f"{zendesk_user}/token",zendesk_api_key)

zendesk_base_url = 'https://YOUR_ORG_HERE.zendesk.com/api/v2/'

zendesk_response = requests.get(
                            url = f"{zendesk_base_url}tickets.json", 
                            auth = zendesk_auth
                            )
tickets = json.loads(zendesk_response.text)['tickets']

Now we go through write the ticket data to a .csv file.

filename = "nf_zendesk_input-" + str(int(time.time())) + ".csv"  

with open(filename, 'w') as output_file:
  csv_writer = csv.writer(output_file, delimiter=',')
  csv_writer.writerows(tickets)
     
print("Zendesk Ticket Data Written to: ", filename)
  1. Begin the file upload process to the Scan API, with the above written .csv file, as shown here.

  2. Once the files have been uploaded, begin using the scan endpoint mentioned here. Note: As can be seen in the documentation, a webhook server is required for the scan endpoint, to which it will send the scanning results. An example webhook server setup can be seen here.

  3. The scanning endpoint will work asynchronously for the files uploaded, so you can monitor the webhook server to see the API responses and file scan findings as they come in.

Last updated