Creating an AI Agent Security Policy

AI Agent Security policies are configured as exfiltration policies in Nightfall. This guide walks through each step of the policy creation wizard.


Getting Started

  1. Navigate to Configuration > Policies > Exfiltration.

  2. Click + New Policy.

  3. Select AI Agent Security as the integration type.


Step 1: Choose Hook Types

Enable one or more hook types. Each can be independently toggled:

Hook Type

Can Block

What It Scans

User Prompts

Yes

Prompt text before it reaches the AI model

Tool Calls

Yes

Tool name and input parameters before execution

Tool Responses

No

Tool output after execution (monitor only)

Model Responses

No

Model response after execution (monitor only)

Shell Commands

Yes

Shell command string before execution


Step 2: MCP Server Scope

Defines which MCP servers this policy is evaluated against. This scope also applies to tool responses (data coming back from the server, not just outbound calls). What happens when a match occurs - block, alert, etc. - is configured separately under Remediation Actions.

  • All MCP servers - the policy applies to every connected MCP server.

  • Specific MCP servers - the policy applies only to a chosen list of servers.

  • All except these MCP servers - the policy applies to every server except a chosen list of excluded servers.

When you select “Specific MCP servers” or "All except these MCP servers," a drop-down picker appears:

1: MCP Server Collections

Select one or more named server collections. All servers across selected collections are combined.

  • There are pre-defined collections organized by category:

  • Code Hosting

  • Databases

  • Communication

  • Cloud Infrastructure

  • Observability

  • Project Management

  • File System

You can navigate to Collections list page under AI Governance > Collections and manually add a new MCP server, tool calls for a server. Select individual servers and optionally limit to specific tools within each server. Tool inventory will be captured and will be available in the Collections list page via the Add server and Add tools button. There is no blanket collection which will have all the servers and tools discovered.

For example, you could allow the GitHub MCP server but only for read operations. To do so, specify this in the MCP server collection and configure an appropriate policy.

2: Wildcard Patterns

How servers are identified

Nightfall identifies the MCP server from the tool name reported by each AI client. Because clients format these names differently, the server is not always identifiable. The table below uses the fetch tool on a server named github as an example.

AI client

Tool name format

Example

Server identified?

Claude Code

mcp__<server>__<tool>

mcp__github__fetch

Yes

GitHub Copilot

mcp_<server>_<tool>

mcp_github_fetch

Usually

Cursor

MCP:<tool>

MCP:fetch

No

What this means for your policies

  • Claude Code - Server-specific scoping works as expected.

  • GitHub Copilot - Server-specific scoping works in most cases. When a server or tool name contains underscores, Nightfall may not be able to tell the server and tool apart reliably.

  • Cursor - Cursor does not include the server name in its tool names. A Specific MCP servers or All except these MCP servers policy therefore cannot match Cursor traffic by server, and Cursor activity is treated as if All MCP servers were selected.

Recommendation: If you need to scope policies by server and your organization uses Cursor, pair the policy with a broader All MCP servers rule so Cursor traffic is still covered.


Step 4: Shell Command Patterns (Optional)

When Shell Commands monitoring is enabled, you can optionally scope to specific command patterns. Leaving this field empty scans all shell commands.

Enter patterns as chips (type + Enter to add). Recommended patterns are shown as clickable suggestions below the input.


Step 5: Detection Rules

Select the Nightfall detectors that define what sensitive data to look for. This works the same as any other exfiltration policy:

  • Built-in detectors: PII (SSN, credit cards, phone numbers), credentials (API keys, passwords, tokens), source code patterns

  • Custom detectors: Regular expressions, dictionaries, or ML-based classifiers you have created

  • Detection rule logic: Combine multiple detectors with AND/OR logic and set confidence thresholds


Step 6: Actions and Alerts

Enforcement action

Action

Behavior

Block

The AI agent action is denied. The end-user sees a block message.

Monitor

The action proceeds. An incident is created for review.

Admin alerting

Configure where violation alerts are sent:

  • Slack - post to a channel

  • Jira - create a ticket

  • Email - send to specified recipients

  • Webhook - POST to a custom endpoint

End-user notification

End-user notifications are not available with AI Agent Security policy at this time. The custom message will be displayed in AI clients like Cursor, Claude Code & VS Code.

  • The notification text as per the custom block message (e.g., "This action was blocked because it contains sensitive data. Contact [email protected] for help.")

Policy metadata

  • Policy name and description

  • Risk score - use the Nightfall default or set a custom severity (Critical, High, Medium, Low)


Example: Block Credentials in Prompts

Here is an example of a common policy configuration:

  1. AI Clients: Claude Code, Cursor, VS Code

  2. Hook Types: User Prompts (Block), Tool Calls (Block), Shell Commands (Monitor)

  3. MCP Server Scope: All MCP servers

  4. Detection Rules: API Keys, Passwords, AWS Credentials (High confidence)

  5. Action: Block

  6. Alerts: Slack #security-alerts + Email to security team

This policy prevents developers from accidentally pasting API keys or credentials into AI prompts or tool calls, while monitoring shell commands for credential exposure.

Last updated

Was this helpful?