Creating an AI Agent Security Policy
AI Agent Security policies are configured as exfiltration policies in Nightfall. This guide walks through each step of the policy creation wizard.
Getting Started
Navigate to Configuration > Policies > Exfiltration.
Click + New Policy.
Select AI Agent Security as the integration type.
Step 1: Choose Hook Types
Enable one or more hook types. Each can be independently toggled:
Hook Type
Can Block
What It Scans
User Prompts
Yes
Prompt text before it reaches the AI model
Tool Calls
Yes
Tool name and input parameters before execution
Tool Responses
No
Tool output after execution (monitor only)
Model Responses
No
Model response after execution (monitor only)
Shell Commands
Yes
Shell command string before execution
Step 2: MCP Server Scope
Defines which MCP servers this policy is evaluated against. This scope also applies to tool responses (data coming back from the server, not just outbound calls). What happens when a match occurs - block, alert, etc. - is configured separately under Remediation Actions.
All MCP servers - the policy applies to every connected MCP server.
Specific MCP servers - the policy applies only to a chosen list of servers.
All except these MCP servers - the policy applies to every server except a chosen list of excluded servers.
When you select “Specific MCP servers” or "All except these MCP servers," a drop-down picker appears:
1: MCP Server Collections
Select one or more named server collections. All servers across selected collections are combined.
There are pre-defined collections organized by category:
Code Hosting
Databases
Communication
Cloud Infrastructure
Observability
Project Management
File System
You can navigate to Collections list page under AI Governance > Collections and manually add a new MCP server, tool calls for a server. Select individual servers and optionally limit to specific tools within each server. Tool inventory will be captured and will be available in the Collections list page via the Add server and Add tools button. There is no blanket collection which will have all the servers and tools discovered.
For example, you could allow the GitHub MCP server but only for read operations. To do so, specify this in the MCP server collection and configure an appropriate policy.
2: Wildcard Patterns
How servers are identified
Nightfall identifies the MCP server from the tool name reported by each AI client. Because clients format these names differently, the server is not always identifiable. The table below uses the fetch tool on a server named github as an example.
AI client
Tool name format
Example
Server identified?
Claude Code
mcp__<server>__<tool>
mcp__github__fetch
Yes
GitHub Copilot
mcp_<server>_<tool>
mcp_github_fetch
Usually
Cursor
MCP:<tool>
MCP:fetch
No
What this means for your policies
Claude Code - Server-specific scoping works as expected.
GitHub Copilot - Server-specific scoping works in most cases. When a server or tool name contains underscores, Nightfall may not be able to tell the server and tool apart reliably.
Cursor - Cursor does not include the server name in its tool names. A Specific MCP servers or All except these MCP servers policy therefore cannot match Cursor traffic by server, and Cursor activity is treated as if All MCP servers were selected.
Recommendation: If you need to scope policies by server and your organization uses Cursor, pair the policy with a broader All MCP servers rule so Cursor traffic is still covered.
Step 4: Shell Command Patterns (Optional)
When Shell Commands monitoring is enabled, you can optionally scope to specific command patterns. Leaving this field empty scans all shell commands.
Enter patterns as chips (type + Enter to add). Recommended patterns are shown as clickable suggestions below the input.
Step 5: Detection Rules
Select the Nightfall detectors that define what sensitive data to look for. This works the same as any other exfiltration policy:
Built-in detectors: PII (SSN, credit cards, phone numbers), credentials (API keys, passwords, tokens), source code patterns
Custom detectors: Regular expressions, dictionaries, or ML-based classifiers you have created
Detection rule logic: Combine multiple detectors with AND/OR logic and set confidence thresholds
Step 6: Actions and Alerts
Enforcement action
Action
Behavior
Block
The AI agent action is denied. The end-user sees a block message.
Monitor
The action proceeds. An incident is created for review.
Admin alerting
Configure where violation alerts are sent:
Slack - post to a channel
Jira - create a ticket
Email - send to specified recipients
Webhook - POST to a custom endpoint
End-user notification
End-user notifications are not available with AI Agent Security policy at this time. The custom message will be displayed in AI clients like Cursor, Claude Code & VS Code.
The notification text as per the custom block message (e.g., "This action was blocked because it contains sensitive data. Contact [email protected] for help.")
Policy metadata
Policy name and description
Risk score - use the Nightfall default or set a custom severity (Critical, High, Medium, Low)
Example: Block Credentials in Prompts
Here is an example of a common policy configuration:
AI Clients: Claude Code, Cursor, VS Code
Hook Types: User Prompts (Block), Tool Calls (Block), Shell Commands (Monitor)
MCP Server Scope: All MCP servers
Detection Rules: API Keys, Passwords, AWS Credentials (High confidence)
Action: Block
Alerts: Slack #security-alerts + Email to security team
This policy prevents developers from accidentally pasting API keys or credentials into AI prompts or tool calls, while monitoring shell commands for credential exposure.
Last updated
Was this helpful?