What it does
- Analyzes request content for jailbreak signals (PreRequest)
- Uses an external firewall to score content risk
- Compares the maximum score against a threshold
- Actions when threshold is exceeded:
- Block: return Forbidden (HTTP 403)
- Throttle: delay the request, then allow
- Alert only: allow and log/telemetry
- Adds a response header
X-Jailbreak-Detectedset totrue/false
Configuration Parameters
| Parameter | Type | Description | Required | Default |
|---|---|---|---|---|
mode | string | throttle, block, or alert_only | Yes | — |
credentials.base_url | string | Firewall service base URL | Yes | — |
credentials.token | string | Firewall service token | Yes | — |
mapping_field | string | JSON path to extract content to analyze | No | — |
threshold | number | Detection threshold (0–1) | Yes | — |
- Stage: PreRequest
- If no content or parsing fails, the request is allowed (with telemetry)
- Uses a provider-specific parser internally (e.g., OpenAI)
Prerequisites
These agent security plugins require upstreams configured in provider mode. See Upstream Services & Routing for details: /trustgate/core-concepts/upstream-services-overview Example upstream (provider mode):Example configuration
Best practices
- Start with
alert_onlyto tune the threshold using real traffic - Use
mapping_fieldto point to relevant content (e.g., nested prompt fields) - Pair with Tool Budget Limiter for cost control and layered agent security
- Monitor telemetry and headers to track detections over time