What it does
- Parses agent tool usage in requests and responses
- Enforces an overall token budget and optional per-tool budgets
- Actions when exceeding limits:
- Block: return an error (HTTP 429)
- Throttle: add a delay then allow
- Alert only: allow and log/telemetry
- Adds a response header
X-TOKEN-LIMITwithwithinorexceeded - Works in both PreRequest and PreResponse stages
Configuration Parameters
| Parameter | Type | Description | Required | Default |
|---|---|---|---|---|
mode | string | Action when limits exceed: throttle, block, alert_only | Yes | — |
max_limit.max_tokens | int | Overall token budget | Yes | — |
max_limit.window | int | Budget window in seconds | Yes | — |
max_limit.ttl | int | Cache TTL in seconds | Yes | — |
tool_limits[].name | string | Tool name | No | — |
tool_limits[].max_tokens | int | Per-tool token budget | No | — |
tool_limits[].window | int | Per-tool window in seconds | No | — |
tool_limits[].ttl | int | Per-tool TTL in seconds | No | — |
throttle_delay | int | Delay (seconds) when mode=throttle | No | 0 |
- Stages: PreRequest (checks before execution), PreResponse (updates usage after execution)
- Keys usage by gateway and rule internally; safe for multi-tenant setups
- If parsing fails or no tools found, request is allowed
Prerequisites
These agent security plugins require upstreams configured in provider mode. See Upstream Services & Routing for details: /trustgate/core-concepts/upstream-services-overview Example upstream (provider mode):Example configuration
pre_response stage with the same settings:
Best practices
- Start with alert-only to observe typical budgets, then switch to block/throttle
- Use per-tool budgets for expensive tools to prevent localized spikes
- Prefer reasonable windows (e.g., 30–60 minutes) and align TTL with window
- Surface the
X-TOKEN-LIMITheader to clients/ops dashboards for visibility