Contextual Security
The Contextual Security plugin (contextual_security
) adds behavioral fingerprint-based rate limiting and fraud prevention to the TrustGate AI Gateway. It proactively monitors for repeated malicious activity, analyzes behavioral similarity across users, and applies configurable countermeasures such as throttling, blocking, or alerting.
This plugin is especially useful in environments where prompt-level filtering (e.g., via Guardrails) is not sufficient, and where user behavior over time needs to be evaluated to prevent evasion or abuse.
ℹ️ Dependency: This plugin requires the
neuraltrust_guardrail
plugin to be enabled in the same rule or plugin chain.
How It Works
- Each user/requestor is identified by a unique fingerprint (derived from contextual metadata).
- TrustGate maintains a historical profile for each fingerprint, including:
- Count of malicious requests (e.g., from Guardrail classifications).
- Whether a fingerprint has been blocked in the past.
- Behavioral similarity to other known malicious or blocked fingerprints.
- Once thresholds are crossed, the plugin triggers rate limit actions:
block
: Reject future requests for a configured time.throttle
: Delay requests to increase cost/friction.alert_only
: Flag the request but let it proceed.
Requirements
- The
neuraltrust_guardrail
plugin must be active in the same rule. It classifies prompts and flags malicious behavior, which this plugin uses as input. - The plugin must run at the
pre_request
stage to intercept abusive activity before request execution.
Configuration Parameters
Parameter | Type | Description | Required | Default |
---|---|---|---|---|
max_failures | int | Number of past malicious requests allowed before action is taken. | No | 5 |
block_duration | int | Duration (in seconds) to block the fingerprint once threshold is reached. | No | 600 |
rate_limit_mode | string | Action to apply when thresholds are exceeded: block , throttle , or alert_only . | Yes | — |
similar_malicious_threshold | int | Number of similar fingerprints with malicious activity needed to trigger action. | No | 5 |
similar_blocked_threshold | int | Number of similar fingerprints that have been blocked to trigger action. | No | 5 |
ℹ️ Similarity between fingerprints is computed based on metadata and behavior using TrustGate’s internal
fingerprint.Manager
.
Execution Flow
The contextual_security
plugin follows a multi-step evaluation pipeline to determine whether a request should be allowed, throttled, alerted, or blocked. Below is a detailed breakdown of each step in the flow:
1. Fingerprint Resolution
The plugin retrieves a unique fingerprint identifier for the current request. This fingerprint is expected to be injected into the request context by upstream middleware and typically encodes a combination of request metadata, such as:
- IP address
- User-Agent header
- Authorization token
- User ID
If no fingerprint is found, the plugin will log an error and skip further checks.
2. Historical Analysis
Once a fingerprint is identified, the plugin retrieves its behavioral history from TrustGate’s internal storage
This history includes:
- Malicious request count: Number of previous requests flagged as malicious by
neuraltrust_guardrail
. - Block status: Whether the fingerprint is currently blocked due to prior abuse.
If the fingerprint does not yet exist in the system (i.e., first-time request), it is initialized and persisted with a temporary TTL in Redis.
3. Similarity Checks
To detect evasion techniques and coordinated abusive behavior, the plugin performs an internal similarity analysis by consulting TrustGate’s fingerprinting engine.
This engine evaluates the behavioral and contextual proximity between the current fingerprint and other known fingerprints in the system. Similarity is determined using internal heuristics based on:
- Historical behavior patterns (e.g., frequency and type of malicious activity)
- Metadata characteristics (e.g., origin IP ranges, token, etc)
- Request timing and flow characteristics
Once evaluated, the plugin interprets the results to determine:
- How many similar fingerprints have a history of malicious behavior.
- How many similar fingerprints are currently blocked by the system.
This analysis acts as a behavioral firewall layer, surfacing patterns of abuse that may not be visible through prompt-level inspection alone. It allows TrustGate to proactively respond to coordinated attacks or attempts to rotate identities (e.g., bots using IP cycling) while maintaining low friction for legitimate users.
4. Threshold Evaluation & Action
The plugin checks whether any of the following thresholds have been breached:
- Local Malicious Count ≥
max_failures
- Similar Malicious Fingerprints ≥
similar_malicious_threshold
- Similar Blocked Fingerprints ≥
similar_blocked_threshold
If any of the above conditions are met, the plugin proceeds to execute the configured rate_limit_mode
action:
Mode: block
- Immediately blocks the request.
- Marks the fingerprint as blocked in Redis for
block_duration
seconds. - Returns
403 Forbidden
.
Mode: throttle
- Artificially delays the request (default: 5 seconds).
- Adds the header
X-TrustGate-Alert: malicious-request
to the response. - Allows the request to proceed.
Mode: alert_only
- Adds the header
X-TrustGate-Alert: malicious-request
to the response. - Allows the request without delay or blocking.
Response Behavior
200 OK
Request is allowed to proceed when no threshold is breached.
Alert Only
When rate_limit_mode
is set to alert_only
or throttle
, the request is allowed, but a custom header is added:
Throttle Mode
When in throttle
mode, the request is artificially delayed (e.g., 5 seconds) before proceeding. This helps rate-limit abuse without fully blocking the user.
403 Forbidden
Returned when rate_limit_mode
is set to block
and one or more thresholds (max_failures
, similar_malicious_threshold
, or similar_blocked_threshold
) have been exceeded.
Response Body
Configuration Example
This example enables full fingerprint monitoring and blocking when thresholds are exceeded:
Troubleshooting
Symptom | Resolution |
---|---|
Requests never blocked | Ensure rate_limit_mode is not set to alert_only . Check threshold values. |
Guardrail not triggering malicious detection | Validate neuraltrust_guardrail plugin configuration and classification rules. |
Unexpected 403 responses | Check logs to verify which thresholds were exceeded. Tune plugin configuration accordingly. |
Best Practices
-
Always pair with
neuraltrust_guardrail
This plugin relies on Guardrail to classify prompts as malicious. It must be active in the same rule or plugin chain. -
Tune thresholds based on risk tolerance For higher-security environments, lower the values of
max_failures
,similar_malicious_threshold
, andsimilar_blocked_threshold
to detect and act on malicious behavior more aggressively. -
Monitor alerts When operating in
alert_only
mode, capture and analyze theX-TrustGate-Alert
response header to monitor potential fraudulent behavior without blocking users.