NeuralTrust | The leading security platform for generative AI

The Contextual Security plugin (contextual_security) adds behavioral fingerprint-based rate limiting and fraud prevention to the TrustGate AI Gateway. It proactively monitors for repeated malicious activity, analyzes behavioral similarity across users, and applies configurable countermeasures such as throttling, blocking, or alerting.

This plugin is especially useful in environments where prompt-level filtering (e.g., via Guardrails) is not sufficient, and where user behavior over time needs to be evaluated to prevent evasion or abuse.

ℹ️ Dependency: This plugin requires at least one of neuraltrust_guardrail neuraltrust_moderation toxicity_neuraltrust plugins enabled in the same rule or plugin chain.

How It Works

Each user/requestor is identified by a unique fingerprint (derived from contextual metadata).
TrustGate maintains a historical profile for each fingerprint, including:
Count of malicious requests (e.g., from Guardrail classifications).
Whether a fingerprint has been blocked in the past.
Behavioral similarity to other known malicious or blocked fingerprints.
Once thresholds are crossed, the plugin triggers rate limit actions:
- block: Reject future requests for a configured time.
- throttle: Delay requests to increase cost/friction.
- alert_only: Flag the request but let it proceed.

Requirements

The neuraltrust_guardrail, neuraltrust_moderation, toxicity_neuraltrust plugins must be active in the same rule. It classifies prompts and flags malicious behavior, which this plugin uses as input.
The plugin must run at the pre_request stage to intercept abusive activity before request execution.

{
  "name": "neuraltrust_guardrail",
  "enabled": true,
  ...
}

Configuration Parameters

Parameter	Type	Description	Required	Default
`max_failures`	int	Number of past malicious requests allowed before action is taken.	No	5
`block_duration`	int	Duration (in seconds) to block the fingerprint once threshold is reached.	No	600
`rate_limit_mode`	string	Action to apply when thresholds are exceeded: `block`, `throttle`, or `alert_only`.	Yes	—
`similar_malicious_threshold`	int	Number of similar fingerprints with malicious activity needed to trigger action.	No	5
`similar_blocked_threshold`	int	Number of similar fingerprints that have been blocked to trigger action.	No	5

ℹ️ Similarity between fingerprints is computed based on metadata and behavior using TrustGate’s internal fingerprint.Manager.

Execution Flow

The contextual_security plugin follows a multi-step evaluation pipeline to determine whether a request should be allowed, throttled, alerted, or blocked. Below is a detailed breakdown of each step in the flow:

1. Fingerprint Resolution

The plugin retrieves a unique fingerprint identifier for the current request. This fingerprint is expected to be injected into the request context by upstream middleware and typically encodes a combination of request metadata, such as:

IP address
User-Agent header
Authorization token
User ID

If no fingerprint is found, the plugin will log an error and skip further checks.

2. Historical Analysis

Once a fingerprint is identified, the plugin retrieves its behavioral history from TrustGate’s internal storage

This history includes:

Malicious request count: Number of previous requests flagged as malicious by neuraltrust_guardrail.
Block status: Whether the fingerprint is currently blocked due to prior abuse.

If the fingerprint does not yet exist in the system (i.e., first-time request), it is initialized and persisted with a temporary TTL in Redis.

3. Similarity Checks

To detect evasion techniques and coordinated abusive behavior, the plugin performs an internal similarity analysis by consulting TrustGate’s fingerprinting engine.

This engine evaluates the behavioral and contextual proximity between the current fingerprint and other known fingerprints in the system. Similarity is determined using internal heuristics based on:

Historical behavior patterns (e.g., frequency and type of malicious activity)
Metadata characteristics (e.g., origin IP ranges, token, etc)
Request timing and flow characteristics

Once evaluated, the plugin interprets the results to determine:

How many similar fingerprints have a history of malicious behavior.
How many similar fingerprints are currently blocked by the system.

This analysis acts as a behavioral firewall layer, surfacing patterns of abuse that may not be visible through prompt-level inspection alone. It allows TrustGate to proactively respond to coordinated attacks or attempts to rotate identities (e.g., bots using IP cycling) while maintaining low friction for legitimate users.

4. Threshold Evaluation & Action

The plugin checks whether any of the following thresholds have been breached:

Local Malicious Count ≥ max_failures
Similar Malicious Fingerprints ≥ similar_malicious_threshold
Similar Blocked Fingerprints ≥ similar_blocked_threshold

If any of the above conditions are met, the plugin proceeds to execute the configured rate_limit_mode action:

Mode: `block`

Immediately blocks the request.
Marks the fingerprint as blocked in Redis for block_duration seconds.
Returns 403 Forbidden.

Mode: `throttle`

Artificially delays the request (default: 5 seconds).
Adds the header X-TrustGate-Alert: malicious-request to the response.
Allows the request to proceed.

Mode: `alert_only`

Adds the header X-TrustGate-Alert: malicious-request to the response.
Allows the request without delay or blocking.

Response Behavior

200 OK

Request is allowed to proceed when no threshold is breached.

Alert Only

When rate_limit_mode is set to alert_only or throttle, the request is allowed, but a custom header is added:

X-TrustGate-Alert: malicious-request

Throttle Mode

When in throttle mode, the request is artificially delayed (e.g., 5 seconds) before proceeding. This helps rate-limit abuse without fully blocking the user.

403 Forbidden

Returned when rate_limit_mode is set to block and one or more thresholds (max_failures, similar_malicious_threshold, or similar_blocked_threshold) have been exceeded.

Response Body

{
  "error": "(eg. blocked request due fraudulent activity)"
}

Configuration Example

This example enables full fingerprint monitoring and blocking when thresholds are exceeded:

{
  "name": "contextual_security",
  "enabled": true,
  "priority": 0,
  "stage": "pre_request",
  "parallel": false,
  "settings": {
    "max_failures": 5,
    "block_duration": 600,
    "rate_limit_mode": "block",
    "similar_malicious_threshold": 2,
    "similar_blocked_threshold": 2
  }
}

Troubleshooting

Symptom	Resolution
Requests never blocked	Ensure `rate_limit_mode` is not set to `alert_only`. Check threshold values.
Guardrail not triggering malicious detection	Validate `neuraltrust_guardrail` plugin configuration and classification rules.
Unexpected 403 responses	Check logs to verify which thresholds were exceeded. Tune plugin configuration accordingly.

Best Practices

Always pair with neuraltrust_guardrail This plugin relies on Guardrail to classify prompts as malicious. It must be active in the same rule or plugin chain.
Tune thresholds based on risk tolerance For higher-security environments, lower the values of max_failures, similar_malicious_threshold, and similar_blocked_threshold to detect and act on malicious behavior more aggressively.
Monitor alerts When operating in alert_only mode, capture and analyze the X-TrustGate-Alert response header to monitor potential fraudulent behavior without blocking users.

Getting Started

Core Concepts

Traffic Management

Rate Limiting & Request Control

Content Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

Contextual Security

How It Works

Requirements

Configuration Parameters

Execution Flow

1. Fingerprint Resolution

2. Historical Analysis

3. Similarity Checks

4. Threshold Evaluation & Action

Mode: `block`

Mode: `throttle`

Mode: `alert_only`

Response Behavior

200 OK

Alert Only

Throttle Mode

403 Forbidden

Response Body

Configuration Example

Troubleshooting

Best Practices

Getting Started

Core Concepts

Traffic Management

Rate Limiting & Request Control

Content Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

​How It Works

​Requirements

​Configuration Parameters

​Execution Flow

​1. Fingerprint Resolution

​2. Historical Analysis

​3. Similarity Checks

​4. Threshold Evaluation & Action

​Mode: block

​Mode: throttle

​Mode: alert_only

​Response Behavior

​200 OK

​Alert Only

​Throttle Mode

​403 Forbidden

​Response Body

​Configuration Example

​Troubleshooting

​Best Practices

How It Works

Requirements

Configuration Parameters

Execution Flow

1. Fingerprint Resolution

2. Historical Analysis

3. Similarity Checks

4. Threshold Evaluation & Action

Mode: `block`

Mode: `throttle`

Mode: `alert_only`

Response Behavior

200 OK

Alert Only

Throttle Mode

403 Forbidden

Response Body

Configuration Example

Troubleshooting

Best Practices