The Contextual Security plugin (contextual_security) adds behavioral fingerprint-based rate limiting and fraud prevention to the TrustGate AI Gateway. It proactively monitors for repeated malicious activity, analyzes behavioral similarity across users, and applies configurable countermeasures such as throttling, blocking, or alerting.

This plugin is especially useful in environments where prompt-level filtering (e.g., via Guardrails) is not sufficient, and where user behavior over time needs to be evaluated to prevent evasion or abuse.

ℹ️ Dependency: This plugin requires the neuraltrust_guardrail plugin to be enabled in the same rule or plugin chain.


How It Works

  • Each user/requestor is identified by a unique fingerprint (derived from contextual metadata).
  • TrustGate maintains a historical profile for each fingerprint, including:
  • Count of malicious requests (e.g., from Guardrail classifications).
  • Whether a fingerprint has been blocked in the past.
  • Behavioral similarity to other known malicious or blocked fingerprints.
  • Once thresholds are crossed, the plugin triggers rate limit actions:
    • block: Reject future requests for a configured time.
    • throttle: Delay requests to increase cost/friction.
    • alert_only: Flag the request but let it proceed.

Requirements

  • The neuraltrust_guardrail plugin must be active in the same rule. It classifies prompts and flags malicious behavior, which this plugin uses as input.
  • The plugin must run at the pre_request stage to intercept abusive activity before request execution.
{
  "name": "neuraltrust_guardrail",
  "enabled": true,
  ...
}

Configuration Parameters

ParameterTypeDescriptionRequiredDefault
max_failuresintNumber of past malicious requests allowed before action is taken.No5
block_durationintDuration (in seconds) to block the fingerprint once threshold is reached.No600
rate_limit_modestringAction to apply when thresholds are exceeded: block, throttle, or alert_only.Yes
similar_malicious_thresholdintNumber of similar fingerprints with malicious activity needed to trigger action.No5
similar_blocked_thresholdintNumber of similar fingerprints that have been blocked to trigger action.No5

ℹ️ Similarity between fingerprints is computed based on metadata and behavior using TrustGate’s internal fingerprint.Manager.


Execution Flow

The contextual_security plugin follows a multi-step evaluation pipeline to determine whether a request should be allowed, throttled, alerted, or blocked. Below is a detailed breakdown of each step in the flow:


1. Fingerprint Resolution

The plugin retrieves a unique fingerprint identifier for the current request. This fingerprint is expected to be injected into the request context by upstream middleware and typically encodes a combination of request metadata, such as:

  • IP address
  • User-Agent header
  • Authorization token
  • User ID

If no fingerprint is found, the plugin will log an error and skip further checks.


2. Historical Analysis

Once a fingerprint is identified, the plugin retrieves its behavioral history from TrustGate’s internal storage

This history includes:

  • Malicious request count: Number of previous requests flagged as malicious by neuraltrust_guardrail.
  • Block status: Whether the fingerprint is currently blocked due to prior abuse.

If the fingerprint does not yet exist in the system (i.e., first-time request), it is initialized and persisted with a temporary TTL in Redis.


3. Similarity Checks

To detect evasion techniques and coordinated abusive behavior, the plugin performs an internal similarity analysis by consulting TrustGate’s fingerprinting engine.

This engine evaluates the behavioral and contextual proximity between the current fingerprint and other known fingerprints in the system. Similarity is determined using internal heuristics based on:

  • Historical behavior patterns (e.g., frequency and type of malicious activity)
  • Metadata characteristics (e.g., origin IP ranges, token, etc)
  • Request timing and flow characteristics

Once evaluated, the plugin interprets the results to determine:

  • How many similar fingerprints have a history of malicious behavior.
  • How many similar fingerprints are currently blocked by the system.

This analysis acts as a behavioral firewall layer, surfacing patterns of abuse that may not be visible through prompt-level inspection alone. It allows TrustGate to proactively respond to coordinated attacks or attempts to rotate identities (e.g., bots using IP cycling) while maintaining low friction for legitimate users.


4. Threshold Evaluation & Action

The plugin checks whether any of the following thresholds have been breached:

  • Local Malicious Countmax_failures
  • Similar Malicious Fingerprintssimilar_malicious_threshold
  • Similar Blocked Fingerprintssimilar_blocked_threshold

If any of the above conditions are met, the plugin proceeds to execute the configured rate_limit_mode action:

Mode: block

  • Immediately blocks the request.
  • Marks the fingerprint as blocked in Redis for block_duration seconds.
  • Returns 403 Forbidden.

Mode: throttle

  • Artificially delays the request (default: 5 seconds).
  • Adds the header X-TrustGate-Alert: malicious-request to the response.
  • Allows the request to proceed.

Mode: alert_only

  • Adds the header X-TrustGate-Alert: malicious-request to the response.
  • Allows the request without delay or blocking.

Response Behavior

200 OK

Request is allowed to proceed when no threshold is breached.

Alert Only

When rate_limit_mode is set to alert_only or throttle, the request is allowed, but a custom header is added:

X-TrustGate-Alert: malicious-request

Throttle Mode

When in throttle mode, the request is artificially delayed (e.g., 5 seconds) before proceeding. This helps rate-limit abuse without fully blocking the user.


403 Forbidden

Returned when rate_limit_mode is set to block and one or more thresholds (max_failures, similar_malicious_threshold, or similar_blocked_threshold) have been exceeded.

Response Body

{
  "error": "(eg. blocked request due fraudulent activity)"
}

Configuration Example

This example enables full fingerprint monitoring and blocking when thresholds are exceeded:

{
  "name": "contextual_security",
  "enabled": true,
  "priority": 0,
  "stage": "pre_request",
  "parallel": false,
  "settings": {
    "max_failures": 5,
    "block_duration": 600,
    "rate_limit_mode": "block",
    "similar_malicious_threshold": 2,
    "similar_blocked_threshold": 2
  }
}

Troubleshooting

SymptomResolution
Requests never blockedEnsure rate_limit_mode is not set to alert_only. Check threshold values.
Guardrail not triggering malicious detectionValidate neuraltrust_guardrail plugin configuration and classification rules.
Unexpected 403 responsesCheck logs to verify which thresholds were exceeded. Tune plugin configuration accordingly.

Best Practices

  • Always pair with neuraltrust_guardrail This plugin relies on Guardrail to classify prompts as malicious. It must be active in the same rule or plugin chain.

  • Tune thresholds based on risk tolerance For higher-security environments, lower the values of max_failures, similar_malicious_threshold, and similar_blocked_threshold to detect and act on malicious behavior more aggressively.

  • Monitor alerts When operating in alert_only mode, capture and analyze the X-TrustGate-Alert response header to monitor potential fraudulent behavior without blocking users.