Skip to main content
TrustGuard is NeuralTrust’s runtime security engine for generative-AI and agentic traffic. It inspects prompts, model output, documents, URLs, and agent/tool activity in flight and returns a structured security verdict for each request. TrustGuard is deliberately decision-only: it tells you what it found and what it would do, but it never blocks or drops traffic itself. The component that calls TrustGuard (the collector) decides whether to allow, mask, or block based on the verdict. This keeps TrustGuard safe to deploy inline anywhere — a failure or timeout can never take your application down.
TrustGuard is one of four NeuralTrust products. It pairs naturally with TrustGate (the AI gateway, the most common collector), TrustLens (AI security posture), and TrustTest (red teaming).

What it does

Prompt & content security

Jailbreak and prompt-injection detection, topic moderation, toxicity scoring, and document/URL analysis on both input and output.

Data loss prevention

Detect and mask 60+ PII entities and secrets (API keys, tokens, JWTs) in flight, returning a transformed payload.

Agent & MCP security

Guard tool/function definitions, enforce tool allow/deny lists, and validate the tool calls a model emits.

Behavioral security

Detect abusive actors — bot-like cadence, repeated payloads, and escalation — across requests and collectors.

The model in one minute

TrustGuard has two building blocks. You compose them once, then send traffic.
ConceptWhat it is
DetectorA reusable, named detection — one capability from the built-in catalog (e.g. prompt_guard, data_loss_prevention) configured with a mode (observe / redact / block), protocol, direction, and settings.
CollectorA traffic tap (gateway, SDK, browser, WAF) authenticated by an API key. Each collector has a chain of attached detectors.
                    POST /v1/guard  (Authorization: Bearer <collector key>)

   ┌──────────┐   resolves    ┌───────▼────────┐   runs        ┌─────────────┐
   │ API key  │──────────────▶│   Collector    │──────────────▶│  Detector   │ prompt_guard (block)
   └──────────┘   collector   │  (the tap)     │   its chain   │  Detector   │ data_loss_prevention (redact)
                              └────────────────┘   concurrently └─────────────┘ anomaly_detector (observe)


              { is_flagged, transformed_payload, findings[], trace_id }
A request names no detectors: TrustGuard runs the collector’s entire effective chain (every enabled, attached detector that matches the request’s direction and protocol). See How it works.

What makes it safe to run inline

  • Never drops traffic. A detection is always HTTP 200 with the finding in the response body. Non-2xx is reserved for auth (401), bad requests (400), and system failures (500).
  • Fail-open by default. If a detector errors (e.g. an upstream model API is down), TrustGuard returns no finding for it rather than failing the request, so a TrustGuard issue never breaks your traffic. Fail-closed is opt-in.
  • Enforcement is the caller’s choice. is_flagged is advisory. Your collector decides what to do with it.

Where to go next

How it works

The guard pipeline, modes, and the findings model.

Core concepts

Collectors and detectors.

Detector catalog

Every built-in detection and its settings.

Guard API

The /v1/guard request and response contract.