TrustGuard is one of four NeuralTrust products. It pairs naturally with TrustGate (the AI gateway, the most common collector), TrustLens (AI security posture), and TrustTest (red teaming).
What it does
Prompt & content security
Jailbreak and prompt-injection detection, topic moderation, toxicity scoring,
and document/URL analysis on both input and output.
Data loss prevention
Detect and mask 60+ PII entities and secrets (API keys, tokens, JWTs) in flight,
returning a transformed payload.
Agent & MCP security
Guard tool/function definitions, enforce tool allow/deny lists, and validate the
tool calls a model emits.
Behavioral security
Detect abusive actors — bot-like cadence, repeated payloads, and escalation —
across requests and collectors.
The model in one minute
TrustGuard has two building blocks. You compose them once, then send traffic.| Concept | What it is |
|---|---|
| Detector | A reusable, named detection — one capability from the built-in catalog (e.g. prompt_guard, data_loss_prevention) configured with a mode (observe / redact / block), protocol, direction, and settings. |
| Collector | A traffic tap (gateway, SDK, browser, WAF) authenticated by an API key. Each collector has a chain of attached detectors. |
What makes it safe to run inline
- Never drops traffic. A detection is always
HTTP 200with the finding in the response body. Non-2xx is reserved for auth (401), bad requests (400), and system failures (500). - Fail-open by default. If a detector errors (e.g. an upstream model API is down), TrustGuard returns no finding for it rather than failing the request, so a TrustGuard issue never breaks your traffic. Fail-closed is opt-in.
- Enforcement is the caller’s choice.
is_flaggedis advisory. Your collector decides what to do with it.
Where to go next
How it works
The guard pipeline, modes, and the findings model.
Core concepts
Collectors and detectors.
Detector catalog
Every built-in detection and its settings.
Guard API
The
/v1/guard request and response contract.