NeuralTrust | Platform for Agent Security.

A detector is a reusable, named instance you create from one entry in the detector catalog (e.g. prompt_guard) plus the settings you configure (thresholds, entity lists, allow/deny lists, …). You create and edit detectors in the console’s Detectors screen, and the same detector can be referenced by many policies.

Detectors are detection-only. A detector decides what it finds and how confident it is — it never decides whether to block, mask, or allow. That decision lives on the policy that uses the detector (its Gates and Detectors tabs). This separation lets you reuse one well-tuned detector across many policies with different enforcement.

What defines a detector

Property	Values	Meaning
Type	one entry from the catalog	Which catalog detector this is an instance of (e.g. `prompt_guard`, `data_loss_prevention`). Fixed — you can’t change its code, only its settings. Sent to the API as `plugin_slug`.
Name	free text	A human label (e.g. “Jailbreak — strict”) so you can tell two detectors of the same type apart.
Settings	type-specific	The detection’s configuration — thresholds, entity lists, keyword/regex lists, and so on. The console renders and validates the form.
Enabled	on / off	A disabled detector is skipped everywhere it’s referenced.

A detector does not carry a mode, a direction, or a protocol. Those belong to the policy rule that puts the detector to work (the Input / Output phase). At request time the collector must send matching direction on /v1/evaluate — TrustGate sets it automatically; application and other collectors must set it themselves.

Settings

Every catalog detector exposes its own settings schema (the catalog API, GET /v1/plugins, returns each detector type and its fields). Examples:

prompt_guard, toxicity — a threshold in [0, 1].
data_loss_prevention — which PII entities to detect/mask, plus custom keyword/regex rules.
prompt_moderation — keyword/regex lists and/or NeuralTrust topic thresholds.

See each detector’s full settings on its category page.

Mutable (transform-capable) detectors

Most detectors only read the payload. A mutable detector can rewrite it — today only data_loss_prevention, which masks matched values in flight and populates transformed_payload. Only a mutable detector can be used with the Transform action in a policy; choosing Transform for any other detector is rejected when you save the policy.

Putting a detector to work

Creating a detector doesn’t run it. To evaluate traffic you reference the detector from a policy:

Open a policy → Detectors tab.
Pick an evaluation phase — Input (prompt/request) or Output (completion/response).
Add a rule that selects the detector and an action — Monitor (record a finding only), Block, or Transform (mutable detectors).
Optionally add conditions so the rule only runs for certain consumers, models, collectors, protocols, or sessions.

Then attach the policy to a collector so its traffic is evaluated.

Introduction

Core concepts

Detector catalog

Integrations

Evaluate API

Detectors

What defines a detector

Settings

Mutable (transform-capable) detectors

Putting a detector to work

​What defines a detector

​Settings

​Mutable (transform-capable) detectors

​Putting a detector to work

What defines a detector

Settings

Mutable (transform-capable) detectors

Putting a detector to work