settings.credentials.
| Detector | Slug | Sides | Protocols | Backend |
|---|---|---|---|---|
| Prompt Guard | prompt_guard | input, output | all | NeuralTrust Firewall |
| Multi-turn Guard | multiturn_guard | input | all | Stateful (per session) |
| Toxicity | toxicity | input, output | all | NeuralTrust / OpenAI |
| Toxicity (OpenAI) | toxicity_openai | input, output | all | OpenAI Moderation |
| Toxicity (Azure) | toxicity_azure | input, output | all | Azure Content Safety |
| Prompt Moderation | prompt_moderation | input, output | all | keyword/regex + NeuralTrust topics |
| URL Analyzer | url_analyzer | input | llm, mcp | fetch + NeuralTrust Firewall |
| Document Analyzer | doc_analyzer | input | llm | extract/OCR + PII + Firewall |
| Bedrock Guardrail | bedrock_guardrail | both | all | AWS Bedrock |
Prompt Guard — prompt_guard
Scores input/output with the NeuralTrust Firewall jailbreak detector and flags above a
threshold. Sensitivity 1–4 (default 2).
| Field | Type | Required | Notes |
|---|---|---|---|
jailbreak.threshold | number | ✅ | Score in [0,1] above which content is flagged. |
credentials.{base_url,token,openai_api_key} | string | — | Override global firewall creds. |
Multi-turn Guard — multiturn_guard
Records conversation turns keyed on session_id and evaluates them across the
conversation — catching jailbreaks that build up gradually across turns rather than in a
single message. Pass a stable session_id on the guard request.
| Field | Type | Default | Notes |
|---|---|---|---|
session_ttl | integer | 3600 | Seconds to retain session state. |
retention_period | integer | — | Seconds; malicious-counter TTL. |
threshold | number | 0.7 | Score in [0,1]. |
Toxicity — toxicity
Scores content for toxicity via the configured provider.
| Field | Type | Default | Notes |
|---|---|---|---|
provider | enum | neuraltrust | neuraltrust or openai. |
toxicity.threshold | number | — (required) | Score in [0,1]. |
mapping_field | string | — | JSON path to the text to score. |
credentials.* | object | — | Override global creds. |
Toxicity (OpenAI) — toxicity_openai
Uses the OpenAI Moderation API (omni-moderation-latest).
| Field | Type | Required | Notes |
|---|---|---|---|
openai_key | string | ✅ | OpenAI API key. |
categories | array<string> | — | Moderation categories to consider. |
thresholds | map<string, number> | — | Per-category score threshold [0,1]. |
Toxicity (Azure) — toxicity_azure
Uses Azure AI Content Safety.
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
api_key | string | ✅ | — | Azure key. |
endpoints.text | string | ✅ | — | Text analyze URL. |
endpoints.image | string | — | — | Image analyze URL. |
output_type | enum | — | FourSeverityLevels | or EightSeverityLevels. |
categories | array<string> | — | Hate, Violence, SelfHarm, Sexual | |
category_severity | map<string,int> | — | — | Per-category min severity. |
Prompt Moderation — prompt_moderation
Dual-mode moderation; enable at least one mode.
| Field | Type | Default | Notes |
|---|---|---|---|
keyreg_moderation.enabled | boolean | false | Keyword/regex matching. |
keyreg_moderation.keywords | array<string> | — | |
keyreg_moderation.regex | array<string> | — | Each must compile. |
keyreg_moderation.similarity_threshold | number | 0.8 | |
nt_topic_moderation.enabled | boolean | false | NeuralTrust topic probability. |
nt_topic_moderation.topics | array<string> | — | |
nt_topic_moderation.thresholds | map<string, number> | — | Per-topic threshold. |
URL Analyzer — url_analyzer
Extracts URLs from content, fetches each page (SSRF-guarded, size/timeout-bounded), and
screens the fetched text for jailbreaks and PII.
| Field | Type | Default | Notes |
|---|---|---|---|
threshold | number | 0.7 | Jailbreak score threshold. |
url.timeout | integer | 20000 | Milliseconds. |
url.max_content_size | integer | 1048576 | Bytes. |
url.allowed_domains / url.blocked_domains | array<string> | — | Allow/deny lists. |
pii.entities | array<enum> | — | PII entities to check in fetched content. |
Document Analyzer — doc_analyzer
Extracts text from uploaded documents (PDF, Office, images via OCR, plain text) sent as
input.attachments, then screens for PII and (optionally) jailbreaks.
| Field | Type | Default | Notes |
|---|---|---|---|
max_file_size | integer | 52428800 (50 MiB) | Bytes. |
entities | array<enum> | — | PII entities to detect. |
firewall.enabled | boolean | false | Jailbreak screening of extracted text. |
firewall.threshold | number | 0.7 | |
ocr.enabled | boolean | false | OCR for images/scans (requires the OCR build). |
ocr.languages | array<string> | — | Tesseract language codes. |
Bedrock Guardrail — bedrock_guardrail
Applies an AWS Bedrock guardrail and reports topic, content, or sensitive-information
violations.
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
guardrail_id | string | ✅ | — | |
version | string | — | "1" | |
credentials.aws_access_key / aws_secret_key / aws_region / aws_session_token | string | — | — | Static creds. |
credentials.use_role | boolean | — | false | Assume a role instead. |
credentials.role_arn | string | — | — | Required when use_role=true. |
When to use
prompt_guard(input, block) is the baseline jailbreak defense; addmultiturn_guardfor chat where attacks unfold over turns.url_analyzer/doc_analyzerfor RAG and agent flows that ingest links/files.- Pick one toxicity provider per side that matches your stack.
prompt_moderationfor topic/scope control (“only answer about X”).