Prompt Guard
Prompt Guard analyzes prompts before they reach the LLM and flags attempts to bypass the model’s safety rules or inject instructions. It is powered by NeuralTrust’s firewall.What it detects
Prompt Guard recognizes a broad catalog of jailbreak and prompt-injection techniques, including:| Technique | What it looks like |
|---|---|
| Hypothetical | Prompts that ask the model to imagine or simulate forbidden scenarios. |
| Instruction Manipulation | Hidden or nested instructions embedded in user queries. |
| List-based Injection | Using lists to circumvent filters or elicit forbidden responses. |
| Obfuscation | Encoding, symbol replacement, or Unicode tricks to evade filters. |
| Payload Splitting | Splitting malicious instructions across multiple inputs. |
| Prompt Leaking | Extracting system prompts to alter model behavior. |
| Role Play | Pretending to be characters or systems (DAN-style) to trick the model. |
| Special Token Insertion | Using unusual tokens to confuse moderation mechanisms. |
| Single Prompt Jailbreaks | Direct, one-shot jailbreak instructions. |
| Multi-Turn Jailbreaks | Incremental jailbreaks built over multi-message interactions. |
| Prompt Obfuscation | Misdirection or encoding used to hide intent. |
| Prompt Injection by Delegation | Indirect injection hidden inside benign-looking content (RAG chunks, tool outputs, emails, web pages). |
| Encoding & Unicode Obfuscation | Base64, UTF-16, and other encoding tricks to bypass detection. |
Configuring Prompt Guard
Prompt Guard is configured inline, on the policy that uses it. There is no separate plugin page — you wire it up as part of a policy’sWhen condition and set its sensitivity right there.
1. Pick the detection
InCreate Policy → When, add a condition and open the detection picker. Prompt Guard lives under the Content Security category, alongside other prompt- and response-level detections (Prompt Moderation, Response Moderation, Toxicity Protection, Azure Content Safety, OpenAI Toxicity, URL Analyzer, Document Analyzer).
Select Prompt Guard and confirm. The condition row now reads:
2. Set the sensitivity
Once Prompt Guard is selected, a configuration panel appears under the condition with a Sensitivity control. The scale has four levels, with L2 labeledBalanced and used as the recommended default:
| Level | Profile | Good for |
|---|---|---|
| L1 | Minimal filtering — only the most obvious threats. | Internal or already-hardened traffic. |
| L2 — Balanced | Recommended default — tuned mix of recall and false-positive rate. | Most production routes. |
| L3 | Higher sensitivity — catches more borderline attempts. | Regulated or higher-risk workloads. |
| L4 | Maximum sensitivity — strictest filtering, more false-positives. | Kids / health / finance / government. |
Log policy, review the hits on real traffic, then raise the level and/or promote the policy’s action to Block once the false-positive rate is acceptable.
3. Finish the policy
The rest of the policy is a standardWhere / When / Then:
- Where — typically the
Gatewaysurface, optionally filtered byRoutes(e.g./openai/*) orUpstreams. - When —
Input·triggers· Prompt Guard (with the sensitivity set in step 2). - Then —
Logwhile tuning,Blockonce the policy is ready for enforcement.