NeuralTrust | The leading security platform for generative AI

Prompt security protects the input side of every LLM call. It focuses on two attack families that look innocent at the HTTP layer but are the leading causes of model misbehavior: jailbreaks and prompt injections. In TrustGate, prompt security is delivered by the Prompt Guard detection.

Prompt Guard

Prompt Guard analyzes prompts before they reach the LLM and flags attempts to bypass the model’s safety rules or inject instructions. It is powered by NeuralTrust’s firewall.

What it detects

Prompt Guard recognizes a broad catalog of jailbreak and prompt-injection techniques, including:

Technique	What it looks like
Hypothetical	Prompts that ask the model to imagine or simulate forbidden scenarios.
Instruction Manipulation	Hidden or nested instructions embedded in user queries.
List-based Injection	Using lists to circumvent filters or elicit forbidden responses.
Obfuscation	Encoding, symbol replacement, or Unicode tricks to evade filters.
Payload Splitting	Splitting malicious instructions across multiple inputs.
Prompt Leaking	Extracting system prompts to alter model behavior.
Role Play	Pretending to be characters or systems (DAN-style) to trick the model.
Special Token Insertion	Using unusual tokens to confuse moderation mechanisms.
Single Prompt Jailbreaks	Direct, one-shot jailbreak instructions.
Multi-Turn Jailbreaks	Incremental jailbreaks built over multi-message interactions.
Prompt Obfuscation	Misdirection or encoding used to hide intent.
Prompt Injection by Delegation	Indirect injection hidden inside benign-looking content (RAG chunks, tool outputs, emails, web pages).
Encoding & Unicode Obfuscation	Base64, UTF-16, and other encoding tricks to bypass detection.

Configuring Prompt Guard

Prompt Guard is configured inline, on the policy that uses it. There is no separate plugin page — you wire it up as part of a policy’s When condition and set its sensitivity right there.

1. Pick the detection

In Create Policy → When, add a condition and open the detection picker. Prompt Guard lives under the Content Security category, alongside other prompt- and response-level detections (Prompt Moderation, Response Moderation, Toxicity Protection, Azure Content Safety, OpenAI Toxicity, URL Analyzer, Document Analyzer). Select Prompt Guard and confirm. The condition row now reads:

Input · triggers · 1 detection selected   →   Prompt Guard

2. Set the sensitivity

Once Prompt Guard is selected, a configuration panel appears under the condition with a Sensitivity control. The scale has four levels, with L2 labeled Balanced and used as the recommended default:

Level	Profile	Good for
L1	Minimal filtering — only the most obvious threats.	Internal or already-hardened traffic.
L2 — Balanced	Recommended default — tuned mix of recall and false-positive rate.	Most production routes.
L3	Higher sensitivity — catches more borderline attempts.	Regulated or higher-risk workloads.
L4	Maximum sensitivity — strictest filtering, more false-positives.	Kids / health / finance / government.

Start at L2 — Balanced with a Log policy, review the hits on real traffic, then raise the level and/or promote the policy’s action to Block once the false-positive rate is acceptable.

3. Finish the policy

The rest of the policy is a standard Where / When / Then:

Where — typically the Gateway surface, optionally filtered by Routes (e.g. /openai/*) or Upstreams.
When — Input · triggers · Prompt Guard (with the sensitivity set in step 2).
Then — Log while tuning, Block once the policy is ready for enforcement.

A single policy can combine Prompt Guard with other Content Security detections — add more rows and they’ll be AND-combined, per the policy model.

​Prompt Guard

​What it detects

​Configuring Prompt Guard

​1. Pick the detection

​2. Set the sensitivity

​3. Finish the policy