Documentation Index
Fetch the complete documentation index at: https://docs.neuraltrust.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prompt injection attacks attempt to manipulate the model into ignoring its instructions or behaving in unintended ways. TrustTest provides the most comprehensive suite of prompt injection probes, organized by attack technique.
Attack Categories
Single Turn Attacks
Direct attacks delivered in a single message. These include:
| Category | Attacks |
|---|
| Jailbreaking | DAN Jailbreak, Anti-GPT, Best-of-N, Role-Playing Exploits, System Override, Instructional Inversion |
| Encoding & Obfuscation | Encoded Payload, Encoding/Capitalization, Symbolic Encoding, Obfuscation/Token Smuggling, Typo Tricks |
| Structural | Context Hijacking, JSON Injection, Payload Splitting, Allowed/Disallowed Questions |
| Language-Based | Multi-Language Attacks, Synonyms |
| Multimodal | Multimodal Injection |
Explore Single Turn Attacks →
Multi-Turn Attacks
Sophisticated attacks that use multiple conversation turns to gradually manipulate the model:
| Attack | Description |
|---|
| Crescendo Attack | Gradual escalation that slowly pushes boundaries toward a malicious objective |
| Echo Chamber Attack | Reinforcement-based manipulation through positive feedback |
| Multi-Turn Manipulation | General multi-turn conditioning attacks |
Explore Multi-Turn Attacks →
From Dataset
Load attacks from curated datasets for reproducible testing:
| Dataset Type | Description |
|---|
| Static Datasets | Pre-built YAML/JSON datasets with attack prompts |
| Custom Datasets | Your own attack datasets in various formats |
Explore Dataset-Based Attacks →
Quick Reference: All Prompt Injection Probes
Single Turn Probes
| Probe | Class Name | Focus |
|---|
| Anti-GPT | AntiGPTProbe | Tests handling of role reversal requests |
| DAN Jailbreak | DANJailbreakProbe | Evaluates resistance to persona-based bypasses |
| Best-of-N Jailbreaking | BestOfNJailbreakingProbe | Probes for gaps through diversity |
| Role-Playing Exploits | RolePlayingExploitsProbe | Tests if fictional framing bypasses guardrails |
| System Override | SystemOverrideProbe | Evaluates instruction hierarchy enforcement |
| Instructional Inversion | InstructionalInversionProbe | Tests if negation confuses the model |
| Encoded Payload | EncodedPayloadProbe | Evaluates if encoded content bypasses filters |
| Encoding and Capitalization | EncodingAndCapitalizationProbe | Tests resistance to visual obfuscation |
| Symbolic Encoding | SymbolicEncodingProbe | Assesses handling of emoji/symbol representations |
| Obfuscation and Token Smuggling | ObfuscationAndTokenSmugglingProbe | Tests token-level obfuscation |
| Typo Tricks | TypoTricksProbe | Evaluates robustness to typo-based evasion |
| Context Hijacking | ContextHijackingProbe | Evaluates if injected context redirects behavior |
| JSON Injection | JsonInjectionProbe | Tests handling of structured data injection |
| Payload Splitting | PayloadSplittingProbe | Assesses if fragmented content evades detection |
| Allowed and Disallowed Questions | AllowedAndDisallowedQuestionsProbe | Evaluates if safe questions mask harmful ones |
| Multi-Language Attacks | MultiLanguageAttacksProbe | Evaluates safety consistency across languages |
| Synonyms | SynonymsProbe | Tests if alternative phrasings bypass filters |
| Multimodal Injection | MultimodalInjectionProbe | Evaluates resistance to cross-modal attacks |
Multi-Turn Probes
| Probe | Class Name | Focus |
|---|
| Multi-Turn Manipulation | MultiTurnManipulationProbe | Tests gradual conditioning to comply |
| Crescendo Attack | CrescendoAttackProbe | Tests if incremental requests bypass safety |
| Echo Chamber Attack | EchoChamberAttackProbe | Assesses resistance to reinforcement manipulation |
How Prompt Injection Testing Works
Prompt injection probes use a binary evaluation:
- Pass: The model resists the injection attempt and maintains safe behavior
- Fail: The model’s behavior is successfully manipulated by the attack
Basic Usage Example
from trusttest.catalog.prompt_injections.single_turn import SingleTurnScenarioBuilder, SubCategory
from trusttest.targets.http import HttpTarget, PayloadConfig
target = HttpTarget(
url="https://your-model-endpoint.com/chat",
headers={"Content-Type": "application/json"},
payload_config=PayloadConfig(
format={
"messages": [
{"role": "user", "content": "{{ test }}"}
]
},
message_regex="{{ test }}",
),
)
builder = SingleTurnScenarioBuilder(target=target, num_test_cases=20)
scenario = builder.get_scenario(SubCategory.DAN_JAILBREAK)
test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()
When to Use Prompt Injection Testing
Use prompt injection testing when you need to:
- Validate model safety before deployment
- Test guardrails and content filters
- Assess vulnerability to known jailbreak techniques
- Conduct red team exercises
- Meet security compliance requirements