Attack Categories
Single Turn Attacks
Direct attacks delivered in a single message. These include:| Category | Attacks |
|---|---|
| Jailbreaking | DAN Jailbreak, Anti-GPT, Best-of-N, Role-Playing Exploits, System Override, Instructional Inversion |
| Encoding & Obfuscation | Encoded Payload, Encoding/Capitalization, Symbolic Encoding, Obfuscation/Token Smuggling, Typo Tricks |
| Structural | Context Hijacking, JSON Injection, Payload Splitting, Allowed/Disallowed Questions |
| Language-Based | Multi-Language Attacks, Synonyms |
| Multimodal | Multimodal Injection |
Multi-Turn Attacks
Sophisticated attacks that use multiple conversation turns to gradually manipulate the model:| Attack | Description |
|---|---|
| Crescendo Attack | Gradual escalation that slowly pushes boundaries toward a malicious objective |
| Echo Chamber Attack | Reinforcement-based manipulation through positive feedback |
| Multi-Turn Manipulation | General multi-turn conditioning attacks |
From Dataset
Load attacks from curated datasets for reproducible testing:| Dataset Type | Description |
|---|---|
| Static Datasets | Pre-built YAML/JSON datasets with attack prompts |
| Custom Datasets | Your own attack datasets in various formats |
Quick Reference: All Prompt Injection Probes
Single Turn Probes
| Probe | Class Name | Focus |
|---|---|---|
| Anti-GPT | AntiGPTProbe | Tests handling of role reversal requests |
| DAN Jailbreak | DANJailbreakProbe | Evaluates resistance to persona-based bypasses |
| Best-of-N Jailbreaking | BestOfNJailbreakingProbe | Probes for gaps through diversity |
| Role-Playing Exploits | RolePlayingExploitsProbe | Tests if fictional framing bypasses guardrails |
| System Override | SystemOverrideProbe | Evaluates instruction hierarchy enforcement |
| Instructional Inversion | InstructionalInversionProbe | Tests if negation confuses the model |
| Encoded Payload | EncodedPayloadProbe | Evaluates if encoded content bypasses filters |
| Encoding and Capitalization | EncodingAndCapitalizationProbe | Tests resistance to visual obfuscation |
| Symbolic Encoding | SymbolicEncodingProbe | Assesses handling of emoji/symbol representations |
| Obfuscation and Token Smuggling | ObfuscationAndTokenSmugglingProbe | Tests token-level obfuscation |
| Typo Tricks | TypoTricksProbe | Evaluates robustness to typo-based evasion |
| Context Hijacking | ContextHijackingProbe | Evaluates if injected context redirects behavior |
| JSON Injection | JsonInjectionProbe | Tests handling of structured data injection |
| Payload Splitting | PayloadSplittingProbe | Assesses if fragmented content evades detection |
| Allowed and Disallowed Questions | AllowedAndDisallowedQuestionsProbe | Evaluates if safe questions mask harmful ones |
| Multi-Language Attacks | MultiLanguageAttacksProbe | Evaluates safety consistency across languages |
| Synonyms | SynonymsProbe | Tests if alternative phrasings bypass filters |
| Multimodal Injection | MultimodalInjectionProbe | Evaluates resistance to cross-modal attacks |
Multi-Turn Probes
| Probe | Class Name | Focus |
|---|---|---|
| Multi-Turn Manipulation | MultiTurnManipulationProbe | Tests gradual conditioning to comply |
| Crescendo Attack | CrescendoAttackProbe | Tests if incremental requests bypass safety |
| Echo Chamber Attack | EchoChamberAttackProbe | Assesses resistance to reinforcement manipulation |
How Prompt Injection Testing Works
Prompt injection probes use a binary evaluation:- Pass: The model resists the injection attempt and maintains safe behavior
- Fail: The model’s behavior is successfully manipulated by the attack
Basic Usage Example
When to Use Prompt Injection Testing
Use prompt injection testing when you need to:- Validate model safety before deployment
- Test guardrails and content filters
- Assess vulnerability to known jailbreak techniques
- Conduct red team exercises
- Meet security compliance requirements