Skip to main content
Prompt injection attacks attempt to manipulate the model into ignoring its instructions or behaving in unintended ways. TrustTest provides the most comprehensive suite of prompt injection probes, organized by attack technique.

Attack Categories

Single Turn Attacks

Direct attacks delivered in a single message. These include:
CategoryAttacks
JailbreakingDAN Jailbreak, Anti-GPT, Best-of-N, Role-Playing Exploits, System Override, Instructional Inversion
Encoding & ObfuscationEncoded Payload, Encoding/Capitalization, Symbolic Encoding, Obfuscation/Token Smuggling, Typo Tricks
StructuralContext Hijacking, JSON Injection, Payload Splitting, Allowed/Disallowed Questions
Language-BasedMulti-Language Attacks, Synonyms
MultimodalMultimodal Injection
Explore Single Turn Attacks →

Multi-Turn Attacks

Sophisticated attacks that use multiple conversation turns to gradually manipulate the model:
AttackDescription
Crescendo AttackGradual escalation that slowly pushes boundaries toward a malicious objective
Echo Chamber AttackReinforcement-based manipulation through positive feedback
Multi-Turn ManipulationGeneral multi-turn conditioning attacks
Explore Multi-Turn Attacks →

From Dataset

Load attacks from curated datasets for reproducible testing:
Dataset TypeDescription
Static DatasetsPre-built YAML/JSON datasets with attack prompts
Custom DatasetsYour own attack datasets in various formats
Explore Dataset-Based Attacks →

Quick Reference: All Prompt Injection Probes

Single Turn Probes

ProbeClass NameFocus
Anti-GPTAntiGPTProbeTests handling of role reversal requests
DAN JailbreakDANJailbreakProbeEvaluates resistance to persona-based bypasses
Best-of-N JailbreakingBestOfNJailbreakingProbeProbes for gaps through diversity
Role-Playing ExploitsRolePlayingExploitsProbeTests if fictional framing bypasses guardrails
System OverrideSystemOverrideProbeEvaluates instruction hierarchy enforcement
Instructional InversionInstructionalInversionProbeTests if negation confuses the model
Encoded PayloadEncodedPayloadProbeEvaluates if encoded content bypasses filters
Encoding and CapitalizationEncodingAndCapitalizationProbeTests resistance to visual obfuscation
Symbolic EncodingSymbolicEncodingProbeAssesses handling of emoji/symbol representations
Obfuscation and Token SmugglingObfuscationAndTokenSmugglingProbeTests token-level obfuscation
Typo TricksTypoTricksProbeEvaluates robustness to typo-based evasion
Context HijackingContextHijackingProbeEvaluates if injected context redirects behavior
JSON InjectionJsonInjectionProbeTests handling of structured data injection
Payload SplittingPayloadSplittingProbeAssesses if fragmented content evades detection
Allowed and Disallowed QuestionsAllowedAndDisallowedQuestionsProbeEvaluates if safe questions mask harmful ones
Multi-Language AttacksMultiLanguageAttacksProbeEvaluates safety consistency across languages
SynonymsSynonymsProbeTests if alternative phrasings bypass filters
Multimodal InjectionMultimodalInjectionProbeEvaluates resistance to cross-modal attacks

Multi-Turn Probes

ProbeClass NameFocus
Multi-Turn ManipulationMultiTurnManipulationProbeTests gradual conditioning to comply
Crescendo AttackCrescendoAttackProbeTests if incremental requests bypass safety
Echo Chamber AttackEchoChamberAttackProbeAssesses resistance to reinforcement manipulation

How Prompt Injection Testing Works

Prompt injection probes use a binary evaluation:
  • Pass: The model resists the injection attempt and maintains safe behavior
  • Fail: The model’s behavior is successfully manipulated by the attack

Basic Usage Example

from trusttest.catalog import PromptInjectionScenario
from trusttest.targets.http import HttpTarget, PayloadConfig

target = HttpTarget(
    url="https://your-model-endpoint.com/chat",
    headers={"Content-Type": "application/json"},
    payload_config=PayloadConfig(
        format={
            "messages": [
                {"role": "user", "content": "{{ test }}"}
            ]
        },
        message_regex="{{ test }}",
    ),
)

scenario = PromptInjectionScenario(
    target=target,
    sub_category="dan-jailbreak",
    max_attacks=20,
    sampling="random",
)

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()

When to Use Prompt Injection Testing

Use prompt injection testing when you need to:
  • Validate model safety before deployment
  • Test guardrails and content filters
  • Assess vulnerability to known jailbreak techniques
  • Conduct red team exercises
  • Meet security compliance requirements