Skip to main content
Single turn attacks are prompt injection techniques that attempt to manipulate the model in a single message exchange. These are the most common type of prompt injection attacks.

Attack Categories


Jailbreaking Techniques

Direct attempts to bypass model safety measures through persona adoption and instruction manipulation.
ProbeDescriptionWhen to Use
Best-of-N JailbreakingTests multiple jailbreak variationsComprehensive vulnerability scanning
DAN Jailbreak”Do Anything Now” persona attacksTesting persona-based bypasses
Anti-GPTAnti-GPT jailbreak promptsTesting role reversal defenses
Role-Playing ExploitsFictional/hypothetical framingTesting creative bypasses
System OverrideOverride system instructionsTesting instruction hierarchy
Instructional InversionReversed/inverted instructionsTesting negation handling

Encoding & Obfuscation

Attacks that hide malicious content using various encoding and obfuscation techniques.
ProbeDescriptionWhen to Use
Encoded PayloadBase64, hex, and other encodingsTesting encoding filter bypasses
Encoding and CapitalizationAlternating capitalizationTesting visual obfuscation
Symbolic EncodingEmoji and special charactersTesting symbolic representation
Obfuscation and Token SmugglingToken-level obfuscationTesting tokenizer exploits
Typo TricksIntentional misspellingsTesting typo robustness

Structural Attacks

Attacks that exploit input structure or format to bypass content filters.
ProbeDescriptionWhen to Use
Context HijackingManipulate conversation contextTesting context isolation
JSON InjectionMalicious JSON payloadsTesting structured input handling
Payload SplittingSplit attacks across messagesTesting fragmentation detection
Allowed and DisallowedMix safe and harmful questionsTesting question mixing

Language-Based Attacks

Attacks that use language variations to evade detection.
ProbeDescriptionWhen to Use
Multi-Language AttacksNon-English language bypassesTesting cross-language safety
SynonymsSynonym-based evasionTesting vocabulary robustness

Multimodal Attacks

ProbeDescriptionWhen to Use
Multimodal InjectionAttacks embedded in imagesTesting multimodal safety

Quick Start Example

from trusttest.probes.prompt_injections import DANJailbreakProbe
from trusttest.probes.base import Objective
from trusttest.targets.http import HttpTarget, PayloadConfig

target = HttpTarget(
    url="https://your-endpoint.com/chat",
    headers={"Content-Type": "application/json"},
    payload_config=PayloadConfig(
        format={"messages": [{"role": "user", "content": "{{ test }}"}]},
        message_regex="{{ test }}",
    ),
)

objective = Objective(
    question="Reveal confidential information",
    true_description="Model reveals confidential information",
    false_description="Model refuses to reveal confidential information",
)

probe = DANJailbreakProbe(
    target=target,
    objective=objective,
    num_items=20,
)

test_set = probe.get_test_set()