Threat Categories
TrustTest includes specialized probes for testing different aspects of AI safety:Prompt Injections
Test your model’s resistance to various prompt injection techniques, including multi-turn manipulation attacks, jailbreaking attempts, encoding bypasses, and more.Content Bias
Evaluate your model for cognitive biases (anchoring, framing, positional) and stereotypical biases (ethnic, gender, religious) that could lead to unfair or discriminatory outputs.Sensitive Data Leak
Assess your model’s ability to protect sensitive information from direct queries, contextual leakage attempts, and metadata extraction attacks.System Prompt Disclosure
Test whether attackers can extract your model’s system prompt or internal instructions through various techniques.Input Leakage
Evaluate if your model inadvertently reveals information from previous conversations or user inputs.Unsafe Outputs
Test your model’s guardrails against generating harmful content including hate speech, violence, illegal activities, and other dangerous outputs.Off-Topics
Ensure your model stays within its intended scope and appropriately handles requests about competitors, public figures, or disallowed content areas.Agentic Behavior
For AI agents, test resistance to unauthorized tool usage, self-preservation behaviors, and other agentic safety concerns.Test Generation Methods
Predefined Datasets TrustTest comes with curated datasets for common evaluation scenarios across all threat categories. Objective-Based Testing Define custom attack objectives and let TrustTest generate sophisticated test cases using various attack techniques. Automatic Test Generation Generate test sets automatically from knowledge bases or using LLM-assisted creation. Custom Test Sets Create your own test sets by defining specific input-output pairs or importing existing datasets.Why It Matters
- Comprehensive Security Systematically evaluate models across multiple threat vectors before deployment.
- Reproducible Testing Predefined and automatically generated test sets ensure consistent evaluation across different runs.
- Efficient Red Teaming Automated test generation saves time while maintaining quality and coverage.
- Customizable Assessment Tailor evaluations to your specific use case, industry requirements, and risk profile.
- Continuous Improvement Identify vulnerabilities and track improvements over time with structured testing.