Skip to main contentTrustTest provides comprehensive red teaming capabilities to evaluate the safety, security, and reliability of AI models. Test sets are collections of attack scenarios designed to systematically assess how well models handle various threats and requirements.
Threat Categories
TrustTest includes specialized probes for testing different aspects of AI safety:
Prompt Injections
Test your model’s resistance to various prompt injection techniques, including multi-turn manipulation attacks, jailbreaking attempts, encoding bypasses, and more.
Content Bias
Evaluate your model for cognitive biases (anchoring, framing, positional) and stereotypical biases (ethnic, gender, religious) that could lead to unfair or discriminatory outputs.
Sensitive Data Leak
Assess your model’s ability to protect sensitive information from direct queries, contextual leakage attempts, and metadata extraction attacks.
System Prompt Disclosure
Test whether attackers can extract your model’s system prompt or internal instructions through various techniques.
Evaluate if your model inadvertently reveals information from previous conversations or user inputs.
Unsafe Outputs
Test your model’s guardrails against generating harmful content including hate speech, violence, illegal activities, and other dangerous outputs.
Off-Topics
Ensure your model stays within its intended scope and appropriately handles requests about competitors, public figures, or disallowed content areas.
Agentic Behavior
For AI agents, test resistance to unauthorized tool usage, self-preservation behaviors, and other agentic safety concerns.
Test Generation Methods
Predefined Datasets
TrustTest comes with curated datasets for common evaluation scenarios across all threat categories.
Objective-Based Testing
Define custom attack objectives and let TrustTest generate sophisticated test cases using various attack techniques.
Automatic Test Generation
Generate test sets automatically from knowledge bases or using LLM-assisted creation.
Custom Test Sets
Create your own test sets by defining specific input-output pairs or importing existing datasets.
Why It Matters
-
Comprehensive Security
Systematically evaluate models across multiple threat vectors before deployment.
-
Reproducible Testing
Predefined and automatically generated test sets ensure consistent evaluation across different runs.
-
Efficient Red Teaming
Automated test generation saves time while maintaining quality and coverage.
-
Customizable Assessment
Tailor evaluations to your specific use case, industry requirements, and risk profile.
-
Continuous Improvement
Identify vulnerabilities and track improvements over time with structured testing.