Skip to main content
TrustTest provides comprehensive red teaming capabilities to evaluate the safety, security, and reliability of AI models. Test sets are collections of attack scenarios designed to systematically assess how well models handle various threats and requirements.

Threat Categories

TrustTest includes specialized probes for testing different aspects of AI safety:

Prompt Injections

Test your model’s resistance to various prompt injection techniques, including multi-turn manipulation attacks, jailbreaking attempts, encoding bypasses, and more.

Content Bias

Evaluate your model for cognitive biases (anchoring, framing, positional) and stereotypical biases (ethnic, gender, religious) that could lead to unfair or discriminatory outputs.

Sensitive Data Leak

Assess your model’s ability to protect sensitive information from direct queries, contextual leakage attempts, and metadata extraction attacks.

System Prompt Disclosure

Test whether attackers can extract your model’s system prompt or internal instructions through various techniques.

Input Leakage

Evaluate if your model inadvertently reveals information from previous conversations or user inputs.

Unsafe Outputs

Test your model’s guardrails against generating harmful content including hate speech, violence, illegal activities, and other dangerous outputs.

Off-Topics

Ensure your model stays within its intended scope and appropriately handles requests about competitors, public figures, or disallowed content areas.

Agentic Behavior

For AI agents, test resistance to unauthorized tool usage, self-preservation behaviors, and other agentic safety concerns.

Test Generation Methods

Predefined Datasets TrustTest comes with curated datasets for common evaluation scenarios across all threat categories. Objective-Based Testing Define custom attack objectives and let TrustTest generate sophisticated test cases using various attack techniques. Automatic Test Generation Generate test sets automatically from knowledge bases or using LLM-assisted creation. Custom Test Sets Create your own test sets by defining specific input-output pairs or importing existing datasets.

Why It Matters

  • Comprehensive Security Systematically evaluate models across multiple threat vectors before deployment.
  • Reproducible Testing Predefined and automatically generated test sets ensure consistent evaluation across different runs.
  • Efficient Red Teaming Automated test generation saves time while maintaining quality and coverage.
  • Customizable Assessment Tailor evaluations to your specific use case, industry requirements, and risk profile.
  • Continuous Improvement Identify vulnerabilities and track improvements over time with structured testing.