In TrustTest there are specialized components designed to assess if an AI model response is compliant with a specific set of criteria. They provide a systematic way to measure various aspects of model outputs against predefined criteria, ensuring reliable and consistent evaluation across different use cases.


Key Areas

Heuristic Evaluators These evaluators use rule-based approaches and predefined metrics to assess responses. They include:

  • Language-based evaluations ( checks if the response is in the correct language)
  • Exact matching and pattern recognition
  • BLEU score for text similarity
  • Regular expression pattern matching

LLM-based Evaluators These evaluators leverage language models to perform more nuanced assessments:

  • Response correctness
  • Response completeness
  • Tone and style analysis
  • URL correctness validation
  • Custom evaluation criteria
  • True/false assessment

Why It Matters

  • Quality Assurance Evaluators provide objective metrics to ensure AI responses meet quality standards and requirements.

  • Consistent Assessment By standardizing evaluation criteria, evaluators enable reproducible and comparable results across different models and use cases.

  • Flexible Evaluation The modular design allows for custom evaluators to be created for specific needs while maintaining a consistent interface.

  • Comprehensive Analysis Different types of evaluators can be combined to provide a holistic assessment of model performance across multiple dimensions.

  • Trust and Reliability Systematic evaluation helps build confidence in AI systems by providing clear metrics and explanations for assessment results.