Heuristic evaluators uses mathematical and logical formulas to aproximate if a response is correct or incorrect.

Why Heuristic Evaluators are Important

Heuristic evaluators are valuable because they:

  1. Consistency: Provide consistent evaluations across different runs and scenarios
  2. Speed: Execute quickly without requiring additional API calls
  3. Cost-Effective: Don’t require additional LLM API calls, making them more economical

However, there are some limitations:

  • Rigidity: May miss nuanced or context-dependent aspects of responses
  • Limited Scope: Can only evaluate what has been explicitly defined in the rules
  • Maintenance: Require regular updates to handle new patterns or edge cases
  • Complexity: May become unwieldy when trying to capture complex evaluation criteria

Current TrustTest Heuristic Evaluators

TrustTest provides several specialized heuristic evaluators:

  1. Regex Evaluator: Uses regular expressions to validate response patterns
  2. Equals Evaluator: Checks if responses exactly match expected values
  3. BLEU Evaluator: Measures the similarity between responses using the BLEU score metric
  4. Expected Language Evaluator: Verifies if responses are in the expected language
  5. Equal Language Evaluator: Compares the language of responses to ensure consistency

While heuristic evaluators are fast and consistent, we recommend using LLM as a Judge evaluators when possible as they can better understand semantic relationships and reason about content in a more human-like way.