Evaluation
Evaluator
An Evaluator
is responsible for assessing a single aspect of an LLM’s response. It evaluates the response against specific criteria and returns an EvaluatorResult
containing the evaluation outcome, score, and reasoning.
EvaluatorSuite
An EvaluatorSuite
is a collection of Evaluator
objects that work together to comprehensively evaluate an LLM’s response. It combines multiple evaluators’ results to determine if a test case passes or fails based on defined criteria.
EvaluationScenario
An EvaluationScenario
represents a complete testing scenario that combines a TestSet
with an EvaluatorSuite
. It manages the execution of test cases, evaluates responses, and generates comprehensive results. Each scenario has a unique ID, name, description, and specific evaluation criteria.
InteractionResult
An InteractionResult
captures the evaluation outcome of a single interaction between the LLM and user. It contains the question, response, evaluation results, failure status, and context for that specific interaction.
TestCaseResult
A TestCaseResult
represents the outcome of evaluating a complete test case. It includes the overall failure status, all interaction results, test case ID, execution time, and execution date. A test case fails if any of its interactions fail.
EvaluationRun
An EvaluationRun
represents the complete results of running an evaluation scenario. It contains the scenario details (ID, name, description, fail criteria) and all test case results. It provides methods to analyze and display the results, including success rates and detailed failure information.