EvaluationScenarios
, EvaluatorSuites
and EvaluationContext
.
This architecture allows for flexible and comprehensive testing of language model responses through multiple evaluation criteria and formats.
ExpectedResponseContext
, while an evaluator that checks for specific keywords might only need a QuestionContext
.
This context system makes the evaluation process more robust and maintainable by clearly defining the data requirements for each type of evaluation.
ExpectedResponseContext
is a child class of QuestionContext
, so they can be used together in the same suiteExpectedResponseContext
is not a child of ObjectiveContext
, so they cannot be used togethermypy
or pylance
, you will be able to see if a context is valid to be used in a suite.