If you want to go straigth to the point go directly to the Complete Example section.
Step 1: Evaluation Target
Define the model that we are going to evaluate.
Step 1: Evaluation Target
Define the model that we are going to evaluate.
In trusttest we have defined a set of dummy Models to easaly test the library.This dummy model just have a fix set of responses for a fix set of inputs. Else it returns โI donโt know the answer to that question.
Step 2: Probe
Define the probe that will generate the test cases.
Step 2: Probe
Define the probe that will generate the test cases.
When our model is ready, we can choose the probe that will generate the test cases to evaluate the target.
In this case we are going to use The generated
DatasetProbe
to generate test cases from a dataset.test_set
has two test cases. A test case is a set of questions and model responses with other metadata for evaluation.Step 3: Evaluation Scenario
Define the evaluation metrics and criteria.
Step 3: Evaluation Scenario
Define the evaluation metrics and criteria.
When the our In this Evaluation Scenario we are using the
test_set
read, we can define which evaluation metrics and criteria we want to use to evaluate the target.BleuEvaluator
and the ExpectedLanguageEvaluator
, with the criteria any_fail
to evaluate the target.
So if any of the evaluators fails, the scenario will fail.Step 4: Run the Scenario
Evaluate the test set.
Step 4: Run the Scenario
Evaluate the test set.
Now that we have defined our model and the way to evaluate it, we are ready to get the evaluation results.If everything is working as expected, the results should be displayed in the console.And thats it! ๐ You have just created your first functional test with TrustTest,
go to the next section to see undersant all that is possible with TrustTest.
Complete Example
Full python code for the quickstart.
Complete Example
Full python code for the quickstart.