Test Generation
Interaction
An Interaction
represents a single interaction between the LLM and the user. It includes the question posed to the LLM, the response generated by the LLM, and the evaluation context.
EvaluationContext
The EvaluationContext
is a class that contains the context of the evaluation i.e. the required data for and Evaluator to decide if a test case is passed or failed.
TestCase
A TestCase
consists of a list of Interaction
. If a test case has more than one interaction, it means that the test case is a conversation between the LLM and the user.
TestSet
A TestSet
is a collection of TestCase
objects. This serves as the container for all the test cases needed to evaluate the LLM on various tasks. The test set can be serialized to or deserialized from a dictionary format.
Objective
An Objective
defines a test goal. It includes an initial question (prompt) for the LLM and descriptions of what constitutes a true or false response. This is crucial for assessing the accuracy of the LLM’s responses.
Probe
A Probe
generates a TestSet
. It is the algorithm that creates the prompt for evaluating the LLM, being malicious or funcional.
Knowledge Base
A KnowledgeBase
is a collection of documents that are used to generate test cases for a specific domain or task. Usually refered as the Vector Database for retrieval augmented generation (RAG).