Interaction

An Interaction represents a single interaction between the LLM and the user. It includes the question posed to the LLM, the response generated by the LLM, and the evaluation context.

EvaluationContext

The EvaluationContext is a class that contains the context of the evaluation i.e. the required data for and Evaluator to decide if a test case is passed or failed.

TestCase

A TestCase consists of a list of Interaction. If a test case has more than one interaction, it means that the test case is a conversation between the LLM and the user.

TestSet

A TestSet is a collection of TestCase objects. This serves as the container for all the test cases needed to evaluate the LLM on various tasks. The test set can be serialized to or deserialized from a dictionary format.

Objective

An Objective defines a test goal. It includes an initial question (prompt) for the LLM and descriptions of what constitutes a true or false response. This is crucial for assessing the accuracy of the LLM’s responses.

Probe

A Probe generates a TestSet. It is the algorithm that creates the prompt for evaluating the LLM, being malicious or funcional.

Knowledge Base

A KnowledgeBase is a collection of documents that are used to generate test cases for a specific domain or task. Usually refered as the Vector Database for retrieval augmented generation (RAG).