This library focuses on testing evaluating large language models (LLMs) by providing a structured approach to defining, generating, and evaluating test cases across various domains and tasks. Whether you’re testing compliance with ethical standards, evaluating task performance, or ensuring robustness in different scenarios, the library supports a wide range of use cases for LLM testing.

Key Entities

The documentation is divided into several key sections, each focusing on different aspects of the evaluation process. Below is a high-level of the main entities of the library:

  • Test cases are the building blocks of model evaluation. Each test case includes a set of interactions between the model and the user. And interaction consists in the user prompt, the model response and the evaluation context.
  • Probes genereates a set of test cases.
  • Evaluators evaluates if a test case is passed or failed.
  • EvaluatorScenarios define a group of evaluators and its failure criteria.
  • Knowledge base collection of documents that are used to gather context to generate test cases.
  • Model target LLM model to evaluate.
  • LLM clients providers of the LLM model.
  • Embeddings models providers of the embeddings model.

This document will guide you through the architecture of the Neural Trust library, explaining the various components and how they work together to provide a comprehensive testing and evaluation system for large language models. Whether you’re interested in compliance testing, performance evaluation, or generating dynamic test cases, this library provides the tools you need for effective LLM assessment.