NeuralTrust | The leading security platform for generative AI

Evaluator

An Evaluator is responsible for assessing a single aspect of an LLM’s response. It evaluates the response against specific criteria and returns an EvaluatorResult containing the evaluation outcome, score, and reasoning.

EvaluatorSuite

An EvaluatorSuite is a collection of Evaluator objects that work together to comprehensively evaluate an LLM’s response. It combines multiple evaluators’ results to determine if a test case passes or fails based on defined criteria.

EvaluationScenario

An EvaluationScenario represents a complete testing scenario that combines a TestSet with an EvaluatorSuite. It manages the execution of test cases, evaluates responses, and generates comprehensive results. Each scenario has a unique ID, name, description, and specific evaluation criteria.

InteractionResult

An InteractionResult captures the evaluation outcome of a single interaction between the LLM and user. It contains the question, response, evaluation results, failure status, and context for that specific interaction.

TestCaseResult

A TestCaseResult represents the outcome of evaluating a complete test case. It includes the overall failure status, all interaction results, test case ID, execution time, and execution date. A test case fails if any of its interactions fail.

EvaluationRun

An EvaluationRun represents the complete results of running an evaluation scenario. It contains the scenario details (ID, name, description, fail criteria) and all test case results. It provides methods to analyze and display the results, including success rates and detailed failure information.

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

Evaluation

Evaluator

EvaluatorSuite

EvaluationScenario

InteractionResult

TestCaseResult

EvaluationRun

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

​Evaluator

​EvaluatorSuite

​EvaluationScenario

​InteractionResult

​TestCaseResult

​EvaluationRun

Evaluator

EvaluatorSuite

EvaluationScenario

InteractionResult

TestCaseResult

EvaluationRun