NeuralTrust | The leading security platform for generative AI

The Correctness Evaluator is a specialized tool designed to assess the accuracy of responses by comparing them against expected or ground truth responses. It uses an LLM (Large Language Model) as a judge to determine how well an actual response matches the expected response.

Purpose

The Correctness Evaluator is particularly useful when you need to:

Verify the factual accuracy of responses
Ensure responses align with expected answers
Detect contradictions or misinformation
Evaluate the semantic similarity between responses

How It Works

The evaluator uses a 5-point scale to rate responses:

Score: 1 (Direct Contradiction): The actual response directly contradicts the expected response
Score: 2 (Partial Contradiction): Contains some similar facts but also has direct contradictions
Score: 3 (Similar but Not Equivalent): Not contradictory but not equivalent
Score: 4 (Partial Equivalence): Some information is equivalent but not all
Score: 5 (Fully Equivalent): Both answers are equivalent

Usage Example

import asyncio

from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.evaluators import CorrectnessEvaluator


async def evaluate():
    evaluator = CorrectnessEvaluator()
    result = await evaluator.evaluate(
        response="What is the capital of Osona?",
        context=ExpectedResponseContext(
            expected_response="The capital of Osona is Vic."
        )
    )
    print(result)

if __name__ == "__main__":
    asyncio.run(evaluate())

The evaluator returns a tuple containing:

A score (1-5) indicating the level of correctness
A list of explanations for the given score

When to Use

Use the Correctness Evaluator when you need to:

Validate factual accuracy in QA systems
Check response quality in chatbots
Ensure consistency in information retrieval systems
Evaluate the reliability of AI-generated content
Test the accuracy of automated responses

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

Correctness

Purpose

How It Works

Usage Example

When to Use

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

​Purpose

​How It Works

​Usage Example

​When to Use

Purpose

How It Works

Usage Example

When to Use