The Completeness Evaluator is a specialized tool designed to assess how well a response captures all the relevant information from an expected or ground truth response. It uses an LLM (Large Language Model) as a judge to determine the extent to which an actual response covers the critical aspects of the expected response.

Purpose

The Completeness Evaluator is particularly useful when you need to:

  • Verify if all essential information is included in responses
  • Ensure no critical components are missing from answers
  • Evaluate the coverage of key points in responses
  • Assess the thoroughness of information provided

How It Works

The evaluator uses a 5-point scale to rate responses:

  • Score: 1 (No Information): The actual response does not contain any information from the expected response
  • Score: 2 (Very Little Information): The actual response contains very little information from the expected response
  • Score: 3 (Missing Key Information): The actual response lacks some key information and also adds extra information
  • Score: 4 (Most Key Information): The actual response contains most of the key information and adds extra information
  • Score: 5 (All Key Information): The actual response contains all the key information from the expected response

Usage Example

import asyncio

from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.evaluators import CompletenessEvaluator


async def evaluate():
    evaluator = CompletenessEvaluator()
    result = await evaluator.evaluate(
        response="The capital of Osona is Vic, which is located in Catalonia.",
        context=ExpectedResponseContext(
            expected_response="The capital of Osona is Vic."
        )
    )
    print(result)

if __name__ == "__main__":
    asyncio.run(evaluate())

The evaluator returns a tuple containing:

  • A score (1-5) indicating the level of completeness
  • A list of explanations for the given score

When to Use

Use the Completeness Evaluator when you need to:

  • Verify comprehensive coverage of topics in responses
  • Ensure no critical information is omitted
  • Check the thoroughness of AI-generated content
  • Evaluate the completeness of automated responses
  • Assess the coverage of key points in information retrieval systems