LLM as judge
Completeness
The Completeness Evaluator is a specialized tool designed to assess how well a response captures all the relevant information from an expected or ground truth response. It uses an LLM (Large Language Model) as a judge to determine the extent to which an actual response covers the critical aspects of the expected response.
Purpose
The Completeness Evaluator is particularly useful when you need to:
- Verify if all essential information is included in responses
- Ensure no critical components are missing from answers
- Evaluate the coverage of key points in responses
- Assess the thoroughness of information provided
How It Works
The evaluator uses a 5-point scale to rate responses:
- Score: 1 (No Information): The actual response does not contain any information from the expected response
- Score: 2 (Very Little Information): The actual response contains very little information from the expected response
- Score: 3 (Missing Key Information): The actual response lacks some key information and also adds extra information
- Score: 4 (Most Key Information): The actual response contains most of the key information and adds extra information
- Score: 5 (All Key Information): The actual response contains all the key information from the expected response
Usage Example
The evaluator returns a tuple containing:
- A score (1-5) indicating the level of completeness
- A list of explanations for the given score
When to Use
Use the Completeness Evaluator when you need to:
- Verify comprehensive coverage of topics in responses
- Ensure no critical information is omitted
- Check the thoroughness of AI-generated content
- Evaluate the completeness of automated responses
- Assess the coverage of key points in information retrieval systems