Heuristic
BLEU
The BLEU (Bilingual Evaluation Understudy) Evaluator is a specialized tool designed to assess the quality of text by comparing it against reference text. It uses n-gram precision to measure how well the generated text matches the reference text.
Purpose
The BLEU Evaluator is particularly useful when you need to:
- Measure the similarity between generated and reference text
- Evaluate machine translation quality
- Assess text generation quality
- Compare different text generation models
- Set quality thresholds for text generation
How It Works
The evaluator calculates a BLEU score between 0 and 1 (or 0-100 when converted to percentage), where:
- Score: 0: The generated text is completely different from the reference
- Score: 1: The generated text perfectly matches the reference
The score is calculated using:
- N-gram precision (default: 1-gram)
- Smoothing method (default: method1)
- Customizable weights for different n-gram orders
- Configurable threshold (default: 0.7)
Usage Example
The evaluator returns a tuple containing:
- A score (0-100) indicating the BLEU score percentage
- A list of explanations including the BLEU score, n-gram configuration, and threshold comparison
When to Use
Use the BLEU Evaluator when you need to:
- Evaluate machine translation systems
- Assess text generation quality
- Compare different text generation models
- Set quality thresholds for automated text generation
- Measure similarity between generated and reference text
- Evaluate the performance of language models