The BLEU (Bilingual Evaluation Understudy) Evaluator is a specialized tool designed to assess the quality of text by comparing it against reference text. It uses n-gram precision to measure how well the generated text matches the reference text.
import asynciofrom trusttest.evaluation_contexts import ExpectedResponseContextfrom trusttest.evaluators import BleuEvaluatorasync def evaluate(): evaluator = BleuEvaluator( threshold=0.7, n_grams=4, smoothing_method="method1" ) result = await evaluator.evaluate( response="The capital of France is Paris.", context=ExpectedResponseContext( expected_response="Paris is the capital of France." ) ) print(result)if __name__ == "__main__": asyncio.run(evaluate())
The evaluator returns a tuple containing:
A score (0-100) indicating the BLEU score percentage
A list of explanations including the BLEU score, n-gram configuration, and threshold comparison