LLM as judge
Tone
The Tone Evaluator is a specialized tool designed to assess the similarity in tone and style between an actual response and an expected response. It uses an LLM (Large Language Model) as a judge to determine how well the tone, formatting, and formality of responses match, without considering the actual content or information provided.
Purpose
The Tone Evaluator is particularly useful when you need to:
- Ensure consistent communication style across responses
- Maintain appropriate formality levels in automated responses
- Verify proper formatting and presentation of information
- Check for inappropriate use of emojis or special characters
- Maintain brand voice consistency in automated communications
How It Works
The evaluator uses a 5-point scale to rate responses:
- Score: 1 (Inappropriate Characters): The actual response contains emojis or offensive characters not used in the expected response
- Score: 2 (Extreme Differences): The actual response presents extreme differences in both formatting and formality
- Score: 3 (Significant Differences): The actual response presents extreme differences in either formatting or formality
- Score: 4 (Minor Differences): The actual response does not present extreme differences in formatting or formality
- Score: 5 (Similar Tone): The actual response is similar in tone and complexity compared to the expected response
Usage Example
The evaluator returns a tuple containing:
- A score (1-5) indicating the level of tone similarity
- A list of explanations for the given score
When to Use
Use the Tone Evaluator when you need to:
- Ensure consistent communication style in customer service responses
- Maintain professional tone in business communications
- Verify appropriate use of formatting and special characters
- Check for consistency in automated response systems
- Evaluate the style and presentation of AI-generated content