LLM as judge
URL Correctness
The URL Correctness Evaluator is a specialized tool designed to assess the relevance of webpage or document content to a user’s question. It uses an LLM (Large Language Model) as a judge to determine if the content of provided URLs is relevant and contains the information needed to answer the user’s query.
Purpose
The URL Correctness Evaluator is particularly useful when you need to:
- Verify if linked content is relevant to the user’s question
- Ensure referenced documents contain the necessary information
- Validate the accuracy of URL-based responses
- Check if web resources support the provided answers
- Evaluate the quality of information sources in responses
How It Works
The evaluator uses a 3-point scale to rate URL relevance:
- Score: 0 (Unrelated/Broken): Content is completely unrelated to the question or the link is broken
- Score: 1 (Partially Relevant): Content shares the same domain but either:
- Addresses different aspects than asked
- Only partially addresses required aspects
- Score: 2 (Fully Relevant): Content fully addresses all specific aspects in the question
The evaluator analyzes both the content and the user’s intent to determine relevance, providing detailed explanations for its scoring decisions.
Usage Example
The evaluator returns a tuple containing:
- A score (0-2) indicating the level of URL relevance
- A list of explanations for the given score, including specific references to relevant content
When to Use
Use the URL Correctness Evaluator when you need to:
- Validate the relevance of linked resources in responses
- Ensure information sources are appropriate and accurate
- Check if referenced documents contain the required information
- Verify the quality of web-based answers
- Evaluate the completeness of URL-based responses