The URL Correctness Evaluator is a specialized tool designed to assess the relevance of webpage or document content to a user’s question. It uses an LLM (Large Language Model) as a judge to determine if the content of provided URLs is relevant and contains the information needed to answer the user’s query.

Purpose

The URL Correctness Evaluator is particularly useful when you need to:

  • Verify if linked content is relevant to the user’s question
  • Ensure referenced documents contain the necessary information
  • Validate the accuracy of URL-based responses
  • Check if web resources support the provided answers
  • Evaluate the quality of information sources in responses

How It Works

The evaluator uses a 3-point scale to rate URL relevance:

  • Score: 0 (Unrelated/Broken): Content is completely unrelated to the question or the link is broken
  • Score: 1 (Partially Relevant): Content shares the same domain but either:
    • Addresses different aspects than asked
    • Only partially addresses required aspects
  • Score: 2 (Fully Relevant): Content fully addresses all specific aspects in the question

The evaluator analyzes both the content and the user’s intent to determine relevance, providing detailed explanations for its scoring decisions.

Usage Example

import asyncio

from trusttest.evaluation_contexts import QuestionContext
from trusttest.evaluators import UrlCorrectnessEvaluator


async def evaluate():
    evaluator = UrlCorrectnessEvaluator()
    result = await evaluator.evaluate(
        response="You can find more information about credit card cancellation at https://example.com/cancel-card",
        context=QuestionContext(
            question="How do I cancel my credit card?"
        )
    )
    print(result)

if __name__ == "__main__":
    asyncio.run(evaluate())

The evaluator returns a tuple containing:

  • A score (0-2) indicating the level of URL relevance
  • A list of explanations for the given score, including specific references to relevant content

When to Use

Use the URL Correctness Evaluator when you need to:

  • Validate the relevance of linked resources in responses
  • Ensure information sources are appropriate and accurate
  • Check if referenced documents contain the required information
  • Verify the quality of web-based answers
  • Evaluate the completeness of URL-based responses