In this guide we will see how to configure trusttest to use any Evaluator of type LLM as a Judge.
For our experience LLM as a Judge Evaluators offer a better evaluation for evaluating LLM outputs
than other metrics. As they are able to capture more complex patterns and relationships between the input and output.
For this example we will use OpenAI gpt-4o-mini as our LLM client. so we need a token to use the OpenAI API.
and to install the openai optional dependency.
Currently we support OpenAI, AzureOpenAI, Anthropic, Google and Ollama as LLM clients.
Copy
uv add "trusttest[openai]"
Define OpenAI token in your .env file.
Copy
OPENAI_API_KEY="your_openai_token"
Once we have installed the optional dependency and we have a token, we can configure the LLM client.
To check that the LLM client is working correctly, you can run:
Copy
from dotenv import load_dotenvfrom trusttest.llm_clients import OpenAiClientload_dotenv()llm_client = OpenAiClient( model="gpt-4o-mini", temperature=0.2,)async def main(): response = await llm_client.complete( system_prompt=""" You are a helpful assistant that can answer questions about the world. Return as json with the key 'answer'. """, instructions="What is the capital of Madagascar?", ) print(response)if __name__ == "__main__": import asyncio asyncio.run(main())
So usually you won’t run the evaluator directly, but rather use it in a evaluation scenario.
So we will define a scenario that will use the evaluator to check if the LLM is correct.
LLM clients can be configured globally, so you don’t need to pass the llm_client to the evaluator or other use cases.
Copy
import trusttesttrusttest.set_config( { "evaluator": {"provider": "google", "model": "gemini-2.0-flash", "temperature": 0.2}, "question_generator": {"provider": "openai", "model": "gpt-4o-mini"}, "embeddings": {"provider": "openai", "model": "text-embedding-3-small"}, "topic_summarizer": {"provider": "google", "model": "gemini-2.0-flash"}, })# Now we can use the evaluator without passing the llm_client# the evaluator will use google gemini-2.0-flash as the llm clientevaluator = CorrectnessEvaluator()