Prerequisites
Before starting, make sure you have:- Ollama installed and running locally
- A model pulled in Ollama (e.g.,
gemma3:1borllama3.2)
Model Requirements and Hardware Considerations
This example uses two different models:gemma3:1b(1 billion parameters) as the model being evaluatedllama3.2(4 billion parameters) as the judge model for evaluation
gemma3:1b model requires less memory, while the llama3.2 model will be used only for evaluation purposes. Make sure to pull both models in Ollama before running the example:
Target
TheLocalLLMTarget defines the model being evaluated. In this case, it’s the gemma3:1b model:
Creating a Test Dataset
You can create a simple test dataset with questions and expected answers:Setting Up Evaluation
Configure your evaluation scenario with the desired evaluators. In this case, we’ll use theCorrectnessEvaluator to evaluate the model’s correctness, and the llama3.2 model as the judge model: