Prerequisites
Before starting, make sure you have:- Ollama installed and running locally
- A model pulled in Ollama (e.g.,
gemma3:1b
orllama3.2
)
Model Requirements and Hardware Considerations
This example uses two different models:gemma3:1b
(1 billion parameters) as the model being evaluatedllama3.2
(4 billion parameters) as the judge model for evaluation
gemma3:1b
model requires less memory, while the llama3.2
model will be used only for evaluation purposes. Make sure to pull both models in Ollama before running the example:
Target
TheLocalLLMTarget
defines the model being evaluated. In this case, it’s the gemma3:1b
model:
Creating a Test Dataset
You can create a simple test dataset with questions and expected answers:Setting Up Evaluation
Configure your evaluation scenario with the desired evaluators. In this case, we’ll use theCorrectnessEvaluator
to evaluate the model’s correctness, and the llama3.2
model as the judge model: