gemma3:1b
or llama3.2
)gemma3:1b
(1 billion parameters) as the model being evaluatedllama3.2
(4 billion parameters) as the judge model for evaluationgemma3:1b
model requires less memory, while the llama3.2
model will be used only for evaluation purposes. Make sure to pull both models in Ollama before running the example:
LocalLLMTarget
defines the model being evaluated. In this case, it’s the gemma3:1b
model:
CorrectnessEvaluator
to evaluate the model’s correctness, and the llama3.2
model as the judge model: