To start using TrustTest, you need to install the package in your python environment:
uv add trusttest
For this quickstart, we are going to run a basic functional test against a dummy API and save the test locally.
If you want to go straigth to the point go directly to the Complete Example section.
In trusttest we have defined a set of dummy Models to easaly test the library.
from trusttest.targets.testing import DummyTarget

target = DummyTarget()
response = target.respond("Hello, how are you?")
print(response)
This dummy model just have a fix set of responses for a fix set of inputs. Else it returns โ€œI donโ€™t know the answer to that question.
When our model is ready, we can choose the probe that will generate the test cases to evaluate the target. In this case we are going to use DatasetProbe to generate test cases from a dataset.
from trusttest.dataset_builder import Dataset, DatasetItem
from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.targets.testing import DummyTarget
from trusttest.probes.dataset import DatasetProbe

target = DummyTarget()
probe = DatasetProbe(
    target=target,
    dataset=Dataset(
        [
            [
                DatasetItem(
                    question="What is Python?",
                    context=ExpectedResponseContext(
                        expected_response="Python is a high-level, interpreted programming language."
                    ),
                )
            ],
            [
                DatasetItem(
                    question="What is the capital of France?",
                    context=ExpectedResponseContext(
                        expected_response="The capital of France is Paris."
                    ),
                )
            ],
        ]
    ),
)

test_set = probe.get_test_set()
The generated test_set has two test cases. A test case is a set of questions and model responses with other metadata for evaluation.
When the our test_set read, we can define which evaluation metrics and criteria we want to use to evaluate the target.
from trusttest.dataset_builder import Dataset, DatasetItem
from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.evaluation_scenarios import EvaluationScenario
from trusttest.evaluator_suite import EvaluatorSuite
from trusttest.evaluators import (
    BleuEvaluator,
    ExpectedLanguageEvaluator,
)
from trusttest.targets.testing import DummyTarget
from trusttest.probes.dataset import DatasetProbe

target = DummyTarget()
probe = DatasetProbe(...)

test_set = probe.get_test_set()


scenario = EvaluationScenario(
    name="Quickstart Functional Test",
    description="Functional test example.",
    evaluator_suite=EvaluatorSuite(
        evaluators=[
            BleuEvaluator(threshold=0.3),
            ExpectedLanguageEvaluator(expected_language="en"),
        ],
        criteria="any_fail",
    ),
)

In this Evaluation Scenario we are using the BleuEvaluator and the ExpectedLanguageEvaluator, with the criteria any_fail to evaluate the target. So if any of the evaluators fails, the scenario will fail.
Now that we have defined our model and the way to evaluate it, we are ready to get the evaluation results.
# ...
results = scenario.evaluate(test_set)
results.display()
results.display_summary()
If everything is working as expected, the results should be displayed in the console.And thats it! ๐ŸŽ‰ You have just created your first functional test with TrustTest, go to the next section to see undersant all that is possible with TrustTest.
from trusttest.dataset_builder import Dataset, DatasetItem
from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.evaluation_scenarios import EvaluationScenario
from trusttest.evaluator_suite import EvaluatorSuite
from trusttest.evaluators import (
    BleuEvaluator,
    ExpectedLanguageEvaluator,
)
from trusttest.targets.testing import DummyTarget
from trusttest.probes.dataset import DatasetProbe

target = DummyTarget()
probe = DatasetProbe(
    target=target,
    dataset=Dataset(
        [
            [
                DatasetItem(
                    question="What is Python?",
                    context=ExpectedResponseContext(
                        expected_response="Python is a high-level, interpreted programming language."
                    ),
                )
            ],
            [
                DatasetItem(
                    question="What is the capital of France?",
                    context=ExpectedResponseContext(
                        expected_response="The capital of France is Paris."
                    ),
                )
            ],
        ]
    ),
)

test_set = probe.get_test_set()

scenario = EvaluationScenario(
    name="Quickstart Functional Test",
    description="Functional test example.",
    evaluator_suite=EvaluatorSuite(
        evaluators=[
            BleuEvaluator(threshold=0.3),
            ExpectedLanguageEvaluator(expected_language="en"),
        ],
        criteria="any_fail",
    ),
)


results = scenario.evaluate(test_set)
results.display()
results.display_summary()