NeuralTrust | The leading security platform for generative AI

To start using TrustTest, you need to install the package in your python environment:

uv add trusttest

For this quickstart, we are going to run a basic functional test against a dummy API and save the test locally.

If you want to go straigth to the point go directly to the Complete Example section.

Step 1: Evaluation Model

Define the model that we are going to evaluate.

In trusttest we have defined a set of dummy Models to easaly test the library.

from trusttest.models.testing import DummyEndpoint

model = DummyEndpoint()
response = model.respond("Hello, how are you?")
print(response)

This dummy model just have a fix set of responses for a fix set of inputs. Else it returns “I don’t know the answer to that question.

Step 2: Probe

Define the probe that will generate the test cases.

When our model is ready, we can choose the probe that will generate the test cases to evaluate the model. In this case we are going to use DatasetProbe to generate test cases from a dataset.

from trusttest.dataset_builder import Dataset, DatasetItem
from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.models.testing import DummyEndpoint
from trusttest.probes.dataset import DatasetProbe

model = DummyEndpoint()
probe = DatasetProbe(
    model=model,
    dataset=Dataset(
        [
            [
                DatasetItem(
                    question="What is Python?",
                    context=ExpectedResponseContext(
                        expected_response="Python is a high-level, interpreted programming language."
                    ),
                )
            ],
            [
                DatasetItem(
                    question="What is the capital of France?",
                    context=ExpectedResponseContext(
                        expected_response="The capital of France is Paris."
                    ),
                )
            ],
        ]
    ),
)

test_set = probe.get_test_set()

The generated test_set has two test cases. A test case is a set of questions and model responses with other metadata for evaluation.

Step 3: Evaluation Scenario

Define the evaluation metrics and criteria.

When the our test_set read, we can define which evaluation metrics and criteria we want to use to evaluate the model.

from trusttest.dataset_builder import Dataset, DatasetItem
from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.evaluation_scenarios import EvaluationScenario
from trusttest.evaluator_suite import EvaluatorSuite
from trusttest.evaluators import (
    BleuEvaluator,
    ExpectedLanguageEvaluator,
)
from trusttest.models.testing import DummyEndpoint
from trusttest.probes.dataset import DatasetProbe

model = DummyEndpoint()
probe = DatasetProbe(...)

test_set = probe.get_test_set()


scenario = EvaluationScenario(
    name="Quickstart Functional Test",
    description="Functional test example.",
    evaluator_suite=EvaluatorSuite(
        evaluators=[
            BleuEvaluator(threshold=0.3),
            ExpectedLanguageEvaluator(expected_language="en"),
        ],
        criteria="any_fail",
    ),
)

In this Evaluation Scenario we are using the BleuEvaluator and the ExpectedLanguageEvaluator, with the criteria any_fail to evaluate the model. So if any of the evaluators fails, the scenario will fail.

Step 4: Run the Scenario

Evaluate the test set.

Now that we have defined our model and the way to evaluate it, we are ready to get the evaluation results.

# ...
results = scenario.evaluate(test_set)
results.display()
results.display_summary()

If everything is working as expected, the results should be displayed in the console.

And thats it! 🎉 You have just created your first functional test with TrustTest, go to the next section to see undersant all that is possible with TrustTest.

Complete Example

Full python code for the quickstart.

from trusttest.dataset_builder import Dataset, DatasetItem
from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.evaluation_scenarios import EvaluationScenario
from trusttest.evaluator_suite import EvaluatorSuite
from trusttest.evaluators import (
    BleuEvaluator,
    ExpectedLanguageEvaluator,
)
from trusttest.models.testing import DummyEndpoint
from trusttest.probes.dataset import DatasetProbe

model = DummyEndpoint()
probe = DatasetProbe(
    model=model,
    dataset=Dataset(
        [
            [
                DatasetItem(
                    question="What is Python?",
                    context=ExpectedResponseContext(
                        expected_response="Python is a high-level, interpreted programming language."
                    ),
                )
            ],
            [
                DatasetItem(
                    question="What is the capital of France?",
                    context=ExpectedResponseContext(
                        expected_response="The capital of France is Paris."
                    ),
                )
            ],
        ]
    ),
)

test_set = probe.get_test_set()

scenario = EvaluationScenario(
    name="Quickstart Functional Test",
    description="Functional test example.",
    evaluator_suite=EvaluatorSuite(
        evaluators=[
            BleuEvaluator(threshold=0.3),
            ExpectedLanguageEvaluator(expected_language="en"),
        ],
        criteria="any_fail",
    ),
)


results = scenario.evaluate(test_set)
results.display()
results.display_summary()

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

Self-hosted

Quickstart