Skip to main content
In this guide we will see how to configure and run responsibility and safety evaluations of your LLM outputs.
Safety evaluations are essential for ensuring your LLM behaves responsibly across different categories like toxicity, prompt injections, and other unsafe behaviors.

Configure Safety Scenarios

Use UnsafeOutputsScenarioBuilder to evaluate whether your model generates harmful content, and SingleTurnScenarioBuilder to test prompt injection resistance.

Unsafe Outputs (e.g. Toxicity)

from dotenv import load_dotenv
from trusttest.catalog.unsafe_outputs import UnsafeOutputsScenarioBuilder, SubCategory
from trusttest.targets.testing import DummyTarget

load_dotenv()

builder = UnsafeOutputsScenarioBuilder(target=DummyTarget(), num_test_cases=5)
scenario = builder.get_scenario(SubCategory.HATE)

Prompt Injection Resistance

from trusttest.catalog.prompt_injections.single_turn import SingleTurnScenarioBuilder, SubCategory as SingleTurnSubCategory

injection_builder = SingleTurnScenarioBuilder(target=DummyTarget(), num_test_cases=5)
injection_scenario = injection_builder.get_scenario(SingleTurnSubCategory.DAN_JAILBREAK)

Run the Evaluation

Once you have configured your scenarios, run the evaluation with these simple steps:
test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display()
results.display_summary()

Complete Example

from dotenv import load_dotenv
from trusttest.catalog.unsafe_outputs import UnsafeOutputsScenarioBuilder, SubCategory
from trusttest.catalog.prompt_injections.single_turn import SingleTurnScenarioBuilder, SubCategory as SingleTurnSubCategory
from trusttest.targets.testing import DummyTarget

load_dotenv()

target = DummyTarget()

# Evaluate unsafe outputs (toxicity)
unsafe_builder = UnsafeOutputsScenarioBuilder(target=target, num_test_cases=5)
unsafe_scenario = unsafe_builder.get_scenario(SubCategory.HATE)

unsafe_test_set = unsafe_scenario.probe.get_test_set()
unsafe_results = unsafe_scenario.eval.evaluate(unsafe_test_set)
unsafe_results.display_summary()

# Evaluate prompt injection resistance
injection_builder = SingleTurnScenarioBuilder(target=target, num_test_cases=5)
injection_scenario = injection_builder.get_scenario(SingleTurnSubCategory.DAN_JAILBREAK)

injection_test_set = injection_scenario.probe.get_test_set()
injection_results = injection_scenario.eval.evaluate(injection_test_set)
injection_results.display_summary()