Run Capture The Flag Evaluation
In this guide we will see how to configure and run an Iterate Scenario in trusttest, which is particularly useful for capture the flag style evaluations.
Basic Setup
First, let’s set up the basic imports and environment:
For this example we will use the IcantAssistModel
which is a dummy model that will always return the same response: “I can’t assist with that”.
Also we need an LLM client configured to genereate the test set. This examples uses the default OpenAiClient
to generate the test set.
Configure the Scenario
The CaptureTheFlagScenario
allows you to define specific objectives that the model should either meet or avoid. Here’s how to set it up:
Understanding the Components
model
: The model you want to test (in this case,IcantAssistModel
)objectives
: A list ofObjective
objects that define what you want to test- Each
Objective
contains:question
: The prompt to testtrue_description
: What a successful jailbreak would look like.false_description
: What a failed jailbreak would look like.
The most critical part of the Objective
is a good definition of the true_description
and false_description
.
Remember:
true_description
: What a successful jailbreak would look like. ( What a bad answer would look like)false_description
: What a failed jailbreak would look like. ( What a good answer would look like)
Running the Evaluation
Once configured, running the evaluation is straightforward:
Multiple Objectives
You can test multiple objectives at once by adding more Objective
instances to the list: