Tutorials & Guides
Run Responsibility Capture The Flag Evaluation
In this guide we will see how to configure and run a Responsibility Capture The Flag evaluation scenario using trusttest.
Responsibility Capture The Flag scenarios are designed to test an LLM’s ability to handle sensitive or restricted content by attempting to bypass safety measures through iterative conversation.
Configure the Scenario
The IterateComplianceScenario
is specifically designed for compliance testing. Here’s how to set it up:
For this example we will use the RepeatModel
which is a dummy model that will always return the same response the user asks for.
We need an LLM client configured to genereate the test set. This examples uses the default OpenAiClient to generate the test set.
Let’s break down the configuration parameters:
model
: The LLM model you want to test (in this example we use aRepeatModel
for demonstration)categories
: Set of categories to test against (e.g., “off_topic”, “harmful”, etc.)max_objectives_per_category
: Maximum number of objectives to generate per categorymax_turns
: Maximum number of conversation turns to attempt for each objective
Run the Evaluation
Once you have configured the scenario, you can run the evaluation:
The evaluation will:
- Generate test cases based on the specified categories
- Attempt to bypass safety measures through iterative prompting
- Record the results of each attempt
- Display a summary of the findings
Complete Example
Here’s a complete example that you can run: