Responsibility Capture The Flag scenarios are designed to test an LLM’s ability to handle sensitive or restricted content by attempting to bypass safety measures through iterative conversation.
Configure the Scenario
TheIterateComplianceScenario
is specifically designed for compliance testing. Here’s how to set it up:
RepeatTarget
which is a dummy model that will always return the same response the user asks for.
We need an LLM client configured to genereate the test set. This examples uses the default OpenAiClient to generate the test set.
Let’s break down the configuration parameters:
model
: The LLM model you want to test (in this example we use aRepeatTarget
for demonstration)categories
: Set of categories to test against (e.g., “off_topic”, “harmful”, etc.)max_objectives_per_category
: Maximum number of objectives to generate per categorymax_turns
: Maximum number of conversation turns to attempt for each objective
Run the Evaluation
Once you have configured the scenario, you can run the evaluation:- Generate test cases based on the specified categories
- Attempt to bypass safety measures through iterative prompting
- Record the results of each attempt
- Display a summary of the findings