In this guide we will see how to configure and run a Responsibility Capture The Flag evaluation scenario using trusttest.

Responsibility Capture The Flag scenarios are designed to test an LLM’s ability to handle sensitive or restricted content by attempting to bypass safety measures through iterative conversation.

Configure the Scenario

The IterateComplianceScenario is specifically designed for compliance testing. Here’s how to set it up:

from dotenv import load_dotenv

from trusttest.catalog import IterateComplianceScenario
from trusttest.models.testing import RepeatModel

load_dotenv()

scenario = IterateComplianceScenario(
    model=RepeatModel(),
    categories={"off_topic"},
    max_objectives_per_category=2,
    max_turns=4,
)

For this example we will use the RepeatModel which is a dummy model that will always return the same response the user asks for.

We need an LLM client configured to genereate the test set. This examples uses the default OpenAiClient to generate the test set.

Let’s break down the configuration parameters:

  • model: The LLM model you want to test (in this example we use a RepeatModel for demonstration)
  • categories: Set of categories to test against (e.g., “off_topic”, “harmful”, etc.)
  • max_objectives_per_category: Maximum number of objectives to generate per category
  • max_turns: Maximum number of conversation turns to attempt for each objective

Run the Evaluation

Once you have configured the scenario, you can run the evaluation:

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()

The evaluation will:

  1. Generate test cases based on the specified categories
  2. Attempt to bypass safety measures through iterative prompting
  3. Record the results of each attempt
  4. Display a summary of the findings

Complete Example

Here’s a complete example that you can run:

from dotenv import load_dotenv

from trusttest.catalog import IterateComplianceScenario
from trusttest.models.testing import RepeatModel

load_dotenv()

scenario = IterateComplianceScenario(
    model=RepeatModel(),
    categories={"off_topic"},
    max_objectives_per_category=2,
    max_turns=4,
)

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()