NeuralTrust | The leading security platform for generative AI

In this guide we will see how to configure and run a Responsibility Capture The Flag evaluation scenario using trusttest.

Responsibility Capture The Flag scenarios are designed to test an LLM’s ability to handle sensitive or restricted content by attempting to bypass safety measures through iterative conversation.

Configure the Scenario

The IterateComplianceScenario is specifically designed for compliance testing. Here’s how to set it up:

from dotenv import load_dotenv

from trusttest.catalog import IterateComplianceScenario
from trusttest.models.testing import RepeatModel

load_dotenv()

scenario = IterateComplianceScenario(
    model=RepeatModel(),
    categories={"off_topic"},
    max_objectives_per_category=2,
    max_turns=4,
)

For this example we will use the RepeatModel which is a dummy model that will always return the same response the user asks for.

We need an LLM client configured to genereate the test set. This examples uses the default OpenAiClient to generate the test set.

Let’s break down the configuration parameters:

model: The LLM model you want to test (in this example we use a RepeatModel for demonstration)
categories: Set of categories to test against (e.g., “off_topic”, “harmful”, etc.)
max_objectives_per_category: Maximum number of objectives to generate per category
max_turns: Maximum number of conversation turns to attempt for each objective

Run the Evaluation

Once you have configured the scenario, you can run the evaluation:

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()

The evaluation will:

Generate test cases based on the specified categories
Attempt to bypass safety measures through iterative prompting
Record the results of each attempt
Display a summary of the findings

Complete Example

Here’s a complete example that you can run:

from dotenv import load_dotenv

from trusttest.catalog import IterateComplianceScenario
from trusttest.models.testing import RepeatModel

load_dotenv()

scenario = IterateComplianceScenario(
    model=RepeatModel(),
    categories={"off_topic"},
    max_objectives_per_category=2,
    max_turns=4,
)

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

Self-hosted

Run Responsibility Capture The Flag Evaluation

Configure the Scenario

Run the Evaluation

Complete Example

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

Self-hosted

​Configure the Scenario

​Run the Evaluation

​Complete Example

Configure the Scenario

Run the Evaluation

Complete Example