NeuralTrust | The leading security platform for generative AI

In this guide we will see how to run the built-in red teaming catalog using run_red_teaming(), following the same example as docs/testing_guide/basic_red_teaming.py from the TrustTest repository.

This example uses IcantAssistTarget, a dummy target that always refuses unsafe requests, so you can run the workflow end to end before switching to your own app. To test a real endpoint, replace it with an HttpTarget as shown in the Http Target tutorial.

Configure the Environment

Add your NeuralTrust token and target ID to your .env file:

TARGET_TOKEN="your_neuraltrust_target_token"
TARGET_ID="your_neuraltrust_target_id"

Then import the red teaming helpers and load your environment variables:

import os
from typing import List

from dotenv import load_dotenv

import trusttest
from trusttest.catalog.red_team import run_red_teaming
from trusttest.language_detection.types import LanguageType
from trusttest.targets.testing import IcantAssistTarget

load_dotenv(override=True)

Create the Target and Client

run_red_teaming() needs a target to attack and, optionally, a client to save generated scenarios, test sets, and evaluation runs.

target = IcantAssistTarget()
client = trusttest.client(
    type="neuraltrust",
    token=os.getenv("TARGET_TOKEN"),
    target_id=os.getenv("TARGET_ID"),
)

If you omit the client, the catalog still runs locally, but nothing is uploaded to NeuralTrust.

Run the Catalog

The basic example runs the catalog in English and generates 50 test cases per scenario:

languages: List[LanguageType] = ["English"]
for language in languages:
    run_red_teaming(
        target,
        language=language,
        client=client,
        num_test_cases=50,
        evaluate=False,
    )

With evaluate=False, TrustTest builds the red-team scenarios and saves their test sets, but it does not execute the evaluator suites yet.

Tune the Run

You can adjust the basic script depending on what you need:

Change languages to generate scenarios in multiple languages.
Increase or decrease num_test_cases to control how many attacks are generated per scenario.
Set evaluate=True to immediately run the evaluator suites and persist the evaluation results.
Pass multi_turn_enabled=True to include multi-turn prompt injection scenarios.
Use category={...} to restrict the run to specific parts of the catalog, such as {"unsafe_outputs", "system_prompt_disclosure"}.

Run the Script

If you are using the example file from the TrustTest repository, run it from the repository root:

uv run python docs/testing_guide/basic_red_teaming.py

Complete Example

import os
from typing import List

from dotenv import load_dotenv

import trusttest
from trusttest.catalog.red_team import run_red_teaming
from trusttest.language_detection.types import LanguageType
from trusttest.targets.testing import IcantAssistTarget

load_dotenv(override=True)

target = IcantAssistTarget()
client = trusttest.client(
    type="neuraltrust",
    token=os.getenv("TARGET_TOKEN"),
    target_id=os.getenv("TARGET_ID"),
)

languages: List[LanguageType] = ["English"]
for language in languages:
    run_red_teaming(
        target,
        language=language,
        client=client,
        num_test_cases=50,
        evaluate=False,
    )

​Configure the Environment

​Create the Target and Client

​Run the Catalog

​Tune the Run

​Run the Script

​Complete Example

Configure the Environment

Create the Target and Client

Run the Catalog

Tune the Run

Run the Script

Complete Example