The Responsibility Evaluator is a specialized tool designed to assess a model’s compliance with various safety and ethical guidelines. It tests the model’s ability to handle potentially harmful, illegal, or inappropriate requests by ensuring it responds appropriately and safely.

Purpose

The Responsibility Evaluator is particularly useful when you need to:

  • Test model safety and ethical boundaries
  • Ensure compliance with content policies
  • Prevent harmful or illegal content generation
  • Evaluate model robustness against adversarial prompts
  • Assess system security against potential attacks

Available Compliance Categories

The Responsibility Evaluator includes several categories of compliance tests:

1. Toxicity

Tests the model’s ability to handle harmful, illegal, or inappropriate content requests, including:

  • Chemical and biological hazards
  • Illegal activities
  • Harmful content
  • Harassment and bullying
  • Discrimination and hate speech

2. Bias

Evaluates the model’s ability to handle sensitive topics and avoid biased responses, including:

  • Gender bias
  • Racial bias
  • Cultural bias
  • Age bias
  • Disability bias

3. Accessibility

Tests the model’s ability to provide accessible and inclusive responses, including:

  • Language accessibility
  • Cultural sensitivity
  • Disability awareness
  • Age-appropriate content
  • Educational accessibility

How It Works

The Responsibility Evaluator uses a binary scoring system:

  • Score: 0 (Compliant): The model responds appropriately, avoiding harmful content or security breaches
  • Score: 1 (Non-Compliant): The model fails to handle the request safely, potentially exposing harmful content or security vulnerabilities

Usage Examples

Basic Compliance Testing

Basic ResponsibilityScenario testing uses a predefined dataset to evaluate the model’s responses against various responsibility objectives.

from trusttest.catalog import ResponsibilityScenario
from trusttest.models.http import HttpModel, PayloadConfig
from trusttest.probes.responsibility import Category

model = HttpModel(
    url="https://chat.neuraltrust.ai/api/chat",
    headers={
        "Content-Type": "application/json"
    },
    payload_config=PayloadConfig(
        format={
            "messages": [
                {"role": "system", "content": "**Welcome to Airline Assistant**."},
                {"role": "user", "content": "{{ test }}"},
            ]
        },
        message_regex="{{ test }}",
    ),
    concatenate_field=".",
)

scenario = ResponsibilityScenario(
    name="Responsibility: Bias",
    description="Evaluate the bias of the model.",
    model=model,
    categories={Category.BIAS},
    max_objectives_per_category=10,
    use_jailbreaks=False,
    sampling="random",
)

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()

Iterative Compliance Testing

The CtfResponsibilityScenario uses a capture-the-flag approach to iteratively test the model’s compliance through multi-turn conversations.

from trusttest.catalog import CtfResponsibilityScenario
from trusttest.models.testing import RepeatModel
from trusttest.probes.responsibility import Category

scenario = CtfResponsibilityScenario(
    model=RepeatModel(),
    categories={Category.BIAS},
    max_objectives_per_category=2,
    max_turns=4,
)

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()

When to Use

Use the Responsibility Evaluator when you need to:

  • Test model safety boundaries
  • Ensure compliance with content policies
  • Prevent security vulnerabilities
  • Evaluate ethical guidelines adherence
  • Assess model robustness
  • Test system security
  • Validate content moderation
  • Ensure brand consistency