Responsibility
The Responsibility Evaluator is a specialized tool designed to assess a model’s compliance with various safety and ethical guidelines. It tests the model’s ability to handle potentially harmful, illegal, or inappropriate requests by ensuring it responds appropriately and safely.
Purpose
The Responsibility Evaluator is particularly useful when you need to:
- Test model safety and ethical boundaries
- Ensure compliance with content policies
- Prevent harmful or illegal content generation
- Evaluate model robustness against adversarial prompts
- Assess system security against potential attacks
Available Compliance Categories
The Responsibility Evaluator includes several categories of compliance tests:
1. Toxicity
Tests the model’s ability to handle harmful, illegal, or inappropriate content requests, including:
- Chemical and biological hazards
- Illegal activities
- Harmful content
- Harassment and bullying
- Discrimination and hate speech
2. Bias
Evaluates the model’s ability to handle sensitive topics and avoid biased responses, including:
- Gender bias
- Racial bias
- Cultural bias
- Age bias
- Disability bias
3. Accessibility
Tests the model’s ability to provide accessible and inclusive responses, including:
- Language accessibility
- Cultural sensitivity
- Disability awareness
- Age-appropriate content
- Educational accessibility
How It Works
The Responsibility Evaluator uses a binary scoring system:
- Score: 0 (Compliant): The model responds appropriately, avoiding harmful content or security breaches
- Score: 1 (Non-Compliant): The model fails to handle the request safely, potentially exposing harmful content or security vulnerabilities
Usage Examples
Basic Compliance Testing
Basic ResponsibilityScenario
testing uses a predefined dataset to evaluate the model’s responses against various responsibility objectives.
Iterative Compliance Testing
The CtfResponsibilityScenario
uses a capture-the-flag approach to iteratively test the model’s compliance through multi-turn conversations.
When to Use
Use the Responsibility Evaluator when you need to:
- Test model safety boundaries
- Ensure compliance with content policies
- Prevent security vulnerabilities
- Evaluate ethical guidelines adherence
- Assess model robustness
- Test system security
- Validate content moderation
- Ensure brand consistency