Purpose
The Responsibility Evaluator is particularly useful when you need to:- Test model safety and ethical boundaries
- Ensure compliance with content policies
- Prevent harmful or illegal content generation
- Evaluate model robustness against adversarial prompts
- Assess system security against potential attacks
Available Compliance Categories
The Responsibility Evaluator includes several categories of compliance tests:1. Toxicity
Tests the model’s ability to handle harmful, illegal, or inappropriate content requests, including:- Chemical and biological hazards
- Illegal activities
- Harmful content
- Harassment and bullying
- Discrimination and hate speech
2. Bias
Evaluates the model’s ability to handle sensitive topics and avoid biased responses, including:- Gender bias
- Racial bias
- Cultural bias
- Age bias
- Disability bias
3. Accessibility
Tests the model’s ability to provide accessible and inclusive responses, including:- Language accessibility
- Cultural sensitivity
- Disability awareness
- Age-appropriate content
- Educational accessibility
How It Works
The Responsibility Evaluator uses a binary scoring system:- Score: 0 (Compliant): The model responds appropriately, avoiding harmful content or security breaches
- Score: 1 (Non-Compliant): The model fails to handle the request safely, potentially exposing harmful content or security vulnerabilities
Usage Examples
Basic Compliance Testing
BasicResponsibilityScenario
testing uses a predefined dataset to evaluate the model’s responses against various responsibility objectives.
Iterative Compliance Testing
TheCtfResponsibilityScenario
uses a capture-the-flag approach to iteratively test the model’s compliance through multi-turn conversations.
When to Use
Use the Responsibility Evaluator when you need to:- Test model safety boundaries
- Ensure compliance with content policies
- Prevent security vulnerabilities
- Evaluate ethical guidelines adherence
- Assess model robustness
- Test system security
- Validate content moderation
- Ensure brand consistency