Implementation Details
Message Processing
The plugin processes messages in the following format:Content Types
- Text Content
- Direct text analysis
- Multi-message support
- UTF-8 encoding
- Length validation
- Image Content
- URL-based processing
- Image format validation
- Size restrictions
- Accessibility checks
Moderation Categories
The plugin supports comprehensive content analysis across multiple categories:Category | Description | Implementation Details |
---|---|---|
Sexual | Sexual content detection | - Base category scoring \n- Sub-category detection \n- Context analysis |
Violence | Violence and threats | - Direct violence detection \n- Graphic content analysis \n- Threat assessment |
Hate | Hate speech and bias | - Bias detection \n- Discriminatory content \n- Hate speech patterns |
Self-harm | Self-harm content | - Intent detection \n- Instruction filtering \n- Risk assessment |
Harassment | Harassment detection | - Personal attacks \n- Threatening behavior \n- Bullying patterns |
Illicit | Illegal activity | - Criminal content \n- Prohibited activities \n- Legal compliance |
API Integration
The plugin integrates with OpenAI’s moderation API:- Request Formation
- Response Processing
🛠️ Error Handling
The plugin includes robust and comprehensive error handling mechanisms across multiple stages:🔧 Configuration Validation
- ✅ API key verification
- ✅ Action type validation
- ✅ Threshold validation
- ✅ Category validation
⚠️ Runtime Error Handling
- 🔌 API connection errors
- 🔍 Response parsing errors
- ⏱️ Timeout handling
- 🚦 Rate limit management
📄 Content Processing Errors
- ❌ Invalid content format
- 🚫 Missing required fields
- 📏 Size limit violations
- 🧬 Encoding issues
🚀 Performance Optimizations
The plugin is designed with performance in mind, optimizing both request and response handling:📥 Request Processing
- 📦 Batch message processing
- ⚡ Efficient JSON parsing
- 🧠 Minimal memory allocation
- 🔁 Request pooling
📤 Response Handling
- 📡 Streaming response processing
- 📊 Efficient score calculation
- ⛔ Early termination when thresholds are met
- 💾 Result caching for repeated evaluations
Configuration Reference
Required Settings
Advanced Options
- Custom error messages
- Category-specific actions
- Threshold adjustments
- Logging configuration
Monitoring and Metrics
The plugin provides detailed monitoring capabilities:- Request/response logging
- Category score tracking
- Error rate monitoring
- Performance metrics
Features
Feature | Capabilities |
---|---|
Multi-Category Detection | • Comprehensive content analysis across multiple categories (sexual, violence, hate, etc.) • Real-time detection with configurable sensitivity levels • Customizable thresholds per category |
Flexible Actions | • Configurable response actions • Custom error messages • Block or allow decisions |
OpenAI Integration | • Powered by OpenAI’s moderation API • Real-time content analysis • High accuracy detection |
Request Stage Processing | • Pre-request content analysis • Configurable priority in plugin chain • Non-blocking architecture |
How It Works
Content Analysis
The plugin analyzes incoming requests by examining both text and image content for various types of toxic or inappropriate material. For text content, it processes the content directly through OpenAI’s moderation API. For images, it can analyze image URLs provided in the request. The results are evaluated against configured thresholds:Threshold Evaluation
Each category has its own configurable threshold. Content is blocked if any category’s score exceeds its threshold:Action Execution
Based on the evaluation results, the plugin can take configured actions:Configuration Examples
Basic Configuration
A simple configuration that enables toxicity detection with default settings:Plugin Settings
Property | Description | Required | Default |
---|---|---|---|
name | Plugin identifier | Yes | ”toxicity_openai” |
enabled | Enable/disable plugin | Yes | true |
stage | Processing stage | Yes | ”pre_request” |
priority | Plugin execution priority | Yes | 1 |
Category Thresholds
Category | Description | Default Threshold | Impact |
---|---|---|---|
sexual | Sexual content detection | 0.3 | Lower values = stricter filtering |
violence | Violence detection | 0.5 | Higher values = more permissive |
hate | Hate speech detection | 0.4 | Balance based on needs |
Best Practices
Threshold Configuration
- Content Policy Alignment: • Set thresholds according to your content policy • Consider your audience and use case • Test thresholds with sample content
- Category Selection: • Enable relevant categories for your use case • Consider regulatory requirements • Balance between safety and usability
- Performance Considerations: • Set appropriate plugin priority • Consider API rate limits • Monitor response times
Security Considerations
- API Key Management: • Secure storage of OpenAI API key • Regular key rotation • Access control for configuration changes
- Logging and Monitoring: • Enable appropriate logging • Monitor blocked content patterns • Regular threshold adjustments