Category | Description | Implementation Details |
---|---|---|
Sexual | Sexual content detection | - Base category scoring \n- Sub-category detection \n- Context analysis |
Violence | Violence and threats | - Direct violence detection \n- Graphic content analysis \n- Threat assessment |
Hate | Hate speech and bias | - Bias detection \n- Discriminatory content \n- Hate speech patterns |
Self-harm | Self-harm content | - Intent detection \n- Instruction filtering \n- Risk assessment |
Harassment | Harassment detection | - Personal attacks \n- Threatening behavior \n- Bullying patterns |
Illicit | Illegal activity | - Criminal content \n- Prohibited activities \n- Legal compliance |
Feature | Capabilities |
---|---|
Multi-Category Detection | • Comprehensive content analysis across multiple categories (sexual, violence, hate, etc.) • Real-time detection with configurable sensitivity levels • Customizable thresholds per category |
Flexible Actions | • Configurable response actions • Custom error messages • Block or allow decisions |
OpenAI Integration | • Powered by OpenAI’s moderation API • Real-time content analysis • High accuracy detection |
Request Stage Processing | • Pre-request content analysis • Configurable priority in plugin chain • Non-blocking architecture |
Property | Description | Required | Default |
---|---|---|---|
name | Plugin identifier | Yes | ”toxicity_detection” |
enabled | Enable/disable plugin | Yes | true |
stage | Processing stage | Yes | ”pre_request” |
priority | Plugin execution priority | Yes | 1 |
Category | Description | Default Threshold | Impact |
---|---|---|---|
sexual | Sexual content detection | 0.3 | Lower values = stricter filtering |
violence | Violence detection | 0.5 | Higher values = more permissive |
hate | Hate speech detection | 0.4 | Balance based on needs |