Skip to main content

Azure Toxicity Detection

Overview

The Azure Toxicity Detection plugin is an advanced content moderation layer that leverages Azure's Content Safety API Azure Content Safety to analyze and filter potentially harmful content in both text and images. This plugin provides comprehensive content analysis across multiple categories and supports both text-based and image-based content moderation.

The plugin features a sophisticated multi-category detection system that can identify various types of inappropriate content, with configurable severity levels for each category:

CategoryDescription
HateContent expressing hatred or discrimination
ViolenceContent depicting or promoting violence
SelfHarmContent related to self-harm behaviors
SexualSexually explicit or inappropriate content

Each category can be individually configured with specific severity thresholds, allowing for fine-grained control over content moderation policies. The plugin supports two output types for severity levels:

Severity Level DescriptionFourSeverityLevelsEightSeverityLevels
Safety/Very Low Risk Content0 0
1
Low Risk Content2 2
3
Medium Risk Content4 4
5
High Risk Content6 6
7

Features

Content Analysis Capabilities

The Azure Toxicity Detection plugin offers comprehensive content analysis features:

Multi-Modal Analysis: Supports both text and image content analysis, enabling comprehensive content screening across different media types. The plugin processes text inputs for harmful language and analyzes images for inappropriate visual content, providing a unified moderation solution for mixed-media applications

Configurable Categories: Flexible category selection and threshold configuration, allowing for tailored content moderation policies

Severity Level Control: Two output types for different granularity needs, providing flexibility in how content is evaluated and acted upon

Custom Actions: Configurable response actions and error messages, allowing for tailored responses to detected content violations

Content Path Specification: Flexible content extraction from various request formats, enabling precise content extraction from different request types

Performance and Integration

The Azure Toxicity Detection plugin is engineered for seamless integration and highly efficient operation in production environments. It provides real-time content evaluation capabilities through its immediate analysis system, allowing for instant moderation decisions. The plugin implements configurable endpoints that enable separate processing paths for both text and image content, ensuring optimal handling of different content types. This architecture supports rapid content screening while maintaining high performance standards.

The plugin delivers comprehensive analysis results through detailed response data that includes precise severity scores across all configured categories. These detailed insights enable informed decision-making based on content risk levels. Additionally, the system incorporates robust error handling and logging mechanisms that ensure reliable operation even under challenging conditions, maintaining system stability while providing clear visibility into the moderation process. The combination of detailed analytics and dependable error management makes the plugin a reliable choice for content moderation needs.

Configuration

Basic Configuration

Here's a basic configuration example that enables both text and image content moderation:

{
"name": "toxicity_azure",
"enabled": true,
"stage": "pre_request",
"priority": 1,
"settings": {
"api_key": "${AZURE_API_KEY}",
"endpoints": {
"text": "https://your-region.api.cognitive.microsoft.com/contentsafety/text/analyze",
"image": "https://your-region.api.cognitive.microsoft.com/contentsafety/image/analyze"
},
"output_type": "FourSeverityLevels",
"content_types": [
{
"type": "text",
"path": "text"
},
{
"type": "image",
"path": "image_data"
}
],
"actions": {
"type": "block",
"message": "Content violates safety guidelines"
},
"category_severity": {
"Hate": 2,
"Violence": 2,
"SelfHarm": 2,
"Sexual": 2
}
}
}

This basic configuration establishes a comprehensive content moderation setup that processes both text and image content through Azure's Content Safety API. It operates at the pre-request stage with priority 1, ensuring content is analyzed before reaching your application. The configuration uses the FourSeverityLevels system with a conservative threshold of 2 across all categories, meaning it will flag content that presents even low-risk concerns. The content_types setting enables the plugin to extract content from specific JSON paths in the request body, with 'text' being extracted from the 'text' field and image content from the 'image_data' field. When violations are detected, the plugin blocks the request and returns a clear violation message, providing immediate feedback to users about content policy violations.

Configuration Parameters

Essential Settings

API_KEY: Your Azure Content Safety API key

ENDPOINTS: Configuration for text and image analysis endpoints

TEXT: Azure endpoint for text content analysis

IMAGE: Azure endpoint for image content analysis

OUTPUT_TYPE: Severity level format ("FourSeverityLevels" or "EightSeverityLevels")

Content Type Configuration

CONTENT_TYPES: Array of content type configurations

         • TYPE: Content type ("text" or "image")

         • PATH: JSON path to extract content from request

Category Settings

CATEGORY_SEVERITY: Threshold configuration for each category

         • Values for FourSeverityLevels: 0, 2, 4, or 6

         • Values for EightSeverityLevels: 0 to 7

Action Configuration

ACTIONS: Response configuration for detected violations

         • TYPE: Action type (e.g., "block")

         • MESSAGE: Custom error message

Advanced Configuration

Here's an example of a more detailed configuration with custom severity levels:

{
"name": "toxicity_azure",
"enabled": true,
"stage": "pre_request",
"priority": 1,
"settings": {
"api_key": "${AZURE_API_KEY}",
"endpoints": {
"text": "${AZURE_TEXT_ENDPOINT}",
"image": "${AZURE_IMAGE_ENDPOINT}"
},
"output_type": "EightSeverityLevels",
"content_types": [
{
"type": "text",
"path": "text"
},
{
"type": "image",
"path": "image_data"
}
],
"actions": {
"type": "block",
"message": "Content violates our community guidelines"
},
"category_severity": {
"Hate": 3,
"Violence": 4,
"SelfHarm": 2,
"Sexual": 5
}
}
}

Usage Examples

Text Content Analysis

Here's an example of analyzing text content:

curl -X POST "http://localhost:8081/post" \
-H "Host: your-subdomain.example.com" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "This is a test message for content analysis"
}'

Image Content Analysis

Example of analyzing an image:

curl -X POST "http://localhost:8081/post" \
-H "Host: your-subdomain.example.com" \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"image_data": "BASE64_ENCODED_IMAGE_CONTENT"
}'

Response Format

The plugin provides detailed analysis results in its response:

{
"analysis_results": [
{
"category": "Hate",
"severity": 1,
"severityLevel": 2
},
{
"category": "Violence",
"severity": 0,
"severityLevel": 2
}
],
"is_blocked": false,
"blocked_categories": []
}

Best Practices

Configuration Guidelines

When configuring the Azure Toxicity Detection plugin, consider these practices:

API Key Security: Always use environment variables for the API key. Store sensitive credentials in a secure environment file (.env) and never commit them to version control. Implement proper key rotation and access control policies.

Endpoint Configuration: Use region-specific endpoints for better performance. Choose the Azure endpoint closest to your application's geographic location. Consider implementing endpoint failover for high availability scenarios.

Severity Thresholds: Start with conservative thresholds and adjust based on needs. Monitor false positives and gradually tune thresholds based on real usage patterns. Document threshold changes and their impact on detection accuracy.

Content Paths: Configure precise content paths to ensure accurate content extraction. Map your application's content structure carefully and validate path configurations. Regularly test path configurations with different content types.

Performance Optimization

To optimize the plugin's performance:

Request Size: Keep image sizes reasonable to improve response times. Implement client-side image compression before upload. Consider setting maximum size limits and providing feedback to users when exceeded.

Content Type Selection: Only enable needed content types. Disable unused content type analyzers to reduce processing overhead. Review and update enabled content types based on your application's requirements.

Category Selection: Configure only necessary categories for analysis. Focus on categories relevant to your use case to minimize processing time. Regularly review category effectiveness and remove unused ones.

Error Handling: Implement appropriate error handling in your application. Set up retry mechanisms for transient failures and graceful degradation. Provide meaningful error messages to end users while logging detailed information for debugging.

Monitoring and Maintenance

For effective operation:

Log Monitoring: Regular review of plugin logs for performance and issues. Set up automated alerts for critical errors and performance degradation. Maintain log retention policies aligned with your compliance requirements.

Threshold Adjustment: Periodic review and adjustment of severity thresholds. Analyze false positive/negative rates and adjust thresholds accordingly. Keep detailed records of threshold changes and their impact.

API Limits: Monitor Azure API usage and limits. Set up usage alerts before reaching API quotas. Implement rate limiting and queueing mechanisms for high-traffic scenarios.

Response Times: Track and optimize response times for different content types. Set up performance benchmarks and monitor trends over time. Implement caching strategies where appropriate for frequently analyzed content.

Error Handling

The plugin provides detailed error information in case of issues:

Invalid Configuration: Clear error messages for configuration problems. Includes specific validation errors for each configuration parameter. Provides guidance on how to correct common configuration mistakes.

API Errors: Detailed Azure API error responses. Includes error codes, descriptions, and recommended actions. Maintains correlation IDs for tracking issues across systems.

Content Extraction: Specific errors for content extraction issues. Details about file format problems, encoding issues, or size limitations. Suggests appropriate content formatting requirements.

Threshold Violations: Detailed information about blocked content. Includes specific categories and severity scores that triggered the block. Provides context for content moderation decisions.