Overview
The NeuralTrust Moderation plugin (neuraltrust_moderation
) is a sophisticated content filtering system designed to protect your AI gateway from potentially harmful or unwanted content. It employs multiple layers of content analysis to ensure comprehensive protection while maintaining high performance.
At its core, the plugin implements content filtering through two primary mechanisms: keyword-based blocking and regular expression pattern matching. The keyword system is enhanced with fuzzy matching capabilities, allowing it to detect not just exact matches but also similar variations of prohibited words. This is particularly effective in catching attempts to circumvent the filter through minor word modifications.
The smart detection system utilizes advanced string matching algorithms, specifically the Levenshtein distance calculation, to determine word similarity. This approach is case-insensitive by default and operates with a configurable similarity threshold (ranging from 0 to 1). The default threshold is set to 0.8, providing a good balance between strict matching and flexibility. This means that words that are 80% similar to blocked keywords will trigger the filter, effectively catching common evasion techniques like character substitutions or misspellings.
Features
- Keyword-based Blocking:
- Fuzzy matching for similar words
- Configurable similarity threshold
- Case-insensitive matching
- Pattern-based Blocking:
- Regular expression support
- Complex pattern matching
- Pre-compiled patterns for performance
- Support for common attack patterns
- Action Configuration:
- Customizable block messages
- Configurable response codes
- Detailed error reporting
- Logging and monitoring
Configuration Examples
Basic Configuration
The basic configuration provides essential content filtering with commonly needed protections:Similarity Threshold
Value Range | Description | Impact |
---|---|---|
0 to 1 (0.8 default) | Determines how closely strings must match | Controls matching sensitivity |
Higher values (>0.8) | Requires closer matches | Reduces false positives |
Lower values (<0.8) | Allows more variation | Catches more variations |