Monitors
Monitoring LLM applications is essential for maintaining performance, controlling costs, and ensuring quality. Through the NeuralTrust platform, you can track key metrics and receive alerts when issues arise.
Types of Monitors
-
Metric Monitor: Metric monitors compare metric values against a static threshold. During each alert evaluation, NeuralTrust calculates the selected metric over the selected period and checks if it's above/below the threshold. This is the standard alerting case where you know what unexpected values look like.
For example, you might want to be alerted when the average response time exceeds 5 seconds or when the total cost goes above a certain budget threshold.
-
Change Monitor: Change monitors evaluate the difference between a past value and the current value. During each alert evaluation, NeuralTrust calculates the difference between the current series and the series from N minutes ago, then computes the selected metric over the selected period. An alert is triggered when this computed series crosses the threshold.
Change monitors evaluate the difference between a past value and the current value. During each alert evaluation, NeuralTrust calculates the difference between the current series and the series from N minutes ago, then computes the selected metric over the selected period. An alert is triggered when this computed series crosses the threshold.
This type of monitor is particularly useful for detecting sudden changes in your metrics, such as an unexpected spike in error rates or a significant drop in successful completions.
Metrics
The metric to monitor and the one that will be used to trigger the alert. Currently, we support the following metrics:
- Number of Messages: Total count of individual prompts and responses exchanged with the LLM.
- Number of Conversations: Count of distinct chat sessions or interactions with the LLM.
- Dialog time: Duration of the entire conversation from start to finish.
- Words per prompt: Average number of words in user inputs sent to the LLM.
- Words per response: Average number of words in responses generated by the LLM.
- Prompt language: Detected language of the user inputs.
- Latency: Time taken by the LLM to generate and return a response.
- Cost: Financial expense associated with LLM API calls.
- Tokens per prompt: Number of tokens (chunks of text) in user inputs, which directly affects API costs.
- Tokens per response: Number of tokens in LLM-generated responses, which directly affects API costs.
- Readability: Measure of how easy it is to understand the LLM's responses.
- + Response sentiment: Frequency of positive emotional tone in LLM responses.
- - Response sentiment: Frequency of negative emotional tone in LLM responses.
- + Prompt sentiment: Frequency of positive emotional tone in user inputs.
- - Prompt sentiment: Frequency of negative emotional tone in user inputs.
- Number of Prompt data leakage: Count of instances where sensitive information might be exposed in prompts.
- Number of Prompt injection: Count of potential malicious prompt attempts trying to manipulate the LLM's behavior.
Priority Levels
Alerts are categorized by priority levels to help teams quickly assess and respond to issues:
-
P1 Critical: Severe incidents requiring immediate attention that directly impact system availability or security. Examples include system outages, security breaches, or critical performance degradation.
-
P2 High: Urgent issues that significantly affect system performance or user experience but don't cause complete service disruption. Examples include significant latency increases or high error rates.
-
P3 Medium: Important issues that should be addressed soon but don't require immediate action. Examples include moderate cost increases or declining performance metrics.
-
P4 Low: Minor issues that should be monitored but don't impact system functionality. Examples include slight variations in response times or small increases in token usage.
-
P5 Info: Informational alerts for tracking system changes or trends. Examples include routine usage statistics or planned maintenance notifications.