TrustGate implements a token bucket-based rate limiting system specifically designed for AI model interactions. This system manages both the number of requests and the token usage per API key.
The plugin adds the following headers to track usage:
Copy
X-Ratelimit-Limit-Tokens: [maximum tokens]X-Ratelimit-Remaining-Tokens: [tokens remaining]X-Ratelimit-Reset-Tokens: [seconds until next refill]X-Ratelimit-Limit-Requests: [maximum requests per minute]X-Ratelimit-Remaining-Requests: [requests remaining]X-Ratelimit-Reset-Requests: [seconds until request count reset]X-Tokens-Consumed: [tokens used in this request]