NeuralTrust | The leading security platform for generative AI

Rate limiting is a crucial mechanism in modern API security and traffic control. It prevents abuse, mitigates denial-of-service attacks, and ensures fair usage of system resources across users, IPs, and applications.

TrustGate provides a comprehensive and extensible rate limiting system that supports multiple strategies and levels of granularity, allowing you to protect your infrastructure without compromising performance or user experience.

Why Use Rate Limiting?

Rate limiting helps you:

Prevent abuse from malicious users or bots.
Protect backend services from traffic spikes.
Ensure fair usage in multi-tenant environments.
Enforce service quotas aligned with billing tiers.
Control AI token usage for LLM-based applications.

What TrustGate Offers

TrustGate includes built-in support for the following rate limiting strategies:

Strategy	Description
Per IP	Limits requests based on client IP address. Useful for blocking abusive IPs or preventing spam.
Per User ID	Tracks usage per authenticated user. Ideal for SaaS and authenticated API scenarios.
Global	Applies a global cap across all users and IPs. Acts as a system-wide fail-safe against overload.
Token-Based	Controls requests based on token consumption (e.g., LLM usage). Especially useful for AI workloads.

Configuration Overview

Each limiter supports granular settings via the plugin configuration, including:

limit: Maximum allowed requests or tokens.
window: Duration in which the limit applies (e.g., 30s, 1m, 1h).
actions: What to do when limits are exceeded (e.g., reject, block, or retry_after).
headers: Rate limit feedback headers are automatically added to responses.

Response Headers

TrustGate exposes rate limit feedback through response headers:

X-RateLimit-{type}-Limit: [maximum requests]
X-RateLimit-{type}-Remaining: [requests remaining]
X-RateLimit-{type}-Reset: [reset timestamp]

Where {type} is one of: global, per_ip, per_user, or tokens.

Best Practices

Combine per-IP, per-user, and global limits for layered protection.
Use token-based limits when handling AI/LLM requests to prevent excessive consumption.
Monitor rate limit headers and metrics to adjust thresholds as your traffic evolves.
Leverage retry_after to guide clients on when they can retry.

Ready to dive into each type? See:

Getting Started

Core Concepts

Traffic Management

Non-REST Connectivity

Rate Limiting & Request Control

Content Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

Overview

Why Use Rate Limiting?

What TrustGate Offers

Configuration Overview

Response Headers

Best Practices

Getting Started

Core Concepts

Traffic Management

Non-REST Connectivity

Rate Limiting & Request Control

Content Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

​Why Use Rate Limiting?

​What TrustGate Offers

​Configuration Overview

​Response Headers

​Best Practices

Why Use Rate Limiting?

What TrustGate Offers

Configuration Overview

Response Headers

Best Practices