NeuralTrust | The leading security platform for generative AI

Global rate limiting imposes a system-wide limit on all requests passing through the gateway. This is useful as an upper bound for overall capacity protection, ensuring the gateway or downstream services aren’t overwhelmed by total traffic.

Overview

What it does: Caps the total request volume across all IPs and users.
Common use cases:
- Protecting shared infrastructure.
- Enforcing service-level quotas or performance thresholds.

Basic Configuration

Below is an example showing how to enable global limits:

curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id} \
  -H "Content-Type: application/json" \
  -d '{
    "required_plugins": [
      {
        "name": "rate_limiter",
        "enabled": true,
        "stage": "pre_request",
        "priority": 1,
        "settings": {
          "limits": {
            "global": {
              "limit": 15,
              "window": "1m"
            }
          },
          "actions": {
            "type": "reject",
            "retry_after": "60"
          }
        }
      }
    ]
  }'

Configuration Fields

limit Maximum number of requests allowed for each user within the specified window.
window Time frame (e.g., 1m, 30s) for measuring requests.
actions
- type:
  - reject: Returns 429 status with retry information
  - block: Similar to reject but for permanent blocks
- retry_after: Seconds to wait before retrying

Window Configuration

The window parameter supports any valid duration string:

s: seconds (e.g., ”30s”)
m: minutes (e.g., “5m”)
h: hours (e.g., “1h”)
d: days (e.g., “1d”)

Example combinations:

{
  "limits": {
    "per_ip": {
      "limit": 30,
      "window": "30s"
    },
    "per_user": {
      "limit": 100,
      "window": "1h"
    },
    "global": {
      "limit": 1000,
      "window": "1d"
    }
  }
}

Response Headers

The rate limiter adds the following headers to each response:

Per Limit Type Headers

X-RateLimit-{type}-Limit: [maximum requests]
X-RateLimit-{type}-Remaining: [requests remaining]
X-RateLimit-{type}-Reset: [reset timestamp]

Where {type} is one of:

global
per_ip
per_user

Rate Limit Exceeded Response

{
  "error": "per_ip rate limit exceeded",
  "retry_after": "60"
}

Implementation Details

Storage and Tracking

Uses Redis sorted sets for tracking
Key format: ratelimit:{level}:{id}:{limit_type}:{key}
Automatic cleanup of expired entries
Thread-safe operations

Counter Implementation

requestID := fmt.Sprintf("%d:%s", now.Unix(), uuid.New().String())
pipe := redis.Pipeline()
pipe.ZRemRangeByScore(ctx, key, "0", windowStart)
pipe.ZAdd(ctx, key, &redis.Z{
    Score:  float64(now.Unix()),
    Member: requestID,
})
pipe.Expire(ctx, key, window)

Use Cases and Considerations

System-Wide Quotas

If you have a backend with limited capacity, global limiting ensures no single spike can breach that capacity.

Fallback Mechanism

Even if you have per-IP or per-user limits, global limiting acts as a final line of defense when total traffic volume surges.

Fair Resource Distribution

In multi-tenant environments, it prevents one tenant from consuming the entire capacity, ensuring all tenants receive a baseline service.

Getting Started

Core Concepts

Traffic Management

Non-REST Connectivity

Rate Limiting & Request Control

Content Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

Global System-Wide Limits

Overview

Basic Configuration

Configuration Fields

Window Configuration

Response Headers

Per Limit Type Headers

Rate Limit Exceeded Response

Implementation Details

Storage and Tracking

Counter Implementation

Use Cases and Considerations

System-Wide Quotas

Fallback Mechanism

Fair Resource Distribution

Getting Started

Core Concepts

Traffic Management

Non-REST Connectivity

Rate Limiting & Request Control

Content Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

​Overview

​Basic Configuration

​Configuration Fields

​Window Configuration

​Response Headers

​Per Limit Type Headers

​Rate Limit Exceeded Response

​Implementation Details

​Storage and Tracking

​Counter Implementation

​Use Cases and Considerations

​System-Wide Quotas

​Fallback Mechanism

​Fair Resource Distribution

Overview

Basic Configuration

Configuration Fields

Window Configuration

Response Headers

Per Limit Type Headers

Rate Limit Exceeded Response

Implementation Details

Storage and Tracking

Counter Implementation

Use Cases and Considerations

System-Wide Quotas

Fallback Mechanism

Fair Resource Distribution