Benchmark

The AI Gateway has been tested under high load conditions to ensure optimal performance. This guide explains how to run benchmarks and interpret the results.

Running Benchmarks

There are two methods to benchmark the gateway:

1. Automated Benchmark Script

Use our automated benchmark script for comprehensive testing:

./scripts/benchmark.sh

The script automatically:

Verifies dependencies
Creates a test gateway
Configures forwarding rules
Executes multiple benchmark scenarios

Configuration

Customize the benchmark using environment variables:

export ADMIN_URL="http://localhost:8080/api/v1"  # Admin API endpoint
export PROXY_URL="http://localhost:8081"         # Proxy endpoint
export CONCURRENT_USERS=50                       # Number of concurrent users
export DURATION="30s"                           # Test duration

2. Manual Testing

For custom benchmarks, use hey:

Install hey:

go install github.com/rakyll/hey@latest

Run custom benchmarks:

# Set your gateway URL and API key
export GATEWAY_URL="http://localhost:8080"
export API_KEY="your-api-key"

# Run benchmark
hey -n 100000 -c 50 \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -m POST \
    -d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hi"}]}' \
    "$GATEWAY_URL/v1/chat/completions"

Performance Results

Recent benchmark results show impressive performance:

Summary:
  Total:        3.9210 secs
  Slowest:      0.1347 secs
  Fastest:      0.0004 secs
  Average:      0.0078 secs
  Requests/sec: 25503.4763

  Total data:   4900000 bytes
  Size/request: 49 bytes

Latency distribution:
  10% in 0.0047 secs
  25% in 0.0059 secs
  50% in 0.0073 secs
  75% in 0.0090 secs
  90% in 0.0109 secs
  95% in 0.0123 secs
  99% in 0.0158 secs

Key Metrics

Metric	Value	Description
Throughput	25,500 req/s	Requests processed per second
Avg Latency	7.8ms	Average response time
P99 Latency	15.8ms	99th percentile response time
Success Rate	100%	Percentage of successful requests

Performance Characteristics

The gateway demonstrates:

Low Latency: 90% of requests complete under 11ms
High Throughput: Exceeds 25k requests per second
Stability: Minimal performance variance
Reliability: No failed requests during testing

Notes

Results were obtained in controlled conditions
Actual performance may vary based on:
- Network conditions
- Hardware specifications
- Specific use cases
- Load patterns

Best Practices

Environment Setup
- Use production-grade hardware
- Ensure sufficient network bandwidth
- Configure appropriate system limits
Test Configuration
- Start with lower concurrency
- Gradually increase load
- Monitor system resources
Results Analysis
- Focus on P95 and P99 latencies
- Monitor error rates
- Check resource utilization

Running Benchmarks​

1. Automated Benchmark Script​

Configuration​

2. Manual Testing​

Performance Results​

Key Metrics​

Performance Characteristics​

Notes​

Best Practices​