Weighted round-robin is a variation of the round-robin load balancing strategy. Instead of cycling evenly through the available targets, each target is assigned a weight that reflects its share of the traffic. This allows you to proportionally direct traffic based on the capacity, performance, or other criteria of each backend service.

  • Core Concept: Requests are distributed in a round-robin fashion, but targets with higher weights receive a proportionally larger share of the requests.
  • Fine-Grained Control: You can increase or decrease a target’s weight to adjust how much traffic it handles, making it an excellent approach for deployments with varying resource capacities.

Create an Upstream with Weighted Round-Robin

Below is an example command to create an Upstream using the weighted-round-robin algorithm. The sample config includes two targets—one for OpenAI and another for Anthropic—each assigned a weight to dictate how traffic is balanced.

# Create an upstream with weighted distribution
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "name": "weighted-upstream",
    "algorithm": "weighted-round-robin",
    "targets": [
      {
        "host": "api.openai.com",
        "port": 443,
        "protocol": "https",
        "weight": 60,    # 60% of traffic
        "priority": 1,
        "default_model": "gpt-4o-mini",
        "models": ["gpt-3.5-turbo", "gpt-4", "gpt-4o-mini"],
        "credentials": {
          "header_name": "Authorization",
          "header_value": "Bearer your-openai-key"
        }
      },
      {
        "host": "api.anthropic.com",
        "port": 443,
        "protocol": "https",
        "weight": 40,    # 40% of traffic
        "priority": 1,
        "default_model": "claude-3-5-sonnet-20241022",
        "models": ["claude-3-5-sonnet-20241022"],
        "credentials": {
          "header_name": "Authorization",
          "header_value": "Bearer your-anthropic-key"
        }
      }
    ],
    "health_checks": {
      "passive": true,
      "threshold": 3,
      "interval": 60
    }
  }'