Load Balancing

Load balancing helps distribute AI model requests across multiple instances for better performance and reliability. TrustGate supports multiple load balancing algorithms and health checking capabilities.

Configure Load Balancing

Create an Upstream with Round-Robin Strategy

# Create an upstream with round-robin load balancing
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "name": "round-robin-upstream",
    "algorithm": "round-robin",
    "targets": [
      {
        "host": "api.openai.com",
        "port": 443,
        "protocol": "https",
        "weight": 50,
        "priority": 1
      },
      {
        "host": "api.anthropic.com",
        "port": 443,
        "protocol": "https",
        "weight": 50,
        "priority": 1
      }
    ],
    "health_checks": {
      "passive": true,
      "threshold": 3,
      "interval": 60
    }
  }'

Create an Upstream with Weighted Round-Robin

# Create an upstream with weighted distribution
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "name": "weighted-upstream",
    "algorithm": "weighted-round-robin",
    "targets": [
      {
        "host": "api.openai.com",
        "port": 443,
        "protocol": "https",
        "weight": 60,    # 60% of traffic
        "priority": 1
      },
      {
        "host": "api.anthropic.com",
        "port": 443,
        "protocol": "https",
        "weight": 40,    # 40% of traffic
        "priority": 1
      }
    ],
    "health_checks": {
      "passive": true,
      "threshold": 3,
      "interval": 60
    }
  }'

Create a Service Using the Upstream

# Create a service that uses the load-balanced upstream
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/services \
  -H "Content-Type: application/json" \
  -d '{
    "name": "load-balanced-service",
    "type": "upstream",
    "description": "Load balanced AI service",
    "upstream_id": "{upstream-id}"
  }'

Configure Routing Rules

# Create a rule to route traffic to the load-balanced service
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/rules \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/ai",
    "service_id": "{service-id}",
    "methods": ["POST"],
    "strip_path": true,
    "active": true
  }'

Load Balancing Features

Supported Algorithms

Round Robin: Distributes requests evenly across all targets
Weighted Round Robin: Distributes traffic based on target weights

Health Checks

TrustGate supports passive health checking:

"health_checks": {
  "passive": true,      // Enable passive health checks
  "threshold": 3,       // Number of failures before marking target as unhealthy
  "interval": 60        // Check interval in seconds
}

Target Configuration

Each target can be configured with:

Weight: Determines traffic distribution (1-100)
Priority: Determines target selection order (lower numbers = higher priority)
Protocol: Supports HTTP/HTTPS
Host/Port: Target service endpoint

Payload Transformation

When using multiple providers in an upstream, you need to include fields that cover all providers in your request. The gateway will automatically transform the request for the selected provider.

For example, when load balancing between OpenAI and Anthropic:

{
  "model": "gpt-4",                    // OpenAI model
  "messages": [                        // OpenAI format
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "max_tokens": 1000,                  // Common field
  "system": "You are an assistant",    // Anthropic system prompt
  "temperature": 0.7                   // Common field
}

The gateway will:

Select a target based on the load balancing algorithm
Transform the request to match the selected provider's format
Remove unnecessary fields for that provider
Add any required provider-specific headers

You don't need to handle the transformation yourself - just include all necessary fields in your request, and the gateway will handle the rest based on the provider schemas.

For streaming requests, add "stream": true to enable streaming for all providers.

Priority-Based Fallback

Targets can be assigned different priorities to create fallback chains:

{
  "targets": [
    {
      "host": "api.openai.com",
      "priority": 1,     // Primary target
      "weight": 100
    },
    {
      "host": "api.anthropic.com",
      "priority": 2,     // Fallback target
      "weight": 100
    }
  ]
}

When a higher priority target fails:

The request automatically fails over to the next priority level
Load balancing continues among targets of the same priority
Health checks determine when to return to higher priority targets

Best Practices

Health Checking
- Enable passive health checks for automatic failure detection
- Set appropriate threshold values based on your requirements
Load Distribution
- Use weighted distribution for heterogeneous targets
- Consider target capacity when setting weights
Monitoring
- Regularly monitor target health status
- Review traffic distribution patterns

Next Steps

After configuring load balancing:

Configure Rate Limiting to protect your services

Configure Load Balancing​

Create an Upstream with Round-Robin Strategy​

Create an Upstream with Weighted Round-Robin​

Create a Service Using the Upstream​

Configure Routing Rules​

Load Balancing Features​

Supported Algorithms​

Health Checks​

Target Configuration​

Payload Transformation​

Priority-Based Fallback​

Best Practices​

Next Steps​