Load Balancing
Load balancing helps distribute AI model requests across multiple instances for better performance and reliability. TrustGate supports multiple load balancing algorithms and health checking capabilities.
Configure Load Balancing
Create an Upstream with Round-Robin Strategy
# Create an upstream with round-robin load balancing
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/upstreams \
-H "Content-Type: application/json" \
-d '{
"name": "round-robin-upstream",
"algorithm": "round-robin",
"targets": [
{
"host": "api.openai.com",
"port": 443,
"protocol": "https",
"weight": 50,
"priority": 1
},
{
"host": "api.anthropic.com",
"port": 443,
"protocol": "https",
"weight": 50,
"priority": 1
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}'
Create an Upstream with Weighted Round-Robin
# Create an upstream with weighted distribution
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/upstreams \
-H "Content-Type: application/json" \
-d '{
"name": "weighted-upstream",
"algorithm": "weighted-round-robin",
"targets": [
{
"host": "api.openai.com",
"port": 443,
"protocol": "https",
"weight": 60, # 60% of traffic
"priority": 1
},
{
"host": "api.anthropic.com",
"port": 443,
"protocol": "https",
"weight": 40, # 40% of traffic
"priority": 1
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}'
Create a Service Using the Upstream
# Create a service that uses the load-balanced upstream
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/services \
-H "Content-Type: application/json" \
-d '{
"name": "load-balanced-service",
"type": "upstream",
"description": "Load balanced AI service",
"upstream_id": "{upstream-id}"
}'
Configure Routing Rules
# Create a rule to route traffic to the load-balanced service
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/rules \
-H "Content-Type: application/json" \
-d '{
"path": "/ai",
"service_id": "{service-id}",
"methods": ["POST"],
"strip_path": true,
"active": true
}'
Load Balancing Features
Supported Algorithms
- Round Robin: Distributes requests evenly across all targets
- Weighted Round Robin: Distributes traffic based on target weights
Health Checks
TrustGate supports passive health checking:
"health_checks": {
"passive": true, // Enable passive health checks
"threshold": 3, // Number of failures before marking target as unhealthy
"interval": 60 // Check interval in seconds
}
Target Configuration
Each target can be configured with:
- Weight: Determines traffic distribution (1-100)
- Priority: Determines target selection order (lower numbers = higher priority)
- Protocol: Supports HTTP/HTTPS
- Host/Port: Target service endpoint
Payload Transformation
When using multiple providers in an upstream, you need to include fields that cover all providers in your request. The gateway will automatically transform the request for the selected provider.
For example, when load balancing between OpenAI and Anthropic:
{
"model": "gpt-4", // OpenAI model
"messages": [ // OpenAI format
{
"role": "user",
"content": "Hello!"
}
],
"max_tokens": 1000, // Common field
"system": "You are an assistant", // Anthropic system prompt
"temperature": 0.7 // Common field
}
The gateway will:
- Select a target based on the load balancing algorithm
- Transform the request to match the selected provider's format
- Remove unnecessary fields for that provider
- Add any required provider-specific headers
You don't need to handle the transformation yourself - just include all necessary fields in your request, and the gateway will handle the rest based on the provider schemas.
For streaming requests, add "stream": true
to enable streaming for all providers.
Priority-Based Fallback
Targets can be assigned different priorities to create fallback chains:
{
"targets": [
{
"host": "api.openai.com",
"priority": 1, // Primary target
"weight": 100
},
{
"host": "api.anthropic.com",
"priority": 2, // Fallback target
"weight": 100
}
]
}
When a higher priority target fails:
- The request automatically fails over to the next priority level
- Load balancing continues among targets of the same priority
- Health checks determine when to return to higher priority targets
Best Practices
-
Health Checking
- Enable passive health checks for automatic failure detection
- Set appropriate threshold values based on your requirements
-
Load Distribution
- Use weighted distribution for heterogeneous targets
- Consider target capacity when setting weights
-
Monitoring
- Regularly monitor target health status
- Review traffic distribution patterns
Next Steps
After configuring load balancing:
- Configure Rate Limiting to protect your services