Prerequisites
- TrustGate installed and running
- API keys for multiple providers (e.g., OpenAI and Anthropic)
- Basic understanding of load balancing concepts
Step 1: Create a Gateway
First, create a gateway that will handle the load balancing:Step 2: Configure Multi-Provider Upstream
Set up an upstream that includes multiple AI providers. This example demonstrates load balancing between OpenAI and Anthropic:Provider-specific examples
The following examples show minimal upstream definitions for individual providers.- OpenAI:
-
El campo
provider_options
permite especificar{"api": "responses"}
o{"api": "completions"}
. -
Usa
api: "completions"
cuando quieras hacer load balancing entre OpenAI y otros proveedores (Anthropic, Gemini, etc.). -
Importante:
api: "responses"
no es compatible con el balanceo de carga multi‑proveedor; si configuras un upstream con varios providers, evitaresponses
y utilizacompletions
para mantener la compatibilidad. - Anthropic:
- Google Gemini:
- AWS Bedrock (standalone):
- AWS Bedrock (example with Anthropic via Bedrock):
- Azure OpenAI:
Configuration Parameters
- algorithm: Load balancing algorithm (e.g., round-robin)
- weight: Relative traffic distribution weight for each target
- path: Provider-specific API endpoint path
- provider: AI provider identifier
- provider_options: Provider-specific options. Para OpenAI,
{"api": "responses"}
o{"api": "completions"}
. Importante:responses
no es compatible con el balanceo de carga con otros proveedores; usacompletions
para escenarios multi‑proveedor. - models: List of supported models for each provider
- default_model: Default model when none is specified
- headers: Provider-specific headers
- credentials: Authentication credentials for each provider
Step 3: Create a Service
Create a service that uses the multi-provider upstream:Step 4: Add a Rule
Configure a rule to route requests to your service:Step 5: Generate an API Key
Create an API key for authentication:Using the Load Balanced API
When making requests to your load-balanced API, TrustGate automatically handles provider selection and request transformation:model
andmessages
: OpenAI formatsystem
: Anthropic system promptmax_tokens
: Common field for both providers
- Select a target based on the load balancing algorithm
- Transform the request to match the selected provider’s format
- Remove unnecessary fields for that provider
- Add any required provider-specific headers
- Use the default model for the selected provider if different from the request
"stream": true
to enable streaming for all providers.
Response Headers
The API returns headers indicating which provider was selected:- X-Selected-Provider: The provider that handled the request
Load Balancing Features
-
Weighted Distribution
- Configure traffic distribution using weights
- Adjust weights based on provider costs or performance
-
Failover Support
- Set priority levels for providers
- Automatic failover when primary provider fails
- Health checks for provider availability
-
Request Transformation
- Automatic conversion between provider formats
- Model name mapping
- Request/response adaptation
-
Health Monitoring
- Passive health checks
- Configurable failure thresholds
- Automatic provider recovery
Best Practices
-
Provider Selection
- Choose complementary providers
- Consider provider strengths and pricing
- Match models across providers
-
Load Distribution
- Balance cost vs. performance
- Monitor provider quotas
- Adjust weights based on usage patterns
-
Error Handling
- Implement proper fallback logic
- Monitor provider errors
- Set appropriate timeouts
-
Request Design
- Use provider-agnostic request format
- Include all required fields for both providers
- Handle provider-specific features gracefully
Troubleshooting
If you encounter issues:- Check provider selection headers
- Verify provider health status
- Review load balancing configuration
- Monitor provider error responses
Next Steps
- Set up monitoring for provider metrics
- Configure provider-specific rate limits
- Implement cost optimization strategies
- Add more providers for redundancy