Documentation Index
Fetch the complete documentation index at: https://docs.neuraltrust.ai/llms.txt
Use this file to discover all available pages before exploring further.
This guide demonstrates how to set up load balancing between multiple AI providers in TrustGate. By distributing traffic across different providers, you can improve reliability, optimize costs, and ensure high availability for your AI applications.
Prerequisites
- TrustGate installed and running
- API keys for multiple providers (e.g., OpenAI and Anthropic)
- Basic understanding of load balancing concepts
Step 1: Create a Gateway
First, create a gateway that will handle the load balancing:
curl -X POST "http://localhost:8080/api/v1/gateways" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-d '{
"name": "multi-provider-gateway",
}'
Set up an upstream that includes multiple AI providers. This example demonstrates load balancing between OpenAI and Anthropic:
curl -X POST "http://localhost:8080/api/v1/gateways/{gateway_id}/upstreams" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-d '{
"name": "ai-providers-upstream",
"algorithm": "round-robin",
"targets": [
{
"path": "/v1/chat/completions",
"provider": "openai",
"weight": 50,
"default_model": "gpt-4o-mini",
"models": ["gpt-3.5-turbo", "gpt-4", "gpt-4o-mini"],
"credentials": {
"api_key": "your-openai-key"
}
},
{
"path": "/v1/messages",
"provider": "anthropic",
"weight": 50,
"default_model": "claude-3-sonnet",
"models": ["claude-3-sonnet"],
"headers": {
"anthropic-version": "2023-06-01"
},
"credentials": {
"api_key": "your-anthropic-key"
}
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}'
Provider-specific examples
The following examples show minimal upstream definitions for individual providers.
{
"name": "{{upstream_service_name}}",
"algorithm": "round-robin",
"targets": [
{
"provider": "openai",
"provider_options": {"api": "responses"},
"weight": 50,
"priority": 1,
"default_model": "gpt-4o-mini",
"models": ["gpt-3.5-turbo", "gpt-4", "gpt-4o-mini"],
"stream": false,
"credentials": {
"api_key": "sk-proj-"
}
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}
Nota sobre OpenAI provider_options:
-
El campo
provider_options permite especificar {"api": "responses"} o {"api": "completions"}.
-
Usa
api: "completions" cuando quieras hacer load balancing entre OpenAI y otros proveedores (Anthropic, Gemini, etc.).
-
Importante:
api: "responses" no es compatible con el balanceo de carga multi‑proveedor; si configuras un upstream con varios providers, evita responses y utiliza completions para mantener la compatibilidad.
-
Anthropic:
{
"name": "{{upstream_service_name}}",
"algorithm": "round-robin",
"targets": [
{
"provider": "anthropic",
"weight": 50,
"default_model": "claude-3-5-sonnet-20241022",
"models": ["claude-3-5-sonnet-20241022"],
"stream": false,
"headers": {
"anthropic-version": "2023-06-01"
},
"credentials": {
"api_key": ""
}
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}
{
"name": "{{upstream_service_name}}",
"algorithm": "round-robin",
"targets": [
{
"provider": "gemini",
"weight": 50,
"default_model": "gemini-2.0-flash-001",
"models": ["gemini-2.0-flash-001"],
"stream": false,
"credentials": {
"api_key": ""
}
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}
- AWS Bedrock (standalone):
{
"name": "{{upstream_service_name}}",
"algorithm": "round-robin",
"targets": [
{
"provider": "bedrock",
"weight": 50,
"default_model": "eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
"models": [
"eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
"amazon.titan-text-express-v1",
"eu.meta.llama3-2-1b-instruct-v1:0",
"eu.mistral.pixtral-large-2502-v1:0",
"deepseek-llm-r1-distill-llama-8b"
],
"stream": true,
"headers": {
"anthropic-version": "2023-06-01"
},
"credentials": {
"aws_access_key_id": "",
"aws_secret_access_key": "",
"aws_session_token": "",
"aws_region": "eu-west-1"
}
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}
- AWS Bedrock (example with Anthropic via Bedrock):
{
"name": "{{upstream_service_name}}",
"algorithm": "round-robin",
"targets": [
{
"provider": "bedrock",
"weight": 50,
"default_model": "eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
"models": [
"eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
"amazon.titan-text-express-v1",
"eu.meta.llama3-2-1b-instruct-v1:0",
"eu.mistral.pixtral-large-2502-v1:0",
"deepseek-llm-r1-distill-llama-8b"
],
"stream": true,
"headers": {
"anthropic-version": "2023-06-01"
},
"credentials": {
"aws_access_key_id": "A....",
"aws_secret_access_key": "u1H49D0GfnbtDhYF...",
"aws_session_token": "I...",
"aws_region": "eu-west-1"
}
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}
{
"name": "{{upstream_service_name}}",
"algorithm": "round-robin",
"targets": [
{
"provider": "azure",
"weight": 50,
"default_model": "gpt-4o-mini",
"models": ["gpt-4o-mini"],
"credentials": {
"api_key": "7LQ...",
"azure_endpoint": "https://example2.cognitiveservices.azure.com",
"azure_version": "",
"azure_use_managed_identity": false
}
}
],
"health_checks": {
"passive": true,
"threshold": 3,
"interval": 60
}
}
Configuration Parameters
- algorithm: Load balancing algorithm (e.g., round-robin)
- weight: Relative traffic distribution weight for each target
- path: Provider-specific API endpoint path
- provider: AI provider identifier
- provider_options: Provider-specific options. Para OpenAI,
{"api": "responses"} o {"api": "completions"}. Importante: responses no es compatible con el balanceo de carga con otros proveedores; usa completions para escenarios multi‑proveedor.
- models: List of supported models for each provider
- default_model: Default model when none is specified
- headers: Provider-specific headers
- credentials: Authentication credentials for each provider
Step 3: Create a Service
Create a service that uses the multi-provider upstream:
curl -X POST "http://localhost:8080/api/v1/gateways/{gateway_id}/services" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-d '{
"name": "ai-chat-service",
"description": "Load balanced AI chat completion service",
"upstream_id": "{upstream_id}",
"type": "upstream",
"tags": ["ai", "chat"]
}'
Step 4: Add a Rule
Configure a rule to route requests to your service:
curl -X POST "http://localhost:8080/api/v1/gateways/{gateway_id}/rules" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-d '{
"path": "/v1",
"service_id": "{service_id}",
"methods": ["POST"],
"strip_path": false,
"active": true
}'
Step 5: Generate an API Key
Create an API key for authentication:
curl -X POST "http://localhost:8080/api/v1/gateways/{gateway_id}/keys" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-d '{
"name": "test-key",
"expires_at": "2026-01-01T00:00:00Z"
}'
Using the Load Balanced API
When making requests to your load-balanced API, TrustGate automatically handles provider selection and request transformation:
curl -X POST "http://localhost:8081/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "X-TG-API-Key: your-api-key" \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "You are an assistant"
},
{
"role": "user",
"content": "Hello, how are you?"
}
],
"max_tokens": 1020,
"system": "You are an assistant",
"stream": true
}'
When using multiple providers in an upstream, you need to include fields that cover all providers in your request. The gateway will automatically transform the request for the selected provider.
For example, when load balancing between OpenAI and Anthropic:
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"max_tokens": 1020,
"system": "You are an assistant"
}
The fields in this request serve different purposes:
model and messages: OpenAI format
system: Anthropic system prompt
max_tokens: Common field for both providers
The gateway will:
- Select a target based on the load balancing algorithm
- Transform the request to match the selected provider’s format
- Remove unnecessary fields for that provider
- Add any required provider-specific headers
- Use the default model for the selected provider if different from the request
You don’t need to handle the transformation yourself - just include all necessary fields in your request, and the gateway will handle the rest based on the provider schemas.
For streaming requests, add "stream": true to enable streaming for all providers.
The API returns headers indicating which provider was selected:
- X-Selected-Provider: The provider that handled the request
Load Balancing Features
-
Weighted Distribution
- Configure traffic distribution using weights
- Adjust weights based on provider costs or performance
-
Failover Support
- Set priority levels for providers
- Automatic failover when primary provider fails
- Health checks for provider availability
-
Request Transformation
- Automatic conversion between provider formats
- Model name mapping
- Request/response adaptation
-
Health Monitoring
- Passive health checks
- Configurable failure thresholds
- Automatic provider recovery
Best Practices
-
Provider Selection
- Choose complementary providers
- Consider provider strengths and pricing
- Match models across providers
-
Load Distribution
- Balance cost vs. performance
- Monitor provider quotas
- Adjust weights based on usage patterns
-
Error Handling
- Implement proper fallback logic
- Monitor provider errors
- Set appropriate timeouts
-
Request Design
- Use provider-agnostic request format
- Include all required fields for both providers
- Handle provider-specific features gracefully
Troubleshooting
If you encounter issues:
- Check provider selection headers
- Verify provider health status
- Review load balancing configuration
- Monitor provider error responses
Next Steps
- Set up monitoring for provider metrics
- Configure provider-specific rate limits
- Implement cost optimization strategies
- Add more providers for redundancy