NeuralTrust | The leading security platform for generative AI

This guide demonstrates how to set up load balancing between multiple AI providers in TrustGate. By distributing traffic across different providers, you can improve reliability, optimize costs, and ensure high availability for your AI applications.

Prerequisites

TrustGate installed and running
API keys for multiple providers (e.g., OpenAI and Anthropic)
Basic understanding of load balancing concepts

Step 1: Create a Gateway

First, create a gateway that will handle the load balancing:

curl -X POST "http://localhost:8080/api/v1/gateways" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -d '{
    "name": "multi-provider-gateway",
  }'

Step 2: Configure Multi-Provider Upstream

Set up an upstream that includes multiple AI providers. This example demonstrates load balancing between OpenAI and Anthropic:

curl -X POST "http://localhost:8080/api/v1/gateways/{gateway_id}/upstreams" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -d '{
    "name": "ai-providers-upstream",
    "algorithm": "round-robin",
    "targets": [
      {
        "path": "/v1/chat/completions",
        "provider": "openai",
        "weight": 50,
        "default_model": "gpt-4o-mini",
        "models": ["gpt-3.5-turbo", "gpt-4", "gpt-4o-mini"],
        "credentials": {
          "api_key": "your-openai-key"
        }
      },
      {
        "path": "/v1/messages",
        "provider": "anthropic",
        "weight": 50,
        "default_model": "claude-3-sonnet",
        "models": ["claude-3-sonnet"],
        "headers": {
          "anthropic-version": "2023-06-01"
        },
        "credentials": {
          "api_key": "your-anthropic-key"
        }
      }
    ],
    "health_checks": {
      "passive": true,
      "threshold": 3,
      "interval": 60
    }
  }'

Provider-specific examples

The following examples show minimal upstream definitions for individual providers.

OpenAI:

{
  "name": "{{upstream_service_name}}",
  "algorithm": "round-robin",
  "targets": [
    {
      "provider": "openai",
      "provider_options": {"api": "responses"},
      "weight": 50,
      "priority": 1,
      "default_model": "gpt-4o-mini",
      "models": ["gpt-3.5-turbo", "gpt-4", "gpt-4o-mini"],
      "stream": false,
      "credentials": {
        "api_key": "sk-proj-"
      }
    }
  ],
  "health_checks": {
    "passive": true,
    "threshold": 3,
    "interval": 60
  }
}

Nota sobre OpenAI provider_options:

El campo provider_options permite especificar {"api": "responses"} o {"api": "completions"}.
Usa api: "completions" cuando quieras hacer load balancing entre OpenAI y otros proveedores (Anthropic, Gemini, etc.).
Importante: api: "responses" no es compatible con el balanceo de carga multi‑proveedor; si configuras un upstream con varios providers, evita responses y utiliza completions para mantener la compatibilidad.
Anthropic:

{
  "name": "{{upstream_service_name}}",
  "algorithm": "round-robin",
  "targets": [
    {
      "provider": "anthropic",
      "weight": 50,
      "default_model": "claude-3-5-sonnet-20241022",
      "models": ["claude-3-5-sonnet-20241022"],
      "stream": false,
      "headers": {
        "anthropic-version": "2023-06-01"
      },
      "credentials": {
        "api_key": ""
      }
    }
  ],
  "health_checks": {
    "passive": true,
    "threshold": 3,
    "interval": 60
  }
}

Google Gemini:

{
  "name": "{{upstream_service_name}}",
  "algorithm": "round-robin",
  "targets": [
    {
      "provider": "gemini",
      "weight": 50,
      "default_model": "gemini-2.0-flash-001",
      "models": ["gemini-2.0-flash-001"],
      "stream": false,
      "credentials": {
        "api_key": ""
      }
    }
  ],
  "health_checks": {
    "passive": true,
    "threshold": 3,
    "interval": 60
  }
}

AWS Bedrock (standalone):

{
  "name": "{{upstream_service_name}}",
  "algorithm": "round-robin",
  "targets": [
    {
      "provider": "bedrock",
      "weight": 50,
      "default_model": "eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
      "models": [
        "eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
        "amazon.titan-text-express-v1",
        "eu.meta.llama3-2-1b-instruct-v1:0",
        "eu.mistral.pixtral-large-2502-v1:0",
        "deepseek-llm-r1-distill-llama-8b"
      ],
      "stream": true,
      "headers": {
        "anthropic-version": "2023-06-01"
      },
      "credentials": {
        "aws_access_key_id": "",
        "aws_secret_access_key": "",
        "aws_session_token": "",
        "aws_region": "eu-west-1"
      }
    }
  ],
  "health_checks": {
    "passive": true,
    "threshold": 3,
    "interval": 60
  }
}

AWS Bedrock (example with Anthropic via Bedrock):

{
  "name": "{{upstream_service_name}}",
  "algorithm": "round-robin",
  "targets": [
    {
      "provider": "bedrock",
      "weight": 50,
      "default_model": "eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
      "models": [
        "eu.anthropic.claude-3-5-sonnet-20240620-v1:0",
        "amazon.titan-text-express-v1",
        "eu.meta.llama3-2-1b-instruct-v1:0",
        "eu.mistral.pixtral-large-2502-v1:0",
        "deepseek-llm-r1-distill-llama-8b"
      ],
      "stream": true,
      "headers": {
        "anthropic-version": "2023-06-01"
      },
      "credentials": {
        "aws_access_key_id": "A....",
        "aws_secret_access_key": "u1H49D0GfnbtDhYF...",
        "aws_session_token": "I...",
        "aws_region": "eu-west-1"
      }
    }
  ],
  "health_checks": {
    "passive": true,
    "threshold": 3,
    "interval": 60
  }
}

Azure OpenAI:

{
  "name": "{{upstream_service_name}}",
  "algorithm": "round-robin",
  "targets": [
    {
      "provider": "azure",
      "weight": 50,
      "default_model": "gpt-4o-mini",
      "models": ["gpt-4o-mini"],
      "credentials": {
        "api_key": "7LQ...",
        "azure_endpoint": "https://example2.cognitiveservices.azure.com",
        "azure_version": "",
        "azure_use_managed_identity": false
      }
    }
  ],
  "health_checks": {
    "passive": true,
    "threshold": 3,
    "interval": 60
  }
}

Configuration Parameters

algorithm: Load balancing algorithm (e.g., round-robin)
weight: Relative traffic distribution weight for each target
path: Provider-specific API endpoint path
provider: AI provider identifier
provider_options: Provider-specific options. Para OpenAI, {"api": "responses"} o {"api": "completions"}. Importante: responses no es compatible con el balanceo de carga con otros proveedores; usa completions para escenarios multi‑proveedor.
models: List of supported models for each provider
default_model: Default model when none is specified
headers: Provider-specific headers
credentials: Authentication credentials for each provider

Step 3: Create a Service

Create a service that uses the multi-provider upstream:

curl -X POST "http://localhost:8080/api/v1/gateways/{gateway_id}/services" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -d '{
    "name": "ai-chat-service",
    "description": "Load balanced AI chat completion service",
    "upstream_id": "{upstream_id}",
    "type": "upstream",
    "tags": ["ai", "chat"]
  }'

Step 4: Add a Rule

Configure a rule to route requests to your service:

curl -X POST "http://localhost:8080/api/v1/gateways/{gateway_id}/rules" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -d '{
    "path": "/v1",
    "service_id": "{service_id}",
    "methods": ["POST"],
    "strip_path": false,
    "active": true
  }'

Step 5: Generate an API Key

Create an API key for authentication:

curl -X POST "http://localhost:8080/api/v1/gateways/{gateway_id}/keys" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -d '{
    "name": "test-key",
    "expires_at": "2026-01-01T00:00:00Z"
  }'

Using the Load Balanced API

When making requests to your load-balanced API, TrustGate automatically handles provider selection and request transformation:

curl -X POST "http://localhost:8081/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "X-TG-API-Key: your-api-key" \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are an assistant"
      },
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ],
    "max_tokens": 1020,
    "system": "You are an assistant",
    "stream": true
  }'

When using multiple providers in an upstream, you need to include fields that cover all providers in your request. The gateway will automatically transform the request for the selected provider. For example, when load balancing between OpenAI and Anthropic:

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "max_tokens": 1020,
  "system": "You are an assistant"
}

The fields in this request serve different purposes:

model and messages: OpenAI format
system: Anthropic system prompt
max_tokens: Common field for both providers

The gateway will:

Select a target based on the load balancing algorithm
Transform the request to match the selected provider’s format
Remove unnecessary fields for that provider
Add any required provider-specific headers
Use the default model for the selected provider if different from the request

You don’t need to handle the transformation yourself - just include all necessary fields in your request, and the gateway will handle the rest based on the provider schemas. For streaming requests, add "stream": true to enable streaming for all providers.

Response Headers

The API returns headers indicating which provider was selected:

X-Selected-Provider: The provider that handled the request

Load Balancing Features

Weighted Distribution
- Configure traffic distribution using weights
- Adjust weights based on provider costs or performance
Failover Support
- Set priority levels for providers
- Automatic failover when primary provider fails
- Health checks for provider availability
Request Transformation
- Automatic conversion between provider formats
- Model name mapping
- Request/response adaptation
Health Monitoring
- Passive health checks
- Configurable failure thresholds
- Automatic provider recovery

Best Practices

Provider Selection
- Choose complementary providers
- Consider provider strengths and pricing
- Match models across providers
Load Distribution
- Balance cost vs. performance
- Monitor provider quotas
- Adjust weights based on usage patterns
Error Handling
- Implement proper fallback logic
- Monitor provider errors
- Set appropriate timeouts
Request Design
- Use provider-agnostic request format
- Include all required fields for both providers
- Handle provider-specific features gracefully

Troubleshooting

If you encounter issues:

Check provider selection headers
Verify provider health status
Review load balancing configuration
Monitor provider error responses

Next Steps

Set up monitoring for provider metrics
Configure provider-specific rate limits
Implement cost optimization strategies
Add more providers for redundancy

Getting Started

Core Concepts

Traffic Management

Actions API

Non-REST Connectivity

Rate Limiting & Request Control

Content Security

Agent Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

Load Balancing Between AI Providers

Prerequisites

Step 1: Create a Gateway

Step 2: Configure Multi-Provider Upstream

Provider-specific examples

Configuration Parameters

Step 3: Create a Service

Step 4: Add a Rule

Step 5: Generate an API Key

Using the Load Balanced API

Response Headers

Load Balancing Features

Best Practices

Troubleshooting

Next Steps

Getting Started

Core Concepts

Traffic Management

Actions API

Non-REST Connectivity

Rate Limiting & Request Control

Content Security

Agent Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

​Prerequisites

​Step 1: Create a Gateway

​Step 2: Configure Multi-Provider Upstream

​Provider-specific examples

​Configuration Parameters

​Step 3: Create a Service

​Step 4: Add a Rule

​Step 5: Generate an API Key

​Using the Load Balanced API

​Response Headers

​Load Balancing Features

​Best Practices

​Troubleshooting

​Next Steps

Prerequisites

Step 1: Create a Gateway

Step 2: Configure Multi-Provider Upstream

Provider-specific examples

Configuration Parameters

Step 3: Create a Service

Step 4: Add a Rule

Step 5: Generate an API Key

Using the Load Balanced API

Response Headers

Load Balancing Features

Best Practices

Troubleshooting

Next Steps