Semantic Loadbalancer is an advanced load balancing strategy that routes requests to the most appropriate backend target based on the semantic content of the request. Unlike traditional load balancing methods that distribute traffic based on predefined patterns or server metrics, semantic load balancing analyzes the actual content of requests to make intelligent routing decisions.

  • Key Benefit: Routes requests to the most semantically appropriate target based on content analysis.
  • AI-Powered Routing: Uses embeddings to calculate similarity between request content and target descriptions.
  • Intelligent Distribution: Ensures that specialized models or services receive the most relevant requests.
  • Fallback Mechanism: If no suitable match is found, defaults to the first available target.

How It Works

The Semantic Loadbalancer works by:

  1. Extracting the prompt or content from incoming requests
  2. Generating embeddings (vector representations) of the request content
  3. Comparing these embeddings with pre-stored embeddings of target descriptions
  4. Routing the request to the target with the highest semantic similarity
  5. Falling back to the first target if no suitable match is found or in case of errors

This approach ensures that requests are sent to the most appropriate backend service based on their actual content, improving response quality and resource utilization.


Create an Upstream with Semantic Loadbalancing Strategy

Below is an example command to create an Upstream using the semantic loadbalancing algorithm. The sample request includes multiple targets, each with a description that will be used for semantic matching.

# Create an upstream with semantic load balancing
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "name": "ai-providers-semantic-upstream",
    "algorithm": "semantic",
    "embedding_config": {
      "provider": "openai",
      "model": "text-embedding-ada-002",
      "credentials": {
        "header_name": "Authorization",
        "header_value": "Bearer your-openai-key"
      }
    },
    "targets": [
      {
        "path": "/v1/chat/completions",
        "provider": "openai",
        "description": "Specialized in creative writing, storytelling, and content generation. Good for marketing copy, blog posts, and creative fiction.",
        "default_model": "gpt-4",
        "models": ["gpt-3.5-turbo", "gpt-4", "gpt-4o-mini"],
        "credentials": {
          "header_name": "Authorization",
          "header_value": "Bearer your-openai-key"
        }
      },
      {
        "path": "/v1/messages",
        "provider": "anthropic",
        "description": "Excels at technical documentation, code explanation, and scientific content. Ideal for programming help and technical problem-solving.",
        "default_model": "claude-3-5-sonnet-20241022",
        "models": ["claude-3-5-sonnet-20241022"],
        "headers": {
          "anthropic-version": "2023-06-01"
        },
        "credentials": {
          "header_name": "x-api-key",
          "header_value": "your-anthropic-key"
        }
      },
      {
        "path": "/v1/chat/completions",
        "provider": "openai",
        "description": "Specialized in data analysis, mathematical reasoning, and logical problem-solving. Best for analytical tasks and structured data processing.",
        "default_model": "gpt-4-turbo",
        "models": ["gpt-4-turbo"],
        "credentials": {
          "header_name": "Authorization",
          "header_value": "Bearer your-openai-key"
        }
      }
    ],
    "health_checks": {
      "passive": true,
      "threshold": 3,
      "interval": 60
    }
  }'

Configuration Parameters

ParameterTypeDescription
algorithmstringMust be set to "semantic" to use semantic load balancing
embedding_configobjectConfiguration for the embedding service
embedding_config.providerstringEmbedding provider (currently only “openai” supported)
embedding_config.modelstringModel to use for generating embeddings
embedding_config.credentialsobjectCredentials for the embedding service
embedding_config.credentials.header_namestringName of the header for authentication
embedding_config.credentials.header_valuestringValue of the header for authentication
targets[].descriptionstringDescription of the target’s capabilities or specialization (used for semantic matching)

Considerations

  • Each target should have a meaningful description that accurately represents its capabilities or specialization.
  • The quality of routing depends on the quality of the descriptions provided for each target.
  • The semantic loadbalancer requires additional processing time to generate and compare embeddings.
  • If all targets fail semantic matching or if an error occurs during the matching process, the first target in the list will be used as a fallback.
  • Ensure your embedding service has sufficient capacity to handle the request volume.