NeuralTrust | The leading security platform for generative AI

Semantic Loadbalancer is an advanced load balancing strategy that routes requests to the most appropriate backend target based on the semantic content of the request. Unlike traditional load balancing methods that distribute traffic based on predefined patterns or server metrics, semantic load balancing analyzes the actual content of requests to make intelligent routing decisions.

Key Benefit: Routes requests to the most semantically appropriate target based on content analysis.
AI-Powered Routing: Uses embeddings to calculate similarity between request content and target descriptions.
Intelligent Distribution: Ensures that specialized models or services receive the most relevant requests.
Fallback Mechanism: If no suitable match is found, defaults to the first available target.

How It Works

The Semantic Loadbalancer works by:

Extracting the prompt or content from incoming requests
Generating embeddings (vector representations) of the request content
Comparing these embeddings with pre-stored embeddings of target descriptions
Routing the request to the target with the highest semantic similarity
Falling back to the first target if no suitable match is found or in case of errors

This approach ensures that requests are sent to the most appropriate backend service based on their actual content, improving response quality and resource utilization.

Create an Upstream with Semantic Loadbalancing Strategy

Below is an example command to create an Upstream using the semantic loadbalancing algorithm. The sample request includes multiple targets, each with a description that will be used for semantic matching.

# Create an upstream with semantic load balancing
curl -X POST http://localhost:8080/api/v1/gateways/{gateway-id}/upstreams \
  -H "Content-Type: application/json" \
  -d '{
    "name": "ai-providers-semantic-upstream",
    "algorithm": "semantic",
    "embedding_config": {
      "provider": "openai",
      "model": "text-embedding-ada-002",
      "credentials": {
        "header_name": "Authorization",
        "header_value": "Bearer your-openai-key"
      }
    },
    "targets": [
      {
        "path": "/v1/chat/completions",
        "provider": "openai",
        "description": "Specialized in creative writing, storytelling, and content generation. Good for marketing copy, blog posts, and creative fiction.",
        "default_model": "gpt-4",
        "models": ["gpt-3.5-turbo", "gpt-4", "gpt-4o-mini"],
        "credentials": {
          "header_name": "Authorization",
          "header_value": "Bearer your-openai-key"
        }
      },
      {
        "path": "/v1/messages",
        "provider": "anthropic",
        "description": "Excels at technical documentation, code explanation, and scientific content. Ideal for programming help and technical problem-solving.",
        "default_model": "claude-3-5-sonnet-20241022",
        "models": ["claude-3-5-sonnet-20241022"],
        "headers": {
          "anthropic-version": "2023-06-01"
        },
        "credentials": {
          "header_name": "x-api-key",
          "header_value": "your-anthropic-key"
        }
      },
      {
        "path": "/v1/chat/completions",
        "provider": "openai",
        "description": "Specialized in data analysis, mathematical reasoning, and logical problem-solving. Best for analytical tasks and structured data processing.",
        "default_model": "gpt-4-turbo",
        "models": ["gpt-4-turbo"],
        "credentials": {
          "header_name": "Authorization",
          "header_value": "Bearer your-openai-key"
        }
      }
    ],
    "health_checks": {
      "passive": true,
      "threshold": 3,
      "interval": 60
    }
  }'

Configuration Parameters

Parameter	Type	Description
`algorithm`	string	Must be set to `"semantic"` to use semantic load balancing
`embedding_config`	object	Configuration for the embedding service
`embedding_config.provider`	string	Embedding provider (currently only “openai” supported)
`embedding_config.model`	string	Model to use for generating embeddings
`embedding_config.credentials`	object	Credentials for the embedding service
`embedding_config.credentials.header_name`	string	Name of the header for authentication
`embedding_config.credentials.header_value`	string	Value of the header for authentication
`targets[].description`	string	Description of the target’s capabilities or specialization (used for semantic matching)

Considerations

Each target should have a meaningful description that accurately represents its capabilities or specialization.
The quality of routing depends on the quality of the descriptions provided for each target.
The semantic loadbalancer requires additional processing time to generate and compare embeddings.
If all targets fail semantic matching or if an error occurs during the matching process, the first target in the list will be used as a fallback.
Ensure your embedding service has sufficient capacity to handle the request volume.

Getting Started

Core Concepts

Traffic Management

Rate Limiting & Request Control

Content Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

Semantic

How It Works

Create an Upstream with Semantic Loadbalancing Strategy

Configuration Parameters

Considerations

Getting Started

Core Concepts

Traffic Management

Rate Limiting & Request Control

Content Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

​How It Works

​Create an Upstream with Semantic Loadbalancing Strategy

​Configuration Parameters

​Considerations

How It Works

Create an Upstream with Semantic Loadbalancing Strategy

Configuration Parameters

Considerations