Semantic
Semantic Loadbalancer is an advanced load balancing strategy that routes requests to the most appropriate backend target based on the semantic content of the request. Unlike traditional load balancing methods that distribute traffic based on predefined patterns or server metrics, semantic load balancing analyzes the actual content of requests to make intelligent routing decisions.
- Key Benefit: Routes requests to the most semantically appropriate target based on content analysis.
- AI-Powered Routing: Uses embeddings to calculate similarity between request content and target descriptions.
- Intelligent Distribution: Ensures that specialized models or services receive the most relevant requests.
- Fallback Mechanism: If no suitable match is found, defaults to the first available target.
How It Works
The Semantic Loadbalancer works by:
- Extracting the prompt or content from incoming requests
- Generating embeddings (vector representations) of the request content
- Comparing these embeddings with pre-stored embeddings of target descriptions
- Routing the request to the target with the highest semantic similarity
- Falling back to the first target if no suitable match is found or in case of errors
This approach ensures that requests are sent to the most appropriate backend service based on their actual content, improving response quality and resource utilization.
Create an Upstream with Semantic Loadbalancing Strategy
Below is an example command to create an Upstream using the semantic loadbalancing algorithm. The sample request includes multiple targets, each with a description that will be used for semantic matching.
Configuration Parameters
Parameter | Type | Description |
---|---|---|
algorithm | string | Must be set to "semantic" to use semantic load balancing |
embedding_config | object | Configuration for the embedding service |
embedding_config.provider | string | Embedding provider (currently only “openai” supported) |
embedding_config.model | string | Model to use for generating embeddings |
embedding_config.credentials | object | Credentials for the embedding service |
embedding_config.credentials.header_name | string | Name of the header for authentication |
embedding_config.credentials.header_value | string | Value of the header for authentication |
targets[].description | string | Description of the target’s capabilities or specialization (used for semantic matching) |
Considerations
- Each target should have a meaningful
description
that accurately represents its capabilities or specialization. - The quality of routing depends on the quality of the descriptions provided for each target.
- The semantic loadbalancer requires additional processing time to generate and compare embeddings.
- If all targets fail semantic matching or if an error occurs during the matching process, the first target in the list will be used as a fallback.
- Ensure your embedding service has sufficient capacity to handle the request volume.