How It Works
- Connection Count Tracking The gateway monitors the number of active (or inflight) connections to each target.
- Decision Process When a new request arrives, the gateway routes the request to the target with the lowest active connection count, balancing the load in real time.
- Adaptive Distribution As traffic fluctuates, targets frequently shift in and out of the “least connections” spot. This approach is ideal when you have backends with relatively similar performance characteristics and want an adaptive load balancing method.
Example: Creating an Upstream with Least Connections
Below is an examplecurl
command demonstrating how to create an Upstream using the least-connections load balancing algorithm. It sets up two targets—one pointing to OpenAI and another to Anthropic.