An upstream represents a virtual hostname that can be used to load balance incoming requests across multiple services (targets). In AI Gateway, upstreams define where and how requests are forwarded to backend AI services.
Active health checking implements a proactive target testing strategy that continuously monitors the health of upstream targets. The system performs regular, scheduled health checks by sending test requests to each target at regular interval checks, allowing for early detection of potential issues before they impact user traffic. These checks are highly configurable and can be customized to match your specific service requirements.The system implements automatic target removal when health checks fail, seamlessly removing problematic targets from the active pool without disrupting service. This automated response helps maintain system stability by preventing requests from being routed to malfunctioning targets. The removal process is governed by configurable thresholds that determine how many failed checks must occur before a target is considered unhealthy, as well as how many successful checks are required before it can rejoin the active pool. These thresholds can be fine-tuned based on your specific reliability requirements and tolerance for false positives.
Passive health checking provides continuous monitoring by being based on request results from actual traffic patterns. Unlike active checks, passive monitoring analyzes real user requests to detect issues, providing insights into actual service behavior under real-world conditions. The system carefully tracks failures and successes of each request, building a comprehensive picture of target health based on actual performance data.The passive system implements a gradual target recovery mechanism that carefully reintroduces previously failed targets back into the rotation. This careful approach prevents sudden traffic spikes to recovering targets and helps ensure stability during recovery phases. The system also features sophisticated automatic circuit breaking capabilities that can quickly respond to degraded performance or increased error rates. When certain error thresholds are exceeded, the circuit breaker trips, temporarily removing the affected target from the pool to prevent cascade failures and allow time for recovery.The combination of both active and passive health checking creates a robust monitoring system that can quickly detect and respond to issues while maintaining optimal service availability. This dual approach ensures that problems are caught both through proactive testing and real-world usage patterns, providing comprehensive protection against service disruptions.
Targets represent the actual backend instances that handle incoming requests. The management of these targets is crucial for maintaining a reliable and efficient system. Let’s explore the key aspects of target management:
The foundation of target management lies in properly configuring essential target properties. Each target requires a specific host and port configuration that defines its network location and access point. These basic connection parameters ensure the gateway can establish and maintain reliable connections to the backend service.The weight for load balancing property allows fine-grained control over traffic distribution. By assigning different weights to targets, administrators can influence how much traffic each target receives, enabling sophisticated load distribution strategies that account for varying server capacities and processing capabilities.Target priority for failover settings determine the order in which targets are selected when failures occur. This hierarchical approach to failover ensures that traffic is redirected to the most appropriate backup targets when issues arise, maintaining service continuity while respecting operational preferences and infrastructure capabilities.Comprehensive health check settings for each target enable customized monitoring approaches. These settings can be tailored to match the specific characteristics and requirements of each backend service, ensuring accurate health assessment and appropriate response to service degradation.
Target states represent the operational status of backend instances and determine how they participate in request handling. A target in the Healthy state actively receives traffic and participates fully in the load balancing rotation. These targets have passed all health checks and are operating within expected parameters, making them eligible to handle incoming requests.When a target fails health checks or exhibits problematic behavior, it enters an Unhealthy state and is automatically removed from the pool. This state change prevents new requests from being routed to problematic targets while the underlying issues are investigated and resolved, protecting overall system stability.The Draining state represents a graceful transition phase where a target is preparing to be removed from service. In this state, the target continues to process existing requests but doesn’t accept new ones, ensuring smooth maintenance operations and preventing request interruption. This controlled withdrawal process is essential for maintaining service quality during target removal or maintenance.Targets can also be Disabled through manual intervention, allowing administrators to explicitly remove targets from the active pool. This state is useful for planned maintenance, testing, or when specific targets need to be temporarily excluded from service without affecting the overall system operation.
The proxy configuration allows you to route traffic to upstream services through an intermediary proxy server. This can be useful in various scenarios:
The proxy configuration enables sophisticated network topology management by allowing traffic to flow through designated proxy servers. This approach can help maintain proper network segmentation and security boundaries, ensuring that direct connections between clients and backend services are mediated through controlled channels. It’s particularly valuable in complex enterprise environments where direct access to backend services may be restricted by network policies or security requirements.
The proxy configuration consists of two essential properties:
Host: The hostname or IP address of the proxy server that will handle the traffic to upstream targets. This defines the network location where requests will be forwarded before reaching the actual backend services.
Port: The port number on which the proxy server is listening for incoming connections. This completes the address information needed to establish a connection to the proxy.
When configured, all traffic to the upstream’s targets will be routed through this proxy, allowing for centralized control and management of the communication flow.