Service Types
AI Gateway supports different types of services that help you integrate and manage various AI model endpoints. The two main service types are:- Upstream Services
- Direct connection to backend AI models, allowing you to integrate your own hosted models and AI services
- Load balanced distribution across multiple target endpoints to optimize performance and resource utilization
- Built-in health checking capabilities to monitor service availability and performance
- Automatic failover support to maintain high availability when issues occur
- Proxy Services
- Seamless proxy requests to external AI providers like OpenAI, Anthropic, and others
- Comprehensive authentication and rate limiting to control access and usage
- Powerful request and response transformation capabilities to modify payloads as needed
- Intelligent response caching when possible to improve performance and reduce costs
Upstream Services
An upstream represents a virtual hostname that can be used to load balance incoming requests across multiple services (targets). In AI Gateway, upstreams define where and how requests are forwarded to backend AI services.Upstream Structure
An upstream consists of:- Name: A unique identifier for the upstream
- Algorithm: Load balancing algorithm to use
- Targets: List of backend services
- Health Checks: Configuration for monitoring target health
- Proxy: Optional configuration for proxy settings
- Tags: Optional metadata tags to organize upstreams
- Websocket: Optional WebSocket behavior configuration for upstream connections
- Embedding Config: Required only when using the
semantic
algorithm, to enable semantic load balancing
Health Checking
Upstreams support two types of health checks that work in tandem to ensure optimal system reliability and performance:Active Health Checks
Active health checking implements a proactive target testing strategy that continuously monitors the health of upstream targets. The system performs regular, scheduled health checks by sending test requests to each target at regular interval checks, allowing for early detection of potential issues before they impact user traffic. These checks are highly configurable and can be customized to match your specific service requirements. The system implements automatic target removal when health checks fail, seamlessly removing problematic targets from the active pool without disrupting service. This automated response helps maintain system stability by preventing requests from being routed to malfunctioning targets. The removal process is governed by configurable thresholds that determine how many failed checks must occur before a target is considered unhealthy, as well as how many successful checks are required before it can rejoin the active pool. These thresholds can be fine-tuned based on your specific reliability requirements and tolerance for false positives.Passive Health Checks
Passive health checking provides continuous monitoring by being based on request results from actual traffic patterns. Unlike active checks, passive monitoring analyzes real user requests to detect issues, providing insights into actual service behavior under real-world conditions. The system carefully tracks failures and successes of each request, building a comprehensive picture of target health based on actual performance data. The passive system implements a gradual target recovery mechanism that carefully reintroduces previously failed targets back into the rotation. This careful approach prevents sudden traffic spikes to recovering targets and helps ensure stability during recovery phases. The system also features sophisticated automatic circuit breaking capabilities that can quickly respond to degraded performance or increased error rates. When certain error thresholds are exceeded, the circuit breaker trips, temporarily removing the affected target from the pool to prevent cascade failures and allow time for recovery. The combination of both active and passive health checking creates a robust monitoring system that can quickly detect and respond to issues while maintaining optimal service availability. This dual approach ensures that problems are caught both through proactive testing and real-world usage patterns, providing comprehensive protection against service disruptions.Target Management
Targets represent the actual backend instances that handle incoming requests. The management of these targets is crucial for maintaining a reliable and efficient system. Let’s explore the key aspects of target management:1. Target Properties
The foundation of target management lies in properly configuring essential target properties. Each target requires a specific host and port configuration that defines its network location and access point. These basic connection parameters ensure the gateway can establish and maintain reliable connections to the backend service. The weight for load balancing property allows fine-grained control over traffic distribution. By assigning different weights to targets, administrators can influence how much traffic each target receives, enabling sophisticated load distribution strategies that account for varying server capacities and processing capabilities. Targets can also be configured as provider-based instead of host/port endpoints. A provider-based target specifies aprovider
, models
, and a default_model
along with credentials
. In this mode:
host
/port
must not be setcredentials
are required- at least one entry in
models
is required default_model
is required and must be one of themodels
stream
) and insecure SSL (insecure_ssl
) behavior where appropriate.
Comprehensive health check settings for each target enable customized monitoring approaches. These settings can be tailored to match the specific characteristics and requirements of each backend service, ensuring accurate health assessment and appropriate response to service degradation.
2. Target States
Target states represent the operational status of backend instances and determine how they participate in request handling. A target in the Healthy state actively receives traffic and participates fully in the load balancing rotation. These targets have passed all health checks and are operating within expected parameters, making them eligible to handle incoming requests. When a target fails health checks or exhibits problematic behavior, it enters an Unhealthy state and is automatically removed from the pool. This state change prevents new requests from being routed to problematic targets while the underlying issues are investigated and resolved, protecting overall system stability. The Draining state represents a graceful transition phase where a target is preparing to be removed from service. In this state, the target continues to process existing requests but doesn’t accept new ones, ensuring smooth maintenance operations and preventing request interruption. This controlled withdrawal process is essential for maintaining service quality during target removal or maintenance. Targets can also be Disabled through manual intervention, allowing administrators to explicitly remove targets from the active pool. This state is useful for planned maintenance, testing, or when specific targets need to be temporarily excluded from service without affecting the overall system operation.Proxy Configuration
The proxy configuration allows you to route traffic to upstream services through an intermediary proxy server. This can be useful in various scenarios:1. Network Architecture
The proxy configuration enables sophisticated network topology management by allowing traffic to flow through designated proxy servers. This approach can help maintain proper network segmentation and security boundaries, ensuring that direct connections between clients and backend services are mediated through controlled channels. It’s particularly valuable in complex enterprise environments where direct access to backend services may be restricted by network policies or security requirements.2. Security Enhancement
Using a proxy provides an additional security layer between clients and backend services. The proxy can implement various security measures such as:- Traffic inspection and filtering
- Access control enforcement
- Protection against direct attacks on backend services
- Masking of internal network architecture
3. Configuration Properties
The proxy configuration consists of these essential properties:- Host: The hostname or IP address of the proxy server that will handle the traffic to upstream targets. This defines the network location where requests will be forwarded before reaching the actual backend services.
- Port: The port number on which the proxy server is listening for incoming connections. This completes the address information needed to establish a connection to the proxy.
-
Protocol: The protocol used to communicate with the proxy (e.g.,
http
orhttps
).
Best Practices
- Load Balancing
- Choose appropriate algorithms
- Set proper target weights
- Configure connection limits
- Plan for scaling
- Health Checks
- Enable both check types
- Set appropriate thresholds
- Configure check intervals
- Define recovery behavior
- Target Management
- Maintain adequate capacity
- Plan for failover
- Consider geographic distribution
- Document target roles