Service Types

AI Gateway supports different types of services that help you integrate and manage various AI model endpoints. The two main service types are:

Upstream Services

These services provide direct connections to your backend AI models and infrastructure. They offer robust features including:

Direct connection to backend AI models, allowing you to integrate your own hosted models and AI services
Load balanced distribution across multiple target endpoints to optimize performance and resource utilization
Built-in health checking capabilities to monitor service availability and performance
Automatic failover support to maintain high availability when issues occur

Proxy Services

These services act as intermediaries to external AI providers, adding important management capabilities:

Seamless proxy requests to external AI providers like OpenAI, Anthropic, and others
Comprehensive authentication and rate limiting to control access and usage
Powerful request and response transformation capabilities to modify payloads as needed
Intelligent response caching when possible to improve performance and reduce costs

Upstream Services

An upstream represents a virtual hostname that can be used to load balance incoming requests across multiple services (targets). In AI Gateway, upstreams define where and how requests are forwarded to backend AI services.

Upstream Structure

An upstream consists of:

Name: A unique identifier for the upstream
Algorithm: Load balancing algorithm to use
Targets: List of backend services
Health Checks: Configuration for monitoring target health
Proxy: Optional configuration for proxy settings
Tags: Optional metadata tags to organize upstreams
Websocket: Optional WebSocket behavior configuration for upstream connections
Embedding Config: Required only when using the semantic algorithm, to enable semantic load balancing

Health Checking

Upstreams support two types of health checks that work in tandem to ensure optimal system reliability and performance:

Active Health Checks

Active health checking implements a proactive target testing strategy that continuously monitors the health of upstream targets. The system performs regular, scheduled health checks by sending test requests to each target at regular interval checks, allowing for early detection of potential issues before they impact user traffic. These checks are highly configurable and can be customized to match your specific service requirements. The system implements automatic target removal when health checks fail, seamlessly removing problematic targets from the active pool without disrupting service. This automated response helps maintain system stability by preventing requests from being routed to malfunctioning targets. The removal process is governed by configurable thresholds that determine how many failed checks must occur before a target is considered unhealthy, as well as how many successful checks are required before it can rejoin the active pool. These thresholds can be fine-tuned based on your specific reliability requirements and tolerance for false positives.

Passive Health Checks

Passive health checking provides continuous monitoring by being based on request results from actual traffic patterns. Unlike active checks, passive monitoring analyzes real user requests to detect issues, providing insights into actual service behavior under real-world conditions. The system carefully tracks failures and successes of each request, building a comprehensive picture of target health based on actual performance data. The passive system implements a gradual target recovery mechanism that carefully reintroduces previously failed targets back into the rotation. This careful approach prevents sudden traffic spikes to recovering targets and helps ensure stability during recovery phases. The system also features sophisticated automatic circuit breaking capabilities that can quickly respond to degraded performance or increased error rates. When certain error thresholds are exceeded, the circuit breaker trips, temporarily removing the affected target from the pool to prevent cascade failures and allow time for recovery. The combination of both active and passive health checking creates a robust monitoring system that can quickly detect and respond to issues while maintaining optimal service availability. This dual approach ensures that problems are caught both through proactive testing and real-world usage patterns, providing comprehensive protection against service disruptions.

Target Management

Targets represent the actual backend instances that handle incoming requests. The management of these targets is crucial for maintaining a reliable and efficient system. Let’s explore the key aspects of target management:

1. Target Properties

Configure targets either as network endpoints (protocol/host/port) or as provider-based targets. Key properties include:

Network endpoint (when not using a provider):
- protocol: Request protocol to upstream (e.g., http, https, ws, wss).
- host and port: Destination host and TCP port.
- path: Optional base path to prefix to all forwarded requests.
- headers: Static headers to append to every request sent to this target.
- insecure_ssl: Skip TLS certificate verification for https/wss when true.
Provider-based (virtual) targets:
- provider: External AI provider identifier (e.g., openai, anthropic, etc.).
- provider_options: Free-form options required by the provider (JSON object).
- models and default_model: Allowed models and the default selection.
- credentials: Secret material used for provider authentication or templating.
- In this mode, do not set host/port.
Load balancing and behavior:
- weight: Relative weight for distributing requests to this target.
- stream: Enable streaming behavior to the upstream when supported.
- tags and description: Optional metadata for organizing and documenting targets.
Authentication:
- auth: Optional authentication block. See “Target Authentication (OAuth2)” below.

Comprehensive health check settings can be applied per target (see Health Checking above) to tailor monitoring and recovery behavior to each backend service.

2. Target Authentication (OAuth2)

Targets support OAuth2 for acquiring and attaching access tokens to upstream requests. Configure by setting auth.type to oauth2 and providing the oauth options. Unless otherwise configured, tokens are added as Authorization: Bearer <token>. Required and optional fields:

token_url (required): OAuth2 token endpoint.
grant_type (required): One of client_credentials, authorization_code, password, or refresh_token.
client_id, client_secret: Client credentials as required by the grant.
use_basic_auth: When true, send client credentials via HTTP Basic Auth; otherwise send in the request body.
scopes: List of scopes to request.
audience: Audience parameter when required by the authorization server.
code, redirect_uri, code_verifier: Used for authorization_code grant (PKCE supported when code_verifier is present).
username, password: Used for password (Resource Owner Password) grant.
refresh_token: Used for refresh_token grant or to refresh expired tokens.
extra: Additional form parameters to include in the token request.

Example: Client Credentials grant

{
  "targets": [
    {
      "protocol": "https",
      "host": "api.example.com",
      "port": 443,
      "path": "/v1",
      "auth": {
        "type": "oauth2",
        "oauth": {
          "token_url": "https://auth.example.com/oauth/token",
          "grant_type": "client_credentials",
          "client_id": "your-client-id",
          "client_secret": "your-client-secret",
          "scopes": ["read", "write"],
          "use_basic_auth": true
        }
      }
    }
  ]
}

Example: Authorization Code with PKCE

{
  "auth": {
    "type": "oauth2",
    "oauth": {
      "token_url": "https://auth.example.com/oauth/token",
      "grant_type": "authorization_code",
      "code": "<authorization_code>",
      "redirect_uri": "https://your.app/callback",
      "client_id": "your-client-id",
      "code_verifier": "<pkce-code-verifier>",
      "scopes": ["openid", "profile"]
    }
  }
}

Example: Resource Owner Password Credentials

{
  "auth": {
    "type": "oauth2",
    "oauth": {
      "token_url": "https://auth.example.com/oauth/token",
      "grant_type": "password",
      "client_id": "your-client-id",
      "client_secret": "your-client-secret",
      "username": "alice",
      "password": "s3cret",
      "scopes": ["read"]
    }
  }
}

Notes:

When both use_basic_auth=true and client credentials are provided, the gateway sends them via the Authorization header to the token endpoint.
If refresh_token is provided or returned by the authorization server, it can be used to obtain new access tokens when previous ones expire.

3. Target States

Target states represent the operational status of backend instances and determine how they participate in request handling. A target in the Healthy state actively receives traffic and participates fully in the load balancing rotation. These targets have passed all health checks and are operating within expected parameters, making them eligible to handle incoming requests. When a target fails health checks or exhibits problematic behavior, it enters an Unhealthy state and is automatically removed from the pool. This state change prevents new requests from being routed to problematic targets while the underlying issues are investigated and resolved, protecting overall system stability. The Draining state represents a graceful transition phase where a target is preparing to be removed from service. In this state, the target continues to process existing requests but doesn’t accept new ones, ensuring smooth maintenance operations and preventing request interruption. This controlled withdrawal process is essential for maintaining service quality during target removal or maintenance. Targets can also be Disabled through manual intervention, allowing administrators to explicitly remove targets from the active pool. This state is useful for planned maintenance, testing, or when specific targets need to be temporarily excluded from service without affecting the overall system operation.

Proxy Configuration

The proxy configuration allows you to route traffic to upstream services through an intermediary proxy server. This can be useful in various scenarios:

1. Network Architecture

The proxy configuration enables sophisticated network topology management by allowing traffic to flow through designated proxy servers. This approach can help maintain proper network segmentation and security boundaries, ensuring that direct connections between clients and backend services are mediated through controlled channels. It’s particularly valuable in complex enterprise environments where direct access to backend services may be restricted by network policies or security requirements.

2. Security Enhancement

Using a proxy provides an additional security layer between clients and backend services. The proxy can implement various security measures such as:

Traffic inspection and filtering
Access control enforcement
Protection against direct attacks on backend services
Masking of internal network architecture

3. Configuration Properties

The proxy configuration consists of these essential properties:

Host: The hostname or IP address of the proxy server that will handle the traffic to upstream targets. This defines the network location where requests will be forwarded before reaching the actual backend services.
Port: The port number on which the proxy server is listening for incoming connections. This completes the address information needed to establish a connection to the proxy.
Protocol: The protocol used to communicate with the proxy (e.g., http or https).

When configured, all traffic to the upstream’s targets will be routed through this proxy, allowing for centralized control and management of the communication flow.

Best Practices

Load Balancing

Choose appropriate algorithms
Set proper target weights
Configure connection limits
Plan for scaling

Health Checks

Enable both check types
Set appropriate thresholds
Configure check intervals
Define recovery behavior

Target Management

Maintain adequate capacity
Plan for failover
Consider geographic distribution
Document target roles

Getting Started

Core Concepts

Traffic Management

Actions API

Non-REST Connectivity

Rate Limiting & Request Control

Content Security

Agent Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

Upstream Services & Routing

Service Types

Upstream Services

Upstream Structure

Health Checking

Active Health Checks

Passive Health Checks

Target Management

1. Target Properties

2. Target Authentication (OAuth2)

3. Target States

Proxy Configuration

1. Network Architecture

2. Security Enhancement

3. Configuration Properties

Best Practices

Getting Started

Core Concepts

Traffic Management

Actions API

Non-REST Connectivity

Rate Limiting & Request Control

Content Security

Agent Security

Application Security

Server Security

Data masking

Extending Functionality

Observability & Monitoring

Benchmark

API Reference

​Service Types

​Upstream Services

​Upstream Structure

​Health Checking

​Active Health Checks

​Passive Health Checks

​Target Management

​1. Target Properties

​2. Target Authentication (OAuth2)

​3. Target States

​Proxy Configuration

​1. Network Architecture

​2. Security Enhancement

​3. Configuration Properties

​Best Practices

Service Types

Upstream Services

Upstream Structure

Health Checking

Active Health Checks

Passive Health Checks

Target Management

1. Target Properties

2. Target Authentication (OAuth2)

3. Target States

Proxy Configuration

1. Network Architecture

2. Security Enhancement

3. Configuration Properties

Best Practices