The Upstream object defines the targets and related configurations to which the AI Gateway routes requests. It specifies load balancing algorithms, health checks, credential settings, and other properties needed for managing one or more backend services or providers.

Below is a sample Upstream object with its fields and a brief explanation of each.

Fields Reference

Top-Level Fields

  • algorithm (string) Specifies the load balancing algorithm used for distributing requests among multiple targets. Common values might include round_robin, least_conn, or other supported algorithms.

  • health_checks (object) Defines active or passive health checks for monitoring the availability of upstream targets.

  • name (string) A human-friendly name for this upstream configuration, often used for clarity or logging.

  • tags (array of strings) An optional array of metadata tags. Useful for organizing, filtering, or searching among multiple upstream configurations.

  • targets (array of objects) Lists one or more backend endpoints (targets) that the gateway routes requests to. Each target contains its own configuration and credentials.


Health Checks

The health_checks object configures how the gateway monitors the health of its upstream targets.

  • headers (object) Key-value pairs representing custom headers to send during health check probes.

  • interval (number) The interval (in seconds) between consecutive health check probes.

  • passive (boolean) When set to true, enables passive health checks in addition to active checks. Passive checks rely on real request/response interactions, marking targets unhealthy if errors are detected.

  • path (string) The URL path used by active health checks to test the availability of upstream targets (for example, /health).

  • threshold (number) The number of consecutive failures required to mark a target as unhealthy (or successes required to restore it to a healthy state).


Targets

Each object in the targets array represents an individual endpoint or provider. You can have multiple targets to enable load balancing or to support different services under a single upstream definition.

  • host (string) The hostname or IP address of the target server (for example, api.example.com).

  • port (number) The TCP port on which the target is listening.

  • protocol (string) The protocol used to communicate with the target (e.g., http, https, grpc).

  • path (string) The path appended to the base URL when routing requests to this target.

  • provider (string) An optional identifier to denote the provider type or environment (e.g., aws, azure, gcp, or a custom provider name).

  • priority (number) A numerical priority, which can influence how load balancing is handled, depending on the chosen algorithm.

  • weight (number) For load balancing methods that support weights, this value affects how many requests are sent to the target relative to others.

  • headers (object) Additional headers to include in requests sent to this specific target.

  • tags (array of strings) Optional array of tags for metadata or organizational purposes.

  • default_model (string) If applicable, specifies a default model (e.g., for AI or ML deployments) used when routing requests to this target.

  • models (array of strings) Lists model identifiers supported by this target (e.g., AI models or deployment versions).


Credentials

The credentials object within each target controls how authentication and related parameters are handled.

  • callow_override (boolean) Indicates whether these credentials can be overridden by other configuration scopes.

  • aws_access_key_id (string) AWS access key ID for authentication (if using AWS-based services).

  • aws_secret_access_key (string) AWS secret access key for authentication (if using AWS-based services).

  • azure_client_id (string) Azure client ID for authentication.

  • azure_client_secret (string) Azure client secret key.

  • azure_tenant_id (string) Azure tenant ID for multi-tenant environments.

  • azure_use_managed_identity (boolean) If true, use Azure’s managed identity instead of explicit credentials.

  • gcp_service_account_json (string) A JSON string with Google Cloud service account credentials.

  • gcp_use_service_account (boolean) If true, the gateway uses GCP service accounts for authentication instead of explicit credentials.

  • header_name (string) The HTTP header name that will carry credentials or tokens.

  • header_value (string) The HTTP header value (e.g., a token) to be used.

  • param_location (string) Indicates where an authorization parameter might be placed (e.g., query, body, or path).

  • param_name (string) Name of the parameter used for authentication.

  • param_value (string) Value of the parameter used for authentication.


Usage Notes

Multiple Targets, One Upstream

You can define multiple targets to distribute traffic among different hosts or providers, all managed under the same upstream configuration.

Load Balancing

The algorithm and weight/priority fields define how traffic is distributed across targets.

Health Checks

Proper configuration of health_checks ensures that the gateway only routes requests to healthy targets.

Credentials Management

Sensitive credentials should be stored securely (e.g., via a vault or environment variables). The fields shown here indicate where the gateway expects credentials, not how they must be stored.

Tags

Use tags to organize or categorize upstream objects and targets, especially helpful in large-scale environments.