The two objects
Upstream
A single LLM target the gateway can call: an LLM provider, a self-hosted model, or a custom backend that fronts an LLM. Holds the URL, credentials, timeouts, retries, and health check.
Route
The match condition (path, method, model, headers, body fields) plus everything that happens when a request matches — upstream selection, load-balancing strategy, plugin chain, transformations, and attached policies.
What a route binds
A route is the coupling point for everything that makes a request AI-aware. When a request matches, the route tells the gateway:| Binding | What it controls |
|---|---|
| Upstream(s) | One or many — a single provider, a weighted pool, a failover chain. |
| Load-balancing strategy | Round-robin, weighted, least-connections, random, or semantic. See Load balancing. |
| Plugin chain | Which plugins run pre-request and post-response (Prompt Guard, PII, moderation, tool guards, rate limits, etc.). Plugins feed detection signals to the policy engine. |
| Transformations | Path rewrites, header injection / stripping, payload normalization between provider dialects. |
| Timeouts & retries | Per-route timeouts and retry budgets for the selected upstream pool. |
| Policies | The policy set that evaluates detections and chooses Allow / Log / Mask / Block. Policies can also attach at the app or global level — see Policy scope. |
Match conditions
A route can match on any combination of:- Path — exact, prefix, or glob (
/v1/chat/completions,/openai/*,/v1/embeddings). - Method —
GET,POST,PUT,DELETE, etc. - Headers —
X-Tenant-Id: acme,Authorization, content-type, custom tags. - Query parameters —
?team=support, feature flags. - Body fields — OpenAI-style fields like
model,stream,tools[], or any JSON path into the payload.
Matching order
Routes are evaluated top-down. The first route that matches wins. Use priority or explicit ordering to disambiguate overlapping routes. A typical priority stack looks like this:- Specific model override — e.g. “all
model = gpt-4orequests go to the Azure pool”. - Per-tenant route — “tenant A uses EU upstreams”.
- Per-path route —
/openai/*goes to the OpenAI upstream set. - Catch-all — everything else falls through to a default pool (or is rejected).
Typical patterns
| Pattern | Match on | Forwards to | Use case |
|---|---|---|---|
| Per-provider | Path prefix (/openai/*, /anthropic/*) | One upstream per provider | Clean abstraction when each provider is called explicitly. |
| Per-model | Body field model | Different upstreams per model family | Same client path, different upstreams per model. |
| Per-tenant | Header (X-Tenant-Id) or API key | Different upstream or pool per tenant | Quota, cost, or data-residency isolation. |
| Per-region | Header or client IP | EU / US / APAC upstream pools | Data-residency or latency optimization. |
| Canary | Header or % of traffic | Weighted pool (90 / 10) | Roll out a new provider or model gradually. |
| Shadow | Same match as primary | Primary plus a secondary upstream, response discarded | Eval / A/B testing without affecting clients. |
| Model fallback | Any chat request | Primary upstream with failover to a secondary | Survive a provider incident without downtime. |
Request transformations
Routes can rewrite the request before it hits the upstream. This is how a single client contract keeps working while the actual providers behind it change:- Path rewrite —
/openai/chaton the client becomes/v1/chat/completionson the upstream. - Header injection — add provider-specific auth (
Authorization: Bearer …) or tracing headers (traceparent). - Header stripping — remove client headers that shouldn’t leak upstream.
- Dialect normalization — translate between provider payload shapes (OpenAI ↔ Anthropic ↔ Bedrock) so your client only ever speaks one format.
- Payload shaping — inject a system prompt, trim message history, or clamp parameters like
temperature/max_tokens.
Relationship to Runtime Security
The AI Gateway’s routes handle routing and plugin wiring. Security decisions — Allow / Log / Mask / Block — are the responsibility of the Runtime Security layer, typically attached through the Gateway enforcement surface. The handoff looks like this:- Route matches → plugin chain runs → detections emitted.
- Runtime Security policies evaluate
Where / Whenagainst the detections. Thenaction is applied — the request is allowed, logged, masked, or blocked.