Skip to main content
Routing is how the NeuralTrust AI Gateway decides where a request goes and what plugins run along the way. It’s configured with two objects: Upstreams (where traffic can go) and Routes (what decides to send it there).

The two objects

Upstream

A single LLM target the gateway can call: an LLM provider, a self-hosted model, or a custom backend that fronts an LLM. Holds the URL, credentials, timeouts, retries, and health check.

Route

The match condition (path, method, model, headers, body fields) plus everything that happens when a request matches — upstream selection, load-balancing strategy, plugin chain, transformations, and attached policies.
A route can forward to a single upstream or to a pool of upstreams with a load-balancing strategy. Retries, failover, and health checks are configured on the route itself, so the same upstream can behave differently on different routes.

What a route binds

A route is the coupling point for everything that makes a request AI-aware. When a request matches, the route tells the gateway:
BindingWhat it controls
Upstream(s)One or many — a single provider, a weighted pool, a failover chain.
Load-balancing strategyRound-robin, weighted, least-connections, random, or semantic. See Load balancing.
Plugin chainWhich plugins run pre-request and post-response (Prompt Guard, PII, moderation, tool guards, rate limits, etc.). Plugins feed detection signals to the policy engine.
TransformationsPath rewrites, header injection / stripping, payload normalization between provider dialects.
Timeouts & retriesPer-route timeouts and retry budgets for the selected upstream pool.
PoliciesThe policy set that evaluates detections and chooses Allow / Log / Mask / Block. Policies can also attach at the app or global level — see Policy scope.

Match conditions

A route can match on any combination of:
  • Path — exact, prefix, or glob (/v1/chat/completions, /openai/*, /v1/embeddings).
  • MethodGET, POST, PUT, DELETE, etc.
  • HeadersX-Tenant-Id: acme, Authorization, content-type, custom tags.
  • Query parameters?team=support, feature flags.
  • Body fields — OpenAI-style fields like model, stream, tools[], or any JSON path into the payload.
The body-field match is what makes routing AI-aware: the same HTTP path can fork to different upstreams depending on which model the client asked for, whether the request contains tool calls, whether it’s streaming, or what message role is involved.

Matching order

Routes are evaluated top-down. The first route that matches wins. Use priority or explicit ordering to disambiguate overlapping routes. A typical priority stack looks like this:
  1. Specific model override — e.g. “all model = gpt-4o requests go to the Azure pool”.
  2. Per-tenant route — “tenant A uses EU upstreams”.
  3. Per-path route/openai/* goes to the OpenAI upstream set.
  4. Catch-all — everything else falls through to a default pool (or is rejected).
Put the narrowest route first and the catch-all last. If two routes could match the same request, give the one that should win a higher priority.

Typical patterns

PatternMatch onForwards toUse case
Per-providerPath prefix (/openai/*, /anthropic/*)One upstream per providerClean abstraction when each provider is called explicitly.
Per-modelBody field modelDifferent upstreams per model familySame client path, different upstreams per model.
Per-tenantHeader (X-Tenant-Id) or API keyDifferent upstream or pool per tenantQuota, cost, or data-residency isolation.
Per-regionHeader or client IPEU / US / APAC upstream poolsData-residency or latency optimization.
CanaryHeader or % of trafficWeighted pool (90 / 10)Roll out a new provider or model gradually.
ShadowSame match as primaryPrimary plus a secondary upstream, response discardedEval / A/B testing without affecting clients.
Model fallbackAny chat requestPrimary upstream with failover to a secondarySurvive a provider incident without downtime.

Request transformations

Routes can rewrite the request before it hits the upstream. This is how a single client contract keeps working while the actual providers behind it change:
  • Path rewrite/openai/chat on the client becomes /v1/chat/completions on the upstream.
  • Header injection — add provider-specific auth (Authorization: Bearer …) or tracing headers (traceparent).
  • Header stripping — remove client headers that shouldn’t leak upstream.
  • Dialect normalization — translate between provider payload shapes (OpenAI ↔ Anthropic ↔ Bedrock) so your client only ever speaks one format.
  • Payload shaping — inject a system prompt, trim message history, or clamp parameters like temperature / max_tokens.
Response-side transformations work symmetrically — the gateway can strip provider-specific fields or rewrite streaming deltas on the way back to the client.

Relationship to Runtime Security

The AI Gateway’s routes handle routing and plugin wiring. Security decisions — Allow / Log / Mask / Block — are the responsibility of the Runtime Security layer, typically attached through the Gateway enforcement surface. The handoff looks like this:
  1. Route matches → plugin chain runs → detections emitted.
  2. Runtime Security policies evaluate Where / When against the detections.
  3. Then action is applied — the request is allowed, logged, masked, or blocked.
In other words, routing precedes enforcement. If a request never matches a route, nothing downstream runs — so the default-deny posture for new paths is “add a route that blocks or forwards to a safe upstream”, not “hope no one finds the URL”. The same pipeline applies whether the Gateway surface sits on top of the NeuralTrust AI Gateway or a third-party AI gateway.