NeuralTrust | The leading security platform for generative AI

Routing is how the NeuralTrust AI Gateway decides where a request goes and what plugins run along the way. It’s configured with two objects: Upstreams (where traffic can go) and Routes (what decides to send it there).

The two objects

Upstream

A single LLM target the gateway can call: an LLM provider, a self-hosted model, or a custom backend that fronts an LLM. Holds the URL, credentials, timeouts, retries, and health check.

Route

The match condition (path, method, model, headers, body fields) plus everything that happens when a request matches — upstream selection, load-balancing strategy, plugin chain, transformations, and attached policies.

A route can forward to a single upstream or to a pool of upstreams with a load-balancing strategy. Retries, failover, and health checks are configured on the route itself, so the same upstream can behave differently on different routes.

What a route binds

A route is the coupling point for everything that makes a request AI-aware. When a request matches, the route tells the gateway:

Binding	What it controls
Upstream(s)	One or many — a single provider, a weighted pool, a failover chain.
Load-balancing strategy	Round-robin, weighted, least-connections, random, or semantic. See Load balancing.
Plugin chain	Which plugins run pre-request and post-response (Prompt Guard, PII, moderation, tool guards, rate limits, etc.). Plugins feed detection signals to the policy engine.
Transformations	Path rewrites, header injection / stripping, payload normalization between provider dialects.
Timeouts & retries	Per-route timeouts and retry budgets for the selected upstream pool.
Policies	The policy set that evaluates detections and chooses `Allow` / `Log` / `Mask` / `Block`. Policies can also attach at the app or global level — see Policy scope.

Match conditions

A route can match on any combination of:

Path — exact, prefix, or glob (/v1/chat/completions, /openai/*, /v1/embeddings).
Method — GET, POST, PUT, DELETE, etc.
Headers — X-Tenant-Id: acme, Authorization, content-type, custom tags.
Query parameters — ?team=support, feature flags.
Body fields — OpenAI-style fields like model, stream, tools[], or any JSON path into the payload.

The body-field match is what makes routing AI-aware: the same HTTP path can fork to different upstreams depending on which model the client asked for, whether the request contains tool calls, whether it’s streaming, or what message role is involved.

Matching order

Routes are evaluated top-down. The first route that matches wins. Use priority or explicit ordering to disambiguate overlapping routes. A typical priority stack looks like this:

Specific model override — e.g. “all model = gpt-4o requests go to the Azure pool”.
Per-tenant route — “tenant A uses EU upstreams”.
Per-path route — /openai/* goes to the OpenAI upstream set.
Catch-all — everything else falls through to a default pool (or is rejected).

Put the narrowest route first and the catch-all last. If two routes could match the same request, give the one that should win a higher priority.

Typical patterns

Pattern	Match on	Forwards to	Use case
Per-provider	Path prefix (`/openai/`, `/anthropic/`)	One upstream per provider	Clean abstraction when each provider is called explicitly.
Per-model	Body field `model`	Different upstreams per model family	Same client path, different upstreams per model.
Per-tenant	Header (`X-Tenant-Id`) or API key	Different upstream or pool per tenant	Quota, cost, or data-residency isolation.
Per-region	Header or client IP	EU / US / APAC upstream pools	Data-residency or latency optimization.
Canary	Header or % of traffic	Weighted pool (90 / 10)	Roll out a new provider or model gradually.
Shadow	Same match as primary	Primary plus a secondary upstream, response discarded	Eval / A/B testing without affecting clients.
Model fallback	Any chat request	Primary upstream with failover to a secondary	Survive a provider incident without downtime.

Request transformations

Routes can rewrite the request before it hits the upstream. This is how a single client contract keeps working while the actual providers behind it change:

Path rewrite — /openai/chat on the client becomes /v1/chat/completions on the upstream.
Header injection — add provider-specific auth (Authorization: Bearer …) or tracing headers (traceparent).
Header stripping — remove client headers that shouldn’t leak upstream.
Dialect normalization — translate between provider payload shapes (OpenAI ↔ Anthropic ↔ Bedrock) so your client only ever speaks one format.
Payload shaping — inject a system prompt, trim message history, or clamp parameters like temperature / max_tokens.

Response-side transformations work symmetrically — the gateway can strip provider-specific fields or rewrite streaming deltas on the way back to the client.

Relationship to Runtime Security

The AI Gateway’s routes handle routing and plugin wiring. Security decisions — Allow / Log / Mask / Block — are the responsibility of the Runtime Security layer, typically attached through the Gateway enforcement surface. The handoff looks like this:

Route matches → plugin chain runs → detections emitted.
Runtime Security policies evaluate Where / When against the detections.
Then action is applied — the request is allowed, logged, masked, or blocked.

In other words, routing precedes enforcement. If a request never matches a route, nothing downstream runs — so the default-deny posture for new paths is “add a route that blocks or forwards to a safe upstream”, not “hope no one finds the URL”. The same pipeline applies whether the Gateway surface sits on top of the NeuralTrust AI Gateway or a third-party AI gateway.

​The two objects

Upstream

Route

​What a route binds

​Match conditions

​Matching order

​Typical patterns

​Request transformations

​Relationship to Runtime Security