Skip to main content
The NeuralTrust AI Gateway is NeuralTrust’s first-party, AI-aware gateway. It sits in front of LLMs — applications and agents send requests to a gateway, and the gateway forwards them to the right upstream (an LLM provider, a self-hosted model, an embedding / moderation API, or a custom LLM-fronting backend) while handling routing, load balancing, and traffic control.
The AI Gateway does not sit in front of MCP servers. MCP and tool security is handled separately by the Runtime Security layer — see Agent & MCP security.
Think of it as an API gateway, but AI-aware: it understands prompts, responses, OpenAI-style chat payloads, streaming, tool calls, and embeddings — not just HTTP verbs and paths.
How you get an AI Gateway in the platform. You don’t provision a separate product — you create a Gateway integration (Integrations → Add Integration → Gateway) and add Routes on it. When a route points directly at an LLM provider or a self-hosted model, that Gateway is acting as an embedded AI Gateway: the same hop is both the Gateway enforcement surface and the AI gateway. When a route points at a third-party AI gateway or a custom backend instead, the Gateway is only a security layer on top — the underlying component keeps doing its own routing.

What an AI gateway adds on top of an API gateway

A plain API gateway gives you…The AI Gateway additionally gives you…
Path- and header-based routingRouting by model, prompt contents, or tool call
Fixed rate limits by IP or keyToken-aware rate limits (prompt + completion tokens)
Request / response loggingPrompt, response, and tool-call capture for observability and replay
Basic auth and mTLSProvider-specific credentials per upstream, normalized across providers
Size limits on bodiesSize + shape limits aware of LLM payloads (messages, attachments, embeddings)
Backend health checksHealth checks + semantic load balancing across providers and models

What the AI Gateway does

Routing

Matches requests to the right LLM upstream (provider, self-hosted model, or custom LLM-fronting backend) using path, method, headers, model name, or body fields. See Routes & forwarding.

Load balancing

Distributes traffic across providers or replicas with weights, health checks, failover, and semantic routing. See Load balancing.

Traffic control

Rate limits, token caps, size limits, bot detection, and anomaly detection protect both your app and the upstream provider. See Traffic control.

Interception hooks

Every request and response runs through a plugin chain so upstream selection, transformations, and — when Runtime Security is attached — detections and policies can fire with AI-specific context.

How a request flows

A single request through the AI Gateway goes through these phases. When the Runtime Security layer is placed in front of (or composed with) the gateway, its detectors and policies plug into the pre-request and post-response hooks below.
  1. Receive — the gateway accepts the client request (HTTP, streaming, or WebSocket).
  2. Match route — routes are evaluated top-down; the first matching route binds the request to one or more upstreams plus the plugin chain to run.
  3. Pre-request plugins — input-side plugins run: transformations, rate limits, token limits, custom plugins, and — when Runtime Security is active — Prompt Guard, PII detection, content moderation, URL analyzer, tool permission, application-security checks, etc.
  4. Transform & forward — path rewrites, header injection, and provider-dialect normalization are applied, and the request is forwarded to the selected upstream (with optional load balancing, retries, and failover).
  5. Post-response plugins — output-side plugins run on the response (or on each streaming chunk): response moderation, PII masking on generations, tool-selection validation, toxicity detection, etc.
  6. Emit — the (possibly rewritten) response is returned to the client, and telemetry — prompts, responses, latencies, token counts, and any detections — is emitted to observability.

Anatomy

The AI Gateway exposes two routing primitives on top of the gateway instance:
PrimitiveRole
GatewayThe deployment boundary — the gateway instance (or cluster) that receives traffic.
UpstreamAn LLM target the gateway can forward to: a provider, a self-hosted model, or a custom backend that fronts an LLM. Holds URL, credentials, timeouts, and health checks.
RouteThe match-and-forward configuration (path, method, model, headers, body fields) that sends a request to one or more upstreams, with optional load balancing and the plugin chain that should run.
See Routes & forwarding for the matching model and examples.

What can be an upstream

The gateway’s upstreams are LLM targets. A single client can sit in front of many of them without vendor lock-in. Upstreams fall into these categories:
  • LLM providers (direct) — OpenAI, Anthropic, Azure OpenAI, Google Vertex, AWS Bedrock, Cohere, Mistral, Fireworks, Together, Groq, and others.
  • Self-hosted models — vLLM, Ollama, TGI, Triton, or any OpenAI-compatible endpoint.
  • Embedding and moderation APIs — OpenAI embeddings, Voyage, Cohere rerank, OpenAI / Azure moderation.
  • Custom / internal LLM-fronting backends — your own REST or gRPC services that terminate in an LLM call.
MCP servers are not routed through the AI Gateway. Tool-calling and MCP governance are handled by the Runtime Security layer at the policy and detector level — the AI Gateway only carries the LLM request/response that references the tools. See Agent & MCP security.

Embedded vs pure-security mode

Because the NeuralTrust AI Gateway and the Gateway enforcement surface are the same gateway integration in the platform, the way you describe a deployment depends on what its routes point at.
Route upstreamModeWhat the gateway is doing
LLM provider (OpenAI, Anthropic, Azure OpenAI, Bedrock, …)Embedded AI GatewayActs as the AI gateway itself: routing, load balancing, traffic control, plus all the detections, masking, and policy decisions of the Gateway surface. One hop, both jobs.
Self-hosted model (vLLM, Ollama, TGI, …)Embedded AI GatewaySame as above, fronting your own model.
Third-party AI gatewayPure security layerThe third-party AI gateway keeps doing its own routing below; this gateway only adds detections, masking, and policy decisions on top.
Custom client backendPure security layerA backend the customer already operates keeps doing whatever it does; this gateway adds the security layer in front of it.
Detectors, policies, and the request lifecycle are identical in both modes. “AI Gateway” is the label we use when the routes land directly on an LLM, because then the same gateway is also the AI-aware routing layer.

Deployment

A Gateway integration can run as a multi-tenant SaaS endpoint, a single-tenant hybrid instance in your VPC, or fully on-prem — picked when you create the integration (Serverless or Dedicated) and combined with the hosting mode. Choice of deployment affects where data flows, not what the gateway can do — see Deployment modes.

Routes & forwarding

How requests match routes and reach upstreams.

Load balancing

Round-robin, weighted, semantic, and health-aware strategies.

Traffic control

Rate limits, size limits, bot and anomaly detection.

Enforcement surfaces

Where Runtime Security enforces — including the Gateway surface that sits on top of this AI Gateway or a third-party one.