The AI Gateway does not sit in front of MCP servers. MCP and tool security is handled separately by the Runtime Security layer — see Agent & MCP security.Think of it as an API gateway, but AI-aware: it understands prompts, responses, OpenAI-style chat payloads, streaming, tool calls, and embeddings — not just HTTP verbs and paths.
How you get an AI Gateway in the platform. You don’t provision a separate product — you create a Gateway integration (Integrations → Add Integration → Gateway) and add Routes on it. When a route points directly at an LLM provider or a self-hosted model, that Gateway is acting as an embedded AI Gateway: the same hop is both the Gateway enforcement surface and the AI gateway. When a route points at a third-party AI gateway or a custom backend instead, the Gateway is only a security layer on top — the underlying component keeps doing its own routing.
What an AI gateway adds on top of an API gateway
| A plain API gateway gives you… | The AI Gateway additionally gives you… |
|---|---|
| Path- and header-based routing | Routing by model, prompt contents, or tool call |
| Fixed rate limits by IP or key | Token-aware rate limits (prompt + completion tokens) |
| Request / response logging | Prompt, response, and tool-call capture for observability and replay |
| Basic auth and mTLS | Provider-specific credentials per upstream, normalized across providers |
| Size limits on bodies | Size + shape limits aware of LLM payloads (messages, attachments, embeddings) |
| Backend health checks | Health checks + semantic load balancing across providers and models |
What the AI Gateway does
Routing
Matches requests to the right LLM upstream (provider, self-hosted model, or custom LLM-fronting backend) using path, method, headers, model name, or body fields. See Routes & forwarding.
Load balancing
Distributes traffic across providers or replicas with weights, health checks, failover, and semantic routing. See Load balancing.
Traffic control
Rate limits, token caps, size limits, bot detection, and anomaly detection protect both your app and the upstream provider. See Traffic control.
Interception hooks
Every request and response runs through a plugin chain so upstream selection, transformations, and — when Runtime Security is attached — detections and policies can fire with AI-specific context.
How a request flows
A single request through the AI Gateway goes through these phases. When the Runtime Security layer is placed in front of (or composed with) the gateway, its detectors and policies plug into the pre-request and post-response hooks below.- Receive — the gateway accepts the client request (HTTP, streaming, or WebSocket).
- Match route — routes are evaluated top-down; the first matching route binds the request to one or more upstreams plus the plugin chain to run.
- Pre-request plugins — input-side plugins run: transformations, rate limits, token limits, custom plugins, and — when Runtime Security is active — Prompt Guard, PII detection, content moderation, URL analyzer, tool permission, application-security checks, etc.
- Transform & forward — path rewrites, header injection, and provider-dialect normalization are applied, and the request is forwarded to the selected upstream (with optional load balancing, retries, and failover).
- Post-response plugins — output-side plugins run on the response (or on each streaming chunk): response moderation, PII masking on generations, tool-selection validation, toxicity detection, etc.
- Emit — the (possibly rewritten) response is returned to the client, and telemetry — prompts, responses, latencies, token counts, and any detections — is emitted to observability.
Anatomy
The AI Gateway exposes two routing primitives on top of the gateway instance:| Primitive | Role |
|---|---|
| Gateway | The deployment boundary — the gateway instance (or cluster) that receives traffic. |
| Upstream | An LLM target the gateway can forward to: a provider, a self-hosted model, or a custom backend that fronts an LLM. Holds URL, credentials, timeouts, and health checks. |
| Route | The match-and-forward configuration (path, method, model, headers, body fields) that sends a request to one or more upstreams, with optional load balancing and the plugin chain that should run. |
What can be an upstream
The gateway’s upstreams are LLM targets. A single client can sit in front of many of them without vendor lock-in. Upstreams fall into these categories:- LLM providers (direct) — OpenAI, Anthropic, Azure OpenAI, Google Vertex, AWS Bedrock, Cohere, Mistral, Fireworks, Together, Groq, and others.
- Self-hosted models — vLLM, Ollama, TGI, Triton, or any OpenAI-compatible endpoint.
- Embedding and moderation APIs — OpenAI embeddings, Voyage, Cohere rerank, OpenAI / Azure moderation.
- Custom / internal LLM-fronting backends — your own REST or gRPC services that terminate in an LLM call.
Embedded vs pure-security mode
Because the NeuralTrust AI Gateway and the Gateway enforcement surface are the same gateway integration in the platform, the way you describe a deployment depends on what its routes point at.| Route upstream | Mode | What the gateway is doing |
|---|---|---|
| LLM provider (OpenAI, Anthropic, Azure OpenAI, Bedrock, …) | Embedded AI Gateway | Acts as the AI gateway itself: routing, load balancing, traffic control, plus all the detections, masking, and policy decisions of the Gateway surface. One hop, both jobs. |
| Self-hosted model (vLLM, Ollama, TGI, …) | Embedded AI Gateway | Same as above, fronting your own model. |
| Third-party AI gateway | Pure security layer | The third-party AI gateway keeps doing its own routing below; this gateway only adds detections, masking, and policy decisions on top. |
| Custom client backend | Pure security layer | A backend the customer already operates keeps doing whatever it does; this gateway adds the security layer in front of it. |
Deployment
A Gateway integration can run as a multi-tenant SaaS endpoint, a single-tenant hybrid instance in your VPC, or fully on-prem — picked when you create the integration (Serverless or Dedicated) and combined with the hosting mode. Choice of deployment affects where data flows, not what the gateway can do — see Deployment modes.What to read next
Routes & forwarding
How requests match routes and reach upstreams.
Load balancing
Round-robin, weighted, semantic, and health-aware strategies.
Traffic control
Rate limits, size limits, bot and anomaly detection.
Enforcement surfaces
Where Runtime Security enforces — including the Gateway surface that sits on top of this AI Gateway or a third-party one.