NeuralTrust | The leading security platform for generative AI

The NeuralTrust AI Gateway is NeuralTrust’s first-party, AI-aware gateway. It sits in front of LLMs — applications and agents send requests to a gateway, and the gateway forwards them to the right upstream (an LLM provider, a self-hosted model, an embedding / moderation API, or a custom LLM-fronting backend) while handling routing, load balancing, and traffic control.

The AI Gateway does not sit in front of MCP servers. MCP and tool security is handled separately by the Runtime Security layer — see Agent & MCP security.

Think of it as an API gateway, but AI-aware: it understands prompts, responses, OpenAI-style chat payloads, streaming, tool calls, and embeddings — not just HTTP verbs and paths.

How you get an AI Gateway in the platform. You don’t provision a separate product — you create a Gateway integration (Integrations → Add Integration → Gateway) and add Routes on it. When a route points directly at an LLM provider or a self-hosted model, that Gateway is acting as an embedded AI Gateway: the same hop is both the Gateway enforcement surface and the AI gateway. When a route points at a third-party AI gateway or a custom backend instead, the Gateway is only a security layer on top — the underlying component keeps doing its own routing.

What an AI gateway adds on top of an API gateway

A plain API gateway gives you…	The AI Gateway additionally gives you…
Path- and header-based routing	Routing by model, prompt contents, or tool call
Fixed rate limits by IP or key	Token-aware rate limits (prompt + completion tokens)
Request / response logging	Prompt, response, and tool-call capture for observability and replay
Basic auth and mTLS	Provider-specific credentials per upstream, normalized across providers
Size limits on bodies	Size + shape limits aware of LLM payloads (messages, attachments, embeddings)
Backend health checks	Health checks + semantic load balancing across providers and models

What the AI Gateway does

Routing

Matches requests to the right LLM upstream (provider, self-hosted model, or custom LLM-fronting backend) using path, method, headers, model name, or body fields. See Routes & forwarding.

Load balancing

Distributes traffic across providers or replicas with weights, health checks, failover, and semantic routing. See Load balancing.

Traffic control

Rate limits, token caps, size limits, bot detection, and anomaly detection protect both your app and the upstream provider. See Traffic control.

Interception hooks

Every request and response runs through a plugin chain so upstream selection, transformations, and — when Runtime Security is attached — detections and policies can fire with AI-specific context.

How a request flows

A single request through the AI Gateway goes through these phases. When the Runtime Security layer is placed in front of (or composed with) the gateway, its detectors and policies plug into the pre-request and post-response hooks below.

Receive — the gateway accepts the client request (HTTP, streaming, or WebSocket).
Match route — routes are evaluated top-down; the first matching route binds the request to one or more upstreams plus the plugin chain to run.
Pre-request plugins — input-side plugins run: transformations, rate limits, token limits, custom plugins, and — when Runtime Security is active — Prompt Guard, PII detection, content moderation, URL analyzer, tool permission, application-security checks, etc.
Transform & forward — path rewrites, header injection, and provider-dialect normalization are applied, and the request is forwarded to the selected upstream (with optional load balancing, retries, and failover).
Post-response plugins — output-side plugins run on the response (or on each streaming chunk): response moderation, PII masking on generations, tool-selection validation, toxicity detection, etc.
Emit — the (possibly rewritten) response is returned to the client, and telemetry — prompts, responses, latencies, token counts, and any detections — is emitted to observability.

Anatomy

The AI Gateway exposes two routing primitives on top of the gateway instance:

Primitive	Role
Gateway	The deployment boundary — the gateway instance (or cluster) that receives traffic.
Upstream	An LLM target the gateway can forward to: a provider, a self-hosted model, or a custom backend that fronts an LLM. Holds URL, credentials, timeouts, and health checks.
Route	The match-and-forward configuration (path, method, model, headers, body fields) that sends a request to one or more upstreams, with optional load balancing and the plugin chain that should run.

See Routes & forwarding for the matching model and examples.

What can be an upstream

The gateway’s upstreams are LLM targets. A single client can sit in front of many of them without vendor lock-in. Upstreams fall into these categories:

LLM providers (direct) — OpenAI, Anthropic, Azure OpenAI, Google Vertex, AWS Bedrock, Cohere, Mistral, Fireworks, Together, Groq, and others.
Self-hosted models — vLLM, Ollama, TGI, Triton, or any OpenAI-compatible endpoint.
Embedding and moderation APIs — OpenAI embeddings, Voyage, Cohere rerank, OpenAI / Azure moderation.
Custom / internal LLM-fronting backends — your own REST or gRPC services that terminate in an LLM call.

MCP servers are not routed through the AI Gateway. Tool-calling and MCP governance are handled by the Runtime Security layer at the policy and detector level — the AI Gateway only carries the LLM request/response that references the tools. See Agent & MCP security.

Embedded vs pure-security mode

Because the NeuralTrust AI Gateway and the Gateway enforcement surface are the same gateway integration in the platform, the way you describe a deployment depends on what its routes point at.

Route upstream	Mode	What the gateway is doing
LLM provider (OpenAI, Anthropic, Azure OpenAI, Bedrock, …)	Embedded AI Gateway	Acts as the AI gateway itself: routing, load balancing, traffic control, plus all the detections, masking, and policy decisions of the Gateway surface. One hop, both jobs.
Self-hosted model (vLLM, Ollama, TGI, …)	Embedded AI Gateway	Same as above, fronting your own model.
Third-party AI gateway	Pure security layer	The third-party AI gateway keeps doing its own routing below; this gateway only adds detections, masking, and policy decisions on top.
Custom client backend	Pure security layer	A backend the customer already operates keeps doing whatever it does; this gateway adds the security layer in front of it.

Detectors, policies, and the request lifecycle are identical in both modes. “AI Gateway” is the label we use when the routes land directly on an LLM, because then the same gateway is also the AI-aware routing layer.

Deployment

A Gateway integration can run as a multi-tenant SaaS endpoint, a single-tenant hybrid instance in your VPC, or fully on-prem — picked when you create the integration (Serverless or Dedicated) and combined with the hosting mode. Choice of deployment affects where data flows, not what the gateway can do — see Deployment modes.

Routes & forwarding

How requests match routes and reach upstreams.

Load balancing

Round-robin, weighted, semantic, and health-aware strategies.

Traffic control

Rate limits, size limits, bot and anomaly detection.

Enforcement surfaces

Where Runtime Security enforces — including the Gateway surface that sits on top of this AI Gateway or a third-party one.

AI Gateway overview

What an AI gateway adds on top of an API gateway

What the AI Gateway does

Routing

Load balancing

Traffic control

Interception hooks

How a request flows

Anatomy

What can be an upstream

Embedded vs pure-security mode

Deployment

What to read next

Routes & forwarding

Load balancing

Traffic control

Enforcement surfaces

​What an AI gateway adds on top of an API gateway

​What the AI Gateway does

Routing

Load balancing

Traffic control

Interception hooks

​How a request flows

​Anatomy

​What can be an upstream

​Embedded vs pure-security mode

​Deployment

​What to read next

Routes & forwarding

Load balancing

Traffic control

Enforcement surfaces

What an AI gateway adds on top of an API gateway

What the AI Gateway does

How a request flows

Anatomy

What can be an upstream

Embedded vs pure-security mode

Deployment

What to read next