Documentation Index
Fetch the complete documentation index at: https://docs.neuraltrust.ai/llms.txt
Use this file to discover all available pages before exploring further.
This guide walks you end-to-end through a fully self-hosted deployment on AKS — Control Plane API, UI, and Scheduler run in your cluster alongside the Data Plane, TrustGate, and Firewall.
For the SaaS-hosted Control Plane alternative, see Azure hybrid.
What you’ll end up with
| Component | Location | Replicas |
|---|
| Control Plane API, UI, Scheduler | Your AKS cluster | 2, 2, 1 |
| Data Plane API, worker, Kafka Connect | Your AKS cluster | 2, 1, 1 |
| TrustGate admin / gateway / actions | Your AKS cluster | 2 each |
| Firewall gateway + 5 workers | Your AKS cluster | 2 + 5 |
| ClickHouse, Kafka, PostgreSQL, Redis | Your AKS cluster (or external) | 1 each |
Sizing baseline: ~21–23 vCPU / 45–50 GiB RAM / 80 GiB PVC. See Image catalog.
Prerequisites
| Resource | Recommended |
|---|
| AKS version | 1.28+ |
| CPU pool node SKU | Standard_D8s_v5 or Standard_D8ds_v5 (8 vCPU / 32 GiB) |
| Min CPU nodes | ≥ 5 across 3 availability zones. Drop to 4 if Firewall workers run on GPU nodes. |
| GPU pool (optional, for GPU Firewall) | Standard_NC4as_T4_v3 — 5 nodes (one per default Firewall worker) |
| Sizing baseline | ~23.1 vCPU / 61.8 GiB requests / 80 GiB PVC (defaults, CPU Firewall) |
| Storage | managed-csi-premium recommended for ClickHouse + Postgres |
| Ingress | AGIC or NGINX (+ cert-manager) |
| Certificate | Key Vault cert (AGIC) or Let’s Encrypt via cert-manager (NGINX) |
| Image pull | gcr-keys.json from NeuralTrust |
Step 1 — Provision AKS and ingress
Same as hybrid — see Azure hybrid › Step 1. Self-hosted is identical aside from slightly higher headroom for CP components.
Step 2 — Namespace and image pull secret
kubectl create namespace neuraltrust
kubectl create secret docker-registry gcr-secret \
--docker-server=europe-west1-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat path/to/gcr-keys.json)" \
[email protected] \
-n neuraltrust
Step 3 — Write your values overlay
AGIC
NGINX + cert-manager
# Self-hosted deployment on AKS with AGIC
global:
platform: "azure"
domain: "platform.example.com"
storageClass: "managed-csi-premium"
autoGenerateSecrets: true
neuraltrust-control-plane:
controlPlane:
enabled: true # ← key difference from hybrid
components:
api:
enabled: true
ingress:
enabled: true
className: "azure-application-gateway"
annotations: &agic
kubernetes.io/ingress.class: azure/application-gateway
appgw.ingress.kubernetes.io/ssl-redirect: "true"
appgw.ingress.kubernetes.io/appgw-ssl-certificate: "<KEY_VAULT_CERT_NAME>"
app:
enabled: true
ingress:
enabled: true
className: "azure-application-gateway"
annotations: *agic
scheduler:
enabled: true
ingress:
enabled: true
className: "azure-application-gateway"
annotations: *agic
infrastructure:
postgresql:
deploy: true
neuraltrust-data-plane:
dataPlane:
enabled: true
components:
api:
ingress:
enabled: true
className: "azure-application-gateway"
annotations: *agic
trustgate:
enabled: true
global:
env:
SERVER_BASE_DOMAIN: "platform.example.com"
ingress:
controlPlane:
className: "azure-application-gateway"
annotations: *agic
dataPlane:
className: "azure-application-gateway"
annotations: *agic
actions:
className: "azure-application-gateway"
annotations: *agic
neuraltrust-firewall:
firewall:
enabled: true
infrastructure:
clickhouse:
deploy: true
persistence:
storageClass: "managed-csi-premium"
size: 200Gi
kafka:
deploy: true
# Self-hosted deployment on AKS with NGINX
global:
platform: "azure"
domain: "platform.example.com"
storageClass: "managed-csi-premium"
autoGenerateSecrets: true
neuraltrust-control-plane:
controlPlane:
enabled: true
components:
api:
enabled: true
ingress:
enabled: true
className: "nginx"
annotations: &nginx
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
app:
enabled: true
ingress:
enabled: true
className: "nginx"
annotations: *nginx
scheduler:
enabled: true
ingress:
enabled: true
className: "nginx"
annotations: *nginx
infrastructure:
postgresql:
deploy: true
neuraltrust-data-plane:
dataPlane:
enabled: true
components:
api:
ingress:
enabled: true
className: "nginx"
annotations: *nginx
trustgate:
enabled: true
global:
env:
SERVER_BASE_DOMAIN: "platform.example.com"
ingress:
controlPlane:
className: "nginx"
annotations: *nginx
dataPlane:
className: "nginx"
annotations: *nginx
actions:
className: "nginx"
annotations: *nginx
neuraltrust-firewall:
firewall:
enabled: true
infrastructure:
clickhouse:
deploy: true
kafka:
deploy: true
Using managed Azure data services (recommended for production)
neuraltrust-control-plane:
infrastructure:
postgresql:
deploy: false
controlPlane:
components:
postgresql:
secrets:
host: "<flexible-server>.postgres.database.azure.com"
port: "5432"
user: "neuraltrust"
password: ""
database: "neuraltrust"
infrastructure:
clickhouse:
deploy: false
external:
host: "your-tenant.azure.clickhouse.cloud"
port: "8443"
user: "neuraltrust"
password: ""
database: "neuraltrust"
kafka:
deploy: false
external:
bootstrapServers: "<event-hubs-namespace>.servicebus.windows.net:9093"
ClickHouse Cloud caveat: the Data Plane API runs migrations against the native port 9000, which is hardcoded in the chart today. ClickHouse Cloud only exposes native on port 9440 with TLS, so out-of-the-box external Cloud will fail the migration init container. Workarounds: pre-provision the schema, run a port-translating proxy 9000→9440, or stay in-cluster. See Feature flags › ClickHouse Cloud caveat.Event Hubs Kafka surface uses SASL/PLAIN over TLS (port 9093) — inject credentials via extraEnv on each Kafka consumer, since the chart does not auto-wire SASL. See Feature flags › Authentication for external Kafka.
Step 4 — Install
helm upgrade --install neuraltrust-platform \
oci://europe-west1-docker.pkg.dev/neuraltrust-app-prod/helm-charts/neuraltrust-platform \
--version <VERSION> \
--namespace neuraltrust \
-f my-values.yaml
kubectl get pods -n neuraltrust -w
Step 5 — DNS
Get the Application Gateway public IP (AGIC) or the NGINX LoadBalancer external IP, then add A/CNAME records for:
| Host | Component |
|---|
app.platform.example.com | Control Plane UI |
api.platform.example.com | Control Plane API |
scheduler.platform.example.com | Control Plane Scheduler |
data-plane-api.platform.example.com | Data Plane API |
admin.platform.example.com | TrustGate admin |
gateway.platform.example.com | TrustGate proxy |
actions.platform.example.com | TrustGate actions |
Step 6 — First login to the Control Plane
kubectl logs -n neuraltrust deploy/control-plane-app -c init-db | grep -i bootstrap
Sign in at https://app.platform.example.com, configure SSO (Platform › SSO), rotate the bootstrap admin password, and set up LLM providers + policies.
Step 7 — Send traffic through TrustGate
Point your AI applications at https://gateway.platform.example.com.
Verification
kubectl get pods -n neuraltrust
kubectl get ingress -n neuraltrust -o wide
curl https://api.platform.example.com/health
curl https://app.platform.example.com
curl https://data-plane-api.platform.example.com/health
curl https://gateway.platform.example.com/__health
Upgrading
helm upgrade neuraltrust-platform \
oci://europe-west1-docker.pkg.dev/neuraltrust-app-prod/helm-charts/neuraltrust-platform \
--version <NEW_VERSION> \
--namespace neuraltrust \
-f my-values.yaml
Watch the init-db init container on the new CP UI pod for any Prisma migration errors before traffic shifts.
Migration to hybrid
neuraltrust-control-plane:
controlPlane:
enabled: false
Enroll the existing data plane with NeuralTrust SaaS — see Azure hybrid › Step 6.
Air-gapped AKS
- Mirror chart images to Azure Container Registry (see Image catalog › Mirroring).
- Set
global.imageRegistry to your ACR.
- Configure
global.proxy.* if egress goes through a forward proxy.
- Pre-load Firewall model weights or mirror
huggingface.co.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|
| CP UI blank | API URL wrong | Verify api.<domain> ingress + config |
| Login fails | DB migration failed | kubectl logs -c init-db on CP UI pod |
| Scheduler not running jobs | Can’t reach Data Plane API | Verify data-plane-api.<domain> and TLS |
PVC stuck Pending | Storage class missing | kubectl get storageclass; check quota |