What you’ll end up with
| Component | Location | Replicas |
|---|---|---|
| Control Plane API, UI, Scheduler | Your AKS cluster | 2, 2, 1 |
| Data Plane API, worker, Kafka Connect | Your AKS cluster | 2, 1, 1 |
| TrustGate admin / gateway / actions | Your AKS cluster | 2 each |
| Firewall gateway + 5 workers | Your AKS cluster | 2 + 5 |
| ClickHouse, Kafka, PostgreSQL, Redis | Your AKS cluster (or external) | 1 each |
Prerequisites
| Resource | Recommended |
|---|---|
| AKS version | 1.28+ |
| CPU pool node SKU | Standard_D8s_v5 or Standard_D8ds_v5 (8 vCPU / 32 GiB) |
| Min CPU nodes | ≥ 5 across 3 availability zones. Drop to 4 if Firewall workers run on GPU nodes. |
| GPU pool (optional, for GPU Firewall) | Standard_NC4as_T4_v3 — 5 nodes (one per default Firewall worker) |
| Sizing baseline | ~23.1 vCPU / 61.8 GiB requests / 80 GiB PVC (defaults, CPU Firewall) |
| Storage | managed-csi-premium recommended for ClickHouse + Postgres |
| Ingress | AGIC or NGINX (+ cert-manager) |
| Certificate | Key Vault cert (AGIC) or Let’s Encrypt via cert-manager (NGINX) |
| Image pull | gcr-keys.json from NeuralTrust |
Step 1 — Provision AKS and ingress
Same as hybrid — see Azure hybrid › Step 1. Self-hosted is identical aside from slightly higher headroom for CP components.Step 2 — Namespace and image pull secret
Step 3 — Write your values overlay
- AGIC
- NGINX + cert-manager
Using managed Azure data services (recommended for production)
Step 4 — Install
Step 5 — DNS
Get the Application Gateway public IP (AGIC) or the NGINX LoadBalancer external IP, then add A/CNAME records for:| Host | Component |
|---|---|
app.platform.example.com | Control Plane UI |
api.platform.example.com | Control Plane API |
scheduler.platform.example.com | Control Plane Scheduler |
data-plane-api.platform.example.com | Data Plane API |
admin.platform.example.com | TrustGate admin |
gateway.platform.example.com | TrustGate proxy |
actions.platform.example.com | TrustGate actions |
Step 6 — First login to the Control Plane
https://app.platform.example.com, configure SSO (Platform › SSO), rotate the bootstrap admin password, and set up LLM providers + policies.
Step 7 — Send traffic through TrustGate
Point your AI applications athttps://gateway.platform.example.com.
Verification
Upgrading
init-db init container on the new CP UI pod for any Prisma migration errors before traffic shifts.
Migration to hybrid
Air-gapped AKS
- Mirror chart images to Azure Container Registry (see Image catalog › Mirroring).
- Set
global.imageRegistryto your ACR. - Configure
global.proxy.*if egress goes through a forward proxy. - Pre-load Firewall model weights or mirror
huggingface.co.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| CP UI blank | API URL wrong | Verify api.<domain> ingress + config |
| Login fails | DB migration failed | kubectl logs -c init-db on CP UI pod |
| Scheduler not running jobs | Can’t reach Data Plane API | Verify data-plane-api.<domain> and TLS |
PVC stuck Pending | Storage class missing | kubectl get storageclass; check quota |
Related guides
- Hybrid deployment on AKS — Control Plane on SaaS
- Azure overview — cluster prerequisites and Azure-specific defaults
- Deployment models — hybrid vs self-hosted comparison
- Image catalog — what runs in self-hosted mode
- Secrets management — auto-generation, External Secrets Operator
- Firewall deployment — GPU workers on AKS
- Configuration scenarios — external infrastructure modes