What you’ll end up with
| Component | Location | Replicas |
|---|---|---|
| Control Plane API, UI, Scheduler | Your cluster | 2, 2, 1 |
| Data Plane API, worker, Kafka Connect | Your cluster | 2, 1, 1 |
| TrustGate admin / gateway / actions | Your cluster | 2 each |
| Firewall gateway + 5 workers | Your cluster | 2 + 5 |
| ClickHouse, Kafka, PostgreSQL, Redis | Your cluster (or external) | 1 each |
Prerequisites
| Resource | Recommended |
|---|---|
| Kubernetes version | 1.24+ |
| CPU pool | ≥ 5 × (8 vCPU / 32 GiB) for HA. Drop to 4 if Firewall workers run on GPU nodes. |
| GPU pool (optional, for GPU Firewall) | 4 vCPU / 16 GiB + 1 × NVIDIA GPU per node — 5 nodes (one per default Firewall worker) |
| Sizing baseline | ~23.1 vCPU / 61.8 GiB requests / 80 GiB PVC (defaults, CPU Firewall) |
| Storage | Default StorageClass with ReadWriteOnce PVs (SSD-backed recommended for ClickHouse + Postgres) |
| Ingress | NGINX, Traefik, HAProxy, or any conformant controller |
| TLS | cert-manager with Let’s Encrypt or internal CA |
| DNS | A control over a base domain (e.g. platform.example.com) |
| Image pull | gcr-keys.json from NeuralTrust OR a mirrored internal registry for air-gapped |
Step 1 — Cluster prep
Same as hybrid — see Vanilla Kubernetes hybrid › Step 1. Add cluster components:- Ingress controller (NGINX/Traefik/HAProxy).
- cert-manager (or your own cert workflow).
- Storage class with
ReadWriteOnce. - (Bare metal) MetalLB or an external LB.
- (GPU Firewall) NVIDIA device plugin.
Step 2 — Namespace and image pull secret
Step 3 — Write your values overlay
Save asmy-values.yaml:
Using external infrastructure
For ClickHouse Cloud, see the native-port caveat. For external Kafka with SASL, see Authentication for external Kafka.
Step 4 — Install
Step 5 — DNS
Get the ingress controller’s external IP / hostname and add A / CNAME records:| Host | Component |
|---|---|
app.platform.example.com | Control Plane UI |
api.platform.example.com | Control Plane API |
scheduler.platform.example.com | Control Plane Scheduler |
data-plane-api.platform.example.com | Data Plane API |
admin.platform.example.com | TrustGate admin |
gateway.platform.example.com | TrustGate proxy |
actions.platform.example.com | TrustGate actions |
Step 6 — First login to the Control Plane
https://app.platform.example.com, configure SSO (Platform › SSO), rotate the bootstrap admin password, configure LLM providers and policies.
Step 7 — Send traffic through TrustGate
Point your AI applications athttps://gateway.platform.example.com.
Verification
Upgrading
init-db init container on the new CP UI pod for Prisma migration errors.
Air-gapped installs
For fully disconnected clusters:1. Mirror images
crane, skopeo, or docker pull/push.
2. Point the chart at your mirror
3. Forward proxy (if applicable)
4. Internal CA for TLS
Bring your own certificates instead of Let’s Encrypt:5. Firewall model weights
The Firewall workers fetch models from Hugging Face on first start. For air-gapped:- Pre-bake models into a custom Firewall image, OR
- Mirror
huggingface.cointernally and set the FirewallHUGGINGFACE_HUB_ENDPOINT, OR - Provide a
HUGGINGFACE_TOKENand allow egress just to Hugging Face.
Migration to hybrid
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| CP UI blank | API URL wrong | Verify api.<domain> ingress and config |
| Login fails | DB migration failed | kubectl logs -c init-db on CP UI pod |
| Scheduler not running jobs | Can’t reach Data Plane API | Verify data-plane-api.<domain> and TLS |
PVC stuck Pending | No default storage class | Mark one default and re-apply |
| cert-manager challenge stuck | DNS not resolving or unreachable for HTTP-01 | kubectl describe challenge |
ImagePullBackOff (air-gapped) | Registry mirror missing images | Re-run image mirror; confirm global.imageRegistry |
Related guides
- Hybrid deployment on vanilla Kubernetes — Control Plane on SaaS
- Vanilla Kubernetes overview — cluster prerequisites
- Deployment models — hybrid vs self-hosted comparison
- Image catalog — what runs in self-hosted mode, plus mirroring
- Secrets management — auto-generation, External Secrets Operator
- Firewall deployment — GPU workers, air-gapped configuration
- Configuration scenarios — external infrastructure modes