What you’ll end up with
| Component | Location | Replicas (default) |
|---|---|---|
| Control Plane API | Your GKE cluster | 2 |
| Control Plane UI | Your GKE cluster | 2 |
| Control Plane Scheduler | Your GKE cluster | 1 |
| Data Plane API | Your GKE cluster | 2 |
| Data Plane worker | Your GKE cluster | 1 |
| Kafka Connect | Your GKE cluster | 1 |
| TrustGate admin / gateway / actions | Your GKE cluster | 2 each |
| Firewall gateway + 5 workers | Your GKE cluster | 2 + 5 |
| ClickHouse, Kafka, PostgreSQL, Redis | Your GKE cluster (or external) | 1 each |
Prerequisites
| Resource | Recommended |
|---|---|
| GKE version | 1.28+ |
| Cluster mode | Standard (recommended) or Autopilot |
| CPU pool machine type | n2-standard-8 (8 vCPU / 32 GiB) |
| Min CPU nodes | ≥ 5 across 3 zones (regional cluster). Drop to 4 if Firewall workers run on GPU nodes. |
| GPU pool (optional, for GPU Firewall) | n1-standard-4 + 1 × T4 — 5 nodes (one per default Firewall worker) |
| Sizing baseline | ~23.1 vCPU / 61.8 GiB requests / 80 GiB PVC (defaults, CPU Firewall) |
| Storage | pd-ssd recommended for ClickHouse + PostgreSQL in self-hosted |
| DNS | A control over a base domain (e.g. platform.example.com) |
| Image pull | gcr-keys.json from NeuralTrust |
Step 1 — Provision the GKE cluster
--num-nodes 2 is per-zone in a regional cluster → 6 worker nodes across 3 zones, which fits the self-hosted CPU pool with HA headroom.
Step 2 — Namespace and image pull secret
Step 3 — Write your values overlay
Save asmy-values.yaml:
External managed services (recommended for production)
For ClickHouse Cloud, see the native-port caveat. For Confluent Cloud Kafka, inject SASL credentials via
extraEnv — see Authentication for external Kafka.Step 4 — Install
Step 5 — DNS and Managed Certificates
The chart creates GCE Ingresses for every public component. Get the assigned IPs:| Host | Component | Required |
|---|---|---|
app.platform.example.com | Control Plane UI | ✅ |
api.platform.example.com | Control Plane API | ✅ |
scheduler.platform.example.com | Control Plane Scheduler | ✅ |
data-plane-api.platform.example.com | Data Plane API | ✅ |
admin.platform.example.com | TrustGate admin | ✅ |
gateway.platform.example.com | TrustGate proxy | ✅ |
actions.platform.example.com | TrustGate actions | ✅ |
Add Managed Certificates
helm upgrade … to apply.
Step 6 — First login to the Control Plane
-
Open
https://app.platform.example.comin your browser. -
The chart creates a bootstrap admin during the Prisma migration (CP UI init container). Get the bootstrap credentials:
- Sign in, configure SSO (Platform › SSO), and rotate the bootstrap admin password.
- From the dashboard, configure your LLM provider keys, integrations, and policies. See Platform overview.
Step 7 — Send traffic through TrustGate
Point your AI applications athttps://gateway.platform.example.com. Traffic flows through TrustGate → Data Plane → ClickHouse, surfaced by the Control Plane UI you’re hosting.
Verification
Upgrading
init-db init container of the new pod for any migration errors before traffic shifts.
Migration to hybrid
To hand off the Control Plane to NeuralTrust SaaS later, flip the flag and upgrade:Air-gapped GKE
For clusters without outbound internet:- Mirror all chart images to your internal Artifact Registry (see Image catalog › Mirroring).
- Set
global.imageRegistryto your internal registry. - Configure
global.proxy.*if egress goes through a forward proxy. - Provide
huggingface.comirrors or pre-load Firewall model weights (if using the Firewall with HF token).
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| CP UI shows blank page | CP API URL wrong in config | Verify api.<domain> ingress reachable; confirm controlPlane.components.app.config.apiUrl |
| Login fails | Bootstrap credentials not set, or CP DB migration failed | Check init-db init container logs |
| Scheduler not running jobs | Scheduler can’t reach Data Plane API | Verify data-plane-api.<domain> resolves and TLS is valid |
PVC stuck Pending | Wrong storage class | kubectl get storageclass; ensure cluster quota for pd-ssd |
Related guides
- Hybrid deployment on GKE — Control Plane on SaaS
- GCP overview — cluster prerequisites and GCP-specific defaults
- Deployment models — hybrid vs self-hosted comparison
- Image catalog — what runs in self-hosted mode
- Secrets management — auto-generation, External Secrets Operator
- Firewall deployment — GPU workers on GKE
- Configuration scenarios — external infrastructure modes