Skip to main content
The NeuralTrust Firewall provides in-cluster prompt and response safety for TrustGate and the Control Plane. It deploys as a lightweight gateway plus a pool of specialized workers. The same chart and image versions support both CPU and GPU inference. The firewall subchart is off by default (neuraltrust-firewall.firewall.enabled: false). Enable it when you need TrustGate or the Control Plane to call into local safety classifiers instead of (or in addition to) NeuralTrust-hosted endpoints.

Architecture

            ┌─────────────────────────────────────────┐
            │            Firewall Gateway              │
            │   (CPU router, fans out to workers)      │
            └────────────────────┬────────────────────┘

   ┌────────────┬────────────┬───┴───┬──────────────┬─────────────────┐
   ▼            ▼            ▼       ▼              ▼                 ▼
toxicity    toolguard   prompt-     prompt-     response-       (workers)
                        jailbreak   moderation  jailbreak
WorkerPurpose
toxicityToxicity detection on prompts and responses
toolguardTool-use guardrails for agent calls
prompt-jailbreakPrompt-side jailbreak detection
prompt-moderationPrompt moderation classifier
response-jailbreakResponse-side jailbreak detection

CPU vs GPU

Two images share the same version tag (e.g. v2.6.0):
ImageUse case
firewall-cpuDefault for both gateway and workers. No GPU scheduling required.
firewall-gpuWorkers with GPU inference. Requires nvidia.com/gpu, nodeSelector, tolerations, and hostIPC: true.
The gateway always runs on CPU. Only workers can be switched to the GPU image.

Enable the Firewall

Minimal config — workers run on CPU using the chart defaults:
neuraltrust-firewall:
  firewall:
    enabled: true
Use this when GPUs aren’t available or for lower-volume workloads. Latency is higher than GPU but no specialized node pool is needed.
The gateway uses firewall-cpu even when workers are on GPU — don’t override gateway.image to a GPU image.

TrustGate integration

TrustGate calls the firewall via two values stored in trustgate-secrets:
trustgate:
  global:
    env:
      NEURAL_TRUST_FIREWALL_URL: "http://firewall-gateway:8000"
      NEURAL_TRUST_FIREWALL_SECRET_KEY: ""   # auto-populated from firewall JWT
When global.autoGenerateSecrets: true (the default), NEURAL_TRUST_FIREWALL_SECRET_KEY is automatically aligned with the firewall’s JWT. With pre-generated secrets, you must set both to matching values yourself. After changing firewall secrets, restart TrustGate to pick up the new values:
kubectl rollout restart deployment/trustgate-control-plane -n neuraltrust
kubectl rollout restart deployment/trustgate-data-plane -n neuraltrust
kubectl rollout restart deployment/trustgate-actions -n neuraltrust

Per-worker overrides

workerDefaults applies to every worker. To override a single worker (e.g. give toxicity more memory), set values under workers.<name>:
neuraltrust-firewall:
  firewall:
    enabled: true
    workerDefaults:
      resources:
        requests:
          memory: 2Gi
          cpu: 500m
        limits:
          memory: 4Gi
    workers:
      toxicity:
        resources:
          requests:
            memory: 4Gi
          limits:
            memory: 8Gi
      response-jailbreak:
        replicaCount: 2
The keys under workers.* mirror those available under workerDefaults, including image, resources, nodeSelector, tolerations, and hostIPC.

OpenShift notes

GPU workers on OpenShift typically need a permissive SCC for hostIPC: true and a node pool taint that matches your MachineSet:
neuraltrust-firewall:
  firewall:
    enabled: true
    workerDefaults:
      image:
        repository: "europe-west1-docker.pkg.dev/.../firewall-gpu"
      nodeSelector:
        nvidia.com/gpu.present: "true"
      tolerations:
        - key: "nvidia.com/gpu"
          operator: "Exists"
          effect: "NoSchedule"
      resources:
        limits:
          nvidia.com/gpu: "1"
      hostIPC: true
If pods fail with SCC errors, see OpenShift › SCC.

Verification

# Gateway pod
kubectl get pods -n neuraltrust -l app.kubernetes.io/name=firewall,app.kubernetes.io/component=gateway

# Worker pods
kubectl get pods -n neuraltrust -l app.kubernetes.io/name=firewall,app.kubernetes.io/component=worker-toxicity
kubectl get pods -n neuraltrust -l app.kubernetes.io/name=firewall,app.kubernetes.io/component=worker-toolguard
kubectl get pods -n neuraltrust -l app.kubernetes.io/name=firewall,app.kubernetes.io/component=worker-prompt-jailbreak
kubectl get pods -n neuraltrust -l app.kubernetes.io/name=firewall,app.kubernetes.io/component=worker-prompt-moderation
kubectl get pods -n neuraltrust -l app.kubernetes.io/name=firewall,app.kubernetes.io/component=worker-response-jailbreak

# Internal health check (port-forward then curl)
kubectl port-forward -n neuraltrust svc/firewall-gateway 8000:8000
curl http://localhost:8000/health

Troubleshooting

Gateway can’t reach a worker

kubectl logs -n neuraltrust -l app.kubernetes.io/name=firewall,app.kubernetes.io/component=gateway
kubectl get svc -n neuraltrust | grep firewall
Check that the worker Service exists and matches the gateway’s expected name (firewall-worker-<name>).

GPU worker stuck pending

kubectl describe pod -n neuraltrust <worker-pod>
Common causes:
  • No node has the requested nodeSelector label — check kubectl get nodes --show-labels.
  • Required toleration missing — confirm the GPU taint key matches.
  • Cluster is out of GPU capacity — scale up the GPU node pool.

CUDA MPS errors

If you see CUDA MPS errors at startup, ensure both cudaMpsActiveThreadPercentage and cudaMpsPinnedDeviceMemLimit are set, or remove both. Setting only one causes the worker to start without a usable MPS configuration.

TrustGate not calling the firewall

kubectl get secret trustgate-secrets -n neuraltrust -o jsonpath='{.data.NEURAL_TRUST_FIREWALL_URL}' | base64 -d
kubectl get secret trustgate-secrets -n neuraltrust -o jsonpath='{.data.NEURAL_TRUST_FIREWALL_SECRET_KEY}' | base64 -d
If either is empty, set them via trustgate.global.env.* and helm upgrade, then restart TrustGate deployments.