Skip to main content
Content moderation controls what your application is allowed to say and be asked about — financial / legal / medical advice, political or religious content, competitor mentions, internal-policy leakage, and more — on either side of the LLM call. Moderation is not a separate screen: it’s a detection you pick inside a policy’s When condition, just like Prompt Guard.

Apply to prompts or responses

The same detections run on either direction of the traffic — you control this with the condition’s Field:
FieldModeratesTypical use
InputWhat the user sends to the model.Stop off-policy questions before they hit the LLM.
OutputWhat the model generates.Prevent the LLM from saying something the product, brand, or regulator forbids.
You can run both in the same policy, or split into two policies so each side tunes independently.

Topics — the built-in catalog

In Create Policy → When, open the detection picker and pick the Topics category. It ships with the following out-of-the-box topics:
TopicCatches
Financial AdviceRecommendations on investments, trades, tax strategy, etc.
Legal AdviceAdvice on contracts, litigation, legal interpretation.
Medical AdviceDiagnoses, treatment suggestions, dosage guidance.
Political ContentPolitical opinions, parties, elections, activism.
Religious ContentReligious beliefs, practices, comparative religion discussions.
Internal PoliciesLeakage of internal guidelines, SOPs, or playbooks.
Confidential ProjectsCodenames, unannounced initiatives, sensitive programs.
Pick the subset relevant to the route. A fintech assistant might enable Financial Advice + Legal Advice; a healthcare assistant might enable Medical Advice; an internal copilot might enable Internal Policies + Confidential Projects.

How a topic condition looks

Once selected, the condition row reads like any other When condition:
Input  ·  triggers  ·  Topics: Financial Advice, Legal Advice
Add more rows via + Add Condition — per the policy model, rows always combine with AND, so for OR semantics use multiple policies.

Keyword & Regex — deterministic rules

Alongside the classifier-driven Topics catalog, you can attach deterministic lists:
ListUse for
Blocked KeywordsExact or substring matches — brand names, competitors, policy vocabulary.
Regex PatternsStructured matches — SKUs, internal IDs, complex compliance patterns.
Use Keyword & Regex when you need auditable, literal rules that supplement the classifier’s intent-based matches.

Sensitivity

Topic and keyword detections expose a 5-level sensitivity scale:
LevelProfileGood for
L1 — LenientMinimal filtering, only the most obvious violations.Internal or already-hardened traffic.
L2 — LightLow sensitivity, catches clear cases.Low-risk customer apps.
L3 — BalancedRecommended default — clear threats with low false-positives (~85% threshold).Typical production routes.
L4 — EnhancedHigher sensitivity, flags borderline content.Regulated or high-risk workloads.
L5 — StrictMaximum protection, strictest filtering.Kids / health / finance / government.
Start at L3 — Balanced with a Log policy, review the hits on real traffic, then raise the level and/or promote the action to Block once the false-positive rate is acceptable.

Using moderation in a policy

Standard Where / When / Then flow:
  • Where — typically the Gateway surface, optionally filtered by Routes or Applications.
  • When
    • Input · triggers · Topics: … (for the prompt side), and/or
    • Output · triggers · Topics: … (for the generation side).
  • ThenLog while tuning, Block once the policy is ready for enforcement.
A single policy can mix Topics with Keyword/Regex rows and other Content Security detections — they all AND together.

Common policies

  • Keep a fintech assistant off financial adviceOutput · triggers · Topics: Financial Advice, Legal AdviceBlock.
  • Block medical advice on a general-purpose chatbotOutput · triggers · Topics: Medical AdviceBlock.
  • Protect internal policy leakage on external routesOutput · triggers · Topics: Internal Policies, Confidential ProjectsBlock.
  • Log political and religious content for reviewInput + Output · triggers · Topics: Political Content, Religious ContentLog.