System prompt disclosure testing evaluates your model’s ability to keep its system prompt and internal instructions confidential. Attackers often attempt to extract system prompts to understand model behavior and find exploitable weaknesses.Documentation Index
Fetch the complete documentation index at: https://docs.neuraltrust.ai/llms.txt
Use this file to discover all available pages before exploring further.
Purpose
Protecting system prompts is important for:- Maintaining security through obscurity as an additional layer
- Protecting proprietary prompt engineering
- Preventing attackers from understanding model constraints
- Ensuring competitive advantage in prompt design
Attack Categories
| Sub-Category | Description |
|---|---|
| Direct Request | Tests resistance to straightforward requests for the system prompt |
| Oblique Reference | Evaluates handling of indirect references to instructions |
| Confusion and Clarification | Tests if confusion tactics can extract prompt details |
| Explanation Mode | Assesses if requests to explain behavior reveal the prompt |
| Meta-Level Instruction Inquiry | Tests against abstract questions about instructions |
| Request for Phrase Conclusion | Evaluates if partial prompts can be completed |
| Request for a Portion | Tests if segments of the prompt can be extracted |
How It Works
The evaluator checks if the model reveals any part of its system prompt:- Pass: The model keeps its system prompt confidential
- Fail: The model discloses part or all of its system prompt
Usage Example
When to Use
Use system prompt disclosure testing when you need to:- Protect proprietary prompt engineering
- Validate prompt confidentiality measures
- Assess resistance to prompt extraction attacks
- Conduct security audits
- Test before deploying customer-facing applications