From Knowledge Bases
Test Generation
The RAG (Retrieval-Augmented Generation) Probe is a specialized tool designed to automatically generate and evaluate test cases for RAG systems. It uses a knowledge base to generate both functional and adversarial questions, and then evaluates the model’s responses against these questions.
This probe needs a LLM client configured to generate the questions and topic generation if not provided. Also and embedding model configured to generate the questions and topic generation if not provided.
Purpose
The RAG Probe is particularly useful when you need to:
- Test RAG systems with automatically generated questions
- Evaluate model performance on specific knowledge domains
- Generate both functional and adversarial test cases
- Assess system robustness against different types of questions
- Validate knowledge base integration and retrieval capabilities
How It Works
The probe works with two main types of scenarios:
Functional RAG Scenario
- Generates benign questions to test normal system operation
- Supports multiple question types:
- Simple Questions: Basic factual questions
- Complex Questions: More detailed and nuanced questions
- Really Complex Questions: Highly detailed and multi-part questions
- Conversational Questions: Questions in a more natural, conversational style
- Distracting Questions: Questions with additional context that might distract
- Double Questions: Questions that ask about two related things
- OOS Questions: Out-of-scope questions to test system boundaries
Adversarial RAG Scenario
- Generates malicious questions to test system robustness
- Supports multiple attack types:
- Instruction Manipulation: Questions that attempt to overwrite, invalidate, edit, or contradict the information in the context, forcing the system to provide incorrect information while maintaining a natural question format.
- Role Play: Questions that assign a new name/role to the assistant and provide context updates that override previous information, creating scenarios that lead to incorrect responses.
- Hypothetical: Questions that create hypothetical scenarios based on the context, testing the system’s ability to handle speculative situations while maintaining factual accuracy.
- Storytelling: Questions that attempt to make the system engage in storytelling about the context, potentially leading to fictional or exaggerated responses.
- Obfuscation: Questions that use complex language, technical terms, or confusing phrasing to obscure the actual intent, testing the system’s ability to handle complex queries.
- Payload Splitting: Questions that split malicious content across multiple parts or use indirect references, testing the system’s ability to handle fragmented or indirect queries.
- List Based: Questions that request lists or enumerations of information, potentially leading to incomplete or incorrect responses.
- Special Token: Questions that include special characters, tokens, or unusual formatting to test the system’s handling of non-standard input.
- Off Tone: Questions that attempt to make the system respond in an inappropriate or unprofessional tone, testing its ability to maintain appropriate communication standards.
The probe will:
- Load documents into a knowledge base
- Generate questions based on document topics
- Query the model with generated questions
- Evaluate responses using configured evaluators
- Provide detailed results and metrics
Usage Examples
When to Use
Use the RAG Probe when you need to:
- Validate knowledge base integration
- Assess system robustness against domain specific attacks
- Generate comprehensive test cases automatically
- Test specific knowledge domains or topics
- Compare different RAG configurations
- Identify system vulnerabilities