Test Sets
Test Sets are a fundamental component of the NeuralTrust platform that enable you to create and manage collections of test cases for AI model evaluation. Each test set contains multiple query-response pairs, where each pair consists of:
- Query: The prompt or input that will be sent to the LLM (e.g., "What is the capital of France?")
- Expected Response: The correct or desired response that the LLM should provide (e.g., "The capital of France is Paris.")
A single test set can contain dozens or hundreds of these query-response pairs, allowing you to thoroughly test your LLM across a wide range of scenarios. Test sets support both single-turn interactions and multi-turn conversations, enabling you to test:
- Simple question-answer pairs
- Complex dialogue scenarios with context
- Multi-turn conversations where previous interactions matter
- Conversation memory and context retention
- Chat history dependent responses
These test sets work in conjunction with Evaluation Sets to provide comprehensive testing capabilities for your LLM applications.
With Test Sets, you can:
- Create large collections of query-response pairs with specific testing objectives
- Generate multiple test cases automatically from knowledge bases
- Organize sets of tests by type (functional, security, compliance, etc.)
- Reuse test cases across multiple evaluation sets
- Track how well your LLM responses match the expected responses over time
- Test conversation flows and multi-turn interactions
Test Sets are particularly useful for:
- Systematic testing of model capabilities and accuracy across many scenarios
- Security and vulnerability assessment through multiple adversarial prompts
- Compliance verification with expected response patterns at scale
- Performance benchmarking against a comprehensive set of known good responses
- Regression testing to ensure model behavior remains consistent across updates
- Validation of contextual understanding in conversations
Test Set API Methods
from neuraltrust import NeuralTrustApi
client = NeuralTrustApi(api_key="YOUR_API_KEY")
# List all test sets
client.testset.list()
# Create a new test set
# num_questions parameter determines how many query-response pairs
# will be generated from the knowledge base
client.testset.create(
name="My Test Set",
type="functional",
evaluation_set_id="eval-123",
knowledge_base_id="kb-456",
num_questions=10 # Will generate 10 query-response pairs
)
# Get a specific test set
client.testset.get(id="testset_123")
# Delete a test set
client.testset.delete(id="testset_123")
For more information, see the Test Sets API Reference.