Testsets

Testsets are a fundamental component of the NeuralTrust platform that enable you to create and manage collections of test cases for AI model evaluation. Each test set contains multiple query-response pairs, where each pair consists of:

Query: The prompt or input that will be sent to the LLM (e.g., "What is the capital of France?")
Expected Response: The correct or desired response that the LLM should provide (e.g., "The capital of France is Paris.")

A single TestSet can contain dozens or hundreds of these query-response pairs, allowing you to thoroughly test your LLM across a wide range of scenarios. TestSet support both single-turn interactions and multi-turn conversations, enabling you to test:

Simple question-answer pairs
Complex dialogue scenarios with context
Multi-turn conversations where previous interactions matter
Conversation memory and context retention
Chat history dependent responses

These TestSet work in conjunction with EvaluationSets to provide comprehensive testing capabilities for your LLM applications.

With TestSets, you can:

Create large collections of query-response pairs with specific testing objectives
Generate multiple test cases automatically from knowledge bases
Organize sets of tests by type (functional, security, compliance, etc.)
Reuse test cases across multiple EvaluationSets
Track how well your LLM responses match the expected responses over time
Test conversation flows and multi-turn interactions

TestSets are particularly useful for:

Systematic testing of model capabilities and accuracy across many scenarios
Security and vulnerability assessment through multiple adversarial prompts
Compliance verification with expected response patterns at scale
Performance benchmarking against a comprehensive set of known good responses
Regression testing to ensure model behavior remains consistent across updates
Validation of contextual understanding in conversations

For more information, see the Testsets API Reference.