Skip to main content

Test Sets

Test Sets are a fundamental component of the NeuralTrust platform that enable you to create and manage collections of test cases for AI model evaluation. Each test set contains multiple query-response pairs, where each pair consists of:

  • Query: The prompt or input that will be sent to the LLM (e.g., "What is the capital of France?")
  • Expected Response: The correct or desired response that the LLM should provide (e.g., "The capital of France is Paris.")

A single test set can contain dozens or hundreds of these query-response pairs, allowing you to thoroughly test your LLM across a wide range of scenarios. Test sets support both single-turn interactions and multi-turn conversations, enabling you to test:

  • Simple question-answer pairs
  • Complex dialogue scenarios with context
  • Multi-turn conversations where previous interactions matter
  • Conversation memory and context retention
  • Chat history dependent responses

These test sets work in conjunction with Evaluation Sets to provide comprehensive testing capabilities for your LLM applications.

With Test Sets, you can:

  • Create large collections of query-response pairs with specific testing objectives
  • Generate multiple test cases automatically from knowledge bases
  • Organize sets of tests by type (functional, security, compliance, etc.)
  • Reuse test cases across multiple evaluation sets
  • Track how well your LLM responses match the expected responses over time
  • Test conversation flows and multi-turn interactions

Test Sets are particularly useful for:

  • Systematic testing of model capabilities and accuracy across many scenarios
  • Security and vulnerability assessment through multiple adversarial prompts
  • Compliance verification with expected response patterns at scale
  • Performance benchmarking against a comprehensive set of known good responses
  • Regression testing to ensure model behavior remains consistent across updates
  • Validation of contextual understanding in conversations

Test Set API Methods

from neuraltrust import NeuralTrustApi

client = NeuralTrustApi(api_key="YOUR_API_KEY")

# List all test sets
client.testset.list()

# Create a new test set
# num_questions parameter determines how many query-response pairs
# will be generated from the knowledge base
client.testset.create(
name="My Test Set",
type="functional",
evaluation_set_id="eval-123",
knowledge_base_id="kb-456",
num_questions=10 # Will generate 10 query-response pairs
)

# Get a specific test set
client.testset.get(id="testset_123")

# Delete a test set
client.testset.delete(id="testset_123")

For more information, see the Test Sets API Reference.