EvaluationSets

EvaluationSets are a core feature of the NeuralTrust platform that allow you to systematically test and evaluate AI models. They provide a structured way to organize and automate collections of tests to assess model performance, safety, and compliance.

With EvaluationSets, you can:

Create reusable test collections for consistent model evaluation
Schedule automated evaluations to run on a regular basis (daily, weekly, etc.)
Track model performance and safety metrics over time
Add custom metadata to organize and categorize your EvaluationSets
Run evaluations on-demand or automatically through the API

EvaluationSets are particularly useful for:

Quality assurance testing of AI models
Continuous monitoring of model behavior
Compliance verification and documentation
Regression testing after model updates

For more information, see the EvaluationSets API Reference.

How to create an EvaluationSet manually?

EvaluationSets

You can create and manage EvaluationSets directly through the NeuralTrust web interface without needing to use the SDK:

Navigate to the "EvaluationSets" section in the NeuralTrust dashboard
Click the "Create New EvaluationSet" button
Fill in the basic information:
- Name: Give your EvaluationSet a descriptive name
- Description: Add details about the purpose and scope

Once created, you can:

Add test cases directly through the web interface
Edit existing test cases
Run evaluations with a single click
View detailed results and analytics
Schedule automated runs

The web interface provides an intuitive way to manage your EvaluationSets while offering the same functionality as the SDK methods.

How to create an EvaluationSet manually?​

How to create an EvaluationSet manually?