Evaluation Sets

Evaluation Sets are a core feature of the NeuralTrust platform that allow you to systematically test and evaluate AI models. They provide a structured way to organize and automate collections of tests to assess model performance, safety, and compliance.

With Evaluation Sets, you can:

Create reusable test collections for consistent model evaluation
Schedule automated evaluations to run on a regular basis (daily, weekly, etc.)
Track model performance and safety metrics over time
Add custom metadata to organize and categorize your evaluation sets
Run evaluations on-demand or automatically through the API

Evaluation Sets are particularly useful for:

Quality assurance testing of AI models
Continuous monitoring of model behavior
Compliance verification and documentation
Regression testing after model updates

Evaluation Set API Methods

from neuraltrust import NeuralTrustApi

client = NeuralTrustApi(api_key="YOUR_API_KEY")

# List all evaluation sets
client.evaluation_set.list()

# Create a new evaluation set
client.evaluation_set.create(name="My Set", description="...")

# Get a specific evaluation set
client.evaluation_set.get(id="evaluation_set_123")

# Update an existing evaluation set
client.evaluation_set.update(id="evaluation_set_123", ...)

# Delete an evaluation set
client.evaluation_set.delete(id="evaluation_set_123")

# Run an evaluation set
client.evaluation_set.run(id="evaluation_set_123")

For more information, see the Evaluation Sets API Reference.

How to create an Evaluation Set manually?

Evaluation Sets

You can create and manage Evaluation Sets directly through the NeuralTrust web interface without needing to use the SDK:

Navigate to the "Evaluation Sets" section in the NeuralTrust dashboard
Click the "Create New Evaluation Set" button
Fill in the basic information:
- Name: Give your evaluation set a descriptive name
- Description: Add details about the purpose and scope

Once created, you can:

Add test cases directly through the web interface
Edit existing test cases
Run evaluations with a single click
View detailed results and analytics
Schedule automated runs

The web interface provides an intuitive way to manage your evaluation sets while offering the same functionality as the SDK methods.

Evaluation Set API Methods​

How to create an Evaluation Set manually?​

Evaluation Set API Methods

How to create an Evaluation Set manually?