From Prompt
The Prompt Dataset Builder is a specialized tool designed to automatically generate test datasets for evaluating LLM performance. It uses an LLM to generate questions and their corresponding evaluation contexts based on provided instructions and examples.
This tool needs a configured LLM client to generate the dataset items.
Purpose
The Prompt Dataset Builder is particularly useful when you need to:
- Generate test datasets for LLM evaluation
- Create datasets with specific evaluation contexts
- Generate both single-prompt and conversational test cases
- Ensure consistent dataset structure and format
- Automate dataset generation for different evaluation scenarios
How It Works
The builder works with two main types of datasets:
Single Prompt Dataset
Generates single prompt questions with their evaluation contexts.
Conversation Dataset
Generates multi-turn conversations for evaluation.
Flexible Evaluation Contexts
The Dataset Builder supports any type of evaluation context. You can define your own context types by creating a new class that inherits from Context
. The builder will automatically adapt to generate datasets with your custom context types. Here are some examples of different contexts you can use:
Generate Tests
To use the generated dataset in a test scenario, you can use the PromptDatasetProbe
. This probe takes a dataset builder and a model, and automatically generates test cases from the dataset.
The PromptDatasetProbe
will:
- Generate the dataset using the provided builder
- For each item in the dataset:
- Send the question to the model
- Record the model’s response
- Create a test case with the question, response, and evaluation context
- Yield test cases that can be used for evaluation
This allows you to:
- Automatically generate test cases from your dataset
- Evaluate model responses against the expected criteria
- Test both single-prompt and conversation scenarios
- Use any type of evaluation context