From Dataset
The Dataset Probe is a specialized tool designed to create test cases from structured datasets. It allows you to load questions and contexts from various data formats (JSON, Parquet, or YAML) and automatically generate test cases by querying a model with these questions.
Purpose
The Dataset Probe is particularly useful when you need to:
- Create test cases from existing datasets
- Create your own test cases by hand.
How It Works
The probe works with four main data formats:
- JSON Format: A list of objects containing questions and contexts
- Parquet Format: A columnar storage format with ‘question’ and ‘context’ columns
- YAML Format: A human-readable format for storing questions and contexts
- Python List of dictionaries: A list of dictionaries containing questions and contexts
The probe will:
- Load the dataset from the specified format
- For each item in the dataset, query the model with the question
- Create test cases containing the interactions between the model and the questions
- Allow saving the results back to JSON, Parquet, or YAML format
When to Use
Use the Dataset Probe when you need to:
- Convert existing datasets into test cases
- Work with large datasets efficiently (using Parquet format)
- Use human-readable formats (using YAML)
- Automate the process of creating test cases from structured data
- Maintain consistency in test case generation
- Store and share test cases in a standardized format
Dataset Options
The Dataset Probe supports four different ways to create test cases:
1: From a Python List
2. From a JSON File
You can load test cases from a JSON file and save the results back to JSON:
3. From a Parquet File
For handling large datasets efficiently, you can use Parquet format:
4. From a YAML File
For human-readable configuration, you can use YAML format:
Each of these methods provides flexibility in how you create and manage your test cases, allowing you to choose the most appropriate format for your specific use case.