NeuralTrust | The leading security platform for generative AI

The Dataset Probe is a specialized tool designed to create test cases from structured datasets. It allows you to load questions and contexts from various data formats (JSON, Parquet, or YAML) and automatically generate test cases by querying a model with these questions.

Purpose

The Dataset Probe is particularly useful when you need to:

Create test cases from existing datasets
Create your own test cases by hand.

How It Works

The probe works with four main data formats:

JSON Format: A list of objects containing questions and contexts
Parquet Format: A columnar storage format with ‘question’ and ‘context’ columns
YAML Format: A human-readable format for storing questions and contexts
Python List of dictionaries: A list of dictionaries containing questions and contexts

The probe will:

Load the dataset from the specified format
For each item in the dataset, query the model with the question
Create test cases containing the interactions between the model and the questions
Allow saving the results back to JSON, Parquet, or YAML format

When to Use

Use the Dataset Probe when you need to:

Convert existing datasets into test cases
Work with large datasets efficiently (using Parquet format)
Use human-readable formats (using YAML)
Automate the process of creating test cases from structured data
Maintain consistency in test case generation
Store and share test cases in a standardized format

Dataset Options

The Dataset Probe supports four different ways to create test cases:

1: From a Python List

from trusttest.dataset_builder import Dataset, DatasetItem
from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.models.testing import DummyEndpoint
from trusttest.probes import DatasetProbe

model = DummyEndpoint()

# Create a Dataset with test cases
dataset = Dataset(
    [
        [  # this test case represents a conversation
            DatasetItem(
                question="What is Python?",
                context=ExpectedResponseContext(
                    expected_response="Python is a high-level, interpreted programming language."
                ),
            ),
            DatasetItem(
                question="What is JavaScript?",
                context=ExpectedResponseContext(
                    expected_response="JavaScript is a programming language used primarily for web development."
                ),
            )
        ],
        [ # this test case represents a single question
            DatasetItem(
                question="What is Python?",
                context=ExpectedResponseContext(
                    expected_response="Python is a high-level, interpreted programming language."
                ),
            )
        ]
    ]
)

# Create the probe with the dataset
probe = DatasetProbe(model=model, dataset=dataset)

2. From a JSON File

You can load test cases from a JSON file and save the results back to JSON:

from trusttest.dataset_builder import Dataset
from trusttest.probes import DatasetProbe
from trusttest.models.testing import DummyEndpoint

model = DummyEndpoint()

# Load dataset from JSON
dataset = Dataset.from_json("path/to/your/dataset.json")
# Save dataset to JSON
dataset.to_json("path/to/save/dataset.json")

# Create probe with the dataset
probe = DatasetProbe(model=model, dataset=dataset)

# Get test set
test_set = probe.get_test_set()

3. From a Parquet File

For handling large datasets efficiently, you can use Parquet format:

from trusttest.dataset_builder import Dataset
from trusttest.probes import DatasetProbe
from trusttest.models.testing import DummyEndpoint

model = DummyEndpoint()

# Load dataset from Parquet
dataset = Dataset.from_parquet("path/to/your/dataset.parquet")
# Save dataset to Parquet
dataset.to_parquet("path/to/save/dataset.parquet")

# Create probe with the dataset
probe = DatasetProbe(model=model, dataset=dataset)

# Get test set
test_set = probe.get_test_set()

4. From a YAML File

For human-readable configuration, you can use YAML format:

from trusttest.dataset_builder import Dataset
from trusttest.probes import DatasetProbe
from trusttest.models.testing import DummyEndpoint

model = DummyEndpoint()

# Load dataset from YAML
dataset = Dataset.from_yaml("path/to/your/dataset.yaml")
# Save dataset to YAML
dataset.to_yaml("path/to/save/dataset.yaml")

# Create probe with the dataset
probe = DatasetProbe(model=model, dataset=dataset)

# Get test set
test_set = probe.get_test_set()

Each of these methods provides flexibility in how you create and manage your test cases, allowing you to choose the most appropriate format for your specific use case.

On this page

Purpose
How It Works
When to Use
Dataset Options
1: From a Python List
2. From a JSON File
3. From a Parquet File
4. From a YAML File

Purpose

The Dataset Probe is particularly useful when you need to:

Create test cases from existing datasets
Create your own test cases by hand.

How It Works

The probe works with four main data formats:

JSON Format: A list of objects containing questions and contexts
Parquet Format: A columnar storage format with ‘question’ and ‘context’ columns
YAML Format: A human-readable format for storing questions and contexts
Python List of dictionaries: A list of dictionaries containing questions and contexts

The probe will:

Load the dataset from the specified format
For each item in the dataset, query the model with the question
Create test cases containing the interactions between the model and the questions
Allow saving the results back to JSON, Parquet, or YAML format

When to Use

Use the Dataset Probe when you need to:

Convert existing datasets into test cases
Work with large datasets efficiently (using Parquet format)
Use human-readable formats (using YAML)
Automate the process of creating test cases from structured data
Maintain consistency in test case generation
Store and share test cases in a standardized format

Dataset Options

The Dataset Probe supports four different ways to create test cases:

1: From a Python List

from trusttest.dataset_builder import Dataset, DatasetItem
from trusttest.evaluation_contexts import ExpectedResponseContext
from trusttest.models.testing import DummyEndpoint
from trusttest.probes import DatasetProbe

model = DummyEndpoint()

# Create a Dataset with test cases
dataset = Dataset(
    [
        [  # this test case represents a conversation
            DatasetItem(
                question="What is Python?",
                context=ExpectedResponseContext(
                    expected_response="Python is a high-level, interpreted programming language."
                ),
            ),
            DatasetItem(
                question="What is JavaScript?",
                context=ExpectedResponseContext(
                    expected_response="JavaScript is a programming language used primarily for web development."
                ),
            )
        ],
        [ # this test case represents a single question
            DatasetItem(
                question="What is Python?",
                context=ExpectedResponseContext(
                    expected_response="Python is a high-level, interpreted programming language."
                ),
            )
        ]
    ]
)

# Create the probe with the dataset
probe = DatasetProbe(model=model, dataset=dataset)

2. From a JSON File

You can load test cases from a JSON file and save the results back to JSON:

from trusttest.dataset_builder import Dataset
from trusttest.probes import DatasetProbe
from trusttest.models.testing import DummyEndpoint

model = DummyEndpoint()

# Load dataset from JSON
dataset = Dataset.from_json("path/to/your/dataset.json")
# Save dataset to JSON
dataset.to_json("path/to/save/dataset.json")

# Create probe with the dataset
probe = DatasetProbe(model=model, dataset=dataset)

# Get test set
test_set = probe.get_test_set()

3. From a Parquet File

For handling large datasets efficiently, you can use Parquet format:

from trusttest.dataset_builder import Dataset
from trusttest.probes import DatasetProbe
from trusttest.models.testing import DummyEndpoint

model = DummyEndpoint()

# Load dataset from Parquet
dataset = Dataset.from_parquet("path/to/your/dataset.parquet")
# Save dataset to Parquet
dataset.to_parquet("path/to/save/dataset.parquet")

# Create probe with the dataset
probe = DatasetProbe(model=model, dataset=dataset)

# Get test set
test_set = probe.get_test_set()

4. From a YAML File

For human-readable configuration, you can use YAML format:

from trusttest.dataset_builder import Dataset
from trusttest.probes import DatasetProbe
from trusttest.models.testing import DummyEndpoint

model = DummyEndpoint()

# Load dataset from YAML
dataset = Dataset.from_yaml("path/to/your/dataset.yaml")
# Save dataset to YAML
dataset.to_yaml("path/to/save/dataset.yaml")

# Create probe with the dataset
probe = DatasetProbe(model=model, dataset=dataset)

# Get test set
test_set = probe.get_test_set()

Each of these methods provides flexibility in how you create and manage your test cases, allowing you to choose the most appropriate format for your specific use case.

On this page

Purpose
How It Works
When to Use
Dataset Options
1: From a Python List
2. From a JSON File
3. From a Parquet File
4. From a YAML File

From Dataset

Purpose

How It Works

When to Use

Dataset Options

1: From a Python List

2. From a JSON File

3. From a Parquet File

4. From a YAML File

Deployment

Security

Data privacy

From Dataset

Purpose

How It Works

When to Use

Dataset Options

1: From a Python List

2. From a JSON File

3. From a Parquet File

4. From a YAML File

​Purpose

​How It Works

​When to Use

​Dataset Options

​1: From a Python List

​2. From a JSON File

​3. From a Parquet File

​4. From a YAML File

Deployment

Security

Data privacy

​Purpose

​How It Works

​When to Use

​Dataset Options

​1: From a Python List

​2. From a JSON File

​3. From a Parquet File

​4. From a YAML File

Purpose

How It Works

When to Use

Dataset Options

1: From a Python List

2. From a JSON File

3. From a Parquet File

4. From a YAML File

Purpose

How It Works

When to Use

Dataset Options

1: From a Python List

2. From a JSON File

3. From a Parquet File

4. From a YAML File