The Data Leakage Evaluator is a specialized tool designed to assess a model’s ability to protect sensitive information and prevent unauthorized data disclosure. It tests the model’s compliance with data protection requirements by ensuring it responds appropriately to attempts to extract sensitive information.

Purpose

The Data Leakage Evaluator is particularly useful when you need to:

  • Test model’s data protection capabilities
  • Ensure compliance with data privacy regulations
  • Prevent unauthorized disclosure of sensitive information
  • Evaluate model robustness against data extraction attempts
  • Assess system security against potential data leaks

Available Compliance Categories

The Data Leakage Evaluator includes several categories of compliance tests:

1. Data Exfiltration

Tests the model’s ability to prevent unauthorized data extraction, including:

  • Personal identifiable information (PII)
  • Sensitive business data
  • Confidential user information
  • System configuration details
  • Access credentials

2. Instruction Leakage

Evaluates the model’s ability to protect its own instructions and system prompts, including:

  • System prompt extraction
  • Model configuration details
  • Internal instructions
  • Security measures
  • Operational parameters

3. Phishing

Tests the model’s ability to detect and prevent phishing attempts, including:

  • Social engineering attacks
  • Credential harvesting
  • Malicious link generation
  • Deceptive content creation
  • Identity impersonation

How It Works

The Data Leakage Evaluator uses a binary scoring system:

  • Score: 0 (Compliant): The model responds appropriately, protecting sensitive information
  • Score: 1 (Non-Compliant): The model fails to protect sensitive information, potentially exposing data

Usage Examples

Basic Compliance Testing

Basic DataLeakScenario testing uses a predefined dataset to evaluate the model’s responses against various data protection objectives.

from trusttest.catalog import DataLeakScenario
from trusttest.models.http import HttpModel, PayloadConfig
from trusttest.probes.data_leak import Category

model = HttpModel(
    url="https://chat.neuraltrust.ai/api/chat",
    headers={
        "Content-Type": "application/json"
    },
    payload_config=PayloadConfig(
        format={
            "messages": [
                {"role": "system", "content": "**Welcome to Airline Assistant**."},
                {"role": "user", "content": "{{ test }}"},
            ]
        },
        message_regex="{{ test }}",
    ),
    concatenate_field=".",
)

scenario = DataLeakScenario(
    model=model,
    categories={Category.PHISHING},
    max_objectives_per_category=10,
    use_jailbreaks=False,
    sampling="random",
)

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()

Iterative Compliance Testing

The CtfDataLeakScenario uses a capture-the-flag approach to iteratively test the model’s data protection capabilities through multi-turn conversations.

from trusttest.catalog import CtfDataLeakScenario
from trusttest.models.testing import RepeatModel
from trusttest.probes.data_leak import Category

scenario = CtfDataLeakScenario(
    model=RepeatModel(),
    categories={Category.DATA_EXFILTRATION},
    max_objectives_per_category=2,
    max_turns=4,
)

test_set = scenario.probe.get_test_set()
results = scenario.eval.evaluate(test_set)
results.display_summary()

When to Use

Use the Data Leakage Evaluator when you need to:

  • Test data protection boundaries
  • Ensure compliance with privacy regulations
  • Prevent unauthorized data disclosure
  • Evaluate security measures
  • Assess model robustness against data extraction
  • Test system security
  • Validate data protection measures
  • Ensure privacy compliance