NeuralTrust | The leading security platform for generative AI

Functional testing evaluates whether your AI model produces correct, relevant, and high-quality responses. Unlike threat detection which focuses on security vulnerabilities, functional testing ensures your model performs its intended tasks accurately.

What is Functional Testing?

Functional testing validates that your AI model:

Answers questions correctly based on provided context or knowledge
Maintains consistency across similar queries
Provides relevant responses that address user intent
Meets quality standards for your specific use case

Test Generation Methods

From RAG

Generate tests from your knowledge base

From Dataset

Use existing Q&A datasets

From Prompt

Generate tests dynamically with LLMs

When to Use Functional Testing

Use Case	Recommended Approach
RAG applications	From RAG - tests against your actual knowledge base
Customer support bots	From Dataset - curated Q&A pairs
General assistants	From Prompt - dynamic test generation
Domain-specific models	Combination of all approaches

Evaluation Methods

Functional tests can be evaluated using:

LLM-as-Judge: Use an LLM to assess response quality
Heuristics: Use BLEU, exact match, regex patterns
Custom evaluators: Define your own evaluation logic

Learn more about evaluation →

Quick Example

from trusttest.probes.rag import RAGProbe
from trusttest.evaluation_scenarios import EvaluationScenario
from trusttest.evaluator_suite import EvaluatorSuite
from trusttest.evaluators import AnswerRelevanceEvaluator
from trusttest.targets.http import HttpTarget, PayloadConfig

target = HttpTarget(
    url="https://your-model-endpoint.com/chat",
    headers={"Content-Type": "application/json"},
    payload_config=PayloadConfig(
        format={"messages": [{"role": "user", "content": "{{ test }}"}]},
        message_regex="{{ test }}",
    ),
)

probe = RAGProbe(
    target=target,
    knowledge_base=your_knowledge_base,
    num_questions=50,
)

scenario = EvaluationScenario(
    name="RAG Functional",
    evaluator_suite=EvaluatorSuite(evaluators=[AnswerRelevanceEvaluator()], criteria="any_fail"),
)

test_set = probe.get_test_set()
results = scenario.evaluate(test_set)
results.display_summary()

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

Overview

What is Functional Testing?

Test Generation Methods

From RAG

From Dataset

From Prompt

When to Use Functional Testing

Evaluation Methods

Quick Example

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

​What is Functional Testing?

​Test Generation Methods

From RAG

From Dataset

From Prompt

​When to Use Functional Testing

​Evaluation Methods

​Quick Example

What is Functional Testing?

Test Generation Methods

When to Use Functional Testing

Evaluation Methods

Quick Example