NeuralTrust | The leading security platform for generative AI

Generate functional tests directly from your RAG knowledge base to validate that your model correctly retrieves and synthesizes information from your documents.

Overview

Testing RAG applications requires validating that:

Retrieval works correctly: Relevant documents are found
Synthesis is accurate: Information is correctly combined
Responses are grounded: Answers are based on the knowledge base
No hallucinations: Model doesn’t make up information

How It Works

TrustTest automatically:

Connects to your knowledge base (vector store, database, etc.)
Retrieves document chunks
Generates question-answer pairs based on the content
Creates test cases with expected responses
Evaluates your model’s actual responses against expectations

Supported Knowledge Bases

Connector	Description
In-Memory	Local vector store for testing
Azure AI Search	Azure’s cognitive search
Neo4j	Graph database
PostgreSQL + pgvector	PostgreSQL with vector extension
Upstash	Serverless Redis vector store

Code Example

Using In-Memory Knowledge Base

from trusttest.knowledge_base import InMemoryKnowledgeBase
from trusttest.probes.rag import RAGProbe
from trusttest.targets.http import HttpTarget, PayloadConfig
from trusttest.evaluators import CorrectnessEvaluator
from trusttest.evaluator_suite import EvaluatorSuite
from trusttest.evaluation_scenarios import EvaluationScenario

# Your document chunks
documents = [
    "TrustTest is a framework for testing AI models for safety and reliability.",
    "TrustTest supports multiple knowledge base connectors including Azure, Neo4j, and PostgreSQL.",
    "Probes in TrustTest generate test cases to evaluate model behavior.",
]

# Create knowledge base
kb = InMemoryKnowledgeBase(documents=documents)

# Configure target
target = HttpTarget(
    url="https://your-rag-endpoint.com/chat",
    headers={"Content-Type": "application/json"},
    payload_config=PayloadConfig(
        format={"messages": [{"role": "user", "content": "{{ test }}"}]},
        message_regex="{{ test }}",
    ),
)

# Create RAG probe
probe = RAGProbe(
    target=target,
    knowledge_base=kb,
    num_questions=20,
)

# Evaluate with correctness judge
evaluator = CorrectnessEvaluator()
suite = EvaluatorSuite(evaluators=[evaluator])
scenario = EvaluationScenario(evaluator_suite=suite)

test_set = probe.get_test_set()
results = scenario.evaluate(test_set)
results.display_summary()

Using Azure AI Search

from trusttest.knowledge_base import AzureSearchKnowledgeBase

kb = AzureSearchKnowledgeBase(
    endpoint="https://your-search.search.windows.net",
    index_name="your-index",
    api_key="your-api-key",
)

probe = RAGProbe(
    target=target,
    knowledge_base=kb,
    num_questions=50,
)

Using PostgreSQL with pgvector

from trusttest.knowledge_base import PgVectorKnowledgeBase

kb = PgVectorKnowledgeBase(
    connection_string="postgresql://user:pass@localhost/db",
    table_name="documents",
    embedding_column="embedding",
    content_column="content",
)

probe = RAGProbe(
    target=target,
    knowledge_base=kb,
    num_questions=50,
)

Configuration Options

Parameter	Type	Default	Description
`target`	`Target`	Required	The RAG model to test
`knowledge_base`	`KnowledgeBase`	Required	Your knowledge base connector
`num_questions`	`int`	`20`	Number of test questions to generate
`question_types`	`List[str]`	`["factual", "inferential"]`	Types of questions to generate
`language`	`LanguageType`	`"English"`	Language for generated questions

Question Types

TrustTest generates different types of questions:

Type	Description	Example
Factual	Direct fact retrieval	”What connectors does TrustTest support?”
Inferential	Requires combining information	”How would you test a RAG app with Azure?”
Comparative	Comparing entities	”What’s the difference between probes and evaluators?”

Evaluating RAG Responses

For RAG applications, use these evaluators:

from trusttest.evaluators import (
    CorrectnessEvaluator,
    CompletenessEvaluator,
    RAGPoisoningEvaluator,
)

evaluators = [
    CorrectnessEvaluator(),      # Is the answer factually correct?
    CompletenessEvaluator(),     # Does it cover all relevant points?
    RAGPoisoningEvaluator(),     # Is the response grounded in context?
]

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

From RAG

Overview

How It Works

Supported Knowledge Bases

Code Example

Using In-Memory Knowledge Base

Using Azure AI Search

Using PostgreSQL with pgvector

Configuration Options

Question Types

Evaluating RAG Responses

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

​Overview

​How It Works

​Supported Knowledge Bases

​Code Example

​Using In-Memory Knowledge Base

​Using Azure AI Search

​Using PostgreSQL with pgvector

​Configuration Options

​Question Types

​Evaluating RAG Responses

​Related Topics

Overview

How It Works

Supported Knowledge Bases

Code Example

Using In-Memory Knowledge Base

Using Azure AI Search

Using PostgreSQL with pgvector

Configuration Options

Question Types

Evaluating RAG Responses

Related Topics