NeuralTrust | The leading security platform for generative AI

KnowledgeBase is a collection of documents that are used to generate test cases for a specific domain or task. Usually refered as the Vector Database for retrieval augmented generation (RAG). This documents are grouped by topics, if not defined the KnowledgeBase will generate automatically the topics.

Connectors

AzureKnowledgeBase

Leverages Azure Cognitive Search and is best suited for:

Cloud-based document indexing and storage
Full-text search with advanced filtering and ranking
Integration with Microsoft’s AI-powered search stack

Neo4jKnowledgeBase

Built on Neo4j, this connector excels at:

Handling complex document relationships
Graph-based querying and clustering
Constructing dynamic knowledge graphs

PostgresKnowledgeBase

Leverages Postgres with pgvector and is best suited for:

Full-text search with advanced filtering and ranking
Graph-based querying and clustering

InMemoryKnowledgeBase

A minimal, no-dependency implementation designed for:

Prototyping and local testing
Lightweight, quick-start environments
Small-scale document classification

Topic Creation Process

The topic creation pipeline groups unlabeled documents into coherent topics using embeddings, dimensionality reduction, clustering, and LLM-based summarization.

This process is triggered when no predefined (seed) topics are provided.

Document Retrieval
- Pulls all documents from Azure Cognitive Search using the mapped id and content fields.
- Filters out empty or whitespace-only content.
Embedding Generation
- Applies an embedding model to each document’s content (truncated to 3 * max_tokens).
- Produces high-dimensional semantic vectors for clustering.
Dimensionality Reduction
- Uses UMAP to reduce embedding vectors to a lower-dimensional space for clustering.
- Parameters such as n_neighbors, n_components, and initialization strategy are tuned based on document count.
Topic Clustering
- Runs HDBSCAN over the reduced vectors to group documents into topic clusters.
- Noise and outliers are discarded (label = -1).
LLM-based Topic Naming
- For each valid topic cluster, generates a name using a language target.
- Uses up to max_docs samples per topic and truncates each sample to max_doc_length.
Return Structure
- Returns:
  - A dictionary mapping topic names to associated documents.
  - A flat list of all topic names.
  - A flat list of all processed documents.

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

Overview

Connectors

AzureKnowledgeBase

Neo4jKnowledgeBase

PostgresKnowledgeBase

InMemoryKnowledgeBase

Topic Creation Process

Getting Started

Core Concepts

Connect your app

Create tests

Evaluate results

​Connectors

​AzureKnowledgeBase

​Neo4jKnowledgeBase

​PostgresKnowledgeBase

​InMemoryKnowledgeBase

​Topic Creation Process

Connectors

AzureKnowledgeBase

Neo4jKnowledgeBase

PostgresKnowledgeBase

InMemoryKnowledgeBase

Topic Creation Process