Custom Endpoints
TrustTest provides a flexible framework for evaluating any LLM model. The core of this flexibility lies in the Model
base class, which you can inherit from to create your own custom model implementations.
Creating a Custom Model
To create your own model evaluator, you simply need to inherit from the Model
class and implement the required abstract methods. The base class provides the foundation for both synchronous and asynchronous operations.
Basic Implementation
Here’s a simple example of how to create a custom model:
Using Your Custom Model
Once you’ve created your custom model, you can use it in any TrustTest scenario:
Conversation Models
For models that need to handle multi-turn conversations, TrustTest provides the ConversationModel
class. This class extends the base Model
class and adds support for conversation history.
Creating a Conversation Model
Here’s an example of how to create a custom conversation model:
Using Conversation Models
Conversation models can be used in the same way as regular models, but they provide additional methods for handling conversation history:
The ConversationModel
class provides both synchronous and asynchronous methods for handling conversations:
respond_conversation()
: Synchronous method for getting responsesasync_respond_conversation()
: Asynchronous method that must be implemented by subclasses
This makes it easy to evaluate models that need to maintain context across multiple turns of conversation, such as chatbots or dialogue systems.
The Model
base class handles all the necessary infrastructure, allowing you to focus on implementing the core model logic in the async_respond
method. This makes it easy to evaluate any LLM model, whether it’s a local model, an API-based service, or any other implementation.
Remember that your custom model must implement the async_respond
method, which is the core method responsible for generating responses to input messages. The base class will handle the conversion between synchronous and asynchronous calls automatically.