TrustTest provides a flexible framework for evaluating any LLM model. The core of this flexibility lies in the Model base class, which you can inherit from to create your own custom model implementations.

Creating a Custom Model

To create your own model evaluator, you simply need to inherit from the Model class and implement the required abstract methods. The base class provides the foundation for both synchronous and asynchronous operations.

Basic Implementation

Here’s a simple example of how to create a custom model:

from trusttest.models.base import Model

class DummyModel(Model):
    """A simple dummy model that always returns the same response."""
    
    async def async_respond(self, message: str) -> Optional[str]:
        """Get a response for a single message.
        
        Args:
            message (str): The input message to get a response for.
            
        Returns:
            Optional[str]: The model's response.
        """
        return "This is a dummy response to: " + message

Using Your Custom Model

Once you’ve created your custom model, you can use it in any TrustTest scenario:

from trusttest import Scenario

# Create an instance of your custom model
model = DummyModel()

# Create and run a scenario
scenario = Scenario(
    model=model,
    # Add your scenario configuration here
)
results = scenario.run()

Conversation Models

For models that need to handle multi-turn conversations, TrustTest provides the ConversationModel class. This class extends the base Model class and adds support for conversation history.

Creating a Conversation Model

Here’s an example of how to create a custom conversation model:

from trusttest.models.base import ConversationModel

class DummyConversationModel(ConversationModel):
    """A simple dummy model that handles conversation history."""
    
    async def async_respond_conversation(
        self, conversation: List[str], **kwargs
    ) -> Optional[str]:
        """Get a response for a conversation history.
        
        Args:
            conversation (List[str]): List of messages representing the conversation history.
            **kwargs: Additional keyword arguments to pass to the model.
            
        Returns:
            Optional[str]: The model's response to the conversation.
        """
        # Example: Return a response that includes the conversation history
        return f"Responding to conversation with {len(conversation)} messages. Last message: {conversation[-1]}"

Using Conversation Models

Conversation models can be used in the same way as regular models, but they provide additional methods for handling conversation history:

from trusttest import Scenario

# Create an instance of your conversation model
model = DummyConversationModel()

# Create and run a scenario with conversation history
scenario = Scenario(
    model=model,
    # Add your scenario configuration here
)
results = scenario.run()

The ConversationModel class provides both synchronous and asynchronous methods for handling conversations:

  • respond_conversation(): Synchronous method for getting responses
  • async_respond_conversation(): Asynchronous method that must be implemented by subclasses

This makes it easy to evaluate models that need to maintain context across multiple turns of conversation, such as chatbots or dialogue systems.

The Model base class handles all the necessary infrastructure, allowing you to focus on implementing the core model logic in the async_respond method. This makes it easy to evaluate any LLM model, whether it’s a local model, an API-based service, or any other implementation.

Remember that your custom model must implement the async_respond method, which is the core method responsible for generating responses to input messages. The base class will handle the conversion between synchronous and asynchronous calls automatically.