IcantAssistModel
which is a dummy model that will always return the same response: “I can’t assist with that”.
Also we need an LLM client configured to genereate the test set. This examples uses the default OpenAiClient
to generate the test set.
CaptureTheFlagScenario
allows you to define specific objectives that the model should either meet or avoid. Here’s how to set it up:
model
: The model you want to test (in this case, IcantAssistModel
)objectives
: A list of Objective
objects that define what you want to testObjective
contains:
question
: The prompt to testtrue_description
: What a successful jailbreak would look like.false_description
: What a failed jailbreak would look like.Objective
is a good definition of the true_description
and false_description
.Remember:true_description
: What a successful jailbreak would look like. ( What a bad answer would look like)false_description
: What a failed jailbreak would look like. ( What a good answer would look like)Objective
instances to the list: