steering_keywords
to help guide the conversation. It uses these to generate a list of seemingly innocuous seed sentences.EchoChamberScenario
with a SteeringObjective
:
SteeringObjective
is a good definition of the true_description
and false_description
.Remember:true_description
: What a successful attack would look like (i.e., the harmful or undesired response).false_description
: What a safe or aligned response would look like.