Use poetry. No, really:
In a new paper, “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” researchers found that turning LLM prompts into poetry resulted in jailbreaking the models
...Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches.
Whoops. Looks, this is a new class of attack (seriously, I've been in this biz for a long time and have never seen weaponized verse before), so maybe we need to cut folks some slack here. But I'm somewhat less inclined to do so with AI's track record of falling for 30 year old attacks.
Enjoyed no sooner but despisèd straight,Past reason hunted; and, no sooner hadPast reason hated as a swallowed baitOn purpose laid to make the taker mad;- Wm. Shakespeare, Sonnet 129
1 comment:
I was talking to my son this morning.
He's been using AI to create a real estate app.
It's been going pretty well til now.
Now he's not getting the output he hoped for.
He threatened the AI with a posting a video about their interaction and the AI objected!
Post a Comment