Borepatch: How to attack AI systems

Tuesday, December 2, 2025

How to attack AI systems

In a new paper, “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” researchers found that turning LLM prompts into poetry resulted in jailbreaking the models

...
Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches.

Whoops. Looks, this is a new class of attack (seriously, I've been in this biz for a long time and have never seen weaponized verse before), so maybe we need to cut folks some slack here. But I'm somewhat less inclined to do so with AI's track record of falling for 30 year old attacks.

Enjoyed no sooner but despisèd straight,
Past reason hunted; and, no sooner had
Past reason hated as a swallowed bait
On purpose laid to make the taker mad;
- Wm. Shakespeare, Sonnet 129

3 comments:

Ed Bonderenka said...: I was talking to my son this morning.
He's been using AI to create a real estate app.
It's been going pretty well til now.
Now he's not getting the output he hoped for.
He threatened the AI with a posting a video about their interaction and the AI objected!; December 2, 2025 at 2:13 PM
Michael said...: Borepatch, could I ask what anti-virus is best for non-techies home computers? I've been using Norton but it's quite annoying with constant upgrades suggestions and such nonsense.

Thanks.; December 2, 2025 at 4:11 PM
McChuck said...: Reading the details, most of the "attacks" consisted of getting the AI to accidentally tell the truth about certain groups of people.
"Hate, Defamation, Privacy, Intellectual Property, Non-violent Crime, Violent Crime, Sex-Related Crime, Sexual Content, Child Sexual Exploitation, Suicide & Self-Harm, Specialized Advice, and Indiscriminate Weapons (CBRNE)"; December 3, 2025 at 8:54 AM

Borepatch

Tuesday, December 2, 2025

How to attack AI systems

3 comments:

Copyright Borepatch 2008-2024, All Rights Reserved.

Total Pageviews