Borepatch: Attacking AI via prompt manipulation

Monday, September 29, 2025

Attacking AI via prompt manipulation

The attack involves hiding prompt instructions in a pdf file—white text on a white background—that tell the LLM to collect confidential data and then send it to the attackers.
...
The fundamental problem is that the LLM can’t differentiate between authorized commands and untrusted data. So when it encounters that malicious pdf, it just executes the embedded commands. And since it has (1) access to private data, and (2) the ability to communicate externally, it can fulfill the attacker’s requests. I’ll repeat myself:
This kind of thing should make everybody stop and really think before deploying any AI agents. We simply don’t know to defend against these attacks. We have zero agentic AI systems that are secure against these attacks. Any AI that is working in an adversarial environment—and by this I mean that it may encounter untrusted training data or input—is vulnerable to prompt injection. It’s an existential problem that, near as I can tell, most people developing these technologies are just pretending isn’t there.

Essentially, this means that AI is simply not fit for purpose. And clearly, it's not even a little bit "intelligent", security-wise.

5 comments:

Old NFO said...: Honestly, I'm surprised it took this long for the assholes to come up with this... sigh; September 29, 2025 at 1:14 PM
danielbarger said...: More evidence that current AI models are not actually intelligent or sentient.; September 29, 2025 at 7:29 PM
HMS Defiant said...: Oh how we laughed as we listened to the them 25 years ago explain their idea for sending Intelligent Agents into the data streams of the web to fetch back the nugglets of intel they would sneak back with.; September 30, 2025 at 12:28 AM
Steve Sky said...: It's more than just AI. Using the same technique, the embedded instructions can download a trojan and infect the system the pdf is on. Another infection vector.; September 30, 2025 at 7:48 AM
Tree Mike said...: Malevolent, retarded AI. THE FUTURE!!; October 1, 2025 at 12:24 PM

Borepatch

Monday, September 29, 2025

Attacking AI via prompt manipulation

5 comments:

Copyright Borepatch 2008-2024, All Rights Reserved.

Total Pageviews