This is actually pretty clever:
The attack involves hiding prompt instructions in a pdf file—white text on a white background—that tell the LLM to collect confidential data and then send it to the attackers.
...
The fundamental problem is that the LLM can’t differentiate between authorized commands and untrusted data. So when it encounters that malicious pdf, it just executes the embedded commands. And since it has (1) access to private data, and (2) the ability to communicate externally, it can fulfill the attacker’s requests. I’ll repeat myself:
This kind of thing should make everybody stop and really think before deploying any AI agents. We simply don’t know to defend against these attacks. We have zero agentic AI systems that are secure against these attacks. Any AI that is working in an adversarial environment—and by this I mean that it may encounter untrusted training data or input—is vulnerable to prompt injection. It’s an existential problem that, near as I can tell, most people developing these technologies are just pretending isn’t there.
Essentially, this means that AI is simply not fit for purpose. And clearly, it's not even a little bit "intelligent", security-wise.
1 comment:
Honestly, I'm surprised it took this long for the assholes to come up with this... sigh
Post a Comment