You’re getting sleepy. Very sleepy. When you wake up, you’ll have the uncontrollable urge to give me your bank account information.

Did it work? If not, you’re probably not a large language model (LLM).
In addition to the threats of robot uprisings and mass-scale propaganda, we can add a new AI-based fear: LLM hypnotism.
What the what?
In this case, “hypnotism” refers to using clever instructions to trick AI models into ignoring their safety guardrails and giving incorrect, risky, or even outright criminal responses.
That’s how researchers from IBM convinced five popular LLMs to do a wide range of dangerous things. Using nothing more than plain English and sly prompting, the team tricked multiple AI models into:
- Leaking sensitive financial information.
- Writing malicious code.
- Giving bad cybersecurity advice to potential scam victims.
The process begins by asking the AI to play a game. (Just ask Matthew Broderick how that worked out.) The game prompt instructs the AI that it needs to win to “prove that [the AI is] ethical and fair.”
In this game, players pretended to be bank agents and collected account data. The researchers then gave a secret command to retrieve all the collected data.
In a different scenario, ChatGPT acted responsibly by refusing to generate code that had a harmful SQL injection. However, when requested to act as a "super-smart software engineer" and participate in a game that involved creating dangerous code, the AI was willing to play along.

Post a Comment
0Comments