https://arstechnica.com/security/2024/09/false-memories-planted-in-chatgpt-give-hacker-persistent-exfiltration-channel/ [arstechnica.com]
When security researcher Johann Rehberger recently reported a vulnerability in ChatGPT that allowed attackers to store false information and malicious instructions in a user’s long-term memory settings, OpenAI summarily closed the inquiry, labeling the flaw a safety issue, not, technically speaking, a security concern.
So Rehberger did what all good researchers do: He created a proof-of-concept exploit that used the vulnerability to exfiltrate all user input in perpetuity. OpenAI engineers took notice and issued a partial fix earlier this month.
The vulnerability abused long-term conversation memory, a feature OpenAI began testing in February [arstechnica.com] and made more broadly available in September [openai.com].
[...]
Within three months of the rollout, Rehberger found [embracethered.com] that memories could be created and permanently stored through indirect prompt injection [arstechnica.com], an AI exploit that causes an LLM to follow instructions from untrusted content such as emails, blog posts, or documents. The researcher demonstrated how he could trick ChatGPT into believing a targeted user was 102 years old, lived in the Matrix, and insisted Earth was flat and the LLM would incorporate that information to steer all future conversations.
[...]
The attack isn’t possible through the ChatGPT web interface, thanks to an API OpenAI rolled out last year [embracethered.com].
[...]
OpenAI provides guidance here [openai.com] for managing the memory tool and specific memories stored in it. Company representatives didn’t respond to an email asking about its efforts to prevent other hacks that plant false memories.