Researchers induced bots to ignore their safeguards without exception [ieee.org]:
AI chatbots such as ChatGPT [ieee.org] and other applications powered by large language models [ieee.org] (LLMs) have exploded in popularity, leading a number of companies to explore LLM-driven robots. However, a new study now reveals an automated way to hack into such machines with 100 percent success. By circumventing safety guardrails, researchers could manipulate self-driving systems into colliding with pedestrians and robot dogs into hunting for harmful places to detonate bombs.
[...] The extraordinary ability of LLMs to process text has spurred a number of companies to use the AI systems to help control robots through voice commands, translating prompts from users into code the robots can run. For instance, Boston Dynamics’ [ieee.org] robot dog Spot [ieee.org], now integrated with OpenAI’s [ieee.org] ChatGPT [ieee.org], can act as a tour guide [bostondynamics.com]. Figure’s [figure.ai] humanoid robots [ieee.org] and Unitree’s [unitree.com] Go2 robot dog are similarly equipped with ChatGPT.
However, a group of scientists has recently identified a host of security vulnerabilities for LLMs. So-called jailbreaking attacks [ieee.org] discover ways to develop prompts that can bypass LLM safeguards and fool the AI systems into generating unwanted content [ieee.org], such as instructions for building bombs [nytimes.com], recipes for synthesizing illegal drugs [wired.com], and guides for defrauding charities [upenn.edu].
Previous research into LLM jailbreaking attacks was largely confined to chatbots. Jailbreaking a robot could prove “far more alarming,” says Hamed Hassani [upenn.edu], an associate professor of electrical and systems engineering at the University of Pennsylvania. For instance, one YouTuber showed that he could get the Thermonator [throwflame.com] robot dog from Throwflame, which is built on a Go2 platform [ieee.org] and is equipped with a flamethrower, to shoot flames at him [youtube.com] with a voice command.
Now, the same group of scientists have developed RoboPAIR [robopair.org], an algorithm designed to attack any LLM-controlled robot. In experiments with three different robotic systems—the Go2; the wheeled ChatGPT-powered Clearpath Robotics Jackal [clearpathrobotics.com]; and Nvidia‘s [ieee.org] open-source Dolphins LLM [github.io] self-driving vehicle simulator. They found that RoboPAIR needed just days to achieve a 100 percent jailbreak rate against all three systems.
“Jailbreaking AI-controlled robots isn’t just possible—it’s alarmingly easy,” says Alexander Robey [github.io], currently a postdoctoral researcher at Carnegie Mellon University in Pittsburgh.
Originally spotted on Schneier on Security [schneier.com].