from the fool-me-once dept.
Submitted via IRC for Bytram
It's very difficult, if not impossible, for us humans to understand how robots see the world. Their cameras work like our eyes do, but the space between the image that a camera captures and actionable information about that image is filled with a black box of machine learning algorithms that are trying to translate patterns of features into something that they're familiar with. Training these algorithms usually involves showing them a set of different pictures of something (like a stop sign), and then seeing if they can extract enough common features from those pictures to reliably identify stop signs that aren't in their training set.
This works pretty well, but the common features that machine learning algorithms come up with generally are not "red octagons with the letters S-T-O-P on them." Rather, they're looking [at] features that all stop signs share, but would not be in the least bit comprehensible to a human looking at them. If this seems hard to visualize, that's because it reflects a fundamental disconnect between the way our brains and artificial neural networks interpret the world.
The upshot here is that slight alterations to an image that are invisible to humans can result in wildly different (and sometimes bizarre) interpretations from a machine learning algorithm. These "adversarial images" have generally required relatively complex analysis and image manipulation, but a group of researchers from the University of Washington, the University of Michigan, Stony Brook University, and the University of California Berkeley have just published a paper showing that it's also possible to trick visual classification algorithms by making slight alterations in the physical world. A little bit of spray paint or some stickers on a stop sign were able to fool a deep neural network-based classifier into thinking it was looking at a speed limit sign 100 percent of the time.
OpenAI has a captivating and somewhat frightening background article: Attacking Machine Learning with Adversarial Examples.
This demonstration from the cybersecurity firm McAfee is the latest indication that adversarial machine learning can potentially wreck autonomous driving systems, presenting a security challenge to those hoping to commercialize the technology.
Mobileye EyeQ3 camera systems read speed limit signs and feed that information into autonomous driving features like Tesla's automatic cruise control, said Steve Povolny and Shivangee Trivedi from McAfee's Advanced Threat Research team.
The researchers stuck a tiny and nearly imperceptible sticker on a speed limit sign. The camera read the sign as 85 instead of 35, and in testing, both the 2016 Tesla Model X and that year's Model S sped up 50 miles per hour.
This is the latest in an increasing mountain of research showing how machine-learning systems can be attacked and fooled in life-threatening situations.
[...] Tesla has since moved to proprietary cameras on newer models, and Mobileye EyeQ3 has released several new versions of its cameras that in preliminary testing were not susceptible to this exact attack.
There are still a sizable number of Tesla cars operating with the vulnerable hardware, Povolny said. He pointed out that Teslas with the first version of hardware cannot be upgraded to newer hardware.
"What we're trying to do is we're really trying to raise awareness for both consumers and vendors of the types of flaws that are possible," Povolny said "We are not trying to spread fear and say that if you drive this car, it will accelerate into through a barrier, or to sensationalize it."
So, it seems this is not so much that a particular adversarial attack was successful (and fixed), but that it was but one instance of a potentially huge set. Obligatory xkcd.