At the RSA Conference in San Francisco, Google demonstrated how machine learning algorithms that use neural networks may be fooled to give erroneous results. By targeting these algorithms with inputs specifically tuned to trick them, hackers could seize upon yet another way to compromise computer systems. The attacks Google demonstrated underscore how brittle some of these machine learning algorithms can be.
Many machine learning algorithms use artificial neural networks to match new input data to an existing database of experiences. This mimics the way we humans tend to interpret our world: we take in our environment through our senses, and we match the visual, audio, tactile, and olfactory signal patterns we experience against patterns we have gathered before. This helps us categorize our new experiences in terms of our previous ones and make judgments based on how they match.
Neural networks attempt to do the same for machines. Let’s consider a simple neural network designed to classify images. The pixels that comprise the image – each square millimeter, for example – have a distinct color. Each of these colors becomes an input into our neural network. So, if an image has 100,000 pixels, then our neural network has 100,000 inputs. These inputs are fed through a network of nodes and branches that route them to a smaller set of outputs. The outputs could be buckets that represent types of things. For example, one output could be “cat”, another “dog”, another “tangerine”. Each of these are kinds of objects we want our neural network to be able to identify from the input picture.
A neural network must be trained before it can actually work. To train a neural network, you feed it hundreds of images. You, the trainer, know what each image actually represents – a cat, a dog, a tangerine. The first time you ask the neural network, it will likely classify the image incorrectly. After all, it’s completely new to this and has no prior learning to lean on. After it guesses, you tell it which was the correct image, and it then will incrementally tune itself to try to narrow the gap between its guess and the correct answer. It tunes itself by adjusting the weights associated with each of the branches that connect the nodes of the network. The weight associated with each branch determines how much the value of one node influences the value of a neighboring node. If the branch’s weight is one, then the node and its neighbor tend to change in unison. At the other extreme, if the branch’s weight is zero, then the node and its neighbor have little to do with each other. By repeatedly inputting images and applying corrections, the weights of all the branches are adjusted to minimize the difference between what the network guesses and the correct answer in each case. In this way, the neural network eventually becomes an adept classifier.
Google demonstrated that these networks respond overly sensitively to minute changes in the inputs. For example, by adjusting just a few dots in a picture of a cat, they were able to fool an otherwise effective image classifier into concluding that the picture actually showed a bowl of guacamole. They were able to play similar tricks on an audio classifier, convincing its neural network to believe a classical music piece was actually a reading of passage from a novel. A human would be able to discard these wild guesses as ridiculous based on context. An artificial neural network, however, doesn’t consider context easily, and so it can be easy to trick.
The kinds of attacks Google demonstrated implement a form of active steganography. Steganography is usually a passive technology. It hides secret data imperceptibly within a larger, more complicated, and completely unrelated work. For example, by adding the data for a textual message to the bytes representing the colors of each pixel in a picture, a steganographer can hide the data in the picture without noticeably changing its appearance, because human eyes lack the sensitivity to detect relatively small changes to colors scattered throughout the image. If the recipient knows there is a message hidden in the color data of the picture, she will be able to extract it by reversing the process used to hide it in the first place. What Google demonstrated, however, is that it is possible to alter a picture or an audio signal in a way that a human will not be able to detect but that will fundamentally alter how the machine interprets it. One scary example the article describes is tricking the vision system of a self-driving car into believing that stop sign ahead is actually a speed limit sign instead.
With the emergence of ubiquitous data, advances in machine learning have accelerated greatly over the past five years. As with all quickly developing technologies, we must carefully consider how the technology can be exploited through cyberattack before deploying it on a grand scale, particularly for applications where human safety and well-being are at stake. Google’s demonstrations this week illustrate how far from prime time some of these algorithms are. Although commercial interests will beckon us make haste, these sobering trials should have us pumping the brakes, at least until we can figure out how to convince machines that cats and guacamole are two completely different things.