Google DeepDream – turning a neural network inside out
Modern convolutional neural networks, like the one we built in the previous post are trained to detect faces or other objects in images. With multiple layers in a neural network, it can be difficult to understand how exactly the neurons in the network respond and activate to patterns in an image.
One way to understand the emergent structure of a trained deep neural net is to run it in reverse. Instead of adjusting the weights of the neurons in each training iteration, the image itself can be adjusted so that an output neuron gives a higher confidence score. Put simply, a neural net trained to recognise dogs will alter the image to appear more ‘dog-like’ when run in reverse. The technique was popularised by the DeepDream program, created by Google engineer Alexander Mordvintsev.
Let’s have a look at some examples of images generated by the DeepDream technique:
The TensorFlow documentation provides code that allows us to create new DeepDream images from convolutional neural networks. The network we’ll look at ‘reversing’ is GoogleLeNet architecture, trained on the ImageNet Large Scale Visual Recognition Challenge 2014 (ILSVRC2014) data set.
The network above is particularly interesting, as it was trained to classify images into one of 1,000 possible categories. With DeepDream images, it becomes possible to visualise the features that each convolutional layer is attempting to detect from the image.
The GoogleLeNet ‘Inception’ network can be downloaded here. The first step to replicating the DeepDream algorithm is to create a TensorFlow session and load the network. GoogLeNet has 59 convolutional layers and 7,548 feature channels, which gives us a lot of potential patterns that can be ‘dreamed’ onto an image. The Jupyter notebook used to generate the images below can be downloaded from my Github.
Deep neural networks are so effective at classification because of their ability to interpret images both by their low and high level features. From the Google AI Blog:
We know that after training, each layer progressively extracts higher and higher-level features of the image, until the final layer essentially makes a decision on what the image shows. For example, the first layer maybe looks for edges or corners. Intermediate layers interpret the basic features to look for overall shapes or components, like a door or a leaf. The final few layers assemble those into complete interpretations—these neurons activate in response to very complex things such as entire buildings or trees.
The same low and high level features should show when we project those low and high level convolutional layers on an image. Let’s run it on a picture of me at Arches National Park in Utah.
We’ll start by visualising the lowest level convolutions.
Projecting the first layers onto the image show that the network appears to be looking for simple features like horizontal lines, that might indicate an animal’s fur or the lines of a building.
As we run layers deeper in the network, new geometric shapes emerge. These basic features are likely for detecting edges and their orientations.
The higher level layers begin to produce entire objects. This is the key idea of DeepDream – if the algorithm detects something within the image that looks similar to a dog, the image is altered to make that part of the image look even more like a dog.
The visuals produced by DeepDream can be aesthetically pleasing, but the process also has more practical applications. Applying the technique can give us better understanding of how a deep neural network is abstracting an image to perform complex classification tasks.