Neural networks ‘disentangled’ for computer vision without the black box

Researchers from Duke University have trained a deep neural network to share its understanding of concepts, shedding light on how it processes visual information.

 Deep neural networks are loosely modelled on real brains, with layers of interconnected “neurons” which respond to features in the input data.

For image recognition, input data is processed by the first layer, which passes that information as the input to the next layer, and so on, until eventually it arrives at a determination of what is in the input image.

However, there is a widely-acknowledged problem in that it is extremely difficult for even the engineers who build these models to understand what happens between input and output: the black box problem. This is an issue when it comes to troubleshooting the networks, or understanding whether they are trustworthy and fair.

“We can input, say, a medical image and observe what comes out the other end – ‘This is a picture of a malignant lesion’ – but it’s hard to know what happened in between,” said Professor Cynthia Rudin, a computer science expert at Duke University.

Most approaches to uncover the workings of computer vision systems focus on the key features or pixels which led to an image to be identified. However, this does not reveal the reasoning of the neural network.

Rudin and her colleagues have developed an alternative method for addressing the black box problem. Rather than attempting to comprehend the reasoning of the network on a post hoc basis, they trained the network to show its work by expressing its “understanding” of concepts through the process. For instance, if given an image of a library, the approach makes it possible to determine whether the layers of the network relied on the representation of “books” to identify it.

This reveals how much the network calls on different concepts to help comprehend an image. “It disentangles how different concepts are represented within the layers of the network,” said Rudin.

The adjustment involves replacing one standard part of a neural network with a new part, which constrains a single neuron to fire in response to a particular concept that makes sense to humans, e.g. an object or a descriptor. Having just one neuron control the information about one concept at a time makes it far easier to understand the hidden processes within the network.

The researchers tested their approach on a neural network trained with millions of labelled images to recognise various kinds of indoor and outdoor scenes, then used it to identify new images while looking at which concepts the network layers drew on as they processed the data. As the information travels through successive layers, the network relies on increasingly sophisticated representations of each concept.

They found that this adjustment makes it possible to identify objects and scenes in images just as accurately as the original network, while gaining substantial understanding about its reasoning process.

The module can be added to any neural network used for image recognition. In one experiment, they incorporated it into a neural network trained to detect skin cancer from photographs and found that the network had summoned a concept of “irregular borders” without any guidance from the training labels.

“Our method revealed a shortcoming in the dataset,” said Rudin, pondering that perhaps if they had included this information in the data, it would have made it clearer whether the model was performing so accurately.

“This example just illustrates why we shouldn’t put blind faith in ‘black box’ models with no clue of what goes on inside them, especially for tricky medical diagnoses.”