Abstract
As Deep Neural Networks (DNNs) have demonstrated superhuman performance in many computer vision tasks, there is an increasing interest in revealing the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective that precisely separates the positive and negative attributions. By identifying the fundamental causes of activation and the proper inversion of relevance, RAP allows each neuron to be assigned an actual contribution to the output. Furthermore, we devise pragmatic methods to handle the effect of bias and batch normalization properly in the attributing procedures. Therefore, our method makes it possible to interpret various kinds of very deep neural network models with clear and attentive visualizations of positive and negative attributions. By utilizing the region perturbation method and comparing the distribution of attributions for a quantitative evaluation, we verify the correctness of our RAP whether the positive and negative attributions correctly account for each meaning. The positive and negative attributions propagated by RAP show the characteristics of vulnerability and robustness to the distortion of the corresponding pixels, respectively. We apply RAP to DNN models; VGG-16, ResNet-50 and Inception-V3, demonstrating its generation of more intuitive and improved interpretation compared to the existing attribution methods.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.00605