Abstract
To detect salient objects accurately, existing methods usually design complex backbone network architectures to learn and fuse powerful features. However, the saliency inference module that performs saliency prediction from the fused features receives much less attention on its architecture design and typically adopts only a few fully convolutional layers. In this paper, we find the limited capacity of the saliency inference module indeed makes a fundamental performance bottleneck, and enhancing its capacity is critical for obtaining better saliency prediction. Correspondingly, we propose a deep yet light-weight saliency inference module that adopts a multi-dilated depth-wise convolution architecture. Such a deep inference module, though with simple architecture, can directly perform reasoning about salient objects from the multi-scale convolutional features fast, and give superior salient object detection performance with less computational cost. To our best knowledge, we are the first to reveal the importance of the inference module for salient object detection, and present a novel architecture design with attractive efficiency and accuracy. Extensive experimental evaluations demonstrate that our simple framework performs favorably compared with the state-of-the-art methods with complex backbone design.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1901.08362