Abstract
Visual saliency patterns are the result of a variety of factors aside from the image being parsed, however existing approaches have ignored these. To address this limitation, we propose a novel saliency estimation model which leverages the semantic modelling power of conditional generative adversarial networks together with memory architectures which capture the subject’s behavioural patterns and task dependent factors. We make contributions aiming to bridge the gap between bottom-up feature learning capabilities in modern deep learning architectures and traditional top-down hand-crafted features based methods for task specific saliency modelling. The conditional nature of the proposed framework enables us to learn contextual semantics and relationships among different tasks together, instead of learning them separately for each task. Our studies not only shed light on a novel application area for generative adversarial networks, but also emphasise the importance of task specific saliency modelling and demonstrate the plausibility of fully capturing this context via an augmented memory architecture.
Abstract (translated by Google)
URL
https://arxiv.org/abs/1803.03354