Abstract
Despite their success, convolutional neural networks are computationally expensive because they must examine all image locations. Stochastic attention-based models have been shown to improve computational efficiency at test time, but they remain difficult to train because of intractable posterior inference and high variance in the stochastic gradient estimates. Borrowing techniques from the literature on training deep generative models, we present the Wake-Sleep Recurrent Attention Model, a method for training stochastic attention networks which improves posterior inference and which reduces the variability in the stochastic gradients. We show that our method can greatly speed up the training time for stochastic attention networks in the domains of image classification and caption generation.
Abstract (translated by Google)
尽管他们成功了,卷积神经网络在计算上是昂贵的,因为他们必须检查所有的图像位置。已经显示随机基于注意的模型在测试时间提高了计算效率,但是由于难以处理的后验推断和随机梯度估计的高方差,它们仍然难以训练。借助文献中关于训练深度生成模型的技巧,我们提出了唤醒 - 睡眠复发注意模型,这是一种训练随机注意网络的方法,它可以改善后验推断,并降低随机梯度的变异性。我们表明,我们的方法可以大大加快随机关注网络在图像分类和字幕生成领域的训练时间。
URL
https://arxiv.org/abs/1509.06812