Text-guided Attention Model for Image Captioning

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns to drive visual attention using associated captions. For this model, we propose an exemplar-based learning approach that retrieves from training data associated captions with each image, and use them to learn attention on visual features. Our attention model enables to describe a detailed state of scenes by distinguishing small or confusable objects effectively. We validate our model on MS-COCO Captioning benchmark and achieve the state-of-the-art performance in standard metrics.

Abstract (translated by Google)

视觉注意力对于理解图像起着重要的作用，并展示了其生成图像的自然语言描述的有效性。另一方面，最近的研究表明，在认知过程中，与图像相关的语言可以引导场景中的视觉注意力。受此启发，我们引入了文字引导的图像字幕的注意模式，通过使用相关字幕学习驾驶视觉注意力。对于这个模型，我们提出了一个基于范例的学习方法，从每个图像的训练数据相关联的标题中检索，并使用它们来学习关注视觉特征。我们的关注模型可以通过有效地区分小的或可混淆的对象来描述场景的详细状态。我们在MS-COCO字幕基准上验证了我们的模型，并在标准指标上达到了最先进的性能。

URL

https://arxiv.org/abs/1612.03557

PDF

https://arxiv.org/pdf/1612.03557

Text-guided Attention Model for Image Captioning

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments