papers AI Learner
The Github is limit! Click to go to the new site.

Guiding Long-Short Term Memory for Image Caption Generation

2015-09-16
Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars

Abstract

In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search in order to prevent from favoring short sentences. On various benchmark datasets such as Flickr8K, Flickr30K and MS COCO, we obtain results that are on par with or even outperform the current state-of-the-art.

Abstract (translated by Google)

在这项工作中,我们专注于图像标题生成的问题。我们提出了长期短期记忆(LSTM)模型的扩展,我们简称gLSTM。特别地,我们将从图像中提取的语义信息作为额外的输入添加到LSTM块的每个单元中,目的是将模型引导到与图像内容更紧密耦合的解决方案。此外,我们探索不同长度的波束搜索的规范化策略,以防止短句的偏爱。在各种基准数据集(如Flickr8K,Flickr30K和MS COCO)上,我们获得的结果与目前的最新技术相比甚至超越了目前的水平。

URL

https://arxiv.org/abs/1509.04942

PDF

https://arxiv.org/pdf/1509.04942


Similar Posts

Comments