papers AI Learner
The Github is limit! Click to go to the new site.

Boosting Image Captioning with Attributes

2016-11-05
Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei

Abstract

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) - a novel architecture that integrates attributes into the successful Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs) image captioning framework, by training them in an end-to-end manner. To incorporate attributes, we construct variants of architectures by feeding image representations and attributes into RNNs in different ways to explore the mutual but also fuzzy relationship between them. Extensive experiments are conducted on COCO image captioning dataset and our framework achieves superior results when compared to state-of-the-art deep models. Most remarkably, we obtain METEOR/CIDEr-D of 25.2%/98.6% on testing data of widely used and publicly available splits in (Karpathy & Fei-Fei, 2015) when extracting image representations by GoogleNet and achieve to date top-1 performance on COCO captioning Leaderboard.

Abstract (translated by Google)

用自然语言自动描述图像一直是计算机视觉和自然语言处理这两个领域的一个新兴挑战。在本文中,我们提出了具有属性的长时间记忆(LSTM-A) - 一种将属性集成到成功的卷积神经网络(CNN)和递归神经网络(RNN)图像字幕框架中的新型架构,端到端的方式。为了结合属性,我们通过以不同的方式将图像表示和属性提供给RNN来构造变体结构,以探索它们之间的相互但也是模糊的关系。在COCO图像字幕数据集上进行了大量的实验,与国家最先进的深度模型相比,我们的框架取得了优异的结果。最显着的是,当我们通过GoogleNet提取图像表示,获得广泛使用和公开可用的分割(Karpathy&Fei-Fei,2015)的数据时,METEOR / CIDEr-D为25.2%/ 98.6%在COCO字幕排行榜上。

URL

https://arxiv.org/abs/1611.01646

PDF

https://arxiv.org/pdf/1611.01646


Similar Posts

Comments