papers AI Learner
The Github is limit! Click to go to the new site.

Image Representations and New Domains in Neural Image Captioning

2015-08-09
Jack Hessel, Nicolas Savva, Michael J. Wilber

Abstract

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a state-of-the-art neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. We replicate this result in a new, fine-grained, transfer learned captioning domain, consisting of 66K recipe image/title pairs. We also provide some experiments regarding the appropriateness of datasets for automatic captioning, and find that having multiple captions per image is beneficial, but not an absolute requirement.

Abstract (translated by Google)

我们研究了最近在自动标题生成方面有希望的结果主要是由于语言模型的可能性。通过改变由卷积神经网络产生的图像表示质量,我们发现即使在提供令人惊讶的图像表示的情况下,最先进的神经字幕算法也能够产生高质量的字幕。我们将这个结果复制到一个新的,精细的转移学习字幕领域,由66K配方图像/标题对组成。我们还提供了一些关于数据集适合自动字幕的实验,并发现每个图像有多个字幕是有益的,但不是绝对的要求。

URL

https://arxiv.org/abs/1508.02091

PDF

https://arxiv.org/pdf/1508.02091


Similar Posts

Comments