Generating captions without looking beyond objects

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

This paper explores new evaluation perspectives for image captioning and introduces a noun translation task that achieves comparative image caption generation performance by translating from a set of nouns to captions. This implies that in image captioning, all word categories other than nouns can be evoked by a powerful language model without sacrificing performance on n-gram precision. The paper also investigates lower and upper bounds of how much individual word categories in the captions contribute to the final BLEU score. A large possible improvement exists for nouns, verbs, and prepositions.

Abstract (translated by Google)

本文探讨了图像字幕的新的评价视角，并引入了一个名词翻译任务，通过从一组名词到字幕的翻译，实现了比较图像字幕的生成性能。这意味着在图像字幕中，除了名词以外的所有字类都可以通过强大的语言模型来唤起，而不牺牲n-gram精度的性能。本文还调查了字幕中单个字词类别对最终BLEU分数的贡献的上限和下限。名词，动词和介词有很大的改进。

URL

https://arxiv.org/abs/1610.03708

PDF

https://arxiv.org/pdf/1610.03708

Generating captions without looking beyond objects

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments