Abstract
The task of generating natural language descriptions from images has received a lot of attention in recent years. Consequently, it is becoming increasingly important to evaluate such image captioning approaches in an automatic manner. In this paper, we provide an in-depth evaluation of the existing image captioning metrics through a series of carefully designed experiments. Moreover, we explore the utilization of the recently proposed Word Mover’s Distance (WMD) document metric for the purpose of image captioning. Our findings outline the differences and/or similarities between metrics and their relative robustness by means of extensive correlation, accuracy and distraction based evaluations. Our results also demonstrate that WMD provides strong advantages over other metrics.
Abstract (translated by Google)
从图像生成自然语言描述的任务近年来受到了很多关注。因此,以自动方式评估这种图像字幕方法变得越来越重要。在本文中,我们通过一系列精心设计的实验,对现有的图像字幕指标进行了深入的评估。此外,我们探讨了最近提出的Word Mover’s Distance(WMD)文档度量的用法,用于图像字幕。我们的研究结果通过广泛的相关性,准确性和基于分心的评估,概述了指标之间的差异和/或相似性及其相对稳健性。我们的结果也表明,WMD提供了超越其他指标的强大优势。
URL
https://arxiv.org/abs/1612.07600