papers AI Learner
The Github is limit! Click to go to the new site.

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

2016-12-22
Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut

Abstract

We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options. Accomplishing the task entails demonstrating comprehension beyond just recognizing “keywords” (or key-phrases) and their corresponding visual concepts. Instead, it requires an alignment between the representations of the two modalities that achieves a visually-grounded “understanding” of various linguistic elements and their dependencies. This new task also admits an easy-to-compute and well-studied metric: the accuracy in detecting the true target among the decoys. The paper makes several contributions: an effective and extensible mechanism for generating decoys from (human-created) image captions; an instance of applying this mechanism, yielding a large-scale machine comprehension dataset (based on the COCO images and captions) that we make publicly available; human evaluation results on this dataset, informing a performance upper-bound; and several baseline and competitive learning approaches that illustrate the utility of the proposed task and dataset in advancing both image and language comprehension. We also show that, in a multi-task learning setting, the performance on the proposed task is positively correlated with the end-to-end task of image captioning.

Abstract (translated by Google)

我们为计算机系统引入了一个新的多模式任务,这个任务构成了一个视觉语言理解的综合挑战:给出几个相似的选项,找出描述场景的最合适的文本。完成任务需要展示理解力,而不仅仅是识别“关键字”(或关键短语)及其相应的视觉概念。相反,它需要两种形式的表征之间的一致性,从而实现对各种语言元素及其依赖性的基于视觉的“理解”。这个新的任务也承认了一个易于计算和精心研究的度量标准:检测诱饵中真实目标的准确性。本文做出了一些贡献:从(人造)图像字幕生成诱饵的有效和可扩展的机制;应用这种机制的实例,产生我们公开可用的大规模机器理解数据集(基于COCO图像和标题);对这个数据集的人类评估结果,通知性能上限;以及一些基线和竞争性的学习方法,说明所提议的任务和数据集在推进图像和语言理解方面的实用性。我们还表明,在多任务学习环境中,所提议的任务的表现与图像字幕的端到端任务正相关。

URL

https://arxiv.org/abs/1612.07833

PDF

https://arxiv.org/pdf/1612.07833


Similar Posts

Comments