papers AI Learner
The Github is limit! Click to go to the new site.

Generating Natural Questions About an Image

2016-06-09
Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He, Lucy Vanderwende

Abstract

There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.

Abstract (translated by Google)

在过去的几年中,视觉和语言界的工作从图像字幕到视频转录,以及回答关于图像的问题已经爆发。这些任务集中在图像的文字描述上。为了超越文字,我们选择探索图像的问题是如何针对常识推理和图像中的物体引起的抽象事件。在本文中,我们介绍视觉问题生成(VQG)的新任务,系统的任务是在图像显示时询问一个自然而有趣的问题。我们提供了三个数据集,涵盖从以对象为中心到以事件为中心的各种图像,其中抽象的训练数据比迄今为止最先进的字幕系统提供的数据多得多。我们训练和测试几个生成和检索模型来解决VQG的任务。评估结果表明,虽然这些模型对各种图像提出了合理的问题,但与人类的表现仍有很大的差距,这促使人们将图像与常识性知识和语用学联系在一起。我们提出的任务为社区提出了一个新的挑战,我们希望能够进一步探索探索视觉和语言之间更深层次的联系。

URL

https://arxiv.org/abs/1603.06059

PDF

https://arxiv.org/pdf/1603.06059


Similar Posts

Comments