papers AI Learner
The Github is limit! Click to go to the new site.

The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering

2016-09-21
Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada

Abstract

Visual Question Answering (VQA) task has showcased a new stage of interaction between language and vision, two of the most pivotal components of artificial intelligence. However, it has mostly focused on generating short and repetitive answers, mostly single words, which fall short of rich linguistic capabilities of humans. We introduce Full-Sentence Visual Question Answering (FSVQA) dataset, consisting of nearly 1 million pairs of questions and full-sentence answers for images, built by applying a number of rule-based natural language processing techniques to original VQA dataset and captions in the MS COCO dataset. This poses many additional complexities to conventional VQA task, and we provide a baseline for approaching and evaluating the task, on top of which we invite the research community to build further improvements.

Abstract (translated by Google)

视觉问答(Visual Question Answering,简称VQA)任务展示了语言和视觉之间的一个新的互动阶段,这是人工智能中最关键的两个组成部分。然而,它主要集中在产生短而重复的答案上,大部分是单个单词,这些单词缺乏人类丰富的语言能力。我们引入了全句视觉问答(FSVQA)数据集,它包含近100万对图像的问题和全句回答,通过将一些基于规则的自然语言处理技术应用到原始的VQA数据集和标题中MS COCO数据集。这给传统的VQA任务带来了许多额外的复杂性,我们提供了一个接近和评估任务的基线,在此基础上,我们邀请研究界进一步改进。

URL

https://arxiv.org/abs/1609.06657

PDF

https://arxiv.org/pdf/1609.06657


Similar Posts

Comments