papers AI Learner
The Github is limit! Click to go to the new site.

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

2016-11-24
Mateusz Malinowski, Marcus Rohrbach, Mario Fritz

Abstract

We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Ask Your Neurons, a scalable, jointly trained, end-to-end formulation to this problem. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language inputs (image and question). We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extend the original DAQUAR dataset to DAQUAR-Consensus. Moreover, we also extend our analysis to VQA, a large-scale question answering about images dataset, where we investigate some particular design choices and show the importance of stronger visual models. At the same time, we achieve strong performance of our model that still uses a global image representation. Finally, based on such analysis, we refine our Ask Your Neurons on DAQUAR, which also leads to a better performance on this challenging task.

Abstract (translated by Google)

我们针对设置为视觉图灵测试的真实世界图像提出了一个问题回答任务。通过结合图像表示和自然语言处理方面的最新进展,我们提出了问你的神经元,一个可扩展的,联合训练,端到端的制定这个问题。与以前的努力相反,我们正面临着语言输出(答案)以视觉和自然语言输入(形象和问题)为条件的多模式问题。我们通过分析只有在我们提供新的人类基线的语言部分中包含多少信息来提供对该问题的更多见解。为了研究与这个具有挑战性的任务固有的模糊性相关的人类共识,我们提出了两个新的度量标准并收集了将原始DAQUAR数据集扩展到DAQUAR共识的附加答案。此外,我们还将我们的分析扩展到VQA,这是一个关于图像数据集的大型问题,我们在这里研究一些特定的设计选择,并展示更强大的视觉模型的重要性。同时,我们实现了仍然使用全局图像表示的模型的强大性能。最后,基于这样的分析,我们对DAQUAR的Ask Your Neurons进行了细化,这也带来了更好的性能。

URL

https://arxiv.org/abs/1605.02697

PDF

https://arxiv.org/pdf/1605.02697


Similar Posts

Comments