papers AI Learner
The Github is limit! Click to go to the new site.

Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources

2016-04-14
Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, Anton van den Hengel

Abstract

We propose a method for visual question answering which combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. This allows more complex questions to be answered using the predominant neural network-based approach than has previously been possible. It particularly allows questions to be asked about the contents of an image, even when the image itself does not contain the whole answer. The method constructs a textual representation of the semantic content of an image, and merges it with textual information sourced from a knowledge base, to develop a deeper understanding of the scene viewed. Priming a recurrent neural network with this combined information, and the submitted question, leads to a very flexible visual question answering approach. We are specifically able to answer questions posed in natural language, that refer to information not contained in the image. We demonstrate the effectiveness of our model on two publicly available datasets, Toronto COCO-QA and MS COCO-VQA and show that it produces the best reported results in both cases.

Abstract (translated by Google)

我们提出了一种视觉问答方法,它将图像内容的内部表示与从一般知识库中提取的信息相结合,以回答广泛的基于图像的问题。这允许使用主要的基于神经网络的方法回答比以前可能的更复杂的问题。特别是,即使图像本身不包含整个答案,也可以询问有关图像内容的问题。该方法构建图像语义内容的文本表示,并将其与来自知识库的文本信息进行合并,从而更深入地了解所观看的场景。用这个组合信息和提交的问题启动一个循环的神经网络,导致一个非常灵活的视觉问题解答方法。我们特别能够回答以自然语言提出的问题,即涉及图像中未包含的信息。我们在两个可公开获得的数据集(多伦多COCO-QA和MS COCO-VQA)上展示了我们模型的有效性,并证明它在两种情况下都能产生最好的报告结果。

URL

https://arxiv.org/abs/1511.06973

PDF

https://arxiv.org/pdf/1511.06973


Similar Posts

Comments