papers AI Learner
The Github is limit! Click to go to the new site.

Textually Enriched Neural Module Networks for Visual Question Answering

2018-09-23
Khyathi Raghavi Chandu, Mary Arpita Pyreddy, Matthieu Felix, Narendra Nath Joshi

Abstract

Problems at the intersection of language and vision, like visual question answering, have recently been gaining a lot of attention in the field of multi-modal machine learning as computer vision research moves beyond traditional recognition tasks. There has been recent success in visual question answering using deep neural network models which use the linguistic structure of the questions to dynamically instantiate network layouts. In the process of converting the question to a network layout, the question is simplified, which results in loss of information in the model. In this paper, we enrich the image information with textual data using image captions and external knowledge bases to generate more coherent answers. We achieve 57.1% overall accuracy on the test-dev open-ended questions from the visual question answering (VQA 1.0) real image dataset.

Abstract (translated by Google)
URL

https://arxiv.org/abs/1809.08697

PDF

https://arxiv.org/pdf/1809.08697


Similar Posts

Comments