papers AI Learner
The Github is limit! Click to go to the new site.

Open-Ended Visual Question-Answering

2016-10-09
Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto

Abstract

This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.

Abstract (translated by Google)

本论文报告了使用深度学习框架来解决视觉问答(VQA)任务的研究方法。作为一个初步的步骤,我们探索在自然语言处理(NLP)中使用的长期短期记忆(LSTM)网络来处理问题回答(基于文本)。然后,我们修改以前的模型,接受一个图像作为输入,除了这个问题。为此,我们探索了VGG-16和K-CNN卷积神经网络从图像中提取视觉特征。这些与嵌入词或嵌入问题的句子合并以预测答案。这项工作已经成功提交到2016年视觉问答应答挑战赛中,在测试数据集中达到了53.62%的准确性。开发的软件遵循最好的编程实践和Python代码风格,为Keras提供不同配置的一致基线。

URL

https://arxiv.org/abs/1610.02692

PDF

https://arxiv.org/pdf/1610.02692


Similar Posts

Comments