papers AI Learner
The Github is limit! Click to go to the new site.

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

2016-07-28
Arun Mallya, Svetlana Lazebnik

Abstract

This paper proposes deep convolutional network models that utilize local and global context to make human activity label predictions in still images, achieving state-of-the-art performance on two recent datasets with hundreds of labels each. We use multiple instance learning to handle the lack of supervision on the level of individual person instances, and weighted loss to handle unbalanced training data. Further, we show how specialized features trained on these datasets can be used to improve accuracy on the Visual Question Answering (VQA) task, in the form of multiple choice fill-in-the-blank questions (Visual Madlibs). Specifically, we tackle two types of questions on person activity and person-object relationship and show improvements over generic features trained on the ImageNet classification task.

Abstract (translated by Google)

本文提出了深度卷积网络模型,利用局部和全局的背景,在静止图像中进行人类活动标签预测,在两个最近的数据集上获得了最新的性能,每个数据集都有数百个标签。我们使用多实例学习来处理个人实例水平的缺乏监督,加权损失处理不平衡的训练数据。此外,我们还展示了如何使用这些数据集训练的专业特性来提高视觉问题回答(VQA)任务的准确性,以多选填空题(Visual Madlibs)的形式提供。具体来说,我们解决了两类人员活动和人 - 对象关系问题,并对ImageNet分类任务上训练的泛型特征进行了改进。

URL

https://arxiv.org/abs/1604.04808

PDF

https://arxiv.org/pdf/1604.04808


Similar Posts

Comments