papers AI Learner
The Github is limit! Click to go to the new site.

Unseen Action Recognition with Multimodal Learning

2018-10-01
AJ Piergiovanni, Michael S. Ryoo

Abstract

In this paper, we present a method to learn a joint multimodal representation space that allows for the recognition of unseen activities in videos. We compare the effect of placing various constraints on the embedding space using paired text and video data. Additionally, we propose a method to improve the joint embedding space using an adversarial formulation with unpaired text and video data. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that learning such shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning.

Abstract (translated by Google)
URL

https://arxiv.org/abs/1806.08251

PDF

https://arxiv.org/pdf/1806.08251


Similar Posts

Comments