Unseen Action Recognition with Multimodal Learning

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

In this paper, we present a method to learn a joint multimodal representation space that allows for the recognition of unseen activities in videos. We compare the effect of placing various constraints on the embedding space using paired text and video data. Additionally, we propose a method to improve the joint embedding space using an adversarial formulation with unpaired text and video data. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that learning such shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.08251

PDF

https://arxiv.org/pdf/1806.08251

Unseen Action Recognition with Multimodal Learning

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments