Weakly Supervised Gaussian Networks for Action Detection

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

Detecting temporal extents of human actions in videos is a challenging computer vision problem that require detailed manual supervision including frame-level labels. This expensive annotation process limits deploying action detectors on a limited number of categories. We propose a novel action recognition method, called WSGN, that can learn to detect actions from “weak supervision”, video-level labels. WSGN learns to exploit both video-specific and dataset-wide statistics to predict relevance of each frame to an action category. We show that a combination of the local and global channels leads to significant gains in two standard benchmarks THUMOS14 and Charades. Our method improves more than 12% mAP over a weakly supervised baseline, outperforms other weakly supervised state-of-the-art methods and only 4% behind the state-of-the-art supervised method in THUMOS14 dataset for action detection. Similarly, our method is only 0.3% mAP behind a state-of-the-art supervised method on challenging Charades dataset for action localisation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07774

PDF

http://arxiv.org/pdf/1904.07774

Weakly Supervised Gaussian Networks for Action Detection

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments