Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

Activity recognition in shopping environments is an important and challenging computer vision task. We introduce a framework for integrating human body pose and object motion to both temporally detect and classify the activities in a fine-grained manner (very short and similar activities). We achieve this by proposing a multi-stream recurrent convolutional neural network architecture guided by the spatiotemporal \emph{attention} mechanism for both activity recognition and detection. To this end, in the absence of accurate pose supervision, we incorporate generative adversarial networks (GANs) to generate candidate body joints. Additionally, based on the intuition that complex actions demand more than one source of information to be precisely identified even by humans, we integrate the second stream of the object motion to our network that acts as a prior knowledge which we quantitatively show improves the results. Furthermore, we empirically show the capabilities of our approach by achieving state-of-the-art results on MERL shopping dataset. Finally, we further investigate the effectiveness of this approach on a new shopping dataset that we have collected to address existing shortcomings in this area including but not limited to lack of training data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.04430

PDF

http://arxiv.org/pdf/1905.04430

Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments