Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

In this paper, a novel two-stream architecture for the task of temporal action proposal generation in long, untrimmed videos is presented. Inspired by the recent advances in the field of human action recognition utilizing 3D convolutions in combination with two-stream networks and based on the Single-Stream Temporal Action Proposals (SST) architecture, four different two-stream architectures utilizing sequences of images on one stream and images of optical flow on the other stream are subsequently investigated. The four architectures fuse the two separate streams at different depths in the model; for each of them, a broad range of parameters is investigated systematically as well as an optimal parametrization is empirically determined. The experiments on action and sports datasets show that all four two-stream architectures are able to outperform the original single-stream SST and achieve state of the art results. Additional experiments revealed that the improvements are not restricted to a single method of calculating optical flow by exchanging the formerly used method of Brox with FlowNet2 and still achieving improvements.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04176

PDF

https://arxiv.org/pdf/1903.04176

Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments