Abstract
Action anticipation and forecasting in videos do not require a hat-trick, as far as there are signs in the context to foresee how actions are going to be deployed. Capturing these signs is hard because the context includes the past. We propose an end-to-end network for action anticipation and forecasting with memory, to both anticipate the current action and foresee the next one. Experiments on action sequence datasets show excellent results indicating that training on histories with a dynamic memory can significantly improve forecasting performance.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1901.03728