Abstract
We predict future video frames from complex dynamic scenes, using an invertible neural network as the encoder of a nonlinear dynamic system with latent linear state evolution. Our invertible linear embedding (ILE) demonstrates successful learning, prediction and latent state inference. In contrast to other approaches, ILE does not use any explicit reconstruction loss or simplistic pixel-space assumptions. Instead, it leverages invertibility to optimize the likelihood of image sequences exactly, albeit indirectly. Comparison with a state-of-the-art method demonstrates the viability of our approach.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1903.00133