Abstract
This paper entertains the hypothesis that the primary purpose of the cells of the primary visual cortex (V1) is to perceive motions and predict changes of local image contents. Specifically, we propose a model that couples the vector representations of local image contents with the matrix representations of local pixel displacements caused by the relative motions between the agent and the surrounding objects and scene. When the image changes from one time frame to the next due to pixel displacements, the vector at each pixel is multiplied by a matrix that represents the displacement of this pixel. We show that by learning from pair of images that are deformed versions of each other, we can learn both vector and matrix representations. The units in the learned vector representations resemble V1 cells. The learned vector-matrix representations enable prediction of image frames over time, and more importantly, inference of the local pixel displacements caused by relative motions.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1902.03871