Abstract
Visual segmentation is a key perceptual function that partitions visual space and allows for detection, recognition and discrimination of objects in complex environments. The processes underlying human segmentation of natural images are still poorly understood. In part, this is because we lack segmentation models consistent with experimental and theoretical knowledge in visual neuroscience. Biological sensory systems have been shown to approximate probabilistic inference to interpret their inputs. This requires a generative model that captures both the statistics of the sensory inputs and expectations about the causes of those inputs. Following this hypothesis, we propose a probabilistic generative model of visual segmentation that combines knowledge about 1) the sensitivity of neurons in the visual cortex to statistical regularities in natural images; and 2) the preference of humans to form contiguous partitions of visual space. We develop an efficient algorithm for training and inference based on expectation-maximization and validate it on synthetic data. Importantly, with the appropriate choice of the prior, we derive an intuitive closed–form update rule for assigning pixels to segments: at each iteration, the pixel assignment probabilities to segments is the sum of the evidence (i.e. local pixel statistics) and prior (i.e. the assignments of neighboring pixels) weighted by their relative uncertainty. The model performs competitively on natural images from the Berkeley Segmentation Dataset (BSD), and we illustrate how the likelihood and prior components improve segmentation relative to traditional mixture models. Furthermore, our model explains some variability across human subjects as reflecting local uncertainty about the number of segments. Our model thus provides a viable approach to probe human visual segmentation.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1806.00111