Abstract
We introduce a new method for category-level pose estimation which produces a distribution over predicted poses by integrating 3D shape estimates from a generative object model with segmentation information. Given an input depth-image of an object, our variable-time method uses a mixture density network architecture to produce a multi-modal distribution over 3DOF poses; this distribution is then combined with a prior probability encouraging silhouette agreement between the observed input and predicted object pose. Our approach significantly outperforms the current state-of-the-art in category-level 3DOF pose estimation—which outputs a point estimate and does not explicitly incorporate shape and segmentation information—as measured on the Pix3D and ShapeNet datasets.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1905.12079