Abstract
3D scene understanding from images is a challenging problem which is encountered in robotics, augmented reality and autonomous driving scenarios. In this paper, we propose a novel approach to jointly infer the 3D rigid-body poses and shapes of vehicles from stereo images of road scenes. Unlike previous work that relies on geometric alignment of shapes with dense stereo reconstructions, our approach works directly on images and infers shape and pose efficiently through combined photometric and silhouette alignment of 3D shape priors with a stereo image. We use a shape prior that represents cars in a low-dimensional linear embedding of volumetric signed distance functions. For efficiently measuring the consistency with both alignment terms, we propose an adaptive sparse point selection scheme. In experiments, we demonstrate superior performance of our method in pose estimation and shape reconstruction over a state-of-the-art approach that uses geometric alignment with dense stereo reconstructions. Our approach can also boost the performance of deep-learning based approaches to 3D object detection as a refinement method. We demonstrate that our method significantly improves accuracy for several recent detection approaches.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.10097