Abstract
Structured representations such as scene graphs serve as an efficient and compact representation that can be used for downstream rendering or retrieval tasks. However, existing efforts to generate realistic images from scene graphs perform poorly on scene composition for cluttered or complex scenes. We propose two contributions to improve the scene composition. First, we enhance the scene graph representation with heuristic-based relations, which add minimal storage overhead. Second, we use extreme points representation to supervise the learning of the scene composition network. These methods achieve significantly higher performance over existing work (69.0% vs 51.2% in relation score metric). We additionally demonstrate how scene graphs can be used to retrieve pose-constrained image patches that are semantically similar to the source query. Improving structured scene graph representations for rendering or retrieval is an important step towards realistic image generation.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.09348