An Interpretable Model for Scene Graph Generation

2018-11-21

Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

arXiv_CV

arXiv_CV Image_Caption QA Caption Detection Relation

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature’s contribution can be explicitly investigated. We study the key factors about these features that have the most impact on the performance, and also visualize the learned visual features for relationships and investigate the efficacy of our model. We won the champion of the OpenImages Visual Relationship Detection Challenge on Kaggle, where we outperform the 2nd place by 5\% (20\% relatively). We believe an accurate scene graph generator is a fundamental stepping stone for higher-level vision-language tasks such as image captioning and visual QA, since it provides a semantic, structured comprehension of an image that is beyond pixels and objects.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.09543

PDF

https://arxiv.org/pdf/1811.09543

An Interpretable Model for Scene Graph Generation

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments