Unpaired Image Captioning via Scene Graph Alignments

2019-03-26

Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang

arXiv_CV

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

Deep neural networks have achieved great success on the image captioning task. However, most of the existing models depend heavily on paired image-sentence datasets, which are very expensive to acquire in most real-world scenarios. In this paper, we propose a scene graph based approach for unpaired image captioning. Our method merely requires an image set, a sentence corpus, an image scene graph generator, and a sentence scene graph generator. The sentence corpus is used to teach the decoder how to generate meaningful sentences from a scene graph. To further encourage the generated captions to be semantically consistent with the image, we employ adversarial learning to align the visual scene graph to the textual scene graph. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.10658

PDF

http://arxiv.org/pdf/1903.10658

Unpaired Image Captioning via Scene Graph Alignments

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments