papers AI Learner
The Github is limit! Click to go to the new site.

Joint Learning of Distributed Representations for Images and Texts

2015-04-28
Xiaodong He, Rupesh Srivastava, Jianfeng Gao, Li Deng

Abstract

This technical report provides extra details of the deep multimodal similarity model (DMSM) which was proposed in (Fang et al. 2015, arXiv:1411.4952). The model is trained via maximizing global semantic similarity between images and their captions in natural language using the public Microsoft COCO database, which consists of a large set of images and their corresponding captions. The learned representations attempt to capture the combination of various visual concepts and cues.

Abstract (translated by Google)

这份技术报告提供了在(Fang et al。2015,arXiv:1411.4952)中提出的深度多模式相似度模型(DMSM)的额外细节。该模型通过使用公共的Microsoft COCO数据库(其由大量图像及其相应字幕组成)来最大化自然语言中的图像及其标题的全局语义相似性来训练。学习的表示试图捕捉各种视觉概念和线索的组合。

URL

https://arxiv.org/abs/1504.03083

PDF

https://arxiv.org/e-print/1504.03083


Similar Posts

Comments