papers AI Learner
The Github is limit! Click to go to the new site.

Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

2016-07-01
Janarthanan Rajendran, Mitesh M. Khapra, Sarath Chandar, Balaraman Ravindran

Abstract

Recently there has been a lot of interest in learning common representations for multiple views of data. Typically, such common representations are learned using a parallel corpus between the two views (say, 1M images and their English captions). In this work, we address a real-world scenario where no direct parallel data is available between two views of interest (say, V1 and V2) but parallel data is available between each of these views and a pivot view (V3). We propose a model for learning a common representation for V1, V2 and V3 using only the parallel data available between V1V3 and V2V3. The proposed model is generic and even works when there are n views of interest and only one pivot view which acts as a bridge between them. There are two specific downstream applications that we focus on (i) transfer learning between languages L1,L2,…,Ln using a pivot language L and (ii) cross modal access between images and a language L1 using a pivot language L2. Our model achieves state-of-the-art performance in multilingual document classification on the publicly available multilingual TED corpus and promising results in multilingual multimodal retrieval on a new dataset created and released as a part of this work.

Abstract (translated by Google)

最近在学习多个数据视图的共同表示方面有很多兴趣。通常情况下,这两种视图之间使用平行语料库(例如,1M图像及其英文字幕)学习这种常见表示。在这项工作中,我们提出了一个真实的场景,在两个感兴趣的视图(比如V1V2)之间没有直接的并行数据可用,但是在每个视图和一个透视视图(V3)。我们提出了一个模型,用于仅使用V1V3V2V3之间可用的并行数据学习V1V2V3的通用表示。所提出的模型是通用的,甚至在存在感兴趣的n视图并且只有一个枢轴视图作为它们之间的桥梁时起作用。我们关注的两个具体的下游应用程序(i)使用数据透视语言L在语言L1L2,…,Ln之间转换学习;(ii)图像和语言L1使用数据透视语言L2。我们的模型在公共可用的多语言TED语料库上实现了多语言文档分类方面的最新性能,并且在作为这项工作的一部分创建和发布的新数据集的多语言多模式检索方面取得了有希望的结果。

URL

https://arxiv.org/abs/1510.03519

PDF

https://arxiv.org/pdf/1510.03519


Similar Posts

Comments