MultiDEC: Multi-Modal Clustering of Image-Caption Pairs

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

In this paper, we propose a method for clustering image-caption pairs by simultaneously learning image representations and text representations that are constrained to exhibit similar distributions. These image-caption pairs arise frequently in high-value applications where structured training data is expensive to produce but free-text descriptions are common. MultiDEC initializes parameters with stacked autoencoders, then iteratively minimizes the Kullback-Leibler divergence between the distribution of the images (and text) to that of a combined joint target distribution. We regularize by penalizing non-uniform distributions across clusters. The representations that minimize this objective produce clusters that outperform both single-view and multi-view techniques on large benchmark image-caption datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.01860

PDF

https://arxiv.org/pdf/1901.01860

MultiDEC: Multi-Modal Clustering of Image-Caption Pairs

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments