papers AI Learner
The Github is limit! Click to go to the new site.

Mining Associated Text and Images with Dual-Wing Harmoniums

2012-07-04
Eric P. Xing, Rong Yan, Alexander G. Hauptmann

Abstract

We propose a multi-wing harmonium model for mining multimedia data that extends and improves on earlier models based on two-layer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the two-layer directed models such as LDA for similar tasks, but bears significant difference in inference/learning cost tradeoffs, latent topic representations, and topic mixing mechanisms. In particular, our model facilitates efficient inference and robust topic mixing, and potentially provides high flexibilities in modeling the latent topic spaces. A contrastive divergence and a variational algorithm are derived for learning. We specialized our model to a dual-wing harmonium for captioned images, incorporating a multivariate Poisson for word-counts and a multivariate Gaussian for color histogram. We present empirical results on the applications of this model to classification, retrieval and image annotation on news video collections, and we report an extensive comparison with various extant models.

Abstract (translated by Google)

我们提出了一个用于挖掘多媒体数据的多翼协调模型,它扩展和改进了基于两层随机场的早期模型,该模型捕获隐藏主题方面和观察输入之间的双向依赖关系。这个模型可以看作是类似任务的LDA等两层导向模型的一个无向对象,但在推理/学习成本折衷,潜在主题表示和主题混合机制方面存在显着差异。具体来说,我们的模型有助于高效的推理和强大的主题混合,并潜在地提供潜在主题空间建模的高灵活性。导出了一个对比散度和变分算法。我们将我们的模型专门用于带有标题图像的双翼乐队,其中包含多元泊松(Poisson)字词计数和多元高斯(Gaussian)颜色直方图。我们给出了这个模型在新闻视频收藏上的分类,检索和图像注释的应用实证结果,并且我们报告了与各种现存模型的广泛比较。

URL

https://arxiv.org/abs/1207.1423

PDF

https://arxiv.org/pdf/1207.1423


Similar Posts

Comments