papers AI Learner
The Github is limit! Click to go to the new site.

BreakingNews: Article Annotation by Image and Text Processing

2016-03-23
Arnau Ramisa, Fei Yan, Francesc Moreno-Noguer, Krystian Mikolajczyk

Abstract

Building upon recent Deep Neural Network architectures, current approaches lying in the intersection of computer vision and natural language processing have achieved unprecedented breakthroughs in tasks like automatic captioning or image retrieval. Most of these learning methods, though, rely on large training sets of images associated with human annotations that specifically describe the visual content. In this paper we propose to go a step further and explore the more complex cases where textual descriptions are loosely related to the images. We focus on the particular domain of News articles in which the textual content often expresses connotative and ambiguous relations that are only suggested but not directly inferred from images. We introduce new deep learning methods that address source detection, popularity prediction, article illustration and geolocation of articles. An adaptive CNN architecture is proposed, that shares most of the structure for all the tasks, and is suitable for multitask and transfer learning. Deep Canonical Correlation Analysis is deployed for article illustration, and a new loss function based on Great Circle Distance is proposed for geolocation. Furthermore, we present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (such as GPS coordinates and popularity metrics). We show this dataset to be appropriate to explore all aforementioned problems, for which we provide a baseline performance using various Deep Learning architectures, and different representations of the textual and visual features. We report very promising results and bring to light several limitations of current state-of-the-art in this kind of domain, which we hope will help spur progress in the field.

Abstract (translated by Google)

基于最近的深度神经网络体系结构,当前处于计算机视觉和自然语言处理交叉点的方法在自动字幕或图像检索等任务中取得了前所未有的突破。然而,大多数这些学习方法依赖于与人类注释相关联的大量训练集,这些人类注释专门描述了视觉内容。在本文中,我们建议进一步探索更复杂的情况,其中文本描述与图像松散相关。我们专注于新闻文章的特定领域,其中文本内容往往表达内涵和模糊的关系,只是建议,而不是从图像直接推断。我们引入新的深度学习方法,解决源检测,流行预测,文章插图和文章的地理位置。提出了一种自适应的CNN架构,它共享所有任务的大部分结构,适用于多任务和传输学习。文章阐述了深度典型相关分析,提出了基于大圆距离的新损失函数进行地理定位。此外,我们还提供BreakingNews,这是一个新颖的数据集,大约包括图像,文本和标题在内的大约10万篇新闻文章,并丰富了异构元数据(如GPS坐标和流行度量度)。我们展示这个数据集适合探索所有上述问题,为此我们提供了使用各种深度学习架构的基线性能,以及不同的文本和视觉特征表示。我们报告了非常有希望的成果,并揭示了这一领域当前最新技术的几个局限性,我们希望这将有助于刺激该领域的进展。

URL

https://arxiv.org/abs/1603.07141

PDF

https://arxiv.org/pdf/1603.07141


Similar Posts

Comments