papers AI Learner
The Github is limit! Click to go to the new site.

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

2015-06-22
Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler

Abstract

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in current datasets. To align movies and books we exploit a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.

Abstract (translated by Google)

书籍是细粒度信息,角色,物体或场景的外观,以及高层语义,人们在想什么,感觉以及这些状态如何通过故事演变的丰富资源。本文旨在将书籍与他们的电影版本对齐,以便为语义上远远超出当前数据集中可用字幕的视觉内容提供丰富的描述性解释。为了对齐电影和书籍,我们利用从大型书籍库以无监督方式训练的神经语句嵌入,以及用于计算书中电影剪辑和句子之间相似性的视频 - 文本神经嵌入。我们提出了一个上下文感知CNN来结合来自多个来源的信息。我们展示了电影/书籍对齐的良好定量性能,并展示了几个定性的例子,展示了我们的模型可用于多样化的任务。

URL

https://arxiv.org/abs/1506.06724

PDF

https://arxiv.org/pdf/1506.06724


Similar Posts

Comments