papers AI Learner
The Github is limit! Click to go to the new site.

Describing Multimedia Content using Attention-based Encoder--Decoder Networks

2015-07-04
Kyunghyun Cho, Aaron Courville, Yoshua Bengio

Abstract

Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output problems, where the observed target is composed of multiple random variables that have a rich joint distribution, given the input. We focus in this paper on the case where the input also has a rich structure and the input and output structures are somehow related. We describe systems that learn to attend to different places in the input, for each element of the output, for a variety of tasks: machine translation, image caption generation, video clip description and speech recognition. All these systems are based on a shared set of building blocks: gated recurrent neural networks and convolutional neural networks, along with trained attention mechanisms. We report on experimental results with these systems, showing impressively good performance and the advantage of the attention mechanism.

Abstract (translated by Google)

尽管深度神经网络最初主要用于分类任务,但在结构化输出问题领域,它们正在迅速扩大,在已知输入的情况下,观察目标由具有丰富联合分布的多个随机变量组成。我们把重点放在投入结构丰富,投入产出结构相关的情况。我们描述了学习输入不同位置的输入的每个元素的系统,用于各种任务:机器翻译,图像标题生成,视频剪辑描述和语音识别。所有这些系统都基于共享的构建块:门控递归神经网络和卷积神经网络,以及训练的注意机制。我们用这些系统报告实验结果,显示出令人印象深刻的良好性能和关注机制的优势。

URL

https://arxiv.org/abs/1507.01053

PDF

https://arxiv.org/pdf/1507.01053


Similar Posts

Comments