Multimodal Attention for Neural Machine Translation

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.

Abstract (translated by Google)

注意机制是神经机器翻译（NMT）的重要组成部分，据报道与固定长度编码序列 - 序列模型相比，其产生更丰富的源代表。最近，在图像字幕的背景下也探讨了注意力的有效性。在这项工作中，我们评估多模态注意机制的可行性，同时关注图像及其自然语言描述，用另一种语言生成描述。我们在Multi30k多语言图像字幕数据集上训练了我们提出的关注机制的几个变体。我们显示，与文本NMT基线相比，每种模态的专注力达到BLEU和METEOR的1.6分。

URL

https://arxiv.org/abs/1609.03976

PDF

https://arxiv.org/pdf/1609.03976

Multimodal Attention for Neural Machine Translation

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments