papers AI Learner
The Github is limit! Click to go to the new site.

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

2019-01-22
Julian Salazar, Katrin Kirchhoff, Zhiheng Huang

Abstract

Self-attention has demonstrated great success in sequence-to-sequence tasks in natural language processing, with preliminary work applying it to end-to-end encoder-decoder approaches in speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free strategy for monotonic sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for speech recognition. On the Wall Street Journal and LibriSpeech datasets, SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, attaining 4.7% CER in 1 day and 2.8% CER in 1 week respectively, using the same architecture and one GPU. We motivate the architecture for speech, evaluate position and downsampling approaches, and explore how the label alphabet affects attention head and performance outcomes.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1901.10055

PDF

http://arxiv.org/pdf/1901.10055


Similar Posts

Comments