papers AI Learner
The Github is limit! Click to go to the new site.

Audio-Linguistic Embeddings for Spoken Sentences

2019-02-20
Albert Haque, Michelle Guo, Prateek Verma, Li Fei-Fei

Abstract

We propose spoken sentence embeddings which capture both acoustic and linguistic content. While existing works operate at the character, phoneme, or word level, our method learns long-term dependencies by modeling speech at the sentence level. Formulated as an audio-linguistic multitask learning problem, our encoder-decoder model simultaneously reconstructs acoustic and natural language features from audio. Our results show that spoken sentence embeddings outperform phoneme and word-level baselines on speech recognition and emotion recognition tasks. Ablation studies show that our embeddings can better model high-level acoustic concepts while retaining linguistic content. Overall, our work illustrates the viability of generic, multi-modal sentence embeddings for spoken language understanding.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1902.07817

PDF

http://arxiv.org/pdf/1902.07817


Similar Posts

Comments