papers AI Learner
The Github is limit! Click to go to the new site.

Attention Based Fully Convolutional Network for Speech Emotion Recognition

2019-05-02
Yuanyuan Zhang, Jun Du, Zirui Wang, Jianshu Zhang

Abstract

Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long utterance; 3) speech data with emotional labeling is usually limited. In this paper, we present a novel attention based fully convolutional network for speech emotion recognition. We employ fully convolutional network as it is able to handle variable-length speech, free of the demand of segmentation to keep critical information not lost. The proposed attention mechanism can make our model be aware of which time-frequency region of speech spectrogram is more emotion-relevant. Considering limited data, the transfer learning is also adapted to improve the accuracy. Especially, it’s interesting to observe obvious improvement obtained with natural scene image based pre-trained model. Validated on the publicly available IEMOCAP corpus, the proposed model outperformed the state-of-the-art methods with a weighted accuracy of 70.4% and an unweighted accuracy of 63.9% respectively.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1806.01506

PDF

http://arxiv.org/pdf/1806.01506


Similar Posts

Comments