Abstract
Attention mechanism has been widely applied to various sound-related tasks. In this work, we propose a Multi-Scale Time-Frequency Attention (MTFA) module for sound event detection. By generating an attention heatmap, MTFA enables the model to focus on discriminative components of the spectrogram along both time and frequency axis. Besides, gathering information at multiple scales helps the model adapt better to the characteristics of different categories of target events. The proposed method is demonstrated on task 2 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge. To the best of our knowledge, our method outperforms all previous methods that don’t use model ensemble on development dataset and achieves state-of-the-art on evaluation dataset by reducing the error rate to 0.09 from 0.13. This demonstrates the effectiveness of MTFA on retrieving discriminative representations for sound event detection.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.00063