papers AI Learner
The Github is limit! Click to go to the new site.

Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network

2019-02-26
Liwen Zhang, Jiqing Han

Abstract

The temporal dynamics and the discriminative information in the audio signals are very crucial for the Acoustic Scene Classification (ASC). In this work, we propose a temporal feature learning method with hierarchical architecture called Multi-Layer Temporal Pooling (MLTP). Via recursive non-linear feature mappings and temporal pooling operations, our proposed MLTP can effectively capture the high-level temporal dynamics for an entire audio signal with arbitrary duration in an unsupervised way. With the patch-level discriminative features extracted by a simple pre-trained convolutional neural network (CNN) as input, our method attempts to learn the temporal features for the entire audio sample which will be directly used to train the classifier. Experimental results show that our method significantly improves the ASC performance. Without using any data augmentation techniques or ensemble strategies, our method can still achieve the state of art performance with only one lightweight CNN and a single classifier.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1902.10063

PDF

http://arxiv.org/pdf/1902.10063


Similar Posts

Comments