Abstract
We propose a disentangled feature for weakly supervised multiclass sound event detection (SED), which helps ameliorate the performance and the training efficiency of class-wise attention based detection system by the introduction of more class-wise prior information as well as the network redundancy weight reduction. In this paper, we approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with class-wise attention pooling (cATP) module to solve it. Aiming at making finer detection even if there is only a small number of clips with less co-occurrence of the categories available in the training set, we optimize the high-level feature space of cATP-MIL by disentangling it based on class-wise identifiable information in the training set and obtain multiple different subspaces. Experiments show that our approach achieves competitive performance on Task4 of the DCASE2018 challenge.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1905.10091