Abstract
Acoustic Scene Classification (ASC) and Sound Event Detection (SED) are two separate tasks in the field of computational sound scene analysis. In this work, we present a new dataset with both sound scene and sound event labels and use this to demonstrate a novel method for jointly classifying sound scenes and recognizing sound events. We show that by taking a joint approach, learning is more efficient and whilst improvements are still needed for sound event detection, SED results are robust in a dataset where the sample distribution is skewed towards sound scenes. We cannot compare different systems which use different datasets, therefore we cannot make any claims on superiority on the state of the art. We show that performance on the joint system is comparable with performance on the isolated ASC system using the same dataset.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.10408