Abstract
This paper presents a self-supervised deep neural network solution to speech denoising by easing the requirement that clean speech signals need to be available for network training. This self-supervised approach is based on training a Fully Convolutional Neutral Network to map a noisy speech signal to another noisy version of the speech signal. To show the effectiveness of the developed approach, four commonly used objective performance measures are used to compare the self-supervised approach to the commonly used fully-supervised approach in which it is assumed that clean speech signals are available for training. The measures are examined for three public domain datasets of speech signals and one public domain dataset of noise signals. The results obtained indicate the self-supervised approach outperforms the fully-supervised approach. This solution is more suited for field deployment compared to the conventional deep learning-based solutions since under realistic audio conditions the only signals which are available for training are noisy speech signals and not clean speech signals.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.12069