Abstract
Audio is an important medium in people’s daily life, hidden information can be embedded into audio for covert communication. Current audio information hiding techniques can be roughly classed into time domain-based and transform domain-based techniques. Time domain-based techniques have large hiding capacity but low imperceptibility. Transform domain-based techniques have better imperceptibility, but the hiding capacity is poor. This paper proposes a new audio information hiding technique which shows high hiding capacity and good imperceptibility. The proposed audio information hiding method takes the original audio signal as input and obtains the audio signal embedded with hidden information (called stego audio) through the training of our private automatic speech recognition (ASR) model. Without knowing the internal parameters and structure of the private model, the hidden information can be extracted by the private model but cannot be extracted by public models. We use four other ASR models to extract the hidden information on the stego audios to evaluate the security of the private model. The experimental results show that the proposed audio information hiding technique has a high hiding capacity of 48 cps with good imperceptibility and high security. In addition, our proposed adversarial audio can be used to activate an intrinsic backdoor of DNN-based ASR models, which brings a serious threat to intelligent speakers.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.03829