Abstract
Voice interfaces and assistants implemented by various services have become increasingly sophisticated, powered by increased availability of data. However, users’ audio data needs to be guarded while enforcing data-protection regulations, such as the GDPR law and the COPPA law. To check the unauthorized use of audio data, we propose an audio auditor for users to audit speech recognition models. Specifically, users can check whether their audio recordings were used as a member of the model’s training dataset or not. In this paper, we focus our work on a DNN-HMM-based automatic speech recognition model over the TIMIT audio data. As a proof-of-concept, the success rate of participant-level membership inference can reach up to 90\% with eight audio samples per user, resulting in an audio auditor.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1905.07082