Abstract
The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model-universal background model (GMM-UBM) and i-vector. Considerable improvement is found in performance metrics by the proposed system on NIST SRE corpora in short duration conditions. We have observed improvement over state-of-the-art i-vector system.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1901.10345