Structural Gaussian mixture models (SGMMs) are proposed for efficient text-independent speaker verification. A structural background model (SBM) is constructed first by hierarchically clustering all Gaussian mixture components in a universal background model (UBM). In this way the acoustic space is partitioned into multiple regions in different levels of resolution. For each target speaker, a SGMM can be generated through multi-level maximum a posteriori (MAP) adaptation from the SBM. During test, only a small subset of Gaussian mixture components is scored for each feature vector in order to reduce the computational cost signifi-cantly. Furthermore, the scores obtained in different layers of the tree-structured models are combined via a neural network for fi-nal decision. Different configurations are compared in the experiments conducted on the telephony speech data used in the NIST speaker verification evaluation. The experimental results show that computational reduction by a factor of 17 can be achieved with equal error rate (EER) reduced by (cid:0)(cid:1) compared with the baseline. The SGMM-SBM also shows some advantages over the recently proposed hash GMM.
Structural Gaussian mixture models for efficient text-independent speaker verification
Published 2002 in Interspeech
ABSTRACT
PUBLICATION RECORD
- Publication year
2002
- Venue
Interspeech
- Publication date
2002-09-16
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-10 of 10 references · Page 1 of 1
CITED BY
Showing 1-2 of 2 citing papers · Page 1 of 1