In this paper, we examine the problem of text-independent open-set speaker identification (OS-SI) in broadcast news. Particularly, the impact of the population of registered speakers to OS-SI performance is investigated, which is the central issue for designing practical OS-SI system. We amend the maximum mutual information (MMI)-based discriminative training scheme to facilitate its incorporation in OS-SI systems. We also improve the implementation to allow the application of MMI-based approach with 2048-component Gaussian mixture models. All systems are evaluated using NIST RT-03, RT-04 and FBIS corpora, with a maximum of 82 registered speakers. Our study shows that notable performance improvement can be obtained with MMI-based discriminative training, which reduces the equal error rate (EER) by 15.9% relatively, in comparison to the GMM-MAP scheme.
Open-set speaker identification in broadcast news
Chao Gao,G. Saikumar,Amit Srivastava,P. Natarajan
Published 2011 in IEEE International Conference on Acoustics, Speech, and Signal Processing
ABSTRACT
PUBLICATION RECORD
- Publication year
2011
- Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
- Publication date
2011-05-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-10 of 10 references · Page 1 of 1
CITED BY
Showing 1-14 of 14 citing papers · Page 1 of 1