TinyCAM: A Lightweight and Discriminative Network for Tibetan Speaker Verification

Published 2025 in 2025 International Conference on Algorithms, Data Mining, and Information Technology (ADMIT)

ABSTRACT

Speaker verification for minority languages faces considerable challenges, primarily due to the scarcity of training data and the structural complexities inherent to these languages. In particular, Tibetan speech is characterized by its rich plosive inventory and strong rhythmic patterns, which pose unique difficulties for robust speaker modeling. To address these issues, this paper proposes TinyCAM, an efficient and lightweight speaker verification framework tailored for such acoustic properties. Built upon the CAM++ architecture, TinyCAM introduces three structural enhancements aimed at improving feature representation and computational efficiency. First, the framework incorporates WDS-ResBlock, which fuses wavelet convolution with depthwise separable convolution to enable more effective multi-scale processing of local time-frequency details. Second, it employs a streamlined GS-TDNN module, which utilizes grouped and pointwise convolutions to capture diverse feature types and temporal dynamics with reduced complexity. Third, TinyCAM integrates the SLRT Layer, which combines low-rank compression with sparse constraints to minimize channel redundancy while enhancing information extraction. Extensive experiments and ablation studies conducted on a Tibetan speech dataset demonstrate that these architectural innovations not only improve recognition accuracy but also significantly reduce model size and computational overhead. Specifically, TinyCAM achieves an Equal Error Rate (EER) of 5.9905% and a minimum Detection Cost Function (minDCF) of 0.8287, with only 6.67 million parameters and 1.38 GFLOPs, marking reductions of 26.6% and 11.7 %, respectively, compared to the original model. These results highlight TinyCAM's strong potential for practical deployment in resource-constrained scenarios, particularly in applications involving minority languages.

PUBLICATION RECORD

Publication year
2025
Venue
2025 International Conference on Algorithms, Data Mining, and Information Technology (ADMIT)
Publication date
2025-10-24
Fields of study
Not labeled
Identifiers
DOI 10.1109/ADMIT67050.2025.11336995
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification
2023cited by this paper
Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification
2023cited by this paper
CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking
2023cited by this paper
A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities
2021cited by this paper
Densely Connected Time Delay Neural Network for Speaker Verification
2020cited by this paper
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
2020cited by this paper
X-Vectors: Robust DNN Embeddings for Speaker Recognition
2018cited by this paper
Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System
2018cited by this paper
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
2018cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Vector Quantization Approach for Speaker Recognition
2013cited by this paper
Front-End Factor Analysis for Speaker Verification
2011cited by this paper
A Study of Interspeaker Variability in Speaker Verification
2008cited by this paper
Support vector machines for speaker and language recognition
2006cited by this paper
Speaker Verification Using Adapted Gaussian Mixture Models
2000cited by this paper
Robust text-independent speaker identification using Gaussian mixture speaker models
1995cited by this paper
Cepstral analysis technique for automatic speaker verification
1981cited by this paper
Dynamic programming algorithm optimization for spoken word recognition
1978cited by this paper

CITED BY

No citing papers are available for this paper.