Children's Speech Recognition through Discrete Token Enhancement

Published 2024 in Interspeech

ABSTRACT

Children's speech recognition is considered a low-resource task mainly due to the lack of publicly available data. There are several reasons for such data scarcity, including expensive data collection and annotation processes, and data privacy, among others. Transforming speech signals into discrete tokens that do not carry sensitive information but capture both linguistic and acoustic information could be a solution for privacy concerns. In this study, we investigate the integration of discrete speech tokens into children's speech recognition systems as input without significantly degrading the ASR performance. Additionally, we explored single-view and multi-view strategies for creating these discrete labels. Furthermore, we tested the models for generalization capabilities with unseen domain and nativity dataset. Results reveal that the discrete token ASR for children achieves nearly equivalent performance with an approximate 83% reduction in parameters.

PUBLICATION RECORD

Publication year
2024
Venue
Interspeech
Publication date
2024-06-19
Fields of study
Computer Science, Engineering, Education
Identifiers
DOI 10.48550/arXiv.2406.13431 arXiv 2406.13431
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
2024cited by this paper
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
2023cited by this paper
My Science Tutor (MyST)–a Large Corpus of Children’s Conversational Speech
2023cited by this paper
Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
2023cited by this paper
Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism
2023cited by this paper
Developmental Articulatory and Acoustic Features for Six to Ten Year Old Children
2023cited by this paper
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models
2023cited by this paper
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
2023cited by this paper
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
2022cited by this paper
A WAV2VEC2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition
2022cited by this paper
Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning
2022cited by this paper
Robust Speech Recognition via Large-Scale Weak Supervision
2022cited by this paper
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition
2022cited by this paper
Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR
2022cited by this paper
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
2022cited by this paper
Low Resource German ASR with Untranscribed Data Spoken by Non-native Children - INTERSPEECH 2021 Shared Task SPAPL System
2021cited by this paper
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
2021cited by this paper
Analysis of Disfluency in Children's Speech
2020cited by this paper
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
2020cited by this paper
ESPnet: End-to-End Speech Processing Toolkit
2018cited by this paper
On the Difficulties of Automatic Speech Recognition for Kindergarten-Aged Children
2018cited by this paper
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
2018cited by this paper
Attention is All you Need
2017cited by this paper
A survey about databases of children's speech
2013cited by this paper
Automated speech scoring for non-native middle school students with multiple task types
2013cited by this paper
Why and How Our Automated Reading Tutor Listens
2012cited by this paper
Speech production variability in fricatives of children and adults: results of functional data analysis.
2008cited by this paper
Stop-consonant voicing and intraoral pressure contours in women and children.
2008cited by this paper
Vowel acoustic space development in children: a synthesis of acoustic and anatomic data.
2007cited by this paper
Multi-view clustering
2004cited by this paper
LANGUAGE AND DISFLUENCY IN NONSTUTTERING CHILDREN'S CONVERSATIONAL SPEECH
1999cited by this paper
Acoustics of children's speech: developmental changes of temporal and spectral parameters.
1999cited by this paper
Analysis of children's speech: duration, pitch and formants
1997cited by this paper
Relationships between duration and temporal variability in children's speech.
1992cited by this paper
Vector quantization in speech coding
1985cited by this paper

CITED BY

Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
2025cites this paper
Can Layer-Wise SSL Features Improve Zero-Shot ASR Performance for Children’s Speech?
2025cites this paper
From CHAT towards ASR: A Hybrid Pipeline for Constructing the HUKILC-CO Hungarian Child Speech Dataset
2025cites this paper
Comparing Unsupervised and Supervised Semantic Speech Tokens: A Case Study of Child ASR
2025cites this paper
Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
2024cites this paper