User-friendly Automatic Transcription of Low-resource Languages: Plugging ESPnet into Elpis

Oliver Adams,Benjamin Galliot,Guillaume Wisniewski,Nicholas Lambourne,Ben Foley,Rahasya Sanders-Dwyer,Janet Wiles,Alexis Michaud,Severine Guillaume,L. Besacier,Christopher Cox,Katya Aplonova,Guillaume Jacques,N. Hill

Published 2020 in COMPUTEL

ABSTRACT

This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a web front-end originally designed to provide access to the Kaldi automatic speech recognition toolkit. The goal of this work is to make end-to-end speech recognition models available to language workers via a user-friendly graphical interface. Encouraging results are reported on (i) development of an ESPnet recipe for use in Elpis, with preliminary results on data sets previously used for training acoustic models with the Persephone toolkit along with a new data set that had not previously been used in speech recognition, and (ii) incorporating ESPnet into Elpis along with UI enhancements and a CUDA-supported Dockerfile.

PUBLICATION RECORD

Publication year
2020
Venue
COMPUTEL
Publication date
2020-12-14
Fields of study
Linguistics, Computer Science, Engineering
Identifiers
DOI 10.33011/COMPUTEL.V1I.969 arXiv 2101.03027
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
2020cited by this paper
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
2020cited by this paper
A Survey on Contextual Embeddings
2020cited by this paper
Universal Phone Recognition with a Multilingual Allophone System
2020influential reference
Unsupervised Pretraining Transfers Well Across Languages
2020cited by this paper
The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment
2020cited by this paper
Towards a Speech Recognizer for Komi, an Endangered and Low-Resource Uralic Language
2020cited by this paper
Response from the editor to "toward open data policies in phonetics: what we can gain and how we can avoid pitfalls"
2020cited by this paper
The Blind Spots of Digital Innovation Fetishism
2020cited by this paper
Kaldi-Web: An Installation-Free, On-Device Speech Recognition System
2020cited by this paper
Phonemic Transcription of Low-Resource Languages: To What Extent can Preprocessing be Automated?
2020cited by this paper
Elpis, an Accessible Speech-to-Text Tool
2019cited by this paper
Future Directions in Technological Support for Language Documentation
2019cited by this paper
Massively Multilingual Adversarial Speech Recognition
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
PyTorch: An Imperative Style, High-Performance Deep Learning Library
2019cited by this paper
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
2018cited by this paper
Deep Contextualized Word Representations
2018cited by this paper
Sequence-Based Multi-Lingual Low Resource Speech Recognition
2018cited by this paper
ESPnet: End-to-End Speech Processing Toolkit
2018cited by this paper
Improved training of end-to-end attention models for speech recognition
2018cited by this paper
End-to-end Speech Recognition Using Lattice-free MMI
2018cited by this paper
Adversarial Multilingual Training for Low-Resource Speech Recognition
2018cited by this paper
Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit
2018cited by this paper
Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion
2018cited by this paper
Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (Elpis)
2018cited by this paper
The Pytorch-kaldi Speech Recognition Toolkit
2018cited by this paper
Language Science Press business model: Evaluated version of the 2015 model
2018cited by this paper
Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation
2018cited by this paper
Language independent end-to-end architecture for joint language identification and speech recognition
2017cited by this paper
Multilingual processing of speech via web services
2017cited by this paper
Japhug
2017cited by this paper
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
2017cited by this paper
Attention is All you Need
2017cited by this paper
Multilingual Speech Recognition with a Single End-to-End Model
2017cited by this paper
Building an ASR System for a Low-resource Language Through the Adaptation of a High-resource Language ASR System: Preliminary Results
2017cited by this paper
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
2017cited by this paper
Phonemic and Graphemic Multilingual CTC Based Speech Recognition
2017cited by this paper
Endangered Language Documentation: Bootstrapping a Chatino Speech Corpus, Forced Aligner, ASR
2016cited by this paper
Language Documentation meets Language Technology
2015cited by this paper
Documenting and Researching Endangered Lan- guages: The Pangloss Collection
2014cited by this paper
Using out-of-language data to improve an under-resourced speech recognizer
2014cited by this paper
Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
Deep Speech: Scaling up end-to-end speech recognition
2014cited by this paper
Automatic speech recognition for under-resourced languages: A survey
2014cited by this paper
Finding a way into a family of tone languages: The story and methods of the Chatino Language Documentation Project
2014cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Multilingual acoustic models using distributed deep neural networks
2013cited by this paper
Multilingual MLP features for low-resource LVCSR systems
2012cited by this paper
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
2012cited by this paper
ADADELTA: An Adaptive Learning Rate Method
2012cited by this paper
Phonology, tone and the functions of tone in San Juan Quiahije Chatino
2011cited by this paper
Cross-lingual portability of Chinese and english neural network features for French and German LVCSR
2011cited by this paper
The Kaldi Speech Recognition Toolkit
2011cited by this paper
Cross-lingual portability of MLP-based tandem features - a case study for English and Hungarian
2008cited by this paper
Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons
2006cited by this paper
Chapter 1 Language documentation: What is it and what is it good for?
2006cited by this paper
First steps in fast acoustic modeling for a new target language: application to Vietnamese
2005cited by this paper
Experiments on cross-language acoustic modeling
2001cited by this paper
State-of-the-art speech recognition u.s. research and business update
1987cited by this paper

CITED BY

Evaluating Speech Foundation Models for Automatic Speech Recognition in the Low-Resource Kanyen'kéha Language
2025cites this paper
Supporting SENCOTEN Language Documentation Efforts with Automatic Speech Recognition
2025cites this paper
Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages
2025cites this paper
Access Control Framework for Language Collections
2024cites this paper
Now You See Me, Now You Don’t: ‘Poverty of the Stimulus’ Problems and Arbitrary Correspondences in End-to-End Speech Models
2024cites this paper
Speech-to-text recognition for multilingual spoken data in language documentation
2023cites this paper
Application of Speech Processes for the Documentation of Kréyòl Gwadloupéyen
2023cites this paper
Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi
2022cites this paper
Phoneme transcription of endangered languages: an evaluation of recent ASR architectures in the single speaker scenario
2022cites this paper
End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec
2021cites this paper
Keyword spotting for audiovisual archival search in Uralic languages
2021cites this paper
Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings
2021cites this paper
The Relevance of the Source Language in Transfer Learning for ASR
2021cites this paper