Neural Language Model Pruning for Automatic Speech Recognition

Leonardo Emili,Thiago Fraga-Silva,Ernest Pusateri,M. Nußbaum-Thom,Youssef Oualil

Published 2023 in arXiv.org

ABSTRACT

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and inference speed-up for moderate compression.

PUBLICATION RECORD

Publication year
2023
Venue
arXiv.org
Publication date
2023-10-05
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2310.03424 arXiv 2310.03424
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
2022cited by this paper
Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models
2022cited by this paper
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
2022cited by this paper
Language model compression with weighted low-rank factorization
2022cited by this paper
Space-Efficient Representation of Entity-centric Query Language Models
2022cited by this paper
The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains
2022cited by this paper
Error-Driven Pruning of Language Models for Virtual Assistants
2021cited by this paper
Sparsification via Compressed Sensing for Automatic Speech Recognition
2021cited by this paper
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
2021cited by this paper
SEED: Self-supervised Distillation For Visual Representation
2021cited by this paper
GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity
2021cited by this paper
A Survey of Quantization Methods for Efficient Neural Network Inference
2021cited by this paper
EBERT: Efficient BERT Inference with Dynamic Structured Pruning
2021cited by this paper
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
2020cited by this paper
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
2020cited by this paper
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
2020influential reference
Layer-Wise Data-Free CNN Compression
2020cited by this paper
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
2020influential reference
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
2020cited by this paper
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
2020cited by this paper
Knowledge Distillation: A Survey
2020cited by this paper
Conformer: Convolution-augmented Transformer for Speech Recognition
2020cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Importance Estimation for Neural Network Pruning
2019influential reference
Shallow-Fusion End-to-End Contextual Biasing
2019cited by this paper
Structured Pruning of Large Language Models
2019cited by this paper
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019cited by this paper
Blockwise Self-Attention for Long Document Understanding
2019cited by this paper
A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition
2019cited by this paper
Self-Attention with Relative Position Representations
2018cited by this paper
Structured Pruning for Efficient ConvNets via Incremental Regularization
2018cited by this paper
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
2018cited by this paper
Filter Distillation for Network Compression
2018cited by this paper
Attention is All you Need
2017cited by this paper
To prune, or not to prune: exploring the efficacy of pruning for model compression
2017influential reference
Literature survey on low rank approximation of matrices
2016cited by this paper
Trained Ternary Quantization
2016cited by this paper
Do Deep Convolutional Nets Really Need to be Deep and Convolutional?
2016cited by this paper
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
2016cited by this paper
Distilling the Knowledge in a Neural Network
2015cited by this paper
Auto-Sizing Neural Networks: With Applications to n-gram Language Models
2015cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets
2013cited by this paper
Sequence Transduction with Recurrent Neural Networks
2012cited by this paper
Optimal Brain Damage
1989cited by this paper
Some statistical issues in the comparison of speech recognition algorithms
1989cited by this paper
Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment
1988cited by this paper
The approximation of one matrix by another of lower rank
1936cited by this paper

CITED BY

Enhancing CNN-Based Speech Emotion Recognition with Data Augmentation and Pruning Optimization
2025cites this paper
Adaptive pruning and SVD-enhanced acoustic modeling for low-resource Thai speech recognition
2025cites this paper
USM RNN-T model weights binarization
2024cites this paper