Fast-ULCNet: A fast and ultra low complexity network for single-channel speech enhancement

Published 2026 in arXiv.org

ABSTRACT

Single-channel speech enhancement algorithms are often used in resource-constrained embedded devices, where low latency and low complexity designs gain more importance. In recent years, researchers have proposed a wide variety of novel solutions to this problem. In particular, a recent deep learning model named ULCNet is among the state-of-the-art approaches in this domain. This paper proposes an adaptation of ULCNet, by replacing its GRU layers with FastGRNNs, to reduce both computational latency and complexity. Furthermore, this paper shows empirical evidence on the performance decay of FastGRNNs in long audio signals during inference due to internal state drifting, and proposes a novel approach based on a trainable complementary filter to mitigate it. The resulting model, Fast-ULCNet, performs on par with the state-of-the-art original ULCNet architecture on a speech enhancement task, while reducing its model size by more than half and decreasing its latency by 34% on average.

PUBLICATION RECORD

Publication year
2026
Venue
arXiv.org
Publication date
2026-01-21
Fields of study
Computer Science, Engineering
Identifiers
DOI 10.48550/arXiv.2601.14925 arXiv 2601.14925
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods
2023cited by this paper
Low Complexity Speech Enhancement Network Based on Frame-Level Swin Transformer
2023cited by this paper
Ultra Low Complexity Deep Learning Based Noise Suppression
2023influential reference
Deepfilternet: A Low Complexity Speech Enhancement Framework for Full-Band Audio Based On Deep Filtering
2021cited by this paper
Real-Time Denoising and Dereverberation wtih Tiny Recurrent U-Net
2021cited by this paper
Dnsmos P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
2021cited by this paper
A consolidated view of loss functions for supervised deep learning-based speech enhancement
2020cited by this paper
The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results
2020cited by this paper
FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network
2018influential reference
SDR – Half-baked or Well Done?
2018cited by this paper
Complex Ratio Masking for Monaural Speech Separation
2016cited by this paper
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
2014cited by this paper
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
2014cited by this paper

CITED BY

No citing papers are available for this paper.