How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs

Published 2026 in Unknown venue

ABSTRACT

The intermediate layers of deep networks can be characterised as a Gaussian process, in particular the Edge-of-Chaos (EoC) initialisation strategy prescribes the limiting covariance matrix of the Gaussian process. Here we show that the under-utilised chosen variance of the Gaussian process is important in the training of deep networks with sparsity inducing activation, such as a shifted and clipped ReLU, $\text{CReLU}_{\tau,m}(x)=\min(\max(x-\tau,0),m)$. Specifically, initialisations leading to larger fixed Gaussian process variances, allow for improved expressivity with activation sparsity as large as 90% in DNNs and CNNs, and generally improve the stability of the training process. Enabling full, or near full, accuracy at such high levels of sparsity in the hidden layers suggests a promising mechanism to reduce the energy consumption of machine learning models involving fully connected layers.

PUBLICATION RECORD

Publication year
2026
Venue
Unknown venue
Publication date
2026-02-05
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2602.05779
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Better Rates for Private Linear Regression in the Proportional Regime via Aggressive Clipping
2025cited by this paper
Slow Transition to Low-Dimensional Chaos in Heavy-Tailed Recurrent Neural Networks
2025cited by this paper
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
2024cited by this paper
Learning Neural Networks with Sparse Activations
2024cited by this paper
To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
2024cited by this paper
ReLU2 Wins: Discovering Efficient Activation Functions for Sparse LLMs
2024cited by this paper
Deep Neural Network Initialization with Sparsity Inducing Activations
2024influential reference
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
2024cited by this paper
Sparsity-aware generalization theory for deep neural networks
2023cited by this paper
Exact learning dynamics of deep linear networks with prior knowledge
2023cited by this paper
Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness
2021cited by this paper
Activation function design for deep networks: linearity and effective initialisation
2021influential reference
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
2021cited by this paper
Random Neural Networks in the Infinite Width Limit as Gaussian Processes
2021cited by this paper
Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping
2021cited by this paper
Neural Network Quantization for Efficient Inference: A Survey
2021influential reference
What is the State of Neural Network Pruning?
2020influential reference
Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity
2019cited by this paper
Non-Gaussian processes and neural networks at finite widths
2019cited by this paper
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
2018cited by this paper
On Lazy Training in Differentiable Programming
2018cited by this paper
Gaussian Process Behaviour in Wide Deep Neural Networks
2018cited by this paper
The Emergence of Spectral Universality in Deep Networks
2018influential reference
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks
2018influential reference
Deep Neural Networks as Gaussian Processes
2017cited by this paper
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
2017influential reference
Mean Field Residual Networks: On the Edge of Chaos
2017cited by this paper
Accelerating Matrix Multiplication in Deep Learning by Using Low-Rank Approximation
2017cited by this paper
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
2017cited by this paper
Exponential expressivity in deep neural networks through transient chaos
2016influential reference
Deep Information Propagation
2016influential reference
On the difficulty of training recurrent neural networks
2012cited by this paper
An Energy Budget for Signaling in the Grey Matter of the Brain
2001cited by this paper
The Principles of Deep Learning Theory An Eﬀective Theory Approach to Understanding Neural Networks
year unknowninfluential reference

CITED BY

No citing papers are available for this paper.