Entropy and mutual information in models of deep neural networks

Marylou Gabrié,Andre Manoel,Clément Luneau,Jean Barbier,N. Macris,Florent Krzakala,L. Zdeborová

Published 2018 in Neural Information Processing Systems

ABSTRACT

We examine a class of stochastic deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) we show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive.

PUBLICATION RECORD

Publication year
2018
Venue
Neural Information Processing Systems
Publication date
2018-05-24
Fields of study
Mathematics, Physics, Computer Science
Identifiers
DOI 10.1088/1742-5468/ab3430 arXiv 1805.09785
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

and s
2019cited by this paper
ournal of Statistical Mechanics : J Theory and Experiment
2019cited by this paper
The adaptive interpolation method for proving replica formulas. Applications to the Curie–Weiss and Wigner spike models
2019cited by this paper
MINE: Mutual Information Neural Estimation
2018cited by this paper
The Mutual Information in Random Linear Estimation Beyond i.i.d. Matrices
2018cited by this paper
On the information bottleneck theory of deep learning
2018influential reference
The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference
2018cited by this paper
Harnessing neural networks: A random matrix approach
2017cited by this paper
Inference in Deep Networks in High Dimensions
2017cited by this paper
The stochastic interpolation method: A simple scheme to prove replica formulas in Bayesian inference
2017influential reference
InfoVAE: Information Maximizing Variational Autoencoders
2017cited by this paper
Emergence of invariance and disentangling in deep representations
2017cited by this paper
Nonlinear Information Bottleneck
2017influential reference
Multi-layer generalized linear estimation
2017influential reference
Additivity of information in multilayer networks via additive Gaussian noise transforms
2017influential reference
The layered structure of tensor estimation and its mutual information
2017cited by this paper
Opening the Black Box of Deep Neural Networks via Information
2017influential reference
High-dimensional dynamics of generalization error in neural networks
2017cited by this paper
Optimal errors and phase transitions in high-dimensional generalized linear models
2017cited by this paper
Estimating Mixture Entropy with Pairwise Distances
2017influential reference
Deep Variational Information Bottleneck
2017cited by this paper
Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models
2017cited by this paper
Relevant sparse codes with variational information bottleneck
2016cited by this paper
The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact
2016cited by this paper
On the Expressive Power of Deep Neural Networks
2016cited by this paper
Information Dropout: Learning Optimal Representations Through Noisy Computation
2016cited by this paper
Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula
2016cited by this paper
The mutual information in random linear estimation
2016cited by this paper
Deep Information Propagation
2016cited by this paper
Fundamental limits of symmetric low-rank matrix estimation
2016influential reference
Deep learning and the information bottleneck principle
2015cited by this paper
Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?
2015cited by this paper
Statistical physics of inference: thresholds and algorithms
2015influential reference
ACDC: A Structured Efficient Linear Layer
2015cited by this paper
Deep Fried Convnets
2014cited by this paper
Signal recovery using expectation consistent approximation for linear observations
2014cited by this paper
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
2014cited by this paper
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
2013cited by this paper
Concentration Inequalities: A Nonasymptotic Theory of Independence
2013cited by this paper
The Sherrington-Kirkpatrick Model
2013cited by this paper
High dimensional robust M-estimation: asymptotic variance via approximate message passing
2013cited by this paper
Support Recovery With Sparsely Sampled Free Random Matrices
2012cited by this paper
Support recovery with sparsely sampled free random matrices
2011cited by this paper
Statistical physics-based reconstruction in compressed sensing
2011cited by this paper
Hybrid Approximate Message Passing
2011influential reference
Statistical mechanics of compressed sensing.
2010cited by this paper
Generalized approximate message passing for estimation with random linear mixing
2010cited by this paper
Message-passing algorithms for compressed sensing
2009cited by this paper
A typical reconstruction limit for compressed sensing based on Lp-norm minimization
2009influential reference
Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing
2009cited by this paper
Learning from correlated patterns by simple perceptrons
2008influential reference
Efficient supervised learning in networks with binary synapses
2007cited by this paper
Perceptron capacity revisited: classification ability for correlated patterns
2007influential reference
Inference from correlated patterns: a unified theory for perceptron learning and linear vector channels
2007cited by this paper
Vector Precoding for Wireless MIMO Systems and its Replica Analysis
2007influential reference
Europhysics Letters PREPRINT
2005influential reference
Random Matrix Theory and Wireless Communications
2004cited by this paper
Spin glasses : a challenge for mathematicians : cavity and mean field models
2003cited by this paper
Information Bottleneck for Gaussian Variables
2003cited by this paper
Estimating mutual information.
2003cited by this paper
Role of the interaction matrix in mean-field spin glass models.
2002cited by this paper
A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors
2002cited by this paper
Statistical physics of spin glasses and information processing : an introduction
2001influential reference
Advanced mean field methods: theory and practice
2001cited by this paper
Information
2001cited by this paper
Tractable approximations for probabilistic models: the adaptive Thouless-Anderson-Palmer mean field approach.
2001influential reference
Statistical Mechanics of Learning
2001influential reference
The information bottleneck method
2000cited by this paper
Phase Transitions
1997influential reference
Information, physics, and computation
1996cited by this paper
Mean-field equations for spin models with orthogonal interaction matrices
1995cited by this paper
Replica field theory for deterministic models: II. A non-random spin glass with glassy behaviour
1994cited by this paper
Statistical mechanics of learning from examples.
1992cited by this paper
Three unfinished works on the optimal storage capacity of networks
1989cited by this paper
The space of interactions in neural networks: Gardner's computation with the cavity method
1989cited by this paper
The space of interactions in neural network models
1988influential reference
Optimal storage properties of neural network models
1988influential reference
Spin Glass Theory and Beyond
1987influential reference
Storing infinite numbers of patterns in a spin-glass model of neural networks.
1985cited by this paper
Solvable Model of a Spin-Glass
1975cited by this paper

CITED BY

Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
2026cites this paper
Detecting Internal and External Simplification by Min–Max Potentiality Control for Interpreting Multi-layered Neural Networks
2026cites this paper
Learned Hallucination Detection in Black-Box LLMs using Token-level Entropy Production Rate
2025cites this paper
A Spin Glass Characterization of Neural Networks
2025cites this paper
Rethinking the Understanding Ability across LLMs through Mutual Information
2025cites this paper
Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime
2025cites this paper
ACMamba: Fast Unsupervised Anomaly Detection via An Asymmetrical Consensus State Space Model
2025cites this paper
Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy
2025cites this paper
Unifying Search and Recommendation with Dual-View Representation Learning in a Generative Paradigm
2025cites this paper
High-entropy Advantage in Neural Networks' Generalizability
2025cites this paper
Generalization of Knowledge Transfer With User Reviews for Cross-Domain Recommendation
2025cites this paper
GTCExplainer: Interpretable Graph Convolutional Networks for Molecular Activity Prediction
2025cites this paper
Estimating Time Series Foundation Model Transferability via In-Context Learning
2025cites this paper
AI-driven insights into B5G/6G MAC mechanisms: A comprehensive analysis
2025cites this paper
FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model
2025cites this paper
Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition
2025cites this paper
TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination
2025cites this paper
IBNorm: Information-Bottleneck Inspired Normalization for Representation Learning
2025cites this paper
LSTM-GRU-Based cGAN With Multi-Head Attention for One-Bit Channel Estimation in Multi-User Massive MIMO System
2025cites this paper
Frozen in the Middle: Hidden States Remain Unchanged Across Intermediate Layers of Language Models
2025cites this paper
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
2025cites this paper
Interpreting Performance of Deep Neural Networks with Partial Information Decomposition
2025cites this paper
AI Enabled IoT Architecture for Remote Cardiac Monitoring: Deep Learning Driven Arrhythmia Detection and Telemedicine Deployment in Rural Sri Lanka
2025cites this paper
Nonlinear Anisotropic Diffusion-Based Channel Estimation in 5G Wireless Networks
2025cites this paper
Multi-Task Federated Split Learning Across Multi-Modal Data with Privacy Preservation
2025cites this paper
Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression
2024cites this paper
Asymptotics of Learning with Deep Structured (Random) Features
2024cites this paper
Fed-pilot: Optimizing LoRA Allocation for Efficient Federated Fine-Tuning with Heterogeneous Clients
2024cites this paper
Spiking Transformer with Spatial-Temporal Attention
2024cites this paper
Mind the information gap: How sampling and clustering impact the predictability of reach‐scale channel types in California (USA)
2024cites this paper
High-dimensional learning of narrow neural networks
2024cites this paper
Informative Subgraphs Aware Masked Auto-Encoder in Dynamic Graphs
2024cites this paper
HDnGAN: A Channel Estimation Method for Time-Varying mmWave Massive MIMO
2024cites this paper
Validity Matters: Uncertainty‐Guided Testing of Deep Neural Networks
2024cites this paper
A Survey on Information Bottleneck
2024cites this paper
D2NAS: Efficient Neural Architecture Search With Performance Improvement and Model Size Reduction for Diverse Tasks
2024cites this paper
Exploring Loss Landscapes through the Lens of Spin Glass Theory
2024cites this paper
Progress Measures for Grokking on Real-world Tasks
2024cites this paper
Progress Measures for Grokking on Real-world Datasets
2024cites this paper
GAMP or GOAMP/GVAMP Receiver in Generalized Linear Systems: Achievable Rate, Coding Principle, and Comparative Study
2024cites this paper
Entropic timescales of dynamic heterogeneity in supercooled liquid.
2023cites this paper
Channel Power Gain Estimation for Terahertz Vehicle-to-Infrastructure Networks
2023cites this paper
Carving Nature at Its Joints: A Comparison of CEMI Field Theory with Integrated Information Theory and Global Workspace Theory
2023cites this paper
IMoVR-Net: A robust interpretable network for multi-ocular lesion recognition from TAO facial images
2023cites this paper
Information Plane Analysis Visualization in Deep Learning via Transfer Entropy
2023cites this paper
Poverty improvement policies and household income: Evidence from China
2023cites this paper
Contradiction neutralization for interpreting multi-layered neural networks
2023cites this paper
A simple connection from loss flatness to compressed representations in neural networks
2023cites this paper
Deep learning for ECG Arrhythmia detection and classification: an overview of progress for period 2017–2023
2023cites this paper
Connecting NTK and NNGP: A Unified Theoretical Framework for Neural Network Learning Dynamics in the Kernel Regime
2023cites this paper
Precision fault prediction in motor bearings with feature selection and deep learning
2023cites this paper
SNIB: Improving Spike-Based Machine Learning Using Nonlinear Information Bottleneck
2023cites this paper
Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance
2023cites this paper
GiGaMAE: Generalizable Graph Masked Autoencoder via Collaborative Latent Space Reconstruction
2023cites this paper
Bounds on the rates of statistical divergences and mutual information via stochastic thermodynamics.
2023cites this paper
Fundamental limits of overparametrized shallow neural networks for supervised learning
2023cites this paper
Explicit mutual information for simple networks and neurons with lognormal activities.
2023cites this paper
An information-Theoretic Approach to Semi-supervised Transfer Learning
2023cites this paper
Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression
2023cites this paper
High-Dimensional Smoothed Entropy Estimation via Dimensionality Reduction
2023cites this paper
LSTM-GRU Model-Based Channel Prediction for One-Bit Massive MIMO System
2023cites this paper
Neural-prior stochastic block model
2023cites this paper
Spatially heterogeneous learning by a deep student machine
2023cites this paper
Free Energy of Multi-Layer Generalized Linear Models
2023influential citation
Deterministic equivalent and error universality of deep random features learning
2023cites this paper
A study of uncertainty quantification in overparametrized high-dimensional models
2022cites this paper
Ising models of deep neural networks
2022cites this paper
k-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension
2022cites this paper
Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions
2022cites this paper
A Random Matrix Network Model for the Network Teaching System of College Music Education Courses
2022cites this paper
Graph Neural Networks: Self-supervised Learning
2022cites this paper
Approximate Message Passing for Multi-Layer Estimation in Rotationally Invariant Models
2022cites this paper
Calibrating Cosmological Simulations with Implicit Likelihood Inference Using Galaxy Growth Observables
2022cites this paper
DISENTANGLED GENERATIVE MODELS AND THEIR APPLICATIONS
2022cites this paper
On the generalization of learning algorithms that do not converge
2022cites this paper
A study of uncertainty quantiﬁcation in overparametrized high-dimensional models
2022cites this paper
Feature selection based on a hybrid simplified particle swarm optimization algorithm with maximum separation and minimum redundancy
2022cites this paper
Mutual information based Bayesian graph neural network for few-shot learning
2022cites this paper
Soil liquefaction assessment by using hierarchical Gaussian Process model with integrated feature and instance based domain adaption for multiple data sources
2022cites this paper
The Compound Information Bottleneck Program
2022cites this paper
Bounding generalization error with input compression: An empirical study with infinite-width networks
2022cites this paper
Entropy of sharp restart
2022cites this paper
Monitoring Shortcut Learning using Mutual Information
2022cites this paper
Channel Estimation for Cell-Free Massive MIMO Using Conditional GAN
2022cites this paper
Bayes-optimal limits in structured PCA, and how to reach them
2022cites this paper
The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?
2022cites this paper
On double-descent in uncertainty quantification in overparametrized models
2022cites this paper
The Compound Information Bottleneck Outlook
2022cites this paper
Unsupervised Learning of Geometric Sampling Invariant Representations for 3D Point Clouds
2021cites this paper
Adaptive Path Interpolation Method for Sparse Systems: Application to a Censored Block Model
2021cites this paper
Matrix inference and estimation in multi-layer models
2021cites this paper
Mutual Information of Neural Network Initialisations: Mean Field Approximations
2021cites this paper
Information Bottleneck Theory on Convolutional Neural Networks
2021cites this paper
More data or more parameters? Investigating the effect of data structure on generalization
2021cites this paper
Understanding Neural Networks with Logarithm Determinant Entropy Estimator
2021cites this paper
Information Bottleneck: Exact Analysis of (Quantized) Neural Networks
2021influential citation
Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group
2021cites this paper
Ten key ICT challenges in the post-Shannon era
2021cites this paper
An information-theoretic framework for learning models of instance-independent label noise
2021cites this paper
Information Bottleneck Analysis by a Conditional Mutual Information Bound
2021cites this paper