On the Expressive Power of Deep Neural Networks

M. Raghu,Ben Poole,J. Kleinberg,S. Ganguli,Jascha Narain Sohl-Dickstein

Published 2016 in International Conference on Machine Learning

ABSTRACT

We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute. Our approach is based on an interrelated set of measures of expressivity, unified by the novel notion of trajectory length, which measures how the output of a network changes as the input sweeps along a one-dimensional path. Our findings can be summarized as follows: (1) The complexity of the computed function grows exponentially with depth. (2) All weights are not equal: trained networks are more sensitive to their lower (initial) layer weights. (3) Regularizing on trajectory length (trajectory regularization) is a simpler alternative to batch normalization, with the same performance.

PUBLICATION RECORD

Publication year
2016
Venue
International Conference on Machine Learning
Publication date
2016-06-16
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1606.05336
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Statistical Learning Theory
2021cited by this paper
Hyperplane Arrangements
2020cited by this paper
Exponential expressivity in deep neural networks through transient chaos
2016cited by this paper
Learning Functions: When Is Deep Better Than Shallow
2016cited by this paper
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
2016cited by this paper
Learning Real and Boolean Functions: When Is Deep Better Than Shallow
2016cited by this paper
Mastering the game of Go with deep neural networks and tree search
2016cited by this paper
Train faster, generalize better: Stability of stochastic gradient descent
2015cited by this paper
Representation Benefits of Deep Feedforward Networks
2015cited by this paper
The Power of Depth for Feedforward Neural Networks
2015cited by this paper
Deep Knowledge Tracing
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Expressiveness of Rectifier Networks
2015cited by this paper
Distilling the Knowledge in a Neural Network
2015cited by this paper
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
2015cited by this paper
On the Expressive Power of Deep Learning: A Tensor Analysis
2015cited by this paper
Searching for exotic particles in high-energy physics with deep learning
2014cited by this paper
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
2014cited by this paper
Qualitatively characterizing neural network optimization problems
2014cited by this paper
Explaining and Harnessing Adversarial Examples
2014cited by this paper
On the Number of Linear Regions of Deep Neural Networks
2014cited by this paper
On Some Inequalities for the Gamma Function
2014cited by this paper
The Loss Surfaces of Multilayer Networks
2014cited by this paper
On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures
2014cited by this paper
Intriguing properties of neural networks
2013cited by this paper
On the Representational Efficiency of Restricted Boltzmann Machines
2013cited by this paper
On the number of response regions of deep feed forward networks with piece-wise linear activations
2013cited by this paper
On Some Inequalities for the Gamma Function
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Rectified Linear Units Improve Restricted Boltzmann Machines
2010influential reference
Links between perceptrons, MLPs and SVMs
2004cited by this paper
Vapnik-Chervonenkis dimension of neural nets
2003cited by this paper
Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks
1998cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
VC dimension of neural networks
1998cited by this paper
Neural networks
1995cited by this paper
A Comparison of the Computational Power of Sigmoid and Boolean Threshold Circuits
1994cited by this paper
Approximation by superpositions of a sigmoidal function
1989cited by this paper
Multilayer feedforward networks are universal approximators
1989cited by this paper
Some extensions of W. Gautschi’s inequalities for the gamma function
1983cited by this paper
On the Density of Families of Sets
1972cited by this paper

CITED BY

On the Topology of Neural Network Superlevel Sets
2026cites this paper
Mechanistic Interpretability of ReLU Neural Networks Through Piecewise-Affine Mapping
2026cites this paper
Continuous-Time Transformer-Based Channel Prediction With Non-Uniform Pilot Pattern
2026cites this paper
Representation Unlearning: Forgetting through Information Compression
2026cites this paper
Artificial intelligence for microbiology and microbiome research.
2026cites this paper
Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet
2026cites this paper
Mining Generalizable Activation Functions
2026cites this paper
Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry
2026cites this paper
Kolmogorov-Arnold Networks for Data-Driven, Physics-Informed, and Deep-Operator Learning: A Review, Synthesis, and New Analysis
2026cites this paper
Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training
2026cites this paper
All-Optical Deep Learning with Quantum Nonlinearity
2026cites this paper
Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion
2025cites this paper
Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
2025cites this paper
Variable selection for nonparametric spatial additive autoregressive model via deep learning
2025cites this paper
Learning Koopman Observables via Kolmogorov-Arnold Networks for Power System Transient Analysis and Model Predictive Control
2025cites this paper
TetraSDF: Precise Mesh Extraction with Multi-resolution Tetrahedral Grid
2025cites this paper
Time-series forecasting with multiphoton quantum states and integrated photonics
2025influential citation
Complexity of One-Dimensional ReLU DNNs
2025cites this paper
Marching Neurons: Accurate Surface Extraction for Neural Implicit Shapes
2025cites this paper
Topological Signatures of ReLU Neural Network Activation Patterns
2025influential citation
Neuronal correlations shape the scaling behavior of memory capacity and nonlinear computational capability of reservoir recurrent neural networks
2025cites this paper
Bound on entanglement in neural quantum states
2025cites this paper
Simplex-FEM Networks (SiFEN): Learning A Triangulated Function Approximator
2025cites this paper
Multi-Scale Protein Structure Modelling with Geometric Graph U-Nets
2025cites this paper
The Interaction Bottleneck of Deep Neural Networks: Discovery, Proof, and Modulation
2025cites this paper
Schrodinger AI: A Unified Spectral-Dynamical Framework for Classification, Reasoning, and Operator-Based Generalization
2025cites this paper
Towards Higher Effective Rank in Parameter-efficient Fine-tuning using Khatri-Rao Product
2025cites this paper
A Quotient Homology Theory of Representation in Neural Networks
2025cites this paper
Few-shot learning for non-vitrified ice segmentation
2025cites this paper
Error Bound Analysis for the Regularized Loss of Deep Linear Neural Networks
2025cites this paper
Geometric learning for computational mechanics Part IV: Efficient mesh-based plasticity from a domain-specific foundation model
2025cites this paper
Machine learning prediction of biochar yield based on different classification methods
2025cites this paper
Generative Modeling of Weights: Generalization or Memorization?
2025cites this paper
The Geometry of ReLU Networks through the ReLU Transition Graph
2025cites this paper
A New Perspective To Understanding Multi-resolution Hash Encoding For Neural Fields
2025cites this paper
Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds
2025cites this paper
Physics Informed Neural Pose Estimation for Real-Time Shape Reconstruction of Soft Continuum Robots
2025cites this paper
Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective
2025cites this paper
Quantifying Uncertainty in the Presence of Distribution Shifts
2025cites this paper
Unit-Centric Regularization for Efficient Deep Neural Networks
2025cites this paper
Deep Distillation Gradient Preconditioning for Inverse Problems
2025cites this paper
A Spin Glass Characterization of Neural Networks
2025cites this paper
Information processing driven by multicomponent surface condensates
2025cites this paper
Entanglement and optimization within autoregressive neural quantum states
2025cites this paper
GLAI: GreenLightningAI for Accelerated Training through Knowledge Decoupling
2025cites this paper
Designing ReLU Generative Networks to Enumerate Trees with a Given Tree Edit Distance
2025cites this paper
Topological Persistence of the Neural Embedding of the Archetypal Subspace
2025cites this paper
ViT3: Unlocking Test-Time Training in Vision
2025cites this paper
A Mutual Information-based Metric for Temporal Expressivity and Trainability Estimation in Quantum Policy Gradient Pipelines
2025cites this paper
Machine Learning-Aided Test Pattern Generation for VLSI Circuits: A Decision Tree Regressor Approach
2025cites this paper
Dual-Band Super-Resolution Channel Prediction in High-Mobility MIMO Systems
2025cites this paper
Scalable Bayesian Physics-Informed Kolmogorov-Arnold Networks
2025cites this paper
Data-driven Identification of Attractors Using Machine Learning
2025cites this paper
Robust real-time object detection and counting system for casting foundries
2025cites this paper
Interpretation and Understanding of Asphalt Crack Detection Deep Learning Models Using Integrated Gradient (I.G.) Maps
2025cites this paper
Photonic quantum convolutional neural networks with adaptive state injection
2025cites this paper
DFTQuake: Tripartite Fourier attention and dendrite network for real-time early prediction of earthquake magnitude and peak ground acceleration
2025cites this paper
Efficient deep neural network training via decreasing precision with layer capacity
2025cites this paper
The impact of allocation strategies in subset learning on the expressive power of neural networks
2025cites this paper
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
2025cites this paper
On Space Folds of ReLU Neural Networks
2025influential citation
Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs
2025cites this paper
Explainable Bayesian deep learning through input-skip Latent Binary Bayesian Neural Networks
2025cites this paper
The Space Between: On Folding, Symmetries and Sampling
2025cites this paper
Why Prompt Design Matters and Works: A Complexity Analysis of Prompt Search Space in LLMs
2025cites this paper
Deep Radon Prior: A fully unsupervised framework for sparse-view CT reconstruction
2025cites this paper
ReLU Networks as Random Functions: Their Distribution in Probability Space
2025cites this paper
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
2025cites this paper
Nonlocal techniques for the analysis of deep ReLU neural network approximations
2025cites this paper
Hadamard Product in Deep Learning: Introduction, Advances and Challenges
2025cites this paper
Dendritic Computing with Multi-Gate Ferroelectric Field-Effect Transistors
2025cites this paper
Optimization over Trained (and Sparse) Neural Networks: A Surrogate within a Surrogate
2025cites this paper
Quantifying expressive power of time-series neural networks
2025cites this paper
L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers
2025cites this paper
The Computational Complexity of Counting Linear Regions in ReLU Neural Networks
2025cites this paper
Time to Spike? Understanding the Representational Power of Spiking Neural Networks in Discrete Time
2025cites this paper
Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner
2025cites this paper
Comprehensive Attribute Encoding and Dynamic LSTM HyperModels for Outcome Oriented Predictive Business Process Monitoring
2025cites this paper
Research on performance prediction of a small sample centrifugal pump based on a pre-processing approach for imbalanced regression and key hyperparameters optimization
2025cites this paper
Gauge fixing for sequence-function relationships
2025cites this paper
Design of a CNN Accelerator for Multitask EEG Signal Classification Based on RISC-V
2025cites this paper
Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data
2025influential citation
The Vanishing Gradient Problem for Stiff Neural Differential Equations
2025cites this paper
Keep It Simple: Self-Adaptive Code Graph Simplification for Accurate Vulnerability Detection
2025cites this paper
JCCMTM: Joint channel-independent and channel-dependent strategy for masked multivariate time-series modeling
2025cites this paper
A Graph-Based Framework for Exploring Mathematical Patterns in Physics: A Proof of Concept
2025cites this paper
Discrete Functional Geometry of ReLU Networks via ReLU Transition Graphs
2025cites this paper
AdaPW: An adaptive point-weighting method for training physics-informed neural networks
2025cites this paper
Mixed-depth physics-informed neural network with nested activation mechanism in solving partial differential equations
2025cites this paper
On the Out-of-Distribution Backdoor Attack for Federated Learning
2025cites this paper
From Parameters to Behaviors: Unsupervised Compression of the Policy Space
2025cites this paper
Beyond Gaussian Initializations: Signal Preserving Weight Initialization for Odd-Sigmoid Activations
2025cites this paper
On residual network depth
2025cites this paper
Reproducibility of AI in Cephalometric Landmark Detection: A Preliminary Study
2025cites this paper
On the Upper Bounds of Number of Linear Regions and Generalization Error of Deep Convolutional Neural Networks
2025cites this paper
Distilling Dataset into Neural Field
2025cites this paper
Towards Contactless Data-Model Matching
2025influential citation
Dynamical Implicit Neural Representations
2025cites this paper
Neural Coherence : Find higher performance to out-of-distribution tasks from few samples
2025cites this paper
Structural network measures reveal the emergence of heavy-tailed degree distributions in lottery ticket multilayer perceptrons
2025cites this paper