Nearly Minimal Over-Parametrization of Shallow Neural Networks

Published 2019 in arXiv.org

ABSTRACT

A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario, the existing theory requires quadratic over-parametrization as a function of the number of training samples. This paper establishes that linear overparametrization is sufficient to fit the training data, using a simple variant of the (stochastic) gradient descent. Crucially, unlike several related works, the training considered in this paper is not limited to the lazy regime in the sense cautioned against in [1, 2]. Beyond shallow networks, the framework developed in this work for over-parametrization is applicable to a variety of learning problems.

PUBLICATION RECORD

Publication year
2019
Venue
arXiv.org
Publication date
2019-10-09
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1910.03948
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

On the Power and Limitations of Random Features for Understanding Neural Networks
2019influential reference
Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural Network
2019cited by this paper
Fast and Provable ADMM for Learning with Generative Priors
2019cited by this paper
Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks
2019cited by this paper
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
2019cited by this paper
TRAINABILITY OF ReLU NETWORKS AND DATA-DEPENDENT INITIALIZATION
2019cited by this paper
Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes
2019cited by this paper
Trainability and Data-dependent Initialization of Over-parameterized ReLU Neural Networks
2019cited by this paper
An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints
2019cited by this paper
An Improved Analysis of Training Over-parameterized Deep Neural Networks
2019cited by this paper
Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound
2019cited by this paper
On Learning Over-parameterized Neural Networks: A Functional Approximation Prospective
2019cited by this paper
Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks
2019cited by this paper
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
2018cited by this paper
On the Power of Over-parametrization in Neural Networks with Quadratic Activation
2018cited by this paper
Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks
2018cited by this paper
A mean field view of the landscape of two-layer neural networks
2018cited by this paper
Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error
2018cited by this paper
Learning One-hidden-layer ReLU Networks via Gradient Descent
2018cited by this paper
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
2018influential reference
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
2018cited by this paper
Spurious Valleys in Two-layer Neural Network Optimization Landscapes
2018cited by this paper
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
2018cited by this paper
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
2018cited by this paper
A Convergence Theory for Deep Learning via Over-Parameterization
2018influential reference
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
2018cited by this paper
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
2018cited by this paper
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?
2018influential reference
Mean field analysis of neural networks: A central limit theorem
2018cited by this paper
Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys
2018cited by this paper
Numerical Optimization
2018cited by this paper
On Lazy Training in Differentiable Programming
2018influential reference
SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
2017cited by this paper
Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference
2017cited by this paper
An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis
2017cited by this paper
Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels
2017cited by this paper
Gradient Descent Can Take Exponential Time to Escape Saddle Points
2017cited by this paper
Spurious Local Minima are Common in Two-Layer ReLU Neural Networks
2017cited by this paper
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
2017cited by this paper
Improved Training of Wasserstein GANs
2017cited by this paper
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
2016cited by this paper
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
2015cited by this paper
Empirical Evaluation of Rectified Activations in Convolutional Network
2015cited by this paper
Understanding Machine Learning From Theory to Algorithms 1st Edition Shwartz Solutions Manual
2015cited by this paper
Introductory Lectures on Convex Optimization - A Basic Course
2014cited by this paper
Understanding Machine Learning - From Theory to Algorithms
2014cited by this paper
Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition
2013cited by this paper
Introduction to the non-asymptotic analysis of random matrices
2010cited by this paper
Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning
2008cited by this paper
Random Features for Large-Scale Kernel Machines
2007cited by this paper
Numerical Optimization (Springer Series in Operations Research and Financial Engineering)
2000cited by this paper
Incorporating Second-Order Functional Knowledge for Better Option Pricing
2000cited by this paper
Semianalytic and subanalytic sets
1988cited by this paper

CITED BY

Convergence Analysis of Neural Networks
2019cites this paper