Learning ReLUs via Gradient Descent

M. Soltanolkotabi

Published 2017 in Neural Information Processing Systems

ABSTRACT

In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form $max(0, )$ with $w$ denoting the weight vector. We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captures known side-information about its structure. We focus on the realizable model where the inputs are chosen i.i.d.~from a Gaussian distribution and the labels are generated according to a planted weight vector. We show that projected gradient descent, when initialization at 0, converges at a linear rate to the planted model with a number of samples that is optimal up to numerical constants. Our results on the dynamics of convergence of these very shallow neural nets may provide some insights towards understanding the dynamics of deeper architectures.

PUBLICATION RECORD

  • Publication year

    2017

  • Venue

    Neural Information Processing Systems

  • Publication date

    2017-05-10

  • Fields of study

    Mathematics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-24 of 24 references · Page 1 of 1

CITED BY

Showing 1-100 of 188 citing papers · Page 1 of 2