Convergence diagnostics for stochastic gradient descent with constant step size

Published 2017 in arXiv.org

ABSTRACT

Many iterative procedures in stochastic optimization exhibit a transient phase followed by a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in that region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transition in the context of stochastic gradient descent with constant learning rate. We present theory and experiments suggesting that the region where the proposed diagnostic is activated coincides with the convergence region. For a class of loss functions, we derive a closed-form solution describing such region. Finally, we suggest an application to speed up convergence of stochastic gradient descent by halving the learning rate each time stationarity is detected. This leads to a new variant of stochastic gradient descent, which in many settings is comparable to state-of-art.

PUBLICATION RECORD

Publication year
2017
Venue
arXiv.org
Publication date
2017-10-17
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1710.06382
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
2017cited by this paper
Optimization Methods for Large-Scale Machine Learning
2016cited by this paper
Towards Stability and Optimality in Stochastic Gradient Descent
2015cited by this paper
Scalable estimation strategies based on stochastic approximations: classical results and new insights
2015cited by this paper
Statistical analysis of stochastic gradient methods for generalized linear models
2014influential reference
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
2014cited by this paper
Convergence of Stochastic Proximal Gradient Algorithm
2014cited by this paper
Asymptotic and finite-sample properties of estimators based on stochastic gradients
2014cited by this paper
Stochastic approximation
2013cited by this paper
Stochastic Approximation approach to Stochastic Programming
2013cited by this paper
Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm
2013cited by this paper
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
2013influential reference
Proximal Algorithms
2013cited by this paper
Stochastic Gradient Descent Tricks
2012cited by this paper
Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent
2011influential reference
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
2011cited by this paper
Implicit Online Learning
2010cited by this paper
Large-Scale Machine Learning with Stochastic Gradient Descent
2010influential reference
Solving large scale linear prediction problems using stochastic gradient descent algorithms
2004cited by this paper
Generalized Linear Models
2002cited by this paper
Beating the hold-out: bounds for K-fold and progressive cross-validation
1999cited by this paper
A Statistical Study on On-line Learning
1999cited by this paper
Accelerated Stochastic Approximation
1993cited by this paper
Gradient estimates for the performance of markov chains and discrete event processes
1992cited by this paper
Adaptive Algorithms and Stochastic Approximations
1990cited by this paper
Non-asymptotic confidence bounds for stochastic approximation algorithms with constant step size
1990influential reference
Stepsize Rules, Stopping Times and their Implementation in Stochastic Quasigradient Algorithms
1988cited by this paper
Efficient Estimations from a Slowly Convergent Robbins-Monro Process
1988cited by this paper
Numerical techniques for stochastic optimization
1988cited by this paper
Stochastic Approximation
1969cited by this paper
Accelerated Stochastic Approximation
1958cited by this paper
A Stochastic Approximation Method
1951cited by this paper
The{dollar}p{dollar}-Norm Generalization of the LMS Algorithm for Adaptive Filtering
year unknowncited by this paper
Noname manuscript No. (will be inserted by the editor) Incremental Proximal Methods for Large Scale Convex Optimization
year unknowncited by this paper

CITED BY

Tight Analysis of Decentralized SGD: A Markov Chain Perspective
2026cites this paper
AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent
2025influential citation
Refined Analysis of Federated Averaging and Federated Richardson-Romberg
2024cites this paper
Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent
2024cites this paper
Stochastic variational inference for scalable non-stationary Gaussian process regression
2023cites this paper
Drill the Cork of Information Bottleneck by Inputting the Most Important Data
2021cites this paper
Path integral contour deformations for observables in SU(N) gauge theory
2021cites this paper
A SIMULATION-BASED PREDICTION FRAMEWORK FOR STOCHASTIC SYSTEM DYNAMIC RISK MANAGEMENT
2018cites this paper
HiGrad: Uncertainty Quantification for Online Learning and Stochastic Approximation
2018influential citation
Proceedings of the 2018 Winter Simulation Conference
2018cites this paper
Yes, but Did It Work?: Evaluating Variational Inference
2018cites this paper
Uncertainty Quantification for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent
2018cites this paper
On the information bottleneck theory of deep learning
2018cites this paper
Bridging the gap between constant step size stochastic gradient descent and Markov chains
2017cites this paper