Stochastic Subspace Descent

D. Kozak,Stephen Becker,A. Doostan,L. Tenorio

Published 2019 in arXiv: Optimization and Control

ABSTRACT

We present two stochastic descent algorithms that apply to unconstrained optimization and are particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained optimization and machine learning problems. The basic algorithm projects the gradient onto a random subspace at each iteration, similar to coordinate descent but without restricting directional derivatives to be along the axes. This algorithm is previously known but we provide new analysis. We also extend the popular SVRG method to this framework but without requiring that the objective function be written as a finite sum. We provide proofs of convergence for our methods under various convexity assumptions and show favorable results when compared to gradient descent and BFGS on non-convex problems from the machine learning and shape optimization literature. We also note that our analysis gives a proof that the iterates of SVRG and several other popular first-order stochastic methods, in their original formulation, converge almost surely to the optimum; to our knowledge, prior to this work the iterates of SVRG had only been known to converge in expectation.

PUBLICATION RECORD

Publication year
2019
Venue
arXiv: Optimization and Control
Publication date
2019-04-01
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1904.01145
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Nonlinear Programming
2021influential reference
Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods
2018cited by this paper
SEGA: Variance Reduction via Gradient Sketching
2018cited by this paper
Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization
2018cited by this paper
Numerical Optimization
2018cited by this paper
Sampled Tikhonov regularization for large linear inverse problems
2018influential reference
Structured Evolution with Compact Architectures for Scalable Policy Optimization
2018cited by this paper
An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization
2018cited by this paper
Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches
2018cited by this paper
An Introduction to Data Analysis and Uncertainty Quantification for Inverse Problems
2017influential reference
Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods (Coordinate Descent, Directional Search, Derivative-Free Method)
2017cited by this paper
Randomized Truncated SVD Levenberg‐Marquardt Approach to Geothermal Natural State and History Matching
2017cited by this paper
Why Are Big Data Matrices Approximately Low Rank?
2017cited by this paper
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017cited by this paper
Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
2017cited by this paper
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
2016influential reference
Stochastic Block BFGS: Squeezing More Curvature out of Data
2016cited by this paper
A Multi-Batch L-BFGS Method for Machine Learning
2016cited by this paper
Barzilai-Borwein Step Size for Stochastic Gradient Descent
2016cited by this paper
Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization
2016cited by this paper
Practical Sketching Algorithms for Low-Rank Matrix Approximation
2016influential reference
Cyclic Coordinate-Update Algorithms for Fixed-Point Problems: Analysis and Applications
2016cited by this paper
Optimization Methods for Large-Scale Machine Learning
2016cited by this paper
Goal-Oriented Optimal Approximations of Bayesian Linear Inverse Problems
2016cited by this paper
Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
2015cited by this paper
ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates
2015cited by this paper
Random Gradient-Free Minimization of Convex Functions
2015cited by this paper
A Linearly-Convergent Stochastic L-BFGS Algorithm
2015cited by this paper
A Universal Catalyst for First-Order Optimization
2015cited by this paper
Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection
2015cited by this paper
Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling
2015cited by this paper
Coordinate descent algorithms
2015influential reference
Randomized Derivative-Free Optimization of Noisy Convex Functions
2015cited by this paper
Randomized sketches of convex programs with sharp guarantees
2014cited by this paper
Introductory Lectures on Convex Optimization - A Basic Course
2014influential reference
Likelihood-informed dimension reduction for nonlinear inverse problems
2014cited by this paper
Finito: A faster, permutable incremental gradient method for big data problems
2014cited by this paper
Scalable and efficient algorithms for the propagation of uncertainty from data through inference to prediction for large-scale problems, with application to flow of the Antarctic ice sheet
2014cited by this paper
A Stochastic Quasi-Newton Method for Large-Scale Optimization
2014cited by this paper
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
2014cited by this paper
RES: Regularized Stochastic BFGS Algorithm
2014cited by this paper
Optimal Low-rank Approximations of Bayesian Linear Inverse Problems
2014cited by this paper
Global convergence of online limited memory BFGS
2014cited by this paper
DANCo: An intrinsic dimensionality estimator exploiting angle and norm concentration
2014cited by this paper
Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming
2013cited by this paper
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
2013influential reference
Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations
2013cited by this paper
Author manuscript, published in "International Conference on Machine Learning (ICML 2013) (2013)" Optimization with First-Order Surrogate Functions
2013cited by this paper
Testing the Manifold Hypothesis
2013cited by this paper
A Computational Framework for Infinite-Dimensional Bayesian Inverse Problems, Part II: Stochastic Newton MCMC with Application to Ice Sheet Flow Inverse Problems
2013cited by this paper
A Computational Framework for Infinite-Dimensional Bayesian Inverse Problems Part I: The Linearized Case, with Application to Global Seismic Inversion
2013cited by this paper
An Effective Method for Parameter Estimation with PDE Constraints with Multiple Right-Hand Sides
2012cited by this paper
Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book
2012cited by this paper
Discrete Adjoint-Based Design for Unsteady Turbulent Flows on Dynamic Overset Unstructured Grids
2012cited by this paper
Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems
2012influential reference
Numerical methods for A-optimal designs with a sparsity constraint for ill-posed inverse problems
2012cited by this paper
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
2012cited by this paper
Linear and Nonlinear Inverse Problems with Practical Applications
2012cited by this paper
Optimization of Convex Functions with Random Pursuit
2011cited by this paper
Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
2011cited by this paper
Fast Algorithms for Bayesian Uncertainty Quantification in Large-Scale Linear Inverse Problems Based on Low-Rank Partial Hessian Approximations
2011cited by this paper
Randomized Hessian estimation and directional search
2011cited by this paper
Optimal Experimental Design for the Large‐Scale Nonlinear Ill‐Posed Problem of Impedance Imaging
2010cited by this paper
A Randomized Cutting Plane Method with Probabilistic Geometric Convergence
2010cited by this paper
Convex optimization
2010influential reference
Minimal Repetition Dynamic Checkpointing Algorithm for Unsteady Adjoint Calculation
2009cited by this paper
Introduction to Derivative-Free Optimization
2009cited by this paper
Variational Learning of Inducing Variables in Sparse Gaussian Processes
2009influential reference
Random Search Methods
2009influential reference
Lessons from the Netflix prize challenge
2007cited by this paper
A Stochastic Quasi-Newton Method for Online Convex Optimization
2007cited by this paper
How to generate random matrices from the classical compact groups
2006cited by this paper
Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent
2006cited by this paper
Parallel Lagrange-Newton-Krylov-Schur Methods for PDE-Constrained Optimization. Part II: The Lagrange-Newton Solver and Its Application to Optimal Control of Steady Viscous Flows
2005cited by this paper
Sparse Gaussian Processes using Pseudo-inputs
2005influential reference
Parallel Lagrange-Newton-Krylov-Schur Methods for PDE-Constrained Optimization. Part I: The Krylov-Schur Solver
2005cited by this paper
Γ and B
2004cited by this paper
Solving convex programs by random walks
2004cited by this paper
Maximum Likelihood Estimation of Intrinsic Dimension
2004influential reference
Gaussian Processes in Machine Learning
2003cited by this paper
Large-Scale PDE-Constrained Optimization: An Introduction
2003cited by this paper
Manifold Parzen Windows
2002influential reference
Completely Derandomized Self-Adaptation in Evolution Strategies
2001cited by this paper
Efficient reservoir history matching using subspace vectors
2001cited by this paper
Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition
2000cited by this paper
Using the Nyström Method to Speed Up Kernel Machines
2000cited by this paper
A Course in Large Sample Theory
1996cited by this paper
Large-Scale Inverse Problems and Quantification of Uncertainty
1994cited by this paper
On the convergence of the coordinate descent method for convex differentiable minimization
1992cited by this paper
Numerical techniques for stochastic optimization
1988cited by this paper
Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion
1987cited by this paper
Perspectives in flow control and optimization
1987cited by this paper
Optimization by Simulated Annealing
1983cited by this paper
A method for solving the convex programming problem with convergence rate O(1/k^2)
1983cited by this paper
Minimization by Random Search Techniques
1981cited by this paper
On search directions for minimization algorithms
1973cited by this paper
Some methods of speeding up the convergence of iteration methods
1964cited by this paper
Minimizing Certain Convex Functions
1963cited by this paper
Journal of Machine Learning Research () Submitted 9/12; Published Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
year unknowncited by this paper

CITED BY

Quantum circuit design from a retraction-based Riemannian optimization framework
2026cites this paper
Parameter-Efficient Subspace Optimization for LLM Fine-Tuning
2025influential citation
On the convergence of stochastic variance reduced gradient for linear inverse problems
2025cites this paper
A Structured Proximal Stochastic Variance Reduced Zeroth-order Algorithm
2025cites this paper
Learning long range dependencies through time reversal symmetry breaking
2025cites this paper
Stochastic Subspace Descent Accelerated via Bi-fidelity Line Search
2025influential citation
A stochastic gradient descent algorithm with random search directions
2025influential citation
On Decentralized Learning with Stochastic Subspace Descent
2025influential citation
Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining
2025cites this paper
Second-order forward-mode optimization of recurrent neural networks for neuroscience
2024cites this paper
Cubic regularized subspace Newton for non-convex optimization
2024cites this paper
Augmenting Subspace Optimization Methods with Linear Bandits
2024cites this paper
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
2024cites this paper
Memory-Efficient LLM Training with Online Subspace Descent
2024cites this paper
Gradient Descent with Low-Rank Objective Functions
2023cites this paper
Low-Rank Gradient Descent
2023cites this paper
Gradient Descent for Low-Rank Functions
2022cites this paper
Global Solutions to Nonconvex Problems by Evolution of Hamilton-Jacobi PDEs
2022cites this paper
Randomised subspace methods for non-convex optimization, with applications to nonlinear least-squares
2022cites this paper
A Hamilton–Jacobi-based proximal operator
2022cites this paper
Zeroth-order optimization with orthogonal random directions
2021cites this paper
Global optimization using random embeddings
2021cites this paper
Communication-efficient Subspace Methods for High-dimensional Federated Learning
2021cites this paper
Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems
2020cites this paper
Optimization by moving ridge functions: derivative-free optimization for computationally intensive functions
2020cites this paper
Stochastic Subspace Cubic Newton Method
2020influential citation
Constrained global optimization of functions with low effective dimensionality using multiple random embeddings
2020cites this paper
Optimization for Supervised Machine Learning: Randomized Algorithms for Data and Parameters
2020influential citation
Intraday Load Forecasts with Uncertainty
2019cites this paper
TRACE: Tennessee Research and Creative TRACE: Tennessee Research and Creative Exchange Exchange
year unknowncites this paper