Neural Optimizer Search with Reinforcement Learning

Irwan Bello,Barret Zoph,Vijay Vasudevan,Quoc V. Le

Published 2017 in International Conference on Machine Learning

ABSTRACT

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Machine Learning
Publication date
2017-08-06
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1709.07417
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Learning to Optimize Neural Nets
2017cited by this paper
Google Vizier: A Service for Black-Box Optimization
2017cited by this paper
On the State of the Art of Evaluation in Neural Language Models
2017cited by this paper
Proximal Policy Optimization Algorithms
2017influential reference
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2017cited by this paper
Learning Transferable Architectures for Scalable Image Recognition
2017influential reference
Learned Optimizers that Scale and Generalize
2017cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016influential reference
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
2016influential reference
Distributed Second-Order Optimization using Kronecker-Factored Approximations
2016cited by this paper
Understanding deep learning requires rethinking generalization
2016cited by this paper
TensorFlow: A system for large-scale machine learning
2016cited by this paper
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
2016cited by this paper
Learning to Optimize
2016cited by this paper
Using the Output Embedding to Improve Language Models
2016cited by this paper
The evolution of a generalized neural learning rule
2016cited by this paper
Wide Residual Networks
2016influential reference
Designing Neural Network Architectures using Reinforcement Learning
2016cited by this paper
Optimization as a Model for Few-Shot Learning
2016cited by this paper
Learning to learn by gradient descent by gradient descent
2016influential reference
SGDR: Stochastic Gradient Descent with Restarts
2016influential reference
Neural Architecture Search with Reinforcement Learning
2016influential reference
SGDR: Stochastic Gradient Descent with Warm Restarts
2016cited by this paper
Trust Region Policy Optimization
2015cited by this paper
Adding Gradient Noise Improves Learning for Very Deep Networks
2015cited by this paper
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
2015cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Revisiting Natural Gradient for Deep Networks
2013cited by this paper
ADADELTA: An Adaptive Learning Rate Method
2012cited by this paper
Large Scale Distributed Deep Networks
2012influential reference
No more pesky learning rates
2012cited by this paper
Training Deep and Recurrent Networks with Hessian-Free Optimization
2012cited by this paper
On the difficulty of training recurrent neural networks
2012cited by this paper
Efficient BackProp
2012cited by this paper
On optimization methods for deep learning
2011cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011influential reference
Recurrent neural network based language model
2010cited by this paper
Deep learning via Hessian-free optimization
2010cited by this paper
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
2004cited by this paper
Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent
2002cited by this paper
Evolution and design of distributed learning rules
2000cited by this paper
Long Short-Term Memory
1997cited by this paper
Use of genetic programming for the search of a new learning rule for neural networks
1994cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
RPROP - A Fast Adaptive Learning Algorithm
1992cited by this paper
Steps Towards 'Self-Referential' Neural Learning: A Thought Experiment ; CU-CS-627-92
1992cited by this paper
Steps Towards`self-referential' Neural Learning: a Thought Experiment
1992cited by this paper
On the limited memory BFGS method for large scale optimization
1989cited by this paper
Learning curves for stochastic gradient descent in linear feedforward networks
year unknowncited by this paper

CITED BY

Artificial neural network simulation with numerical computation for targeted modulation of blood cells in hemodynamically stenosed anisotropic artery
2026cites this paper
Neural network integrated physics driven deep learning model for the thermal response analysis of the pyramidal fin with internal heat generation
2026cites this paper
Riemannian meta-optimization for transmit-receive joint design towards smeared spectrum jamming suppression
2026cites this paper
Learning to Discover Iterative Spectral Algorithms
2026cites this paper
An Efficient Neural Architecture Search Algorithm for AutoEncoder Optimization - A Systematic Literature Review
2025cites this paper
Shape factor effect on the thermal variation of a wavy fin wetted by ternary hybrid nanofluid using an extended physics-informed Laguerre neural network
2025cites this paper
Power of Generalized Smoothness in Stochastic Convex Optimization: First- and Zero-Order Algorithms
2025cites this paper
MoE-TransDLD: A Transformer-Driven Mixture of Experts for Cyber-Attack Detection in Power Systems
2025cites this paper
A Quasi-Newton Method for a Mean Variance Estimation Network
2025cites this paper
An Adaptive Penalized Weighted Least Squared Approach for Detecting and Mitigating Cyberattacks on Dynamic State Estimation
2025cites this paper
Prediction of Animal Vocal Emotions using Convolutional Neural Network
2025cites this paper
Evolving Deep Learning Optimizers
2025cites this paper
Efficient Eye-based Emotion Recognition via Neural Architecture Search of Time-to-First-Spike-Coded Spiking Neural Networks
2025cites this paper
Let the Optimizers Optimize Themselves
2025cites this paper
Differentiable Evolutionary Reinforcement Learning
2025cites this paper
PCAN: A Pandemic-Compatible Attentive Neural Network for Retail Sales Forecasting
2025cites this paper
Optimized Physics-Informed Neural Networks for Deciphering of External Source Pollutants in a Swirling Flow Induced by a Constant Torsional Motion
2025cites this paper
Efficient End-to-End Learning for Decision-Making: A Meta-Optimization Approach
2025cites this paper
Sequential Policy Gradient for Adaptive Hyperparameter Optimization
2025cites this paper
On the Duality between Gradient Transformations and Adapters
2025cites this paper
A Trainable Optimizer
2025cites this paper
Deep Reinforcement Learning design of safe, stable and robust control for sloshing-affected space launch vehicles
2025cites this paper
Learning to Reason from Feedback at Test-Time
2025cites this paper
New perspectives for the intelligent rolling stock classification in railways: an artificial neural networks-based approach
2024cites this paper
Ranking‐based architecture generation for surrogate‐assisted neural architecture search
2024cites this paper
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
2024cites this paper
Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution
2024influential citation
Smart urban windcatcher: Conception of an AI-empowered wind-channeling system for real-time enhancement of urban wind environment
2024cites this paper
A Novel Gradient Descent Optimizer based on Fractional Order Scheduler and its Application in Deep Neural Networks
2024cites this paper
Neural Loss Function Evolution for Large-Scale Image Classifier Convolutional Neural Networks
2024cites this paper
tnGPS: Discovering Unknown Tensor Network Structure Search Algorithms via Large Language Models (LLMs)
2024cites this paper
Dynamic Memory Based Adaptive Optimization
2024cites this paper
Decentralized Adaptive TD(λ) Learning With Linear Function Approximation: Nonasymptotic Analysis
2024cites this paper
Worst Perception Scenario Search via Recurrent Neural Controller and K-Reciprocal Re-Ranking
2024cites this paper
Guided Evolution with Binary Discriminators for ML Program Search
2024cites this paper
Implantable Adaptive Cells: differentiable architecture search to improve the performance of any trained U-shaped network
2024cites this paper
Closed-form Solutions: A New Perspective on Solving Differential Equations
2024cites this paper
Mitigating Propagation of Cyber-Attacks in Wide-Area Measurement Systems
2024cites this paper
Sentiment Analysis of Social Media Data on Ebola Outbreak Using Deep Learning Classifiers
2024cites this paper
Data-Efficient Brain connectivity Analysis via Dual Meta-learning for Brain Disorder Detection
2024cites this paper
Automatic time series forecasting model design based on pruning
2024cites this paper
Multi-Obstacle Path Planning using Deep Reinforcement Learning
2024cites this paper
Estimation method for karst carbon sinks on the basis of a concentration prediction model.
2024cites this paper
Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration
2024cites this paper
Adaptive Non-uniform Timestep Sampling for Accelerating Diffusion Model Training
2024cites this paper
DAR-LFC: A data-driven attack recovery mechanism for Load Frequency Control
2024cites this paper
Lightweight Design and Optimization methods for DCNNs: Progress and Futures
2024cites this paper
Sequential node search for faster neural architecture search
2024cites this paper
The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol
2024cites this paper
Narrowing the Focus: Learned Optimizers for Pretrained Models
2024cites this paper
AutoSR: Automatic Sequential Recommendation System Design
2024cites this paper
Analyzing the Characteristics of Gradient Descent and Non-Gradient Descent-Based Algorithms in Neural Network Learning
2024cites this paper
Balanced quantum neural architecture search
2024cites this paper
Deep Learning for Agile Malware Detection
2024cites this paper
Intelligent algorithm-based cellular robot self-reconfiguration step planning research
2023cites this paper
GraphNAS++: Distributed Architecture Search for Graph Neural Networks
2023influential citation
Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM
2023cites this paper
A Deep Reinforcement Learning Approach to Efficient Distributed Optimization
2023cites this paper
Machine Learning-Based Prediction Models for Control Traffic in SDN Systems
2023cites this paper
Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO's 4000 TPU Months
2023cites this paper
EQNAS: Evolutionary Quantum Neural Architecture Search for Image Classification
2023cites this paper
How predictors affect the RL-based search strategy in Neural Architecture Search?
2023cites this paper
An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting
2023cites this paper
Generalizable Learning Reconstruction for Accelerating MR Imaging via Federated Neural Architecture Search
2023cites this paper
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
2023cites this paper
ZeroLess-DARTS: Improved Differentiable Architecture Search with Refined Search Operation and Early Stopping
2023cites this paper
FedAutoMRI: Federated Neural Architecture Search for MR Image Reconstruction
2023cites this paper
Fisher-Legendre (FishLeg) optimization of deep neural networks
2023cites this paper
Learned Learning Rate Schedules for Deep Neural Network Training Using Reinforcement Learning
2023cites this paper
Identifying effective trajectory predictions under the guidance of trajectory anomaly detection model
2023cites this paper
Meta-SpikePropamine: learning to learn with synaptic plasticity in spiking neural networks
2023cites this paper
DAC-MR: Data Augmentation Consistency Based Meta-Regularization for Meta-Learning
2023cites this paper
Eight years of AutoML: categorisation, review and trends
2023cites this paper
Learning to Optimize Quantum Neural Networks Without Gradients
2023cites this paper
The reusability prior: comparing deep learning models without training
2023cites this paper
Configure Your Federation: Hierarchical Attention-enhanced Meta-Learning Network for Personalized Federated Learning
2023cites this paper
An Improved Tuna-YOLO Model Based on YOLO v3 for Real-Time Tuna Detection Considering Lightweight Deployment
2023influential citation
Enhancing Machine Learning Model Performance with Hyper Parameter Optimization: A Comparative Study
2023cites this paper
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
2023cites this paper
Iterative Image Reconstruction Algorithm with Parameter Estimation by Neural Network for Computed Tomography
2023cites this paper
propnet: Propagating 2D Annotation to 3D Segmentation for Gastric Tumors on CT Scans
2023cites this paper
Fine-tuned CNN-based Sri Lankan Currency Note Detection Method for the Visually Impaired People Using Smartphones
2023cites this paper
A feature-wise attention module based on the difference with surrounding features for convolutional neural networks
2023cites this paper
Control Channel Isolation in SDN Virtualization: A Machine Learning Approach
2023cites this paper
HGNAS++: Efficient Architecture Search for Heterogeneous Graph Neural Networks
2023cites this paper
Identifying optimal architectures of physics-informed neural networks by evolutionary strategy
2023cites this paper
Symbolic Discovery of Optimization Algorithms
2023influential citation
Federated Automatic Differentiation
2023cites this paper
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
2023cites this paper
An empirical study on the structure evolution of deep learning models: taking SAR image processing a case study
2023cites this paper
AutoMSNet: Multi-Source Spatio-Temporal Network via Automatic Neural Architecture Search for Traffic Flow Prediction
2023cites this paper
Interpretable Imitation Learning with Symbolic Rewards
2023influential citation
Learning to Optimize in Model Predictive Control
2022cites this paper
AGNAS: Attention-Guided Micro and Macro-Architecture Search
2022cites this paper
Learning to learn online with neuromodulated synaptic plasticity in spiking neural networks
2022cites this paper
NAS-CTR: Efficient Neural Architecture Search for Click-Through Rate Prediction
2022cites this paper
EASNet: Searching Elastic and Accurate Network Architecture for Stereo Matching
2022cites this paper
Finite Expression Method for Solving High-Dimensional Partial Differential Equations
2022cites this paper
ATPFL: Automatic Trajectory Prediction Model Design under Federated Learning Framework
2022influential citation
Deep Meta-learning in Recommendation Systems: A Survey
2022cites this paper