ADADELTA: An Adaptive Learning Rate Method

Published 2012 in arXiv.org

ABSTRACT

We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.

PUBLICATION RECORD

Publication year
2012
Venue
arXiv.org
Publication date
2012-12-22
Fields of study
Computer Science
Identifiers
arXiv 1212.5701
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Large Scale Distributed Deep Networks
2012cited by this paper
No more pesky learning rates
2012influential reference
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
2012cited by this paper
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
2012cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Improving the convergence of back-propagation learning with second-order methods
1989influential reference
Learning representations by back-propagating errors
1986cited by this paper
A Stochastic Approximation Method
1951cited by this paper

CITED BY

Objective-Function Free Multi-Objective Optimization: Rate of Convergence and Performance of an Adagrad-like algorithm
2026cites this paper
Impact of Optimizers on Transformer Models for Classification of Olive Fruit Disease
2026cites this paper
A survey on abnormal behavior detection based intelligence information video surveillance system using optimized machine learning methods
2026cites this paper
A Transfer Learning CNN approach for automated Plant Growth Temporal Labelling: Addressing Class-Based Variability Paradox and introducing novel metrics
2026cites this paper
Optimization Algorithms With Superlinear Convergence Rate
2026cites this paper
Stability and Generalization of Nonconvex Optimization with Heavy-Tailed Noise
2026cites this paper
A Deep Multi-Modal Method for Patient Wound Healing Assessment
2026cites this paper
OMCR: An Online Multivariate Forecaster for Cloud Resource Management
2026cites this paper
Modeling gamma radiation intensity in monazite rich coastal environments using neural network.
2026cites this paper
DSCA-HLAII: A dual-stream cross-attention model for predicting peptide–HLA class II interaction and presentation
2026cites this paper
A self-explanatory deep learning-based soft sensor induced by a physical diffusion process and its application in an industrial process
2026cites this paper
On the Convergence of HalpernSGD
2026cites this paper
Gradient Regularized Natural Gradients
2026cites this paper
Detecting network intrusions in cyber-physical systems using deep autoencoder-based dimensionality reduction approach anddeep neural networks
2025cites this paper
AIoT Fault Detection for Firefighting Pump Maintenance Services Based Metaheuristics and Combined Deep Learning Methodologies
2025cites this paper
Multi-domain transfer generation of cavity defect data in asphalt pavements using 3D GPR and 3D forward modeling
2025cites this paper
Enhanced Cybersecurity Entity Recognition Using DeBERTa, Transformer-CNN Hybrids, and BiLSTM-Softmax
2025cites this paper
Evaluating EfficientNet Architectures for Pathology Detection in Endoscopic Gastrointestinal Tract Images
2025cites this paper
An Approach to Finding a Robust Deep Learning Model
2025cites this paper
Infrared Monocular Depth Estimation Based on Radiation Field Gradient Guidance and Semantic Priors in HSV Space
2025influential citation
An incorporation of metaheuristic algorithm and two-stage deep learnings for fault classified framework for diesel generator maintenance
2025cites this paper
Disentangled Representation Learning for Chinese Handwriting Recognition
2025influential citation
A Langevin sampling algorithm inspired by the Adam optimizer
2025cites this paper
Block Circulant Adapter for Large Language Models
2025influential citation
FAD: Frequency Adaptation and Diversion for Cross-domain Few-shot Learning
2025cites this paper
Learning by solving differential equations
2025cites this paper
Input normalized stochastic gradient descent for language tasks
2025cites this paper
Impact of Tuning Parameters in Deep Convolutional Neural Network Using a Crack Image Dataset
2025cites this paper
Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks
2025cites this paper
Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification
2025cites this paper
Optimizing Federated Learning: Addressing Key Challenges in Real-World Applications
2025cites this paper
Compare different SG-Schemes based on large least square problems
2025cites this paper
Attention-Mechanism-Based Neural Latent-Factorization-of-Tensors Model
2025cites this paper
Simultaneous Speech and Eating Behavior Recognition Using Data Augmentation and Two-Stage Fine-Tuning
2025influential citation
Adversarial Subspace Generation for Outlier Detection in High-Dimensional Data
2025cites this paper
prunAdag: an adaptive pruning-aware gradient method
2025cites this paper
Construction and optimization of asphalt pavement texture characterization model based on binocular vision and deep learning
2025cites this paper
COVID-19 recognition from chest X-ray images by combining deep learning with transfer learning
2025cites this paper
FLAD: Byzantine-Robust Federated Learning Based on Gradient Feature Anomaly Detection
2025cites this paper
Dynamic learning rate adjustment using volatility in LSTM models for KLCI forecasting
2025cites this paper
ADNeuroNet: a neuroevolution-based neural network algorithm for the diagnosis of neurodegenerative diseases
2025cites this paper
Vector Copula Variational Inference and Dependent Block Posterior Approximations
2025cites this paper
Weighted Fisher divergence for high-dimensional Gaussian variational inference
2025cites this paper
Adaptive moment estimation optimization algorithm using projection gradient for deep learning
2025cites this paper
Variational Bayes inference for simultaneous autoregressive models with missing data
2025cites this paper
Onto Proximality in Non Negative Matrix Factorization for Recommender Systems
2025cites this paper
A Hybrid Framework Combining Rule-Based and Deep Learning Approaches for Data-Driven Verdict Recommendations
2025cites this paper
A low-dimensional recursive deep learning model for El Niño-Southern Oscillation simulation
2025cites this paper
Deep Learning Paradigms for Multi-Dimensional Big Data Analytics: A Critical Assessment
2025cites this paper
Attention-Based Deep Learning for Hybrid Beamforming in OFDM Systems With Phase Noise
2025cites this paper
Typical machine learning datasets as low-depth quantum circuits
2025cites this paper
Dynamic Domain Information Modulation Algorithm for Multi-domain Sentiment Analysis
2025cites this paper
Dynamic and frequency responses of the FG nanopipe using deep neural network and nonlocal strain/stress gradient theory
2025cites this paper
HessFormer: Hessians at Foundation Scale
2025cites this paper
Optimizing Training Hyperparameters for Multilayer Perceptrons in Deep Learning
2025cites this paper
Clustering single-cell data based on a deep embedded subspace model
2025influential citation
Stepsize anything: A unified learning rate schedule for budgeted-iteration training
2025cites this paper
Development and evaluation of a deep learning system for screening real-world multiple abnormal findings based on ultra-widefield fundus images
2025cites this paper
A mixed attention-CTC for natural scene text recognition—experienced in Farsi with new presented natural and synthetic dataset
2025influential citation
Rapid training of Hamiltonian graph networks without gradient descent
2025cites this paper
Online Learning-guided Learning Rate Adaptation via Gradient Alignment
2025cites this paper
FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed
2025cites this paper
JotlasNet: Joint Tensor Low-Rank and Attention-based Sparse Unrolling Network for Accelerating Dynamic MRI
2025influential citation
scDMSC: Deep Multi-View Subspace Clustering for Single-Cell Multi-Omics Data
2025influential citation
Cloud-based AIoT intelligent infrastructure for firefighting pump fault diagnosis-based hybrid CNN-GRU deep learning technique
2025cites this paper
Unveiling the role of chromosome structure morphology on gene function through chromosome conformation analysis
2025cites this paper
Novel deep learning model with fusion of multiple pipelines for stock market prediction
2025cites this paper
Reinforcement Learning for Process Control: Review and Benchmark Problems
2025cites this paper
Deep inference of simulated strong lenses in ground-based surveys
2025cites this paper
A Hessian-informed hyperparameter optimization for differential learning rate
2025cites this paper
Enhancing CNNs With Detail Feature Module for High‐Pixel Image Classification
2025cites this paper
Sine and cosine based learning rate for gradient descent method
2025cites this paper
Local Uncertainty Energy Transfer for Active Domain Adaptation
2025influential citation
Empirical modeling and hybrid machine learning framework for nucleate pool boiling on microchannel structured surfaces
2025cites this paper
A Zeroth-Order Adaptive Frank–Wolfe Algorithm for Resource Allocation in Internet of Things: Convergence Analysis
2025cites this paper
First-passage approach to optimizing perturbations for improved training of machine learning models
2025cites this paper
A Parameter-Free and Near-Optimal Zeroth-Order Algorithm for Stochastic Convex Optimization
2025cites this paper
SemiHMER: Semi-supervised Handwritten Mathematical Expression Recognition using pseudo-labels
2025cites this paper
Preconditioned inexact stochastic ADMM for deep models
2025cites this paper
AMC: Adaptive Learning Rate Adjustment Based on Model Complexity
2025cites this paper
Hybrid physics‐informed neural network with parametric identification for modeling bridge temperature distribution
2025cites this paper
Data-Driven Pseudo-spectral Full Waveform Inversion via Deep Neural Networks
2025cites this paper
An Analysis of First- and Quasi-Second-Order Optimization Algorithms in Variational Monte Carlo
2025cites this paper
Innovative approaches in image processing: enhancing feature extraction and recognition capabilities
2025cites this paper
Performance Analysis of Momentum of Adam Optimizer on YOLO-V8 Using Traffic Object Dataset
2025cites this paper
Same accuracy, twice as fast: continuous training surpasses retraining from scratch
2025cites this paper
An Improved Hybrid_Stacked Deep Neural Network (HDNN) Model for Enhanced Weather Forecasting
2025cites this paper
IDInit: A Universal and Stable Initialization Method for Neural Network Training
2025influential citation
Fed-PID: An Adaptive Learning Rate Scheduler for Federated Learning With PID Controllers
2025cites this paper
Real-time sign language recognition using parallel multi-scale CNN to enhance inclusive education for deaf and hard of hearing students
2025cites this paper
Learning Permutations in Monarch Factorization
2025cites this paper
Cross-Domain Few-Shot Open-Set Keyword Spotting Using Keyword Adaptation and Prototype Reprojection
2025cites this paper
Application and Analysis of Artificial Intelligence and Big Data Technology in Grassroots Governance of Smart Cities
2025cites this paper
DRSC: Dual-Reweighted Siamese Contrastive Learning Network for Cross-Domain Rotating Machinery Fault Diagnosis With Multisource Domain Imbalanced Data
2025cites this paper
Reference Point-Dependent Reinforcement Learning in Humans and Rats
2025cites this paper
Distinct tumor-immune ecologies in NSCLC patients predict progression and define a clinical biomarker of therapy response
2025cites this paper
Lightweight Chest X-Ray Classification for Pneumonia and Tuberculosis Using MobileNet with Explainable AI
2025cites this paper
Taking AI-Based Side-Channel Attacks to a New Dimension
2025cites this paper
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
2025cites this paper
Toward AI-Enabled Approach for Urdu Text Recognition: A Legacy for Urdu Image Apprehension
2025cites this paper