Stochastic Hyperparameter Optimization through Hypernetworks

Published 2018 in arXiv.org

ABSTRACT

Machine learning models are usually tuned by nesting optimization of model weights inside the optimization of hyperparameters. We give a method to collapse this nested optimization into joint stochastic optimization of both weights and hyperparameters. Our method trains a neural network to output approximately optimal weights as a function of hyperparameters. We show that our method converges to locally optimal weights and hyperparameters for sufficiently large hypernets. We compare this method to standard hyperparameter optimization strategies and demonstrate its effectiveness for tuning thousands of hyperparameters.

PUBLICATION RECORD

Publication year
2018
Venue
arXiv.org
Publication date
2018-02-15
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1802.09419
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

GENERATIVE ADVERSARIAL NETS
2018cited by this paper
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
2017cited by this paper
SMASH: One-Shot Model Architecture Search through HyperNetworks
2017cited by this paper
Gradient-based Regularization Parameter Selection for Problems With Nonsmooth Penalty Functions
2017cited by this paper
Forward and Reverse Gradient-Based Hyperparameter Optimization
2017cited by this paper
Hyperparameter optimization with approximate gradient
2016cited by this paper
DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks
2016cited by this paper
Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits
2016cited by this paper
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
2016cited by this paper
Distilling Reverse-Mode Automatic Differentiation (DrMAD) for Optimizing Hyperparameters of Deep Neural Networks
2016cited by this paper
HyperNetworks
2016cited by this paper
Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters
2015cited by this paper
Gradient-based Hyperparameter Optimization through Reversible Learning
2015influential reference
Weight Uncertainty in Neural Networks
2015cited by this paper
Adam: A Method for Stochastic Optimization
2014influential reference
Freeze-Thaw Bayesian Optimization
2014cited by this paper
Practical Bayesian Optimization of Machine Learning Algorithms
2012cited by this paper
Random Search for Hyper-Parameter Optimization
2012cited by this paper
Generic Methods for Optimization-Based Modeling
2012cited by this paper
Stackelberg games for adversarial prediction problems
2011cited by this paper
Rectified Linear Units Improve Restricted Boltzmann Machines
2010cited by this paper
Practical bayesian optimization
2008cited by this paper
The Theory of Learning in Games
1998cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
Approximation capabilities of multilayer feedforward networks
1991cited by this paper

CITED BY

Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts
2026cites this paper
Motion Attribution for Video Generation
2026cites this paper
Optimizing Training Hyperparameters for Multilayer Perceptrons in Deep Learning
2025cites this paper
Pareto Set Learning for Multi-Objective Reinforcement Learning
2025cites this paper
Individualised Treatment Effects Estimation with Composite Treatments and Composite Outcomes
2025cites this paper
HyperAdaLoRA: Accelerating LoRA Rank Allocation During Training via Hypernetworks without Sacrificing Performance
2025cites this paper
Uncertainty quantification of neural network models of evolving processes via Langevin sampling
2025cites this paper
Implicit Neural Representation For Accurate CFD Flow Field Prediction
2024cites this paper
How Far Can a 1-Pixel Camera Go? Solving Vision Tasks Using Photoreceptors and Computationally Designed Visual Morphology
2024cites this paper
TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining
2024cites this paper
Scenario-Aware Learning Approaches to Adaptive Channel Estimation
2024cites this paper
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
2024cites this paper
Solving Vision Tasks with Simple Photoreceptors Instead of Cameras
2024cites this paper
Improving Hyperparameter Optimization with Checkpointed Model Weights
2024cites this paper
Fine-grained Analysis of Stability and Generalization for Stochastic Bilevel Optimization
2024cites this paper
Convergence of Bayesian Bilevel Optimization
2024cites this paper
JacNet: Learning Functions with Structured Jacobians
2024cites this paper
Achieving Hierarchy-Free Approximation for Bilevel Programs With Equilibrium Constraints
2023cites this paper
Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation
2023cites this paper
Scale-Space Hypernetworks for Efficient Biomedical Imaging
2023cites this paper
Magnitude Invariant Parametrizations Improve Hypernetwork Learning
2023cites this paper
Amortized Inference for Gaussian Process Hyperparameters of Structured Kernels
2023cites this paper
ATT3D: Amortized Text-to-3D Object Synthesis
2023cites this paper
A brief review of hypernetworks in deep learning
2023cites this paper
Dynamic Inter-treatment Information Sharing for Heterogeneous Treatment Effects Estimation
2023cites this paper
Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single
2023cites this paper
Sensor drift compensation for gas mixture classification in batch experiments
2023cites this paper
Non-Proportional Parametrizations for Stable Hypernetwork Learning
2023cites this paper
Amortized Learning of Dynamic Feature Scaling for Image Segmentation
2023cites this paper
Deep Learning Model Selection With Parametric Complexity Control
2023cites this paper
Implicit Bilevel Optimization: Differentiating through Bilevel Optimization Programming
2023cites this paper
DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images
2023cites this paper
Dual Meta-Learning with Longitudinally Generalized Regularization for One-Shot Brain Tissue Segmentation Across the Human Lifespan
2023cites this paper
ApproxED: Approximate Exploitability Descent via Learned Best Responses
2023cites this paper
Dynamic Inter-treatment Information Sharing for Individualized Treatment Effects Estimation
2023cites this paper
Online Algorithmic Recourse by Collective Action
2023cites this paper
Using Large Language Models for Hyperparameter Optimization
2023cites this paper
Learning physics-inspired regularization for medical image registration with hypernetworks
2023cites this paper
HPFL: hyper-network guided personalized federated learning for multi-center tuberculosis chest x-ray diagnosis
2023cites this paper
A gradient-based bilevel optimization approach for tuning regularization hyperparameters
2023cites this paper
A Linear Programming Enhanced Genetic Algorithm for Hyperparameter Tuning in Machine Learning
2023cites this paper
CVAE-H: Conditionalizing Variational Autoencoders via Hypernetworks and Trajectory Forecasting for Autonomous Driving
2022cites this paper
Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
2022influential citation
CPMLHO: Hyperparameter Tuning via Cutting Plane and Mixed-Level Optimization
2022cites this paper
Gradient-based Bi-level Optimization for Deep Learning: A Survey
2022cites this paper
On Implicit Bias in Overparameterized Bilevel Optimization
2022cites this paper
Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems
2022cites this paper
Communication-Efficient Robust Federated Learning with Noisy Labels
2022cites this paper
Learning the Space of Deep Models
2022cites this paper
Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How
2022cites this paper
A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method
2022cites this paper
Task Selection for AutoML System Evaluation
2022cites this paper
OCD: Learning to Overfit with Conditional Diffusion Models
2022cites this paper
On Stability and Generalization of Bilevel Optimization Problem
2022cites this paper
Hyperparameter Optimization Using Iterative Decision Tree (IDT)
2022cites this paper
Beyond Backpropagation: Bilevel Optimization Through Implicit Differentiation and Equilibrium Propagation
2022cites this paper
Calibrate Automated Graph Neural Network via Hyperparameter Uncertainty
2022cites this paper
HMOE: Hypernetwork-based Mixture of Experts for Domain Generalization
2022cites this paper
Hyper-Learning for Gradient-Based Batch Size Adaptation
2022cites this paper
Layer-wised Model Aggregation for Personalized Federated Learning
2022cites this paper
Beyond backpropagation: implicit gradients for bilevel optimization
2022cites this paper
Local Stochastic Bilevel Optimization with Momentum-Based Variance Reduction
2022cites this paper
Unified Implicit Neural Stylization
2022cites this paper
Learning the Effect of Registration Hyperparameters with HyperMorph
2022cites this paper
Learning adaptive hyper-guidance via proxy-based bilevel optimization for image enhancement
2022cites this paper
Computing Multiple Image Reconstructions with a Single Hypernetwork
2022cites this paper
Comparative Research of Hyper-Parameters Mathematical Optimization Algorithms for Automatic Machine Learning in New Generation Mobile Network
2022cites this paper
Lyapunov Exponents for Diversity in Differentiable Games
2021cites this paper
Cost-Efficient Online Hyperparameter Optimization
2021influential citation
HyperMorph: Amortized Hyperparameter Learning for Image Registration
2021cites this paper
Regularization-Agnostic Compressed Sensing MRI Reconstruction with Hypernetworks
2021cites this paper
Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond
2021cites this paper
Online hyperparameter optimization by real-time recurrent learning
2021influential citation
A General Descent Aggregation Framework for Gradient-Based Bi-Level Optimization
2021cites this paper
Complex Momentum for Learning in Games
2021cites this paper
Personalized Federated Learning using Hypernetworks
2021cites this paper
BaMBNet: A Blur-aware Multi-branch Network for Defocus Deblurring
2021cites this paper
Complex Momentum for Optimization in Games
2021cites this paper
Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures
2021cites this paper
Stability and Generalization of Bilevel Programming in Hyperparameter Optimization
2021cites this paper
Towards Adversarial Robustness via Transductive Learning
2021cites this paper
EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization
2021cites this paper
Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
2021cites this paper
Using Bifurcations for Diversity in Differentiable Games
2021cites this paper
Implicit Regularization in Overparameterized Bilevel Optimization
2021cites this paper
HyperPlan: A Framework for Motion Planning Algorithm Selection and Parameter Optimization
2021cites this paper
Towards a continuous forecasting mechanism of parking occupancy in urban environments
2021cites this paper
An automatic hyperparameter optimization DNN model for precipitation prediction
2021cites this paper
Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms
2021cites this paper
Meta Internal Learning
2021cites this paper
Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation
2021cites this paper
Towards Evaluating the Robustness of Neural Networks Learned by Transduction
2021cites this paper
Meta-Learning to Improve Pre-Training
2021cites this paper
A Fully Single Loop Algorithm for Bilevel Optimization without Hessian Inverse
2021influential citation
Data-driven integration of regularized mean-variance portfolios
2021cites this paper
Efficient differentiable quadratic programming layers: an ADMM approach
2021cites this paper
Object Pursuit: Building a Space of Objects via Discriminative Weight Generation
2021cites this paper
BiGrad: Differentiating through Bilevel Optimization Programming
2021cites this paper
Learning the Pareto Front with Hypernetworks
2021cites this paper
Data-driven integration of norm-penalized mean-variance portfolios
2021cites this paper