The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.
Practical Bayesian Optimization of Machine Learning Algorithms
Jasper Snoek,H. Larochelle,Ryan P. Adams
Published 2012 in Neural Information Processing Systems
ABSTRACT
PUBLICATION RECORD
- Publication year
2012
- Venue
Neural Information Processing Systems
- Publication date
2012-06-13
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
CONCEPTS
- bayesian optimization
A framework for optimizing the performance of learning algorithms by modeling generalization performance.
imjlk (vdp8mqzes2) extractionAnonymous (12632b8b5f) review - convolutional neural networks
A class of deep neural networks used as a test case for evaluating optimization algorithms.
Aliases: CNNs
imjlk (vdp8mqzes2) extractionAnonymous (12632b8b5f) review - gaussian process
A stochastic process used to model a learning algorithm's generalization performance as a sample.
Aliases: GP
imjlk (vdp8mqzes2) extractionAnonymous (12632b8b5f) review - kernel
A function defining the covariance structure of the Gaussian process model used in optimization.
imjlk (vdp8mqzes2) extractionAnonymous (12632b8b5f) review - latent dirichlet allocation
A generative statistical model used as a test case for evaluating optimization algorithms.
Aliases: LDA
imjlk (vdp8mqzes2) extractionAnonymous (12632b8b5f) review - parallel experimentation
A method of leveraging multiple cores to run multiple learning algorithm experiments simultaneously.
imjlk (vdp8mqzes2) extractionAnonymous (12632b8b5f) review - structured svms
A support vector machine variant used as a test case for evaluating optimization algorithms.
Aliases: structuredSupport Vector Machines
imjlk (vdp8mqzes2) extractionAnonymous (12632b8b5f) review - variable cost
The differing durations required to complete various learning algorithm experiments during tuning.
imjlk (vdp8mqzes2) extractionAnonymous (12632b8b5f) review
REFERENCES
Showing 1-22 of 22 references · Page 1 of 1