Generalization bounds for averaged classifiers

Published 2004 in Annals of Statistics

ABSTRACT

We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction of an algorithm that predicts with the best hypothesis. By allowing the algorithm to abstain from predicting on some examples, we show that the predictions it makes when it does not abstain are very reliable. Finally, we show that the probability that the algorithm abstains is comparable to the generalization error of the best hypothesis in the class.

PUBLICATION RECORD

Publication year
2004
Venue
Annals of Statistics
Publication date
2004-08-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1214/009053604000000058 arXiv math/0410092
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Statistical Learning Theory
2021cited by this paper
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality
2004cited by this paper
Predicting a binary sequence almost as well as the optimal biased coin
2003cited by this paper
Stability and Generalization
2002cited by this paper
Reducing multiclass to binary: a unifying approach for margin classifiers
2001cited by this paper
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
2000cited by this paper
The Alternating Decision Tree Learning Algorithm
1999cited by this paper
Statistical learning theory
1998influential reference
Structural Risk Minimization Over Data-Dependent Hierarchies
1998cited by this paper
Some PAC-Bayesian Theorems
1998cited by this paper
How to use expert advice
1997cited by this paper
A PAC analysis of a Bayesian estimator
1997cited by this paper
A decision-theoretic generalization of on-line learning and an application to boosting
1997cited by this paper
Boosting the margin: A new explanation for the effectiveness of voting methods
1997influential reference
Bagging Predictors
1996cited by this paper
Heuristics of instability and stabilization in model selection
1996cited by this paper
The context-tree weighting method: basic properties
1995cited by this paper
Predicting Nearly As Well As the Best Pruning of a Decision Tree
1995cited by this paper
Probability Inequalities for Sums of Bounded Random Variables
1994cited by this paper
How to use expert advice
1993cited by this paper
Surveys in Combinatorics, 1989: On the method of bounded differences
1989cited by this paper
The weighted majority algorithm
1989cited by this paper
Occam's razor
1980cited by this paper
Probability inequalities for sum of bounded random variables
1963cited by this paper

CITED BY

Knowing When to Answer: Adaptive Confidence Refinement for Reliable Audio-Visual Question Answering
2026cites this paper
Enhancing PAC Learning of Half spaces Through Robust Optimization Techniques
2024cites this paper
A Generalization Bound of Deep Neural Networks for Dependent Data
2023cites this paper
Bagging Provides Assumption-free Stability
2023cites this paper
Theory and algorithms for learning with rejection in binary classification
2023cites this paper
Concentration inequalities for non-causal random fields
2022cites this paper
Classification Under Partial Reject Options
2022cites this paper
Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports
2022cites this paper
PAC-Bayesian Treatment Allocation Under Budget Constraints
2022influential citation
Identifying regions of trusted predictions
2021cites this paper
Boosting with Multiple Sources
2021cites this paper
Classification Confidence Scores with Point-wise Guarantees
2021cites this paper
Primal-dual for classification with rejection (PD-CR): a novel method for classification and feature selection—an application in metabolomics studies
2021cites this paper
Exponential Savings in Agnostic Active Learning Through Abstention
2021cites this paper
“In-Network Ensemble”: Deep Ensemble Learning with Diversified Knowledge Distillation
2021cites this paper
Posterior concentration and fast convergence rates for generalized Bayesian learning
2020cites this paper
Think Locally, Act Globally: Federated Learning with Local and Global Representations
2020cites this paper
FedBoost: Communication-Efficient Algorithms for Federated Learning
2020cites this paper
Deviation bound for non-causal machine learning
2020cites this paper
Relaxing the i.i.d. assumption: Adaptively minimax optimal regret via root-entropic regularization
2020cites this paper
Deep Gamblers: Learning to Abstain with Portfolio Theory
2019cites this paper
Deviation inequalities for separately Lipschitz functionals of composition of random functions
2019cites this paper
Identification of taxon through classification with partial reject options
2019cites this paper
Fast classification rates without standard margin assumptions
2019cites this paper
A novel multi-variate analysis method for searching particles in high energy physics
2017cites this paper
Ensemble multi-label learning in supervised and semi-supervised settings. (Apprentissage multi-label ensembliste dans le context supervisé et semi-supervisé)
2017cites this paper
ToPs: Ensemble Learning With Trees of Predictors
2017cites this paper
Selective Classification for Deep Neural Networks
2017cites this paper
Learning with Rejection
2016cites this paper
Deep Extreme Feature Extraction: New MVA Method for Searching Particles in High Energy Physics
2016cites this paper
The Extended Littlestone's Dimension for Learning with Mistakes and Abstentions
2016cites this paper
Learning to Abstain from Binary Prediction
2016cites this paper
The Utility of Abstaining in Binary Classification
2015cites this paper
Multilabel Classification through Structured Output Learning - Methods and Applications
2015cites this paper
Confidence Sets for Classification
2015cites this paper
Developments in nonparametric regression methods with application to Raman spectroscopy analysis
2015cites this paper
Uniform Convergence of Random Forests via Adaptive Concentration
2015cites this paper
AF : Small : Collaborative Research : On-Line Learning Algorithms for Path Experts with Non-Additive Losses 1
2015cites this paper
Agnostic Pointwise-Competitive Selective Classification
2015influential citation
PAC-Bayes with Minimax for Confidence-Rated Transduction
2015cites this paper
Deep Boosting
2014cites this paper
Learning Ensembles of Structured Prediction Rules
2014cites this paper
Beyond Disagreement-Based Agnostic Active Learning
2014cites this paper
Combined Decision Making with Multiple Agents
2014cites this paper
EFFECTIVE SAMPLING SCHEMES FOR BEHAVIOR DISCRIMINATION IN NONLINEAR SYSTEMS
2014cites this paper
Distribution-independent Reliable Learning
2014cites this paper
Ensemble Methods for Structured Prediction
2014cites this paper
On-line Learning Approach to Ensemble Methods for Structured Prediction
2014cites this paper
Probabilistic uncertainty quantification and experiment design for nonlinear models: Applications in systems biology
2014cites this paper
Boosting Ensembles of Structured Prediction Rules ∗
2014cites this paper
Predicting network of drug-enzyme interaction based on machine learning method.
2014cites this paper
Practical Ensemble Classification Error Bounds for Different Operating Points
2013cites this paper
Prediction of Substrate-Enzyme-Product Interaction Based on Molecular Descriptors and Physicochemical Properties
2013cites this paper
Improved Algorithms for Confidence-Rated Prediction with Error Guarantees
2013cites this paper
Generalization and Robustness of Batched Weighted Average Algorithm with V-Geometrically Ergodic Markov Data
2013influential citation
Appendix A: Probability Theory
2013cites this paper
Theoretical foundations of selective prediction
2013cites this paper
Active Learning via Perfect Selective Classification
2012cites this paper
Limit theorems and inequalities via martingale methods
2012cites this paper
Model calibration and automated trading agent for Euro futures
2012cites this paper
Selective Prediction of Financial Trends with Hidden Markov Models
2011cites this paper
Stochastic boosting algorithms
2011cites this paper
Agnostic Selective Classification
2011cites this paper
Building gene expression profile classifiers with a simple and efficient rejection option in R
2011cites this paper
Automated trading with boosting and expert weighting
2010cites this paper
Classification with guaranteed probability of error
2010cites this paper
On the Foundations of Noise-free Selective Classification
2010cites this paper
Prediction of small molecules' metabolic pathways based on functional group composition.
2009cites this paper
A unifying framework for computational reinforcement learning theory
2009cites this paper
Prediction of interaction between small molecule and enzyme using AdaBoost
2009cites this paper
ENSEMBLING REGRESSION MODELS TO IMPROVE THEIR PREDICTIVITY: A CASE STUDY IN QSAR (QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS) WITH COMPUTATIONAL CHEMOMETRICS
2009cites this paper
Machine Learning Techniques—Reductions Between Prediction Quality Metrics
2008cites this paper
What Can We Learn Privately?
2008cites this paper
Knows what it knows: a framework for self-aware learning
2008cites this paper
Reduced bootstrap aggregating of learning algorithms
2008cites this paper
A statistical approach to rule learning
2008cites this paper
Chapter 5 – Hybrid systems
2007cites this paper
Selection of Binary Variables and Classification by Boosting
2007cites this paper
Classification with reject option
2006cites this paper
A statistical approach to rule learning
2006cites this paper
Building reliable metaclassifiers for text learning
2006cites this paper
Statistische und Probabilistische Methoden der Modellwahl
2005cites this paper
Machine Learning Based on Attribute Interactions
2005cites this paper
Generic Object Recognition with Strangeness and Boosting Ph
2005cites this paper
Incremental learning of ensemble classifiers on ECG data
2005cites this paper
Theory of classification : a survey of some recent advances
2005cites this paper
Universal Well-Calibrated Algorithm for On-Line Classification
2004influential citation
A Universal Well-Calibrated Algorithm for On-line Classification
2004influential citation
Selective Prediction with Hidden Markov Models Research Thesis In Partial Fulfillment of The Requirements for the Degree of Master of Science in Computer Science
year unknowncites this paper
POLITECNICO DI TORINO Repository ISTITUZIONALE Building Gene Expression Profile Classifiers with a Simple and Efficient Rejection Option in R /
year unknowncites this paper