Estimating the algorithmic variance of randomized ensembles via the bootstrap

Published 2019 in Annals of Statistics

ABSTRACT

Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is "large enough" --- so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of "algorithmic variance" (i.e. the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests, and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable $Err_t$ denote the prediction error of a randomized ensemble of size $t$. Working under a "first-order model" for randomized ensembles, we prove that the centered law of $Err_t$ can be consistently approximated via the proposed method as $t\to\infty$. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of $Err_t$ are negligible.

PUBLICATION RECORD

Publication year
2019
Venue
Annals of Statistics
Publication date
2019-04-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1214/18-AOS1707 arXiv 1907.08742
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Foundations of Modern Probability
2021cited by this paper
Asymptotic statistics
2018cited by this paper
Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap
2018cited by this paper
A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication
2017cited by this paper
Introduction to minimal surface theory
2016cited by this paper
Random Forests and Kernel Methods
2015cited by this paper
Random‐projection ensemble classification
2015cited by this paper
Analysis of purely random forests bias
2014cited by this paper
On the asymptotics of random forests
2014cited by this paper
Estimation and Accuracy After Model Selection
2014cited by this paper
Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests
2014cited by this paper
Consistency of Random Forests
2014cited by this paper
Confidence intervals for random forests
2014cited by this paper
How large should ensembles of classifiers be?
2013cited by this paper
Confidence intervals for random forests: the jackknife and the infinitesimal jackknife
2013cited by this paper
A Sharp Bound on the Computation-Accuracy Tradeoff for Majority Voting Ensembles
2013cited by this paper
Boosting: Foundations and Algorithms
2013cited by this paper
Condition - The Geometry of Numerical Algorithms
2013cited by this paper
How Many Trees in a Random Forest?
2012cited by this paper
Variance reduction in purely random forests
2012cited by this paper
Sample size selection in optimization methods for machine learning
2012cited by this paper
Boosting: Foundations and Algorithms
2012influential reference
Classification and regression trees
2012cited by this paper
Analysis of a Random Forests Model
2010cited by this paper
Standard errors for bagged and random forest estimators
2009cited by this paper
Consistency of Random Forests and Other Averaging Classifiers
2008cited by this paper
Classification and Regression by randomForest
2007cited by this paper
OBSERVATIONS ON BAGGING
2006cited by this paper
Random Forests and Adaptive Nearest Neighbors
2006influential reference
Properties of bagged nearest neighbour classifiers
2005cited by this paper
Practical extrapolation methods: theory and applications
2005cited by this paper
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
2004cited by this paper
The Elements of Statistical Learning
2003cited by this paper
Practical Extrapolation Methods: Theory and Applications
2003cited by this paper
Transactions on pattern analysis and machine
2002cited by this paper
Random Forests
2001cited by this paper
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
2001cited by this paper
Limiting the Number of Trees in Random Forests
2001cited by this paper
Convergence rates of the Voting Gibbs classifier, with application to Bayesian feature selection
2001cited by this paper
Analyzing Bagging
2001cited by this paper
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization
2000cited by this paper
Smoothing Effects of Bagging
2000cited by this paper
The Random Subspace Method for Constructing Decision Forests
1998cited by this paper
Application of majority voting to pattern recognition: an analysis of its behavior and performance
1997cited by this paper
Bagging Predictors
1996cited by this paper
Weak Convergence and Empirical Processes: With Applications to Statistics
1996cited by this paper
Extrapolation methods - theory and practice
1993cited by this paper
Richardson Extrapolation and the Bootstrap
1988cited by this paper
Lectures on Geometric Measure Theory
1984cited by this paper
SAMPLE SIZE FOR SELECTION
1971influential reference

CITED BY

Random Forests as Statistical Procedures: Design, Variance, and Dependence
2026cites this paper
When do Random Forests work?
2025cites this paper
Centroid Decision Forest
2025cites this paper
Estimating the Accuracy of a Bagged Ensemble
2025cites this paper
Foundations and Innovations in Data Fusion and Ensemble Learning for Effective Consensus
2025cites this paper
A proactive approach for random forest
2025cites this paper
M\"obius inversion and the iterated bootstrap
2024cites this paper
Extrapolated Cross-Validation for Randomized Ensembles
2023influential citation
Are ensembles getting better all the time?
2023cites this paper
Reconstructing high-resolution groundwater level data using a hybrid random forest model to quantify distributed groundwater changes in the Indus Basin
2023cites this paper
Corrected generalized cross-validation for finite ensembles of penalized estimators
2023cites this paper
Mapping Topsoil Total Nitrogen Using Random Forest and Modified Regression Kriging in Agricultural Areas of Central China
2023cites this paper
Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach
2022cites this paper
Robust Counterfactual Explanations for Random Forests
2022cites this paper
Bootstrapping the Error of Oja's Algorithm
2021cites this paper
EnPSO: An AutoML Technique for Generating Ensemble Recommender System
2021cites this paper
Disclosing Personal Names in Screen Names Predicts Better Final Achievement Levels in Massive Open Online Courses
2021cites this paper
Ellipse fitting by spatial averaging of random ensembles
2020cites this paper
Randomized numerical linear algebra: Foundations and algorithms
2020cites this paper
RaSE: Random Subspace Ensemble Classification
2020cites this paper
Bootstrapping the Operator Norm in High Dimensions: Error Estimation for Covariance Matrices and Sketching
2019cites this paper
Random projections: Data perturbation for classification problems
2019cites this paper
Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success
2019cites this paper
Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting
2019cites this paper
Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap
2018cites this paper
A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication
2017cites this paper