Confidence intervals for random forests: the jackknife and the infinitesimal jackknife

Published 2013 in Journal of machine learning research

ABSTRACT

We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.

PUBLICATION RECORD

Publication year
2013
Venue
Journal of machine learning research
Publication date
2013-11-18
Fields of study
Mathematics, Computer Science, Medicine
Identifiers
DOI 10.5555/2627435.2638587 arXiv 1311.4555 PMID 25580094 PMCID PMC4286302
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Estimation and Accuracy After Model Selection
2014influential reference
Model selection , estimation , and bootstrap smoothing
2012influential reference
Bootstrap-Based Variance Estimators for A Bagging Predictor.
2011cited by this paper
Analysis of a Random Forests Model
2010cited by this paper
Standard errors for bagged and random forest estimators
2009cited by this paper
Consistency of Random Forests and Other Averaging Classifiers
2008cited by this paper
Classification and Regression by randomForest
2007influential reference
On bagging and nonlinear estimation
2007cited by this paper
BMC Bioinformatics BioMed Central Methodology article
2007cited by this paper
Bias in random forest variable importance measures: Illustrations, sources and a solution
2007cited by this paper
Quantile Regression Forests
2006cited by this paper
Random Forests and Adaptive Nearest Neighbors
2006cited by this paper
OBSERVATIONS ON BAGGING
2006cited by this paper
Extremely randomized trees
2006cited by this paper
The Elements of Statistical Learning
2003influential reference
Effects of bagging and bias correction on estimators defined by estimating equations
2003cited by this paper
Modern Applied Statistics With S
2003cited by this paper
Stochastic gradient boosting
2002cited by this paper
Random Forests
2001influential reference
Analyzing Bagging
2001cited by this paper
Some Comments on Cp
2000cited by this paper
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization
2000cited by this paper
Bagging for linear classifiers
1998cited by this paper
Bagging Predictors
1996cited by this paper
Jackknife‐After‐Bootstrap Standard Errors and Influence Functions
1992influential reference
Compliance as an Explanatory Variable in Clinical Trials
1991cited by this paper
Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients.
1989cited by this paper
The Jackknife Estimate of Variance
1981cited by this paper

CITED BY

Statistical frameworks for reliable machine learning predictions and inference
2026cites this paper
A near-global dataset of dissolved organic carbon concentrations and yields in forested headwater streams
2026cites this paper
Consistency of Honest Decision Trees and Random Forests
2026cites this paper
E-QRGMM: Efficient Generative Metamodeling for Covariate-Dependent Uncertainty Quantification
2026cites this paper
Hierarchical forecasting of COVID-19 cases in Africa using machine learning models.
2026cites this paper
Empirical Likelihood for Random Forests and Ensembles
2025cites this paper
Machine Learning for Econometrics
2025cites this paper
Jackknife Variance Estimation for H\'ajek-Dominated Generalized U-Statistics
2025cites this paper
How do Firms' Financial Conditions Influence the Transmission of Monetary Policy? A Non-Parametric Local Projection Approach
2025cites this paper
Efficient Uncertainty Quantification of Bagging via the Cheap Bootstrap
2025cites this paper
Active learning confidence measures for coupling strategies in digital twins integrating simulation and data-driven submodels
2025cites this paper
Confidence intervals for Random Forest permutation importance with missing data
2025cites this paper
Statistical Inference for Gradient Boosting Regression
2025cites this paper
Multiscale Bootstrap Correction for Random Forest Voting: A Statistical Inference Approach to Stock Index Trend Prediction
2025cites this paper
Uncertainty-aware Bayesian machine learning modelling of land cover classification
2025cites this paper
Active and transfer learning with partially Bayesian neural networks for materials and chemicals
2025cites this paper
Forecasting Dengue: Evaluating the Role of Hydroclimate Information in Subseasonal to Seasonal Prediction
2025cites this paper
Forecasting the future development in quality and value of professional football players for applications in team management
2025cites this paper
Climate land use and other drivers impacts on island ecosystem services: a global review
2025cites this paper
Site-specific soil water characteristic curve prediction with extremely scarce data using data-driven hierarchical Bayesian model
2025cites this paper
Comparing Traditional Methods and Modern Statistical Techniques for Tree Height Prediction
2025cites this paper
Comparing pixel-and object-based approaches for classifying benthic habitats
2025cites this paper
Foundations and Innovations in Data Fusion and Ensemble Learning for Effective Consensus
2025cites this paper
On regression-adjusted imputation estimators of average treatment effects
2025cites this paper
Spatial variability and relative influence of seasonal rainfall drivers in Ethiopia
2025cites this paper
Satellite and eddy covariance analysis reveals short-lived evapotranspiration changes after fire in Mediterranean woodland
2025cites this paper
Similarity-based Conformal Prediciton using Random Forest Proximities
2025cites this paper
Robust treatment assignment with uncertainty-aware causal forests: Joint optimization of accuracy and estimation uncertainty
2025cites this paper
Random forests for individual treatment effect estimation with the R package ITERF
2025cites this paper
Localized Uncertainty Quantification in Random Forests via Proximities
2025cites this paper
OSMAC: A Dynamic SMAC for Data Streams
2025cites this paper
A jackknife approach to estimate the prediction uncertainty from binary classifiers under right-censoring
2025cites this paper
Simulations of Life History Variation for Demographic Inference From Population Genomic Data
2025cites this paper
State of Health Estimation for Lithium-Ion Batteries Under Arbitrary Usage Using Data-Driven Multimodel Fusion
2024cites this paper
Theoretical Limitations of Ensembles in the Age of Overparameterization
2024cites this paper
On the uncertainty of real estate price predictions
2024cites this paper
Exploratory subgroup identification in the heterogeneous Cox model: A relatively simple procedure
2024cites this paper
Optimising stope design through economic and geotechnic assessments of predictions made at a meter scale resolution using the sites' reconciled data
2024cites this paper
Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
2024cites this paper
Mapping surface sediment characteristics in enclosed shallow‐marine environments using spatially balanced designs and the random forest algorithm
2024cites this paper
Improve ROI with Causal Learning and Conformal Prediction
2024cites this paper
Calibration and XGBoost reweighting to reduce coverage and non-response biases in overlapping panel surveys: application to the Healthcare and Social Survey
2024cites this paper
Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers
2024cites this paper
Multiple invasion routes have led to the pervasive introduction of earthworms in North America
2024cites this paper
Multi-view uncertainty deep forest: An innovative deep forest equipped with uncertainty estimation for drug-induced liver injury prediction
2024cites this paper
Efficient Near-Infrared Spectrum Detection in Nondestructive Wood Testing via Transfer Network Redesign
2024cites this paper
SYSTEMATIC DEVELOPMENT OF FRAMEWORK FOR VALIDATION AND PERFORMANCE QUANTIFICATION OF ADDITIVELY MANUFACTURED REPLACEMENT PARTS FOR STRUCTURAL STEEL APPLICATIONS
2024cites this paper
Chinook salmon depth distributions on the continental shelf are shaped by interactions between location, season, and individual condition
2024cites this paper
Ecological and climatic transferability of airborne lidar-driven aboveground biomass models in Piñon-Juniper woodlands
2024cites this paper
Global distribution and environmental correlates of marine bioturbation.
2024cites this paper
The Unfairness of ε-Fairness
2024cites this paper
How Do Applied Researchers Use the Causal Forest? A Methodological Review
2024cites this paper
Strontium isoscape of sub-Saharan Africa allows tracing origins of victims of the transatlantic slave trade
2024cites this paper
Development of Two Methods for Estimating High-Dimensional Data in the Case of Multicollinearity and Outliers
2024cites this paper
Daily PM2.5 and Seasonal-Trend Decomposition to Identify Extreme Air Pollution Events from 2001 to 2020 for Continental Australia Using a Random Forest Model
2024cites this paper
Variable Importance Measures for Multivariate Random Forests
2024cites this paper
Deciphering nitrogen concentrations in Metasequoia glyptostroboides: a novel approach using RGB images and machine learning
2024cites this paper
Will the real populists please stand up? A machine learning index of party populism
2024cites this paper
Quantile Regression using Random Forest Proximities
2024cites this paper
How do firms’ financial conditions influence the transmission of monetary policy? A non-parametric local projection approach
2024cites this paper
Model-Based Prediction for Small Domains Using Covariates: A Comparison of Four Methods
2024cites this paper
Materials Informatics: An Algorithmic Design Rule
2023cites this paper
Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value
2023cites this paper
Spatial Prediction of Soil Organic Carbon Stock in the Moroccan High Atlas Using Machine Learning
2023cites this paper
Overview of modern approaches for identifying and evaluating heterogeneous treatment effects from clinical data
2023cites this paper
Using Forests in Multivariate Regression Discontinuity Designs
2023cites this paper
Design of experiments and machine learning with application to industrial experiments
2023cites this paper
Generalized Random Forests Using Fixed-Point Trees
2023cites this paper
A predictive model for lung cancer screening nonadherence in a community setting health-care network
2023cites this paper
Extrapolated Cross-Validation for Randomized Ensembles
2023cites this paper
Multivariate prediction intervals for bagged models
2023influential citation
A Comparison of Spatial and Nonspatial Methods in Statistical Modeling of NO2: Prediction Accuracy, Uncertainty Quantification, and Model Interpretation
2023cites this paper
Probabilistic prediction by means of the propagation of response variable uncertainty through a Monte Carlo approach in regression random forest: Application to soil moisture regionalization
2023cites this paper
An automatic purse-seine set type classification algorithm to inform tropical tuna management
2023cites this paper
Supervised machine learning for theory building and testing: Opportunities in operations management
2023cites this paper
Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels
2023cites this paper
Approximation trees: statistical reproducibility in model distillation
2023cites this paper
Debiased lasso after sample splitting for estimation and inference in high‐dimensional generalized linear models
2023cites this paper
On the Gini-impurity Preservation For Privacy Random Forests
2023cites this paper
Confidence intervals of survival predictions with neural networks trained on molecular data
2023cites this paper
Practical battery State of Health estimation using data-driven multi-model fusion
2023cites this paper
A Jackknife-Inspired Deep Learning Approach to Subject-Independent Classification of EEG
2023cites this paper
Deep Regression Network with Prediction Confidence in Time Series Application for Asset Health Estimation
2023cites this paper
Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data
2023cites this paper
Reduction of testing effort for fatigue tests: Application of Bayesian optimal experimental design
2023cites this paper
Early prediction of battery life by learning from both time-series and histogram data
2023cites this paper
Incorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels
2023cites this paper
Global patterns and drivers of phosphorus fractions in natural soils
2023cites this paper
Intrinsic and extrinsic techniques for quantification uncertainty of an interpretable GRU deep learning model used to predict atmospheric total suspended particulates (TSP) in Zabol, Iran during the dusty period of 120-days wind.
2023cites this paper
Global methane emissions from rivers and streams
2023cites this paper
Conformal Meta-learners for Predictive Inference of Individual Treatment Effects
2023cites this paper
Small Area Estimation with Random Forests and the LASSO
2023cites this paper
Global patterns and drivers of phosphorus pools in natural soils 1
2023cites this paper
Economic costs of the invasive Yellow-legged hornet on honey bees.
2023cites this paper
Data driven design of alkali-activated concrete using sequential learning
2023cites this paper
Investigation of ensemble methods in terms of statistics: TIMMS 2019 example
2023cites this paper
A hybrid data mining framework for variable annuity portfolio valuation
2023cites this paper
Medoid splits for efficient random forests in metric spaces
2023cites this paper
Strategy-aware evaluation of treatment personalization
2023cites this paper
EKOIST Journal of Econometrics and Statistics
2023cites this paper