Analysis of purely random forests bias

Published 2014 in arXiv.org

ABSTRACT

Random forests are a very effective and commonly used statistical method, but their full theoretical analysis is still an open problem. As a first step, simplified models such as purely random forests have been introduced, in order to shed light on the good performance of random forests. In this paper, we study the approximation error (the bias) of some purely random forest models in a regression framework, focusing in particular on the influence of the number of trees in the forest. Under some regularity assumptions on the regression function, we show that the bias of an infinite forest decreases at a faster rate (with respect to the size of each tree) than a single tree. As a consequence, infinite forests attain a strictly better risk rate (with respect to the sample size) than single trees. Furthermore, our results allow to derive a minimum number of trees sufficient to reach the same rate as an infinite forest. As a by-product of our analysis, we also show a link between the bias of purely random forests and the bias of some kernel estimators.

PUBLICATION RECORD

Publication year
2014
Venue
arXiv.org
Publication date
2014-07-14
Fields of study
Mathematics, Computer Science, Environmental Science
Identifiers
arXiv 1407.3939
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

R: A language and environment for statistical computing.
2014influential reference
Variance reduction in purely random forests
2012influential reference
Classification and regression trees
2012cited by this paper
On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification
2010cited by this paper
Analysis of a Random Forests Model
2010cited by this paper
A survey of cross-validation procedures for model selection
2009cited by this paper
Consistency of Random Forests and Other Averaging Classifiers
2008influential reference
Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning
2008influential reference
V-fold cross-validation improved: V-fold penalization
2008cited by this paper
Random Features for Large-Scale Kernel Machines
2007influential reference
All of Nonparametric Statistics
2007influential reference
Classification and Regression by randomForest
2007cited by this paper
All of Nonparametric Statistics (Springer Texts in Statistics)
2006influential reference
Random Forests and Adaptive Nearest Neighbors
2006cited by this paper
Extremely randomized trees
2006influential reference
CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS
2004influential reference
A Distribution-Free Theory of Nonparametric Regression
2002influential reference
Analyzing Bagging
2001cited by this paper
Random Forests
2001cited by this paper
Limiting the Number of Trees in Random Forests
2001cited by this paper
SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES
2000influential reference
Bagging Predictors
1996cited by this paper

CITED BY

Consistency of Honest Decision Trees and Random Forests
2026cites this paper
Early Stopping for Regression Trees
2025cites this paper
Spatial properties of Bayesian unsupervised trees
2024cites this paper
Statistical Advantages of Oblique Randomized Decision Trees and Forests
2024cites this paper
Non-asymptotic Properties of Generalized Mondrian Forests in Statistical Learning
2024cites this paper
Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants
2024cites this paper
Minimax-optimal and Locally-adaptive Online Nonparametric Regression
2024cites this paper
Indigenous Development of Water Quality Monitoring System for Urban Areas using IoT & ML
2023cites this paper
On the Convergence of CART under Sufficient Impurity Decrease Condition
2023cites this paper
Towards Convergence Rate Analysis of Random Forests for Classification
2022cites this paper
Is interpolation benign for random forest regression?
2022cites this paper
Application of novel hybrid model for land subsidence susceptibility mapping
2022influential citation
Pollen-Based Holocene Thawing-History of Permafrost in Northern Asia and Its Potential Impacts on Climate Change
2022cites this paper
Land-subsidence susceptibility mapping: assessment of an adaptive neuro-fuzzy inference system–genetic algorithm hybrid model
2022cites this paper
Fault Handling in Industry 4.0: Definition, Process and Applications
2022cites this paper
Random Forests Weighted Local Fréchet Regression with Theoretical Guarantee
2022cites this paper
Is interpolation benign for random forests?
2022cites this paper
Optimal causal decision trees ensemble for improved prediction and causal inference
2022cites this paper
Random Forest Weighted Local Fréchet Regression with Random Objects
2022influential citation
Statistica Sinica Preprint
2021cites this paper
WildWood: A New Random Forest Algorithm
2021influential citation
of the Bernoulli Society for Mathematical Statistics and Probability Volume Twenty Seven Number Four November 2021
2021cites this paper
Minimax semi-supervised set-valued approach to multi-class classification
2021cites this paper
Application of Machine Learning Algorithms in Predicting Pyrolytic Analysis Result
2021cites this paper
Scaffolding Sets
2021cites this paper
Minimax Rates for High-Dimensional Random Tessellation Forests
2021cites this paper
MINIMAX RATES FOR STIT AND POISSON HYPERPLANE RANDOM FORESTS
2021cites this paper
A Learning Theoretic Perspective on Local Explainability
2020cites this paper
Censored Quantile Regression Forest
2020cites this paper
Trees, forests, and impurity-based variable importance in regression
2020cites this paper
Estimating heterogeneous treatment effects with right-censored data via causal survival forests
2020influential citation
Stochastic geometry to generalize the Mondrian Process
2020cites this paper
Bayesian Nonparametric Space Partitions: A Survey
2020cites this paper
Online Binary Space Partitioning Forests
2020cites this paper
Estimation and Inference with Trees and Forests in High Dimensions
2020cites this paper
Smoothing and adaptation of shifted Pólya tree ensembles
2020influential citation
Estimating heterogeneous treatment effects with right-censored data via causal survival forests
2020cites this paper
Estimating the algorithmic variance of randomized ensembles via the bootstrap
2019cites this paper
AMF: Aggregated Mondrian forests for online learning
2019influential citation
Best Split Nodes for Regression Trees
2019cites this paper
Plug-in methods in classification
2019cites this paper
Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting
2019cites this paper
Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification
2019cites this paper
Fr\'echet random forests for metric space valued regression with non euclidean predictors
2019cites this paper
Best-scored Random Forest Classification
2019cites this paper
MINIMAX OPTIMAL RATES FOR MONDRIAN TREES AND FORESTS By
2019influential citation
Analyzing CART.
2019cites this paper
Méthodes pour l’apprentissage de données massives
2018cites this paper
Boulevard: Regularized Stochastic Gradient Boosted Trees and Their Limiting Distribution
2018influential citation
Impact of subsampling and tree depth on random forests
2018influential citation
Early Fault Detection of Aircraft Components Using Flight Sensor Data
2018cites this paper
Minimax optimal rates for Mondrian trees and forests
2018influential citation
When do random forests fail?
2018influential citation
Complete Analysis of a Random Forest Model
2018cites this paper
TREE-BASED SURVIVAL MODELS AND PRECISION MEDICINE
2018cites this paper
On PAC-Bayesian bounds for random forests
2018cites this paper
Forest fire forecasting using ensemble learning approaches
2018cites this paper
New Kernels For Density and Regression Estimation via Randomized Histograms
2018cites this paper
Sharp Analysis of a Simple Model for Random Forests
2018cites this paper
Consistency of survival tree and forest models: splitting bias and correction
2017cites this paper
Universal consistency and minimax rates for online Mondrian Forests
2017influential citation
Consistency of survival tree and forest models: splitting bias and correction
2017cites this paper
Tuning parameters in random forests
2017cites this paper
Comments on: A random forest guided tour
2016influential citation
Comments on: "A Random Forest Guided Tour" by G. Biau and E. Scornet
2016cites this paper
Generalized random forests
2016cites this paper
Impact of subsampling and pruning on random forests
2016cites this paper
Arbres CART et For{\^e}ts al{\'e}atoiresImportance et s{\'e}lection de variables
2016cites this paper
Solving Heterogeneous Estimating Equations with Gradient Forests
2016cites this paper
Calibrating random forests for probability estimation
2016cites this paper
Uniform Convergence of Random Forests via Adaptive Concentration
2015cites this paper
Random Forests and Kernel Methods
2015cites this paper
Learning with random forests
2015influential citation
On the Use of Harrell's C for Node Splitting in Random Survival Forests
2015cites this paper
A random forest guided tour
2015influential citation
Random Forests for Big Data
2015cites this paper
Adaptive Concentration of Regression Trees, with Application to Random Forests
2015cites this paper
On the use of Harrell's C for clinical risk prediction via random survival forests
2015cites this paper
Measuring the Algorithmic Convergence of Random Forests via Bootstrap Extrapolation
2015cites this paper