Stacked generalization: an introduction to super learning

Published 2017 in bioRxiv

ABSTRACT

Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into a host of methods among which is the “Super Learner”. Super Learner uses V-fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details. We work step-by-step through two examples to illustrate concepts and address common concerns.

PUBLICATION RECORD

Publication year
2017
Venue
bioRxiv
Publication date
2017-08-18
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1007/s10654-018-0390-z PMID 29637384 PMCID PMC6089257
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Principled Machine Learning Using the Super Learner: An Application to Predicting Prison Violence
2019cited by this paper
Data-Adaptive Estimation for Double-Robust Methods in Population-Based Cancer Epidemiology: Risk Differences for Lung Cancer Mortality by Emergency Presentation
2018cited by this paper
The Balance Super Learner: A robust adaptation of the Super Learner to improve estimation of the average treatment effect in the treated based on propensity score matching
2018cited by this paper
Constrained binary classification using ensemble learning: an application to cost‐efficient targeted PrEP strategies
2018cited by this paper
Treatment Prediction, Balance, and Propensity Score Adjustment.
2017cited by this paper
Nonparametric Double Robustness
2017cited by this paper
Estimating the Comparative Effectiveness of Feeding Interventions in the Pediatric Intensive Care Unit: A Demonstration of Longitudinal Targeted Maximum Likelihood Estimation
2017cited by this paper
Discussion of “Data‐driven confounder selection via Markov and Bayesian networks” by Jenny Häggström
2017cited by this paper
Imputation approaches for potential outcomes in causal inference.
2015cited by this paper
Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables
2015cited by this paper
Second-Order Inference for the Mean of a Variable Missing at Random
2015cited by this paper
Cross-validation for selecting a model selection procedure
2015cited by this paper
Super Learner Analysis of Electronic Adherence Data Improves Viral Prediction and May Provide Strategies for Selective HIV RNA Monitoring
2015cited by this paper
Scalable Ensemble Learning and Computationally Efficient Variance Estimation
2015cited by this paper
Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study.
2015cited by this paper
R: A language and environment for statistical computing.
2014cited by this paper
Higher Order Tangent Spaces and Influence Functions
2014cited by this paper
arm: Data Analysis Using Regression and Multilevel/Hierarchical Models
2014influential reference
Mortality risk score prediction in an elderly population using machine learning.
2013cited by this paper
A tutorial on propensity score estimation for multiple treatments using generalized boosted models
2013cited by this paper
Faculty Opinions recommendation of Implementation of G-computation on a simulated data set: demonstration of a causal inference technique.
2011cited by this paper
Statistical Applications in Genetics and Molecular Biology Super Learner
2011cited by this paper
Targeted Learning: Causal Inference for Observational and Experimental Data
2011cited by this paper
Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression.
2010cited by this paper
Improving propensity score weighting using machine learning
2010cited by this paper
Higher order influence functions and minimax estimation of nonlinear functionals
2008cited by this paper
Super Learning: An Application to the Prediction of HIV-1 Drug Resistance
2007cited by this paper
The Cross-Validated Adaptive Epsilon-Net Estimator
2006cited by this paper
Stacked regressions
2004cited by this paper
Unified Methods for Censored Longitudinal Data and Causality
2003cited by this paper
Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples
2003cited by this paper
Data, Design, and Background Knowledge in Etiologic Inference
2001cited by this paper
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
2001influential reference
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
1995cited by this paper
Estimation of Regression Coefficients When Some Regressors are not Always Observed
1994cited by this paper
Stacked generalization
1992cited by this paper

CITED BY

Selecting measures of visual function to classify diabetic retinopathy status: a cross-sectional study
2026cites this paper
Radiomics in Medical Imaging: Methods, Applications, and Challenges
2026cites this paper
Integrative ensemble and meta-learning frameworks for high-precision cardiovascular risk prediction
2026cites this paper
Digital Asset Analytics for DeFi Protocol Valuation: An Explainable Optuna-Tuned Super Learner Ensemble Framework
2026cites this paper
A Causal Inference Approach for Mediated Moderation with Multiple Mediators
2026cites this paper
Joint prediction of wind speed and direction along high-speed railway lines based on multi-layer stacking machine learning models
2026cites this paper
Effect of sea surface temperature in El Niño regions on dengue dynamics in Colombia: Evidence from causal machine learning
2026cites this paper
Neoadjuvant Radiation is Causally Linked to Increased Operative Time and Perioperative Blood Transfusion in Pancreatic Ductal Adenocarcinoma.
2026cites this paper
Antenatal prediction of small for gestational age at birth based on four birthweight standards using machine learning algorithms
2026cites this paper
Social and environmental disparities in mental health benefits from active transport in the UK: a causal machine learning analysis
2026cites this paper
Poor Neighborhoods, Bad Schools? A High-Dimensional Model of Place-Based Disparities in Academic Achievement
2026cites this paper
Benchmarking Mixture of Experts against Stacking ensembles for predicting permeability of leachate-contaminated soils
2026cites this paper
An unsupervised intelligent warning model for drilling kick risk based on multi-temporal feature coupling
2025cites this paper
Carbon Capture Using Metal Organic Frameworks (MOFs): Novel Custom Ensemble Learning Models for Prediction of CO2 Adsorption
2025cites this paper
Association of disability with COVID-19 outcomes in older adults: a prospective analysis of the US health and retirement study
2025cites this paper
Meta-learning based softmax average of convolutional neural networks using multi-layer perceptron for brain tumour classification
2025cites this paper
Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)
2025cites this paper
Building heat load forecasting with mechanism-based data fusion: a stacking ensemble approach for district heating systems
2025cites this paper
Do we need flexible machine-learning algorithms to assess the effect of long-term exposure to fine particulate matter on mortality?: An example from a Canadian national cohort
2025cites this paper
A BSMOTE-OOA-SuperLearner Hybrid Framework for Interpretable Prediction of Pillar Stability
2025cites this paper
Predicting the risk of acute kidney injury in patients with acute pancreatitis complicated by sepsis using a stacked ensemble machine learning model: a retrospective study based on the MIMIC database
2025cites this paper
High-precision multi-target prediction and interpretability analysis of biomass gasification via ensemble machine learning.
2025cites this paper
How Effective Are Machine Learning and Doubly Robust Estimators in Incorporating High‐Dimensional Proxies to Reduce Residual Confounding?
2025cites this paper
Variations of Environmental Niche Breadth, Range Sizes and Geographic Exclusion With Bat Species Richness
2025cites this paper
Fusion vs. Isolation: Evaluating the Performance of Multi-Sensor Integration for Meat Spoilage Prediction
2025cites this paper
How can media attention reveal ESG improvement opportunities? A multi-algorithm machine learning-based approach for Taiwan’s electronics industry
2025cites this paper
A multi-modal model integrating MRI habitat and clinicopathology to predict platinum sensitivity in patients with high-grade serous ovarian cancer: a diagnostic study.
2025cites this paper
Machine learning predictions for regioselectivity of hydroformylation reactions: leveraging limited data for high-precision results
2025cites this paper
Multi-model ensemble framework for analysis of psychopathic traits in heinous crime convicts
2025cites this paper
Machine learning approaches for EGFR mutation status prediction in NSCLC: an updated systematic review
2025cites this paper
Exploring novel super-learner-based machine learning ensembles for landslide susceptibility prediction with integrated uncertainty quantification
2025cites this paper
Leveraging Feature Transfer to Predict Medication Resistance and Secondary-Clinical Outcomes in Psychotic Disorders in Forensic Settings
2025cites this paper
Development and Clinical Validation of Lightweight, Multimodal Machine Learning Models for Smartphone-Based Cataract Detection and Classification
2025cites this paper
Estimation of black carbon concentration in China based on a dynamic weighted ensemble learning model
2025cites this paper
Integrating move analysis and sentence reconstruction in automated writing evaluation for L2 academic writers
2025cites this paper
Artificial Intelligence Outperforms a Nomogram for Osteoradionecrosis Prognostication Following Fibula Free Flap Reconstruction in Oral Cancer Patients.
2025cites this paper
Analysis and Optimization of Coagulation Efficiency for Brackish Water Reverse Osmosis Brine Based on Ensemble Approach
2025cites this paper
Integrating expert range maps and opportunistic occurrence records of marine fish species in range estimates.
2025cites this paper
Exploring how base model combination affects the results of a “stacking” ensemble machine learning model: An applied study on optimization of heteroatom doped carbon data
2025cites this paper
Stacked machine learning for accurate and interpretable prediction of MXenes’ work function
2025cites this paper
Targeted Maximum Likelihood Estimation for Causal Inference With Observational Data-The Example of Private Tutoring.
2025cites this paper
Alzheimer’s diagnosis from EEG with reliable probabilities: subject-wise, leakage-free evaluation and isotonic calibration
2025cites this paper
Predicting admission to and length of stay in intensive care units after general anesthesia: Time-dependent role of pre- and intraoperative data for clinical decision-making.
2025cites this paper
A Comprehensive Analysis of Microbial Community and Nitrogen Removal Rate Predictions in Three Anammox Systems
2025cites this paper
Combined explainable deep learning model to predict pediatric sleep apnea from ECG and SpO2
2025cites this paper
PhosStack: Elevated Predictive Performance for Breast Cancer Classification
2025influential citation
Stochastic treatment regimes in climate-health research: Reassessing malaria risk under warming scenarios in Colombia
2025cites this paper
CSI-Driven Indoor Positioning with a Two-Layer Stacking Fusion Model for Enhanced Accuracy
2025cites this paper
Radiomics and deep learning model based on X-ray imaging for the assisted diagnosis of early Legg-Calvé-Perthes disease
2025cites this paper
Earth and Rockfill Dams’ Seepage Prediction Using Artificial Intelligence Models: A Comprehensive Review Assessment, and Future Research Directions
2025cites this paper
Development and validation of a machine learning model for predicting venous thromboembolism complications following colorectal cancer surgery
2025cites this paper
Agent-based collaborative model for forecasting large-scale intermittent spare parts in smart manufacturing industry
2025cites this paper
Multi-Sensor Wearable-Based Sleep Stage Classification Using Federated Learning for Enhanced Privacy
2025cites this paper
Parameter inverse analysis of high rockfill dams considering material uncertainty based on the EJaya-SESM model
2025cites this paper
Electric Vehicle Charging Demand Forecasting: A Data-Driven Integrated Learning Approach
2025cites this paper
Nitrate Content in Open Field Spinach, Applicative Case for Hyperspectral Reflectance Data
2025cites this paper
External validation and update of the pediatric asthma risk score as a passive digital marker for childhood asthma using integrated electronic health records
2025cites this paper
Construction of a Prediction Model for Adverse Perinatal Outcomes in Foetal Growth Restriction Based on a Machine Learning Algorithm: A Retrospective Study
2025cites this paper
Early diagnosis of autism across developmental stages through scalable and interpretable ensemble model
2025cites this paper
Optimizing personalized screening intervals for clinical biomarkers using extended joint models
2025cites this paper
Targeted Learning for Optimal Patient Assignment to Psychotherapy.
2025cites this paper
Improving Distribution Prediction by Integrating Expert Range Maps and Opportunistic Occurrences: Evidence From Japanese Sea Cucumber
2025cites this paper
AdaptiveGS: an explainable genomic selection framework based on adaptive stacking ensemble machine learning
2025cites this paper
Predicting Stunted Growth in Two Year Old Bangladeshi Children via the Super Learner
2025cites this paper
DDNet: A Robust, and Reliable Hybrid Machine Learning Model for Effective Detection of Depression Among University Students
2025cites this paper
NDT of Closed-shell Oyster Freshness by Acoustic Vibration Signals
2025cites this paper
Algorithm Selection for Estimating Causal Effects: An example using the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers to Be.
2025cites this paper
Automated machine learning for classification and regression: A tutorial for psychologists
2025cites this paper
The super learner for time-to-event outcomes: A tutorial
2025cites this paper
Finding the Optimal Number of Splits and Repetitions in Double Cross‐Fitting Targeted Maximum Likelihood Estimators
2025cites this paper
An atmospheric correction method for Himawari-8 imagery based on a multi-layer stacking algorithm
2025cites this paper
Application of an improved LightGBM hybrid integration model combining gradient harmonization and Jacobian regularization for breast cancer diagnosis
2025cites this paper
Estimating the causal effects of exposure mixtures: a generalized propensity score method
2025cites this paper
Empirical tropospheric zenith wet delay models with strong generalization capability based on a robust machine learning fusion algorithm
2025cites this paper
Synthesizing Local Capacities, Multi-Source Remote Sensing and Meta-Learning to Optimize Forest Carbon Assessment in Data-Poor Regions
2025cites this paper
An automated software methodology for biomedical statistics, data pre-processing, and machine learning
2025cites this paper
Harnessing ensemble Machine learning models for improved salinity prediction in large river basin scales
2025cites this paper
Comparison of Parametric versus Machine-learning Multiple Imputation in Clinical Trials with Missing Continuous Outcomes
2025cites this paper
Machine learning-based strategies for improving healthcare data quality: an evaluation of accuracy, completeness, and reusability
2025cites this paper
Development of machine learning models for predicting non-remission in early RA highlights the robust predictive importance of the RAID score-evidence from the ARCTIC study
2025cites this paper
Research on Landslide Displacement Prediction Using Stacking-Based Machine Learning Fusion Model
2025cites this paper
Explainable ensemble learning for predicting pine wilt disease spread.
2025cites this paper
Machine learning-based prediction of optimal GFRP thickness for enhanced circular concrete column confinement
2025cites this paper
Machine learning models incorporating genotype and ancestry improve severe asthma risk prediction
2025cites this paper
Identification of Parkinson’s disease using MRI and genetic data from the PPMI cohort: an improved machine learning fusion approach
2025cites this paper
Predicting and Optimising Ship Fuel Consumption Using Data-Driven Models and a Proposed IGWO Algorithm for Speed Adjustment
2025cites this paper
Stacking-Based Solar-Induced Chlorophyll Fluorescence Downscaling for Soil EC Estimation
2025cites this paper
PND.heter.cluster: An R package for estimating cluster-specific treatment effects in partially nested designs
2025cites this paper
Ensemble Machine Learning with Limited Data: Feature Selection and Predicting Wheat Yield in Bangladesh
2025cites this paper
Environmental Chemicals as Modifiers of the Association between Age and Ovarian Reserve
2025cites this paper
MRI-based intra-tumoral ecological diversity features and temporal characteristics for predicting microvascular invasion in hepatocellular carcinoma
2025cites this paper
Calibration of Low-Cost LoRaWAN-Based IoT Air Quality Monitors Using the Super Learner Ensemble: A Case Study for Accurate Particulate Matter Measurement
2025cites this paper
A Hybrid Machine learning Techniques for Detection of Chronic Kidney Disease
2024cites this paper
Development of a prediction model for 30-day COVID-19 hospitalization and death in a national cohort of Veterans Health Administration patients–March 2022—April 2023
2024cites this paper
Intelligent Lost Circulation Monitoring Method Based on Data Augmentation and Temporal Models
2024cites this paper
Prediction of glass-forming ability based on multi-model fusion
2024cites this paper
Monitoring soil salinity in coastal wetlands with Sentinel-2 MSI data: Combining fractional-order derivatives and stacked machine learning models
2024cites this paper
Explainable Ensemble Learning Approaches for Predicting the Compression Index of Clays
2024cites this paper
An ensemble optimizer with a stacking ensemble surrogate model for identification of groundwater contamination source.
2024cites this paper
Development of a Real-Time NOx Prediction Soft Sensor Algorithm for Power Plants Based on a Hybrid Boost Integration Model
2024cites this paper