Automating Biomedical Data Science Through Tree-Based Pipeline Optimization

Randal S. Olson,R. Urbanowicz,Peter C. Andrews,Nicole A. Lavender,L. C. Kidd,J. Moore

Published 2016 in EvoApplications

ABSTRACT

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators—such as synthetic feature constructors—that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.

PUBLICATION RECORD

Publication year
2016
Venue
EvoApplications
Publication date
2016-01-28
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1007/978-3-319-31204-0_9 arXiv 1601.07925
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Beyond Manual Tuning of Hyperparameters
2015cited by this paper
Deep feature synthesis: Towards automating data science endeavors
2015cited by this paper
A System‐Level Pathway‐Phenotype Association Analysis Using Synthetic Feature Random Forest
2014cited by this paper
Genetic Analysis of Prostate Cancer Using Computational Evolution, Pareto-Optimization and Post-processing
2013influential reference
Practical Bayesian Optimization of Machine Learning Algorithms
2012cited by this paper
Caipirini: using gene sets to rank literature
2012cited by this paper
Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection
2012influential reference
GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures
2012influential reference
DEAP: evolutionary algorithms made easy
2012cited by this paper
Random Search for Hyper-Parameter Optimization
2012cited by this paper
Computer-Automated Evolution of an X-Band Antenna for NASA's Space Technology 5 Mission
2011cited by this paper
Scikit-learn: Machine Learning in Python
2011cited by this paper
Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions
2009influential reference
A genetic programming approach to automated software repair
2009cited by this paper
Database mining for selection of SNP markers useful in admixture mapping
2009cited by this paper
Genetic programming for finite algebras
2008cited by this paper
Multi-objective optimization using genetic algorithms: A tutorial
2006cited by this paper
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
2004cited by this paper
The Design of Innovation: Lessons from and for Competent Genetic Algorithms
2002cited by this paper
A fast and elitist multiobjective genetic algorithm: NSGA-II
2002cited by this paper
Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications
1998cited by this paper
Genetic Programming: An Introduction
1997cited by this paper
Computer-automated Evolution of an X-band Antenna for Nasa's Space Technology 5 Mission
year unknowncited by this paper

CITED BY

Explaining AutoClustering: Uncovering Meta-Feature Contribution in AutoML for Clustering
2026cites this paper
Leveraging automated machine learning to benchmark, deconstruct, and compare frailty indices for predicting adverse spinal surgery outcomes
2026cites this paper
A review of neuroscience-inspired deep learning and genetic algorithms
2026cites this paper
The role of optimizers in developing data-driven model for predicting lake water quality incorporating advanced water quality model
2025cites this paper
Integrating Equation Coding with Residual Networks for Efficient ODE Approximation in Biological Research
2025cites this paper
The transformative role of machine learning in advancing MOF membranes for gas separations
2025cites this paper
Meta-Black-Box optimization for evolutionary algorithms: Review and perspective
2025cites this paper
Strengthened grey wolf optimization algorithms for numerical optimization tasks and AutoML
2025cites this paper
A Computational Framework for Estimating Days of Maintenance Delay of Naval Ships
2025cites this paper
Exploring AI-Driven Machine Learning Approaches for Optimal Classification of Peri-Implantitis Based on Oral Microbiome Data: A Feasibility Study
2025cites this paper
Skin Lesion Classification in Head and Neck Cancers Using Tissue Index Images Derived from Hyperspectral Imaging
2025cites this paper
The tree-based pipeline optimization tool: Tackling biomedical research problems with genetic programming and automated machine learning
2025cites this paper
AutoPDL: Automatic Prompt Optimization for LLM Agents
2025cites this paper
Financial Fraud Detection with Altman Z-Score and Beneish M-Score via Random Forest: Verified by Borsa Istanbul Fines (2018–2022)
2025cites this paper
Fitting analysis and study on measured data of eddy current sensor via AutoML
2025cites this paper
Mirage search optimization: Application to path planning and engineering design problems
2025cites this paper
Pre-Operative Anemia is an Unsuspecting Driver of Machine Learning Prediction of Adverse Outcomes after Lumbar Spinal Fusion.
2025cites this paper
Artificial Intelligence-Powered Materials Science
2025cites this paper
Optimizing Stroke Detection Using Evidential Networks and Uncertainty-Based Refinement
2025cites this paper
Fake advertisements detection using automated multimodal learning: a case study for Vietnamese real estate data
2025cites this paper
A literature review on automated machine learning
2025influential citation
Machine learning-led semi-automated medium optimization reveals salt as key for flaviolin production in Pseudomonas putida
2025cites this paper
Applications and Advances of Machine Learning in the Development of Solid-State Electrolytes for Lithium-Ion Batteries
2025cites this paper
Visualising the Truth: A Composite Evaluation Framework for Score-Based Predictive Model Selection
2025cites this paper
Comparative Analysis of Automated Machine Learning for Hyperparameter Optimization and Explainable Artificial Intelligence Models
2025cites this paper
Pyrimidine: An algebra-inspired Programming framework for evolutionary algorithms
2025cites this paper
Automated Selection of Time Series Forecasting Models for Financial Accounting Data: Synthetic Data Application
2025cites this paper
AI‐Driven Advances in Sustainable Materials for Green Energy: From Innovation to Lifecycle Management
2025cites this paper
Using Machine Learning to Detect Financial Statement Fraud: A Cross-Country Analysis Applied to Wirecard AG
2025cites this paper
Training and Cross-Validating Machine Learning Pipelines with Limited Memory
2024cites this paper
Machine learning assisted prediction of copper-based catalysts towards carbon dioxide electroreduction into carbon monoxide
2024cites this paper
Development of an individualized risk calculator of treatment resistance in patients with first-episode psychosis (TRipCal) using automated machine learning: a 12-year follow-up study with clozapine prescription as a proxy indicator
2024cites this paper
An Integrated Model for Predicting Student Achievement Efficiency Using Data Envelopment Analysis and Genetic Programming Approach
2024cites this paper
Metabolomics Biomarker Discovery to Optimize Hepatocellular Carcinoma Diagnosis: Methodology Integrating AutoML and Explainable Artificial Intelligence
2024cites this paper
Machine learning-assisted rapid determination for traditional Chinese Medicine Constitution
2024cites this paper
Sorting Through ML Algorithms: A Call for Community Contributions
2024cites this paper
GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables
2024cites this paper
Introducing HoNCAML: Holistic No-Code Auto Machine Learning
2024cites this paper
Auto-Machine-Learning Models for Standardized Precipitation Index Prediction in North–Central Mexico
2024cites this paper
Low-cost quantum mechanical descriptors for data efficient skin sensitization QSAR models
2024cites this paper
Problem-oriented AutoML in Clustering
2024cites this paper
Good results from sensor data: Performance of machine learning algorithms for regression problems in chemical sensors
2024cites this paper
Machine Learning Models for Low Back Pain Detection and Factor Identification: Insights from a 6-Year Nationwide Survey.
2024cites this paper
Prediction of the synergistic effect of antimicrobial peptides and antimicrobial agents via supervised machine learning
2024cites this paper
Non-invasive CT radiomic biomarkers predict microsatellite stability status in colorectal cancer: a multicenter validation study
2024cites this paper
Study on design optimization of GFRP tubular column composite structure based on machine learning method
2024cites this paper
Revised Empirical Relations Between Earthquake Source and Rupture Parameters by Regression and Machine Learning Algorithms
2023cites this paper
Applying machine learning methods to quantify emotional experience in installation art
2023cites this paper
Intelligent Lightning Hazard Warning System for a Wind Farm
2023cites this paper
Predicting Cancer Prognostics from Tumour Transcriptomics Using an Auto Machine Learning Approach
2023cites this paper
AssistML: an approach to manage, recommend and reuse ML solutions
2023cites this paper
Analysis of Automated Forecasting Model Possibilities for Company Financial Accounting Data
2023cites this paper
Bridging the gap between mechanistic biological models and machine learning surrogates
2023cites this paper
The role of machine learning in carbon neutrality: catalyst property prediction, design, and synthesis for carbon dioxide reduction
2023cites this paper
Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications
2023cites this paper
Toward Sustainable Water Infrastructure: The State‐Of‐The‐Art for Modeling the Failure Probability of Water Pipes
2023cites this paper
Integrating Molecular Simulations with Machine Learning Guides in the Design and Synthesis of [BMIM][BF4]/MOF Composites for CO2/N2 Separation
2023cites this paper
Faster Convergence with Lexicase Selection in Tree-based Automated Machine Learning
2023cites this paper
Application of Automated Machine Learning Pipeline for the Classification of Volcanic Time Series Data Name :
2023cites this paper
Machine Learning Based Estimation of Buildings’ Characteristics Employing Electrical and Chilled Water Consumption Data: Pipeline Optimization
2023cites this paper
Exploring genetic influences on adverse outcome pathways using heuristic simulation and graph data science.
2023cites this paper
Predicting biomass composition and operating conditions in fluidized bed biomass gasifiers: An automated machine learning approach combined with cooperative game theory
2023cites this paper
A novel automated SuperLearner using a genetic algorithm-based hyperparameter optimization
2023cites this paper
Automated quantitative trait locus analysis (AutoQTL)
2023cites this paper
Facilitating “Omics” for Phenotype Classification Using a User-Friendly AI-Driven Platform: Application in Cancer Prognostics
2023cites this paper
Response to comments on “Jaws 30”
2023cites this paper
Cluster Analysis reveals Socioeconomic Disparities among Elective Spine Surgery Patients
2023cites this paper
Automated quantitative trait locus analysis (AutoQTL)
2023cites this paper
Cough Classification with Deep Derived Features using Audio Spectrogram Transformer
2022cites this paper
Novel digital approaches to the assessment of problematic opioid use
2022cites this paper
Bayesian AutoML for Databases via the InferenceQL Probabilistic Programming System
2022cites this paper
DRAPE: optimizing private data release under adjustable privacy-utility equilibrium
2022cites this paper
Genetic optimization of asteroid families’ membership
2022cites this paper
Predicting Oxidation Behavior of Multi-Principal Element Alloys by Machine Learning Methods
2022cites this paper
Predictive Modelling in Clinical Bioinformatics: Key Concepts for Startups
2022cites this paper
Benchmarking AutoML algorithms on a collection of synthetic classification problems
2022cites this paper
Método para a Classificação de Áreas Queimadas Baseado em Aprendizado de Máquina Automatizado
2022cites this paper
Machine learning to predict the antimicrobial activity of cold atmospheric plasma-activated liquids
2022cites this paper
Just Add Data: automated predictive modeling for knowledge discovery and feature selection
2022cites this paper
Credit Scoring Active Telegram Channels Offering Stock Signals
2022cites this paper
Review of the state of the art in autonomous artificial intelligence
2022influential citation
Improving medical experts’ efficiency of misinformation detection: an exploratory study
2022cites this paper
Multi-Objective Hyperparameter Optimization in Machine Learning—An Overview
2022cites this paper
AutoML for estimating grass height from ETM+/OLI data from field measurements at a nature reserve
2022cites this paper
Machine Learning Assisted Investigation of Defect Influence on the Mechanical Properties of Additively Manufactured Architected Materials
2022cites this paper
Combining Machine Learning and Molecular Simulations to Unlock Gas Separation Potentials of MOF Membranes and MOF/Polymer MMMs
2022cites this paper
Toward Automated Machine Learning-Based Hyperspectral Image Analysis in Crop Yield and Biomass Estimation
2022cites this paper
IGWO-SS: Improved Grey Wolf Optimization Based on Synaptic Saliency for Fast Neural Architecture Search in Computer Vision
2022cites this paper
Mining Robust Default Configurations for Resource-constrained AutoML
2022cites this paper
A Survey of Open Source Automation Tools for Data Science Predictions
2022cites this paper
Mood State Detection in Handwritten Tasks Using PCA–mFCBF and Automated Machine Learning
2022cites this paper
Analysis of Heart Rate Variability and Game Performance in Normal and Cognitively Impaired Elderly Subjects Using Serious Games
2022cites this paper
Hyperparameter Tuning
2022cites this paper
Clinically Relevant Sound-Based Features in COVID-19 Identification: Robustness Assessment With a Data-Centric Machine Learning Pipeline
2022cites this paper
A Systematic Method for Selecting Molecular Descriptors as Features When Training Models for Predicting Physiochemical Properties
2022cites this paper
Multi-Objective Hyperparameter Optimization - An Overview
2022cites this paper
Intelligent Estimation of Wind Farm Performance with Direct and Indirect ‘Point’ Forecasting Approaches Integrating Several Nwp Models
2022cites this paper
Comparison of algorithms for error prediction in manufacturing with automl and a cost-based metric
2022cites this paper
Results from using an Automl Tool for Error Analysis in Manufacturing
2022cites this paper
From Platform to Knowledge Graph: Evolution of Laboratory Automation
2022cites this paper