Optimizing sample size for supervised machine learning with bulk transcriptomic sequencing: a learning curve approach

Published 2024 in Briefings Bioinform.

ABSTRACT

Abstract Accurate sample classification using transcriptomics data is crucial for advancing personalized medicine. Achieving this goal necessitates determining a suitable sample size that ensures adequate classification accuracy without undue resource allocation. Current sample size calculation methods rely on assumptions and algorithms that may not align with supervised machine learning techniques for sample classification. Addressing this critical methodological gap, we present a novel computational approach that establishes the accuracy-versus-sample size relationship by employing a data augmentation strategy followed by fitting a learning curve. We comprehensively evaluated its performance for microRNA and RNA sequencing data, considering diverse data characteristics and algorithm configurations, based on a spectrum of evaluation metrics. To foster accessibility and reproducibility, the Python and R code for implementing our approach is available on GitHub. Its deployment will significantly facilitate the adoption of machine learning in transcriptomics studies and accelerate their translation into clinically useful classifiers for personalized treatment.

PUBLICATION RECORD

Publication year
2024
Venue
Briefings Bioinform.
Publication date
2024-09-10
Fields of study
Biology, Medicine, Computer Science, Mathematics
Identifiers
DOI 10.1093/bib/bbaf097 arXiv 2409.06180 PMID 40072846 PMCID 11899567
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

OUP accepted manuscript
2022cited by this paper
Depth normalization of small RNA sequencing: using data and biology to select a suitable method
2022cited by this paper
PRECISION.seq: An R Package for Benchmarking Depth Normalization in microRNA Sequencing
2022cited by this paper
Synthetic single cell RNA sequencing data from small pilot studies using deep generative models
2021cited by this paper
Deep learning with small datasets: using autoencoders to address limited datasets in construction management
2021cited by this paper
ACTIVA: realistic single-cell RNA-seq generation with automatic cell-type identification using introspective variational autoencoders
2021cited by this paper
Engineering Psychology and Human Performance
2021cited by this paper
Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks
2020cited by this paper
Statistical Assessment of Depth Normalization for Small RNA Sequencing
2020cited by this paper
Interplay of somatic alterations and immune infiltration modulates response to PD-1 blockade in advanced clear cell renal cell carcinoma
2020influential reference
AUTO-ENCODING VARIATIONAL BAYES
2020cited by this paper
Balancing Reconstruction Error and Kullback-Leibler Divergence in Variational Autoencoders
2020cited by this paper
Sample size requirements for learning to classify with high-dimensional biomarker panels
2019cited by this paper
Data denoising with transfer learning in single-cell transcriptomics
2019cited by this paper
Glow: Generative Flow with Invertible 1x1 Convolutions
2018cited by this paper
Statistical primer: sample size and power calculations—why, when and how?
2018cited by this paper
Generative Adversarial Network in Medical Imaging: A Review
2018cited by this paper
GENERATIVE ADVERSARIAL NETS
2018cited by this paper
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
2018cited by this paper
Predicting cancer outcomes from histology and genomics using convolutional networks
2017cited by this paper
Optimizing the Latent Space of Generative Networks
2017cited by this paper
Wasserstein Generative Adversarial Networks
2017cited by this paper
powsimR: Power analysis for bulk and single cell RNA-seq experiments
2017cited by this paper
Opportunities and obstacles for deep learning in biology and medicine
2017cited by this paper
Masked Autoregressive Flow for Density Estimation
2017cited by this paper
Power analysis for RNA-Seq differential expression studies
2017cited by this paper
Improved Training of Wasserstein GANs
2017cited by this paper
Density estimation using Real NVP
2016cited by this paper
Countering imprecision in precision medicine
2016cited by this paper
Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments
2016cited by this paper
Moving From Clinical Trials to Precision Medicine: The Role for Predictive Modeling.
2016cited by this paper
Deep Learning
2016cited by this paper
TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data
2015cited by this paper
Genetics: Big hopes for big data
2015cited by this paper
Variational Inference with Normalizing Flows
2015cited by this paper
Large-scale profiling of microRNAs for The Cancer Genome Atlas
2015cited by this paper
PROPER: comprehensive power evaluation for differential expression using RNA-seq
2015cited by this paper
Even modest prediction accuracy of genomic models can have large clinical utility
2014cited by this paper
NICE: Non-linear Independent Components Estimation
2014cited by this paper
Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression
2013cited by this paper
Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data
2013cited by this paper
Clinical outcome prediction by microRNAs in human cancer: a systematic review.
2012cited by this paper
Fundamentals of Clinical Trials
2012cited by this paper
Predicting sample size required for classification performance
2012influential reference
Design and validation issues in RNA-seq experiments
2011cited by this paper
Identification of high-quality cancer prognostic markers and metastasis network modules
2010cited by this paper
Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
2010cited by this paper
A tutorial on pilot studies: the what, why and how
2010cited by this paper
A simulation–approximation approach to sample size planning for high-dimensional classification studies
2009cited by this paper
Comprehensive genomic characterization defines human glioblastoma genes and core pathways
2008cited by this paper
How Large a Training Set is Needed to Develop a Classifier for Microarray Data?
2008cited by this paper
Enabling personalized cancer medicine through analysis of gene-expression patterns
2008cited by this paper
Sample size planning for developing classifiers using high-dimensional DNA microarray data.
2007cited by this paper
Prediction of cancer outcome with microarrays.
2005cited by this paper
Primer: an evidence-based approach to prognostic markers
2005cited by this paper
Median Absolute Deviation
2005cited by this paper
The functions of animal microRNAs
2004cited by this paper
Intensified Cloning Efforts Have Revealed Numer
2004cited by this paper
An Introduction to Variable and Feature Selection
2003cited by this paper
Estimating Dataset Size Requirements for Classifying DNA Microarray Data
2003cited by this paper
What makes clinical research ethical?
2000cited by this paper
A Survey of Transfer Between Connectionist Networks
1996cited by this paper
Learning Curves: Asymptotic Values and Rate of Convergence
1993cited by this paper
Using additive noise in back-propagation training
1992cited by this paper
Nonlinear principal component analysis using autoassociative neural networks
1991cited by this paper
A concordance correlation coefficient to evaluate reproducibility.
1989cited by this paper
A graph-dynamic model of the power law of practice and the problem-solving fan-effect.
1988cited by this paper
For Personal Use. Only Reproduce with Permission from the Lancet Publishing Group. Exclusions before Randomisation Exclusions after Randomisation Sample Size Slippages in Randomised Trials: Exclusions and the Lost and Wayward
year unknowncited by this paper

CITED BY

No citing papers are available for this paper.