Proportionality-based association metrics in count compositional data

Kevin McGregor,Nneka Okaeme,Reihane Khorasaniha,Simona Veniamin,J. Jovel,Richard Miller,R. Mahmood,M. Graham,Christine Bonner,C. Bernstein,D. Arnold,A. Bar-Or,Janace Hart,R. Marrie,Julia O’Mahony,E. Yeh,Yinshan Zhao,B. Banwell,E. Waubant,N. Knox,G. Van Domselaar,F. Zhu,A. Mirza,H. Tremlett,Heather Armstrong

Published 2023 in bioRxiv

ABSTRACT

Motivation Compositional data comprise vectors that describe the constituent parts of a whole. Data arising from various -omics platforms such as 16S and RNA-sequencing are compositional in nature. However, correlations between features on raw counts have no meaningful interpretation. Metrics of proportionality were formulated to address this problem. However, there is an inherent bias that arises when calculating these metrics empirically on count-based measures due to variability in read depths. Results We quantify the bias introduced by empirically calculating proportionality-based association metrics in count data. Additionally, we propose a means of estimating these metrics within a logit-normal multinomial model in pursuit of more accurate estimates. The model-based estimates are shown to outperform empirical estimates in simulated data, and are additionally applied to a mouse embryonic stem-cell single-cell sequencing dataset as well as a pediatric-onset multiple sclerosis metagenomic dataset. Availability and Implementation An R package is available at https://CRAN.R-project.org/package=countprop. Supplementary information Supplementary data are available at Bioinformatics online.

PUBLICATION RECORD

Publication year
2023
Venue
bioRxiv
Publication date
2023-08-24
Fields of study
Biology, Computer Science
Identifiers
DOI 10.1101/2023.08.23.554468
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The metabolic potential of the paediatric-onset multiple sclerosis gut microbiome.
2022cited by this paper
A Zero-Inflated Logistic Normal Multinomial Model for Extracting Microbial Compositions
2022cited by this paper
A statistical model for describing and simulating microbial community profiles
2021cited by this paper
Metagenomic Analysis of the Pediatric-Onset Multiple Sclerosis Gut Microbiome
2021cited by this paper
Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences
2020cited by this paper
Some thoughts on counts in sequencing studies
2020cited by this paper
Alterations in Circulating Fatty Acid Are Associated With Gut Microbiota Dysbiosis and Inflammation in Multiple Sclerosis
2020cited by this paper
Microbial Network Recovery by Compositional Graphical Lasso
2020cited by this paper
Evaluating measures of association for single-cell transcriptomics
2019cited by this paper
Primary progressive multiple sclerosis in a Russian cohort: relationship with gut bacterial diversity
2019cited by this paper
Gut microbiome of treatment-naïve MS patients of different ethnicities early in disease course
2019cited by this paper
MDiNE: a model to estimate differential co-occurrence networks in microbiome studies
2019cited by this paper
Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes
2019cited by this paper
Overcoming systematic errors caused by log-transformation of normalized single-cell RNA sequencing data
2018cited by this paper
DivNet: Estimating diversity in networked communities
2018cited by this paper
A field guide for the compositional analysis of any-omics data
2018cited by this paper
Microbiome Datasets Are Compositional: And This Is Not Optional
2017cited by this paper
Multiple sclerosis patients have a distinct gut microbiota compared to healthy controls
2016cited by this paper
Alterations of the human gut microbiome in multiple sclerosis
2016cited by this paper
The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses
2016cited by this paper
How should we measure proportionality on relative gene expression data?
2016cited by this paper
Dysbiosis in the Gut Microbiota of Patients with Multiple Sclerosis, with a Striking Depletion of Species Belonging to Clostridia XIVa and IV Clusters
2015cited by this paper
Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells
2015cited by this paper
A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis
2013cited by this paper
Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
2013cited by this paper
EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM
2012cited by this paper
Nonparametric Modeling of Hierarchically Exchangeable Data
2003cited by this paper
The Statistical Analysis of Compositional Data
1986cited by this paper

CITED BY

No citing papers are available for this paper.