ABSTRACT

Motivation Compositional data comprise vectors that describe the constituent parts of a whole. Data arising from various -omics platforms such as 16S and RNA-sequencing are compositional in nature. However, correlations between features on raw counts have no meaningful interpretation. Metrics of proportionality were formulated to address this problem. However, there is an inherent bias that arises when calculating these metrics empirically on count-based measures due to variability in read depths. Results We quantify the bias introduced by empirically calculating proportionality-based association metrics in count data. Additionally, we propose a means of estimating these metrics within a logit-normal multinomial model in pursuit of more accurate estimates. The model-based estimates are shown to outperform empirical estimates in simulated data, and are additionally applied to a mouse embryonic stem-cell single-cell sequencing dataset as well as a pediatric-onset multiple sclerosis metagenomic dataset. Availability and Implementation An R package is available at https://CRAN.R-project.org/package=countprop. Supplementary information Supplementary data are available at Bioinformatics online.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-28 of 28 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1