Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison

Published 2011 in PLoS ONE

ABSTRACT

Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome.

PUBLICATION RECORD

Publication year
2011
Venue
PLoS ONE
Publication date
2011-07-25
Fields of study
Biology, Computer Science, Mathematics, Environmental Science, Medicine
Identifiers
DOI 10.1371/journal.pone.0056859 arXiv 1107.5095 PMID 23505415 PMCID 3594297
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

ggplot2 - Elegant Graphics for Data Analysis (2nd Edition)
2017cited by this paper
R: A language and environment for statistical computing.
2014cited by this paper
Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria
2012cited by this paper
A Format for Phylogenetic Placements
2012cited by this paper
Principal components analysis in the space of phylogenetic trees
2011cited by this paper
ggplot2: Elegant Graphics for Data Analysis
2011cited by this paper
Metagenomic biomarker discovery and explanation
2011cited by this paper
Global patterns in the biogeography of bacterial taxa.
2011cited by this paper
Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood
2011cited by this paper
The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples
2010influential reference
Vaginal microbiome of reproductive-age women
2010cited by this paper
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
2010cited by this paper
Alignment and clustering of phylogenetic markers - implications for microbial diversity studies
2010cited by this paper
Microbial community resemblance methods differ in their ability to detect biologically relevant patterns
2010cited by this paper
Metagenomic Sequencing of an In Vitro-Simulated Microbial Community
2010cited by this paper
Infernal 1.0: inference of RNA alignments
2009cited by this paper
Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models
2009cited by this paper
Bacterial Community Variation in Human Body Habitats Across Space and Time
2009cited by this paper
Evolutionary Placement of Short Sequence Reads
2009cited by this paper
Analyzing Data with Graphs : Metagenomic Data and the Phylogenetic Tree ∗
2008cited by this paper
The Ribosomal Database Project: improved alignments and new tools for rRNA analysis
2008influential reference
Prevalence and Abundance of Uncultivated Megasphaera-Like Bacteria in the Human Vaginal Environment
2008cited by this paper
Quantitative and Qualitative β Diversity Measures Lead to Different Insights into Factors That Structure Microbial Communities
2007cited by this paper
Object oriented data analysis: Sets of trees
2007cited by this paper
Microbial ecology: Human gut microbes associated with obesity
2006cited by this paper
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
2006cited by this paper
Molecular analysis of the bacterial microbiota in the human stomach.
2006cited by this paper
A novel algorithm and web-based tool for comparing two alternative phylogenetic trees
2006cited by this paper
UniFrac: a New Phylogenetic Method for Comparing Microbial Communities
2005influential reference
Performance
2005cited by this paper
Bioconductor: open software development for computational biology and bioinformatics
2004cited by this paper
Reliability of diagnosing bacterial vaginosis is improved by a standardized method of gram stain interpretation
1991cited by this paper
Comparison of phylogenetic trees
1981cited by this paper
Deﬁning 3D residue environment in protein structures using SCORPION and FORMIGA
year unknowncited by this paper
BIOINFORMATICS ORIGINAL PAPER
year unknowninfluential reference

CITED BY

krepp: a k-mer-based maximum pseudo-likelihood method for estimating read distances and genome-wide phylogenetic placement.
2026cites this paper
Deconvolving Phylogenetic Distance Mixtures
2026cites this paper
Diversity of arbuscular mycorrhiza fungi in roots of giant miscanthus (Miscanthus × giganteus) and prairie cordgrass (Spartina pectinata) cultivated on heavy metal-contaminated areas
2025cites this paper
Ecological Differentiation Among Nitrous Oxide Reducers Enhances Temperature Effects on Riverine N2O Emissions
2025cites this paper
A k-mer-based maximum likelihood method for estimating distances of reads to genomes enables genome-wide phylogenetic placement
2025cites this paper
Enhancing big data analysis in IoT applications and optimizing the performance of machine learning models using hybrid dimensionality optimization approach
2025cites this paper
Scalable method for exploring phylogenetic placement uncertainty with custom visualizations using treeio and ggtree
2025cites this paper
An assessment of the occupational environment contributing to the microbial contamination and diversity of leisure dried tofu
2024cites this paper
Biogeography and impact of nitrous oxide reducers in rivers across a broad environmental gradient on emission rates.
2024cites this paper
Abundance and phylogenetic distribution of eight key enzymes of the phosphorus biogeochemical cycle in grassland soils
2023cites this paper
Phyloecology of nitrate ammonifiers and their importance relative to denitrifiers in global terrestrial biomes
2023cites this paper
Phyloecology of nrfA-ammonifiers and their relative importance with denitrifiers in global terrestrial biomes
2023cites this paper
Metagenomic Analysis Using Phylogenetic Placement—A Review of the First Decade
2022influential citation
Phenol and Polyaromatic Hydrocarbons Are Stronger Drivers Than Host Plant Species in Shaping the Arbuscular Mycorrhizal Fungal Component of the Mycorrhizosphere
2022cites this paper
Distance-Based Phylogenetic Placement with Statistical Support
2022cites this paper
Optimized phylogenetic clustering of HIV-1 sequence data for public health applications
2022cites this paper
Long‐term inorganic nitrogen application changes the ammonia‐oxidizing archaeal community composition in paddy soils
2021cites this paper
DEPP: Deep Learning Enables Extending Species Trees using Single Genes
2021cites this paper
The genetic and cultural impact of the Steppe migration into Europe
2021cites this paper
Microbiome Aggregated Traits and Assembly Are More Sensitive to Soil Management than Diversity
2021cites this paper
Fast and Accurate Distance-based Phylogenetic Placement using Divide and Conquer
2021cites this paper
Incorporating phylogenetic information in microbiome abundance studies has no effect on detection power and FDR control
2020cites this paper
Hierarchical correction of p-values via an ultrametric tree running Ornstein-Uhlenbeck process
2020cites this paper
Soil Bacterial and Archaeal Communities and Their Potential to Perform N-Cycling Processes in Soils of Boreal Forests Growing on Well-Drained Peat
2020influential citation
App-SpaM: phylogenetic placement of short reads without sequence alignment
2020cites this paper
Health and disease markers correlate with gut microbiome composition across thousands of people
2020cites this paper
Soil structure, nutrient status and water holding capacity shape Uruguayan grassland prokaryotic communities.
2020cites this paper
Phylogeny Estimation Given Sequence Length Heterogeneity
2020cites this paper
Correlation and association analyses in microbiome study integrating multiomics in health and disease.
2020cites this paper
Astrovirus infects actively secreting goblet cells and alters the gut mucus barrier
2020cites this paper
Incorporating Phylogenetic Information in Microbiome Differential Abundance Studies Has No Effect on Detection Power and FDR Control
2020cites this paper
Novel Methods for Analyzing and Visualizing Phylogenetic Placements
2020influential citation
Software for Systematics and Evolution APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments
2020cites this paper
Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
2019influential citation
Microdiversity and temporal dynamics of marine bacterial dimethylsulfoniopropionate genes
2019cites this paper
On the dependency between principal components: Application to determine the rank of a matrix in an evolutionary process
2019cites this paper
Characterization of G-Quadruplex Motifs in espB, espK, and cyp51 Genes of Mycobacterium tuberculosis as Potential Drug Targets
2019cites this paper
Lucerne (Medicago sativa) alters N2O-reducing communities associated with cocksfoot (Dactylis glomerata) roots and promotes N2O production in intercropping in a greenhouse experiment
2019cites this paper
A phylogenetic model for the recruitment of species into microbial communities and application to studies of the human microbiome
2019cites this paper
APPLES: Scalable Distance-based Phylogenetic Placement with or without Alignments.
2019cites this paper
Calcium exerts a strong influence upon phosphohydrolase gene abundance and phylogenetic diversity in soil
2019cites this paper
Microbial Genome Diversity and Microbial Genome Sequencing
2019cites this paper
Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data
2019cites this paper
An introduction to phylosymbiosis
2019cites this paper
Genesis and Gappa: Library and Toolkit for Working with Phylogenetic (Placement) Data.
2019cites this paper
Nitrogen fixation in a landrace of maize is supported by a mucilage-associated diazotrophic microbiota
2018cites this paper
GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
2018cites this paper
Differences in microbial community structure and nitrogen cycling in natural and drained tropical peatland soils
2018cites this paper
APPLES: Fast Distance-based Phylogenetic Placement
2018cites this paper
Scalable Methods for Post-Processing, Visualizing, and Analyzing Phylogenetic Placements
2018cites this paper
Rapid alignment-free phylogenetic identification of metagenomic sequences
2018cites this paper
Ecophylogenetics Clarifies the Evolutionary Association between Mammals and Their Gut Microbiota
2018cites this paper
Methods for automatic reference trees and multilevel phylogenetic placement
2018influential citation
Spatial and phyloecological analyses of nosZ genes underscore niche differentiation amongst terrestrial N2O reducing communities
2017cites this paper
Denitrifying and nitrous oxide reducing genotypes Ecophysiology and niche differentiation
2017cites this paper
Adaptive gPCA: A method for structured dimensionality reduction with applications to microbiome data
2017cites this paper
Effect of anaerobic soil disinfestation on the bacterial community and key soilborne phytopathogenic agents under walnut tree-crop nursery conditions
2017cites this paper
Transcriptomic differentiation underlying marine‐to‐freshwater transitions in the South American silversides Odontesthes argentinensis and O. bonariensis (Atheriniformes)
2017cites this paper
Molecular prediction of lytic vs lysogenic states for Microcystis phage: Metatranscriptomic evidence of lysogeny during large bloom events
2017cites this paper
Ecophylogenetics Reveals the Evolutionary Associations between Mammals and their Gut Microbiota
2017cites this paper
Analysing Microbial Community Composition through Amplicon Sequencing: From Sampling to Hypothesis Testing
2017cites this paper
Phylogenetic-Based Characterization of Microbial Eukaryote Community Structure and Diversity of an Estuary in the Salish Sea
2017cites this paper
Reveals the Evolutionary Associations between Mammals and their 2 " Gut Microbiota 3 "
2017cites this paper
Ecophylogenetics Reveals the Evolutionary Associations between Mammals and their 2
2017cites this paper
Prediction of Host-Microbe Interactions from Community High-Throughput Sequencing Data
2017cites this paper
Microbial Community Analysis Using High‐Throughput Amplicon Sequencing
2016cites this paper
Deep Groundwater Metagenomics - Computational Analysis of Microbial Communities and Metabolic Pathways
2016cites this paper
Habitat partitioning of marine benthic denitrifier communities in response to oxygen availability.
2016cites this paper
Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis
2016cites this paper
Millions of reads, thousands of taxa: microbial community structure and associations analyzed via marker genes.
2016cites this paper
K-shuff: A Novel Algorithm for Characterizing Structural and Compositional Diversity in Gene Libraries
2016cites this paper
Characterization of the gut microbiome in epidemiologic studies: the multiethnic cohort experience.
2016cites this paper
Measuring Cluster Stability in a Large Scale Phylogenetic Analysis of Functional Genes in Metagenomes Using pplacer
2016cites this paper
Microbial Diversity Across an Oxygen Gradient Using Large-scale Phylogenetic-based Analysis of Marine Metagenomes
2016cites this paper
The ecologist's field guide to sequence‐based identification of biodiversity
2016cites this paper
Bacterial Composition of the Human Upper Gastrointestinal Tract Microbiome Is Dynamic and Associated with Genomic Instability in a Barrett’s Esophagus Cohort
2015cites this paper
Characterizing and comparing phylogenies from their Laplacian spectrum
2015cites this paper
Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
2015cites this paper
Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar Report
2015influential citation
KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.
2015influential citation
Genetics of Alzheimer´S Disease
2014cites this paper
Microbial shifts in the aging mouse gut
2014cites this paper
PhyloSift: phylogenetic analysis of genomes and metagenomes
2014cites this paper
Genetic Diversity of Musa balbisiana Colla in Indonesia Based on AFLP Marker
2014cites this paper
Novel esterases from microbes through classical and metagenomics approach: Studies on the enzymes and their applications
2014cites this paper
Phylogenetics and the human microbiome
2014cites this paper
Phylogenetics and the Human Microbiome
2014cites this paper
An introduction to the analysis of shotgun metagenomic data
2014cites this paper
Rapid 16S rRNA Next-Generation Sequencing of Polymicrobial Clinical Samples for Diagnosis of Complex Bacterial Infections
2013cites this paper
High throughput sequencing methods and analysis for microbiome research.
2013cites this paper
The Cervical Microbiome over 7 Years and a Comparison of Methodologies for Its Characterization
2012influential citation
A Format for Phylogenetic Placements
2012cites this paper
Sequencing our way towards understanding global eukaryotic biodiversity.
2012cites this paper
Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria
2012cites this paper
The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples
2010cites this paper
Phosphohydrolase Gene Niche Separation in Soil and Maintenance of Microbiome Function under Organic and 5 Inorganic Soil Fertilization
year unknowncites this paper
FIU Digital Commons FIU Digital
year unknowncites this paper