Augmenting microbial phylogenomic signal with tailored marker gene sets

Henry Secaira-Morocho,Xiaofang Jiang,Qiyun Zhu

Published 2025 in Nature Communications

ABSTRACT

Phylogenetic marker genes are traditionally selected from a fixed collection of whole genomes representing major microbial phyla, covering only a small fraction of gene families. However, most microbial diversity resides in metagenome-assembled genomes, which exhibit taxonomic imbalance and harbor gene families that do not fit the criteria for universal orthologs. To address these limitations, we introduce TMarSel, a software tool for automated, free-from-expert opinion, and tailored marker selection for deep microbial phylogenomics. TMarSel allows users to select a variable number of markers and copies based on KEGG and EggNOG gene family annotations, enabling a systematic evaluation of the phylogenetic signal from the entire gene family pool. We show that an expanded marker selection tailored to the input genomes improves the accuracy of phylogenetic trees across simulated and real-world datasets of whole genomes and metagenome-assembled genomes compared to previous markers, even when metagenome-assembled genomes lack a fraction of open reading frames. The selected markers have functional annotations related to metabolism, cellular processes, and environmental information processing, in addition to replication, translation, and transcription. TMarSel provides flexibility in the number of markers, copies, and annotation databases while remaining robust against taxonomic imbalance and incomplete genomic data. Marker genes used in microbial phylogenomics are limited to fixed gene sets selected from complete genomes. TMarSel is a flexible yet robust method for selecting any number of markers from genomes or MAGs that mitigate the impact of taxonomic imbalance and incomplete genomic data on tree quality.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-80 of 80 references · Page 1 of 1

CITED BY