phylostratr: A framework for phylostratigraphy

Zebulun W. Arendsee,Jing Li,Urminder Singh,Arun S. Seetharam,K. Dorman,E. Wurtele

Published 2018 in bioRxiv

ABSTRACT

Motivation The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. Currently, there are no general pipelines for this task. We present an R package, phylostratr, to fill this gap, making high-quality phylostratigraphic analysis accessible to non-specialists. Results Phylostratigraphic analysis entails searching for homologs within increasingly broad clades. The highest clade that contains all homologs of a gene is that gene’s phylostratum. We have created a general R-based framework, phylostratr, for estimating the phylostratum of every gene in a species. The program can fully automate an analysis: select species for a balanced representation of each strata, retrieve the sequences from UniProt, build BLAST databases, run BLAST, infer homologs for each gene against each subject species, determine phylostrata, and return summaries and diagnostics. phylostratr allows extensive customization. A user may: modify the automatically-generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. phylostratr also offers proteome quality assessments, false-positive diagnostics, and checks for missing organelle genomes. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae. Availability phylostratr source code and vignettes are available on GitHub at https://github.com/arendsee/phylostratr Contact evewurtele@gmail.com

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-65 of 65 references · Page 1 of 1

CITED BY

Showing 1-43 of 43 citing papers · Page 1 of 1