Motivation The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. Currently, there are no general pipelines for this task. We present an R package, phylostratr, to fill this gap, making high-quality phylostratigraphic analysis accessible to non-specialists. Results Phylostratigraphic analysis entails searching for homologs within increasingly broad clades. The highest clade that contains all homologs of a gene is that gene’s phylostratum. We have created a general R-based framework, phylostratr, for estimating the phylostratum of every gene in a species. The program can fully automate an analysis: select species for a balanced representation of each strata, retrieve the sequences from UniProt, build BLAST databases, run BLAST, infer homologs for each gene against each subject species, determine phylostrata, and return summaries and diagnostics. phylostratr allows extensive customization. A user may: modify the automatically-generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. phylostratr also offers proteome quality assessments, false-positive diagnostics, and checks for missing organelle genomes. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae. Availability phylostratr source code and vignettes are available on GitHub at https://github.com/arendsee/phylostratr Contact evewurtele@gmail.com
phylostratr: A framework for phylostratigraphy
Zebulun W. Arendsee,Jing Li,Urminder Singh,Arun S. Seetharam,K. Dorman,E. Wurtele
Published 2018 in bioRxiv
ABSTRACT
PUBLICATION RECORD
- Publication year
2018
- Venue
bioRxiv
- Publication date
2018-07-03
- Fields of study
Biology, Medicine, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-65 of 65 references · Page 1 of 1
CITED BY
Showing 1-43 of 43 citing papers · Page 1 of 1