Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER

Published 2005 in BMC Bioinformatics

ABSTRACT

BackgroundProfile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring.ResultsUsing default settings, SAM was overall more sensitive. SAM's model estimation was superior, while HMMER's model scoring was more accurate. Critical features for model building were then analysed by comparing the two packages' algorithmic choices and parameters. The weighting between prior probabilities and multiple alignment counts held the primary explanation why SAM's model building was superior. Our analysis suggests that HMMER gives too much weight to the sequence counts. SAM's emission prior probabilities were also shown to be more sensitive. The relative sequence weighting schemes are different in the two packages but performed equivalently.ConclusionSAM model estimation was more sensitive, while HMMER model scoring was more accurate. By combining the best algorithmic features from both packages the accuracy was substantially improved compared to their default performance.

PUBLICATION RECORD

Publication year
2005
Venue
BMC Bioinformatics
Publication date
2005-04-15
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1186/1471-2105-6-99 PMID 15831105 PMCID 1097716
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The Pfam protein families database
2007cited by this paper
Protein homology detection by HMM?CHMM comparison
2005cited by this paper
SMART 4.0: towards genomic data integration
2004cited by this paper
Improving profile HMM discrimination by adapting transition probabilities.
2004cited by this paper
Transition Priors for Protein Hidden Markov Models: An Empirical Study towards Maximum Discrimination
2004cited by this paper
Enhanced protein domain discovery using taxonomy
2004cited by this paper
The Pfam protein families database
2004cited by this paper
Detecting distant homologs using phylogenetic tree‐based HMMs
2003cited by this paper
The TIGRFAMs database of protein families
2003cited by this paper
Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry
2003cited by this paper
The ASTRAL Compendium in 2004
2003cited by this paper
COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.
2003cited by this paper
Enhanced protein domain discovery by using language modeling techniques from speech recognition
2003cited by this paper
SATCHMO: Sequence Alignment and Tree Construction Using Hidden Markov Models
2003cited by this paper
A comparison of profile hidden Markov model procedures for remote homology detection.
2002influential reference
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory.
2002cited by this paper
What is the value added by human intervention in protein structure prediction?
2001cited by this paper
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.
2001cited by this paper
Identification of related proteins on family, superfamily and fold level.
2000cited by this paper
Hidden Markov models that use predicted secondary structures for fold recognition
1999cited by this paper
Weighting hidden Markov models for maximum discrimination
1998cited by this paper
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.
1998cited by this paper
Profile hidden Markov models
1998cited by this paper
Hidden Markov models for detecting remote protein homologies
1998cited by this paper
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
1998cited by this paper
Scoring hidden Markov models
1997cited by this paper
Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology
1996cited by this paper
Hidden Markov models for sequence analysis: extension and analysis of the basic method
1996cited by this paper
Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology.
1996cited by this paper
Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA
1995cited by this paper
Tree-based maximal likelihood substitution matrices and hidden Markov models
1995cited by this paper
Maximum Discrimination Hidden Markov Models of Sequence Consensus
1995cited by this paper
SCOP: a structural classification of proteins database for the investigation of sequences and structures.
1995influential reference
Volume changes in protein evolution.
1994cited by this paper
Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families
1993cited by this paper

CITED BY

Genome-wide analysis of class III peroxidase gene family in Glycine max and functional roles in stress response
2025cites this paper
A previously undescribed archaeal virus suppresses host immunity
2025cites this paper
Polysaccharide degradation in an Antarctic bacterium: Discovery of glycoside hydrolases from remote regions of the sequence space.
2025cites this paper
Theobroma cacao Virome: Exploring Public RNA-Seq Data for Viral Discovery and Surveillance
2025influential citation
Xylan Degradation in the Halotolerant Bacterium Bacillus altitudinis relies on glycosidic hydrolases from families 11 and 30
2025cites this paper
Evolutionary history and activity towards oligosaccharides and polysaccharides of GH3 glycosidases from an Antarctic marine bacterium.
2024influential citation
A new archaeal virus that suppresses the transcription of host immunity genes
2024cites this paper
The Virome of Cocoa Fermentation-Associated Microorganisms
2024influential citation
Analysis of Enhanced Hidden Markov Models for Improved Stock Market Price Forecasting and Prediction
2024cites this paper
Biogenesis of flavor-related linalool is diverged and genetically conserved in tree peony (Paeonia × suffruticosa)
2022cites this paper
Structural basis for the calmodulin-mediated activation of eukaryotic elongation factor 2 kinase
2022cites this paper
Assessing the composition of the plasma membrane of Leishmania (Leishmania) infantum and L. (L.) amazonensis using label-free proteomics.
2020cites this paper
SMI-BLAST: a novel supervised search framework based on PSI-BLAST for protein remote homology detection
2020cites this paper
Retapamulin-assisted ribosome profiling reveals the alternative bacterial proteome
2019cites this paper
MultiDomainBenchmark: a multi-domain query and subject database suite
2019cites this paper
Profile Comparer Extended: phylogeny of lytic polysaccharide monooxygenase families using profile hidden Markov model alignments
2019cites this paper
Changes in Gene Expression and Metabolite Profiles in Platanus acerifolia Leaves in Response to Feeding Damage Caused by Corythucha ciliata
2019cites this paper
Retapamulin-Assisted Rib osome Profiling Reveals the Alternative Bacterial Proteome Graphical
2019cites this paper
Expression of lima bean terpene synthases in rice enhances recruitment of a beneficial enemy of a major rice pest.
2018cites this paper
A comprehensive review and comparison of different computational methods for protein remote homology detection
2018cites this paper
Bioinformatics approaches for assessing microbial communities in the surface ocean
2018cites this paper
Programmed Ribosomal Frameshifting Generates a Copper Transporter and a Copper Chaperone from the Same Gene.
2017cites this paper
Hidden Markov Model in Biological Sequence Analysis – A Systematic Review
2016cites this paper
Open Peer Review
2016cites this paper
Biological and medical physics, biomedical engineering
2014cites this paper
A new subfamily LIP of the major intrinsic proteins
2014influential citation
The Amborella Genome and the Evolution of Flowering Plants
2013cites this paper
Concomitant prediction of function and fold at the domain level with GO-based profiles
2013cites this paper
The Trypanosoma rangeli trypomastigote surfaceome reveals novel proteins and targets for specific diagnosis.
2013cites this paper
Improved performance of sequence search approaches in remote homology detection
2013cites this paper
De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins
2013influential citation
Specificity and evolution of bacterial two-component signal transduction systems
2013cites this paper
Nuevas aproximaciones computacionales para el estudio y la predicción funcional de dominios de proteínas
2013cites this paper
Evolutionary characteristics of bacterial two-component systems.
2012cites this paper
Evolutionary Analysis of the Protein Domain Distribution in Eukaryotes
2012cites this paper
Evolution of the gene translation machinery and its applications to drug discovery
2012cites this paper
Small Molecule Docking from Theoretical Structural Models
2012cites this paper
Adaptive mutations that prevent crosstalk enable the expansion of paralogous signaling protein families.
2012cites this paper
A machine learning approach to query time-series microarray data sets for functionally related genes using hidden markov models
2011cites this paper
Towards a complete sequence homology concept : Limitations and Applications
2011cites this paper
An expanded binding model for Cys2His2 zinc finger protein–DNA interfaces
2011cites this paper
Multicofactor proteins: structure,prediction, function
2011cites this paper
Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to membrane proteins
2011cites this paper
Towards a complete sequence homology concept: Limitations and applications
2011cites this paper
Classification of MHC I Proteins According to Their Ligand-Type Specificity
2011cites this paper
Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures
2011cites this paper
Genome-wide analysis of the heat shock transcription factors in Populustrichocarpa and Medicagotruncatula
2011influential citation
A genomics method to identify pathogenicity‐related proteins. Application to aminoacyl‐tRNA synthetase‐like proteins
2010cites this paper
Recherche de domaines protéiques divergents à l'aide de modèles de Markov cachés : application à Plasmodium falciparum. (Seeking divergent protein domains with Hidden Markov Models: application to Plasmodium falciparum)
2010cites this paper
Development of a computational framework for protein homology detection by incorporating realignment
2010cites this paper
Prediction of prognostic biomarkers for Interferon-based therapy to Hepatitis C Virus patients: a metaanalysis of the NS5A protein in subtypes 1a, 1b, and 3a
2010cites this paper
Riboswitch Detection Using Profile Hidden Markov Models
2009cites this paper
Hidden Markov Models and their Applications in Biological Sequence Analysis
2009cites this paper
Automated Protein Structure Classification: A Survey
2009influential citation
Prediction of MHC-peptide binding: a systematic and comprehensive overview.
2009cites this paper
RHYTHM—a server to predict the orientation of transmembrane helices in channels and membrane-coils
2009cites this paper
Benchmarking homology detection procedures with low complexity filters
2009cites this paper
Augmented training of hidden Markov models to recognize remote homologs via simulated evolution
2009cites this paper
The effectiveness of position- and composition-specific gap costs for protein similarity searches
2008influential citation
Fold recognition by concurrent use of solvent accessibility and residue depth
2007cites this paper
Improving model construction of profile HMMs for remote homology detection through structural alignment
2007cites this paper
HMM-ModE – Improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences
2007cites this paper
A study of structural properties on profiles HMMs
2007cites this paper
The SUPERFAMILY database in 2007: families and functions
2006cites this paper
Sequence comparison and protein structure prediction.
2006cites this paper
Comparative genomics in C. elegans, C. briggsae, and other Caenorhabditis species.
2006cites this paper
Meningococcal Genetic Variation Mechanisms Viewed through Comparative Analysis of Serogroup C Strain FAM18
2006cites this paper
Institutionen för Cell- och Molekylärbiologi Karolinska Institutet Orthology and Protein Domain Architecture Evolution
2006cites this paper
Orthology and protein domain architecture evolution
2006cites this paper
C. elegans
2006cites this paper
Recognition and Classification of Histones Using Support Vector Machine
2006influential citation
A bioinformaticians view on the evolution of smell perception
2006cites this paper
Statistical Limits to the Identification of Ion Channel Domains by Sequence Similarity
2006influential citation
The hybrid digital tree and its applications to genomic sequence databases
2005cites this paper