Hidden Markov model speed heuristic and iterative HMM search procedure

Steve L. Johnson,Sean R. Eddy,Elon Portugaly

Published 2010 in BMC Bioinformatics

ABSTRACT

BackgroundProfile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases.ResultsWe have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K.ConclusionsOur search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.

PUBLICATION RECORD

Publication year
2010
Venue
BMC Bioinformatics
Publication date
2010-08-18
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1186/1471-2105-11-431 PMID 20718988 PMCID 2931519
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

The Pfam protein families database
2011cited by this paper
Effects of dependence in high-dimensional multiple testing problems
2008cited by this paper
The Pfam protein families database
2007cited by this paper
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction
2006influential reference
BIOINFORMATICS ORIGINAL PAPER Sequence analysis
2004cited by this paper
ASTRAL compendium enhancements
2002cited by this paper
A comparison of profile hidden Markov model procedures for remote homology detection.
2002influential reference
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.
2001cited by this paper
The ASTRAL compendium for protein structure and sequence analysis
2000cited by this paper
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.
1998cited by this paper
Hidden Markov models for detecting remote protein homologies
1998cited by this paper
Homology Detection via Family Pairwise Search
1998cited by this paper
Removing near-neighbour redundancy from large protein sequence collections
1998cited by this paper
SCOP: a structural classification of proteins database
1998cited by this paper
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997cited by this paper
Comparison of methods for searching protein sequence databases
1995cited by this paper
Basic local alignment search tool.
1990cited by this paper

CITED BY

A methylome-derived m6-dAMP trigger assembles a PUA-Cal-HAD immune filament that depletes dNTPs to abort phage infection
2026cites this paper
A comprehensive catalogue of receptor-binding domains in extracellular contractile injection systems
2026cites this paper
Enhancing protein structure prediction: evaluating the role of amino acid physicochemical features in homology search
2026cites this paper
Predicting TCR-pMHC Binding by Reinforcement Learning
2026cites this paper
Flavobacterium oryzagri sp. nov. and Flavobacterium oryzicola sp. nov., isolated from paddy soil.
2026cites this paper
Dynamics-aware Evolutionary Profiling Uncouples Structural Rigidity from Functional Motion to Enable Enhanced Variant Interpretation
2026cites this paper
Computational Discovery of NPA001937: A Novel Carotenoid Targeting Conserved Insect Proteins for Sustainable Pest Management.
2026cites this paper
The GT61 genes mediating xylan side-chains synthesis in the marine diatom Phaeodactylum tricornutum under circadian regulation.
2026cites this paper
Altering chemotaxis as a strategy to enhance the foraging range of motility-restricted bacteria
2026cites this paper
Rewriting protein alphabets with language models
2026cites this paper
Characterization of viral diversity in wild marmot blood from the Qinghai-Tibet Plateau.
2026cites this paper
Prophages of infant-derived Bifidobacterium longum subspecies employ antagonistic and synergistic strategies to persist in their host
2026cites this paper
Structural insights into the assembly and evolution of a complex bacterial flagellar motor.
2026cites this paper
Novel α-glucan phosphorylase coupled with thermotolerant yeast facilitated a high conversion rate from corn stover to artificial starch and microbial protein
2026cites this paper
Synergistic plant-microbe interactions drive the remediation of naphthenic acid fractional compounds in a constructed wetland mesocosm
2026influential citation
Mechanistic insights into SteAB regulation of cell wall hydrolase RipA in Mycobacterium tuberculosis
2026cites this paper
Molecular characterizations and expression profiles under temperature stress of transient receptor potential channels in Diaphorina citri.
2026cites this paper
Diversity and distribution of bacterial DNA polymerases.
2026cites this paper
Herbiconiux salviae sp. nov., isolated from the flower of Salvia splendens.
2026cites this paper
Beyond native sequence recovery: Improved modeling of the sequence-energy landscape of protein structures
2026cites this paper
Uncovering distinct protein conformations using coevolutionary information and AlphaFold
2026cites this paper
Protein Language Models in Directed Evolution
2026cites this paper
Evolution of Plant AIG1-like Proteins: Different Modes of Sequence Divergence and Their Contributions to Functional Diversification
2026influential citation
Distinct transcriptional programs control polyethylene glycol (PEG)-induced drought stress responses in oat (Avena sativa L.) shoot and roots.
2026cites this paper
Disentangling coevolutionary constraints for modeling protein conformational heterogeneity.
2026cites this paper
Convergent evolution of hexenal isomerases in Lepidoptera and plants.
2026cites this paper
Cryo-EM structure and polar assembly of the PS2 S-layer of Corynebacterium glutamicum
2025cites this paper
Accurate Biomolecular Structure Prediction in CASP16 With Optimized Inputs to State‐Of‐The‐Art Predictors
2025cites this paper
Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings
2025cites this paper
Structure Modeling Protocols for Protein Multimer and RNA in CASP16 With Enhanced MSAs, Model Ranking, and Deep Learning
2025cites this paper
Genomic diversity and functional insights of carbapenem-resistant Klebsiella pneumoniae revealed by centroid coding sequences analysis
2025cites this paper
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
2025cites this paper
Systematic Analysis of Dof Gene Family in Prunus persica Unveils Candidate Regulators for Enhancing Cold Tolerance
2025cites this paper
Advancing protein evolution with inverse folding models integrating structural and evolutionary constraints.
2025cites this paper
SpoIIIL is a forespore factor required for efficient cell-cell signalling during Bacillus subtilis sporulation
2025cites this paper
Metagenomic Identification of Brominated Indole Biosynthetic Machinery from Cyanobacteria.
2025cites this paper
Genome-wide identification of GH3 β-glucosidase members reveals NtBGLC12 improves growth in tobacco plants.
2025cites this paper
Diversity, evolution, and gene regulation of endogenous retroviruses in eight important poultry species.
2025cites this paper
Cvm1 and its paralog Cvm2 function as a complex at vacuolar membrane contact sites.
2025cites this paper
Exploring ribosomally synthesized and post-translationally modified peptides through SPECO-based genome mining.
2025cites this paper
PepPCBench is a Comprehensive Benchmarking Framework for Protein-Peptide Complex Structure Prediction
2025cites this paper
SAND: a comprehensive annotation of class D β-lactamases using structural alignment-based numbering
2025cites this paper
Improving AlphaFold2‐ and AlphaFold3‐Based Protein Complex Structure Prediction With MULTICOM4 in CASP16
2025cites this paper
FastConformation: A Standalone ML-Based Toolkit for Modeling and Analyzing Protein Conformational Ensembles at Scale
2025cites this paper
Computationally designed proteins mimic antibody immune evasion in viral evolution.
2025cites this paper
Cell division protein CdpA organises and anchors the midcell ring in haloarchaea
2025cites this paper
Modeling CAPRI Targets of Round 55 by Combining AlphaFold and Docking.
2025cites this paper
Boosting AlphaFold Protein Tertiary Structure Prediction through MSA Engineering and Extensive Model Sampling and Ranking in CASP16
2025cites this paper
Genome-wide identification of NAC family and functional characterization of NtNAC236 in tobacco (Nicotiana tabacum).
2025cites this paper
Functional Annotation of Proteomes Using Protein Language Models: A High-Throughput Implementation of the ProtTrans Model.
2025cites this paper
ClusterEmbed: Efficient Protein Structure Prediction with Clustering and Embeddings
2025cites this paper
“Paraxenoviridae”, a putative family of globally distributed marine bacteriophages with double-stranded RNA genomes
2025cites this paper
HulaChrimson: A Chrimson-like cation channelrhodopsin discovered using freshwater metatranscriptomics from Lake Hula
2025influential citation
Full-Length Transcriptome Sequencing of Pinus massoniana Under Simulated Monochamus alternatus Feeding Highlights bHLH Transcription Factor Involved in Defense Response
2025influential citation
Evolution of endogenous retroviruses (ERVs) in the Bovinae subfamily.
2025cites this paper
Genomic and taxonomic characterization of Niallia pakistanensis sp. nov. NCCP-28T: a novel antibiotic-resistant and heavy-metal-tolerant bacterium isolated from the legume rhizosphere in Pakistan
2025cites this paper
Locality-aware pooling enhances protein language model performance across varied applications
2025cites this paper
Comparative transcriptome analysis of different tissues of Hylomecon japonica provides new insights into the biosynthesis pathway of triterpenoid saponins
2025cites this paper
Drought-induced transposon expression reveals complex drought response mechanisms in Brassica napus
2025cites this paper
The prototypic crAssphage is a linear phage-plasmid.
2025cites this paper
Sampling and ranking of protein conformations using machine learning techniques do not improve the quality of rigid protein-protein docking
2025cites this paper
The Cotton Gene Family Encoding Translationally Controlled Tumor Proteins and the Role of GhTCTP5 in Salt Tolerance.
2025cites this paper
PbrSYP71 regulates pollen tube growth by maintaining polar distribution of the endoplasmic reticulum in Pyrus.
2025cites this paper
Neural network conditioned to produce thermophilic protein sequences can increase thermal stability
2025cites this paper
Emerging frontiers in protein structure prediction following the AlphaFold revolution
2025cites this paper
Quaternary stabilization of a GH2 β‐galactosidase from the psychrophile A. ikkensis, a flexible and unstable dimeric enzyme
2025cites this paper
Apusomonad rhodopsins, a new family of ultraviolet to blue light absorbing rhodopsin channels
2025cites this paper
Genomic Characterization of BvMLO Genes in Sugar Beet Focusing on BvMLO2 BvMLO7 Responses to Cercospora beticola and Abiotic Stress
2025cites this paper
Haplotype-resolved genomes of Trichophyton mentagrophytes and Trichophyton tonsurans
2025cites this paper
In-cell discovery and characterization of a non-canonical bacterial protein translocation-folding complex
2025cites this paper
Origin and Evolution of Bacterial Periplasmic Force Transducers
2025cites this paper
Highly Optimized Simulation of Atomic Resolution Cell-Like Protein Environment.
2025cites this paper
“Paraxenoviridae”, a putative family of ubiquitous marine bacteriophages with double-stranded RNA genomes
2025influential citation
The architecture, assembly, and evolution of a complex flagellar motor
2025cites this paper
Unveiling the resistance: comparative genomic analysis of two novel cefiderocol-resistant Stenotrophomonas species from a Referral Hospital in Mexico City.
2025cites this paper
Improving AlphaFold2 and 3-based protein complex structure prediction with MULTICOM4 in CASP16
2025cites this paper
deep-Sep: a deep learning-based method for fast and accurate prediction of selenoprotein genes in bacteria
2025influential citation
Boosting Protein-Protein Interaction Detection with AlphaFold Multimer and Transformers
2025cites this paper
Efficient C25-Hydroxylation of Vitamin D3 Utilizing an Artificial Self-Sufficient Whole-Cell Cytochrome P450 Biocatalyst.
2025cites this paper
Improving Stereochemical Limitations in Protein–Ligand Complex Structure Prediction
2025cites this paper
Comprehensive characterization of volatile terpenoids and terpene synthases in Lanxangia tsaoko
2025cites this paper
The aPBP-type cell wall synthase PBP1b plays a specialized role in fortifying the Escherichia coli division site against osmotic rupture
2025cites this paper
Preservation of Organic Matter Within Primary Fluid Inclusions in Late Middle Pleistocene Halite From the Mars‐Analog Qaidam Basin
2025cites this paper
PepPCBench is a Comprehensive Benchmark for Protein-Peptide Complex Structure Prediction with AlphaFold3
2025cites this paper
Taxonomic distribution of SbmA/BacA and BacA-like antimicrobial peptide transporters suggests independent recruitment and convergent evolution in host–microbe interactions
2025cites this paper
A Benchmarking Platform for Assessing Protein Language Models on Function-related Prediction Tasks
2025cites this paper
Coevolution in human small Heat Shock Protein 1 is promoted by interactions between the Alpha-Crystallin domain and the disordered regions
2025cites this paper
Functional alignment of protein language models via reinforcement learning
2025cites this paper
How do bacterial extracellular Contractile Injection Systems bind target cells? A remarkable diversity of receptor binding domains
2025cites this paper
Organic carbon oxidation state shapes fermentative methanogenic microbiomes and controls greenhouse gas fluxes
2025cites this paper
Extreme multivalency and a composite short linear motif facilitate PCNA‐binding, localisation and abundance of p21 (CDKN1A)
2025cites this paper
Steering Generative Models with Experimental Data for Protein Fitness Optimization
2025cites this paper
Rethinking Text-based Protein Understanding: Retrieval or LLM?
2025cites this paper
High-quality reference genome and population analysis of allotetraploid Elymus sibiricus provide insight into genome origin and environmental adaptations to the Qinghai-Tibetan Plateau
2025cites this paper
Diverse Sulfuriferula spp. from sulfide mineral weathering environments oxidize ferrous iron and reduced inorganic sulfur compounds
2025cites this paper
Host-parasite coevolution leads to underwater respiratory adaptations in extreme diving insects, seal lice (Lepidophthirus macrorhini)
2025cites this paper
Transcriptomic analysis of non-model Drosophilidae reveals novel AMP candidates
2025influential citation
Protriever: End-to-End Differentiable Protein Homology Search for Fitness Prediction
2025cites this paper
Advancing in silico drug design with Bayesian refinement of AlphaFold models
2025cites this paper
In-depth analysis of 17,115 rice transcriptomes reveals extensive viral diversity in rice plants
2025cites this paper