Multiple alignment of protein sequences with repeats and rearrangements

Tu Minh Phuong,Chuong B. Do,Robert C. Edgar,S. Batzoglou

Published 2006 in Nucleic Acids Research

ABSTRACT

Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, because they either assume that the input sequences are globally alignable or require locally alignable regions to appear in the same order in all sequences. In this paper, we present ProDA, a novel system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions. On a subset of the BAliBASE benchmarking suite containing curated alignments of proteins with complicated domain architectures, ProDA performs well in detecting conserved domain boundaries and clustering domain segments, achieving the highest accuracy to date for this task. We conclude that ProDA is a practical tool for automated alignment of protein sequences with repeats and rearrangements in their domain architecture.

PUBLICATION RECORD

Publication year
2006
Venue
Nucleic Acids Research
Publication date
2006-11-01
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1093/nar/gkl511 PMID 17068081 PMCID 1635250
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Alignment
2009cited by this paper
The Pfam protein families database
2007influential reference
Progressive sequence alignment as a prerequisitetto correct phylogenetic trees
2007cited by this paper
Global multiple‐sequence alignment with repeats
2006influential reference
Fragment Assembly
2006cited by this paper
Problems and Solutions in Biological Sequence Analysis
2006cited by this paper
ProbCons: Probabilistic consistency-based multiple sequence alignment.
2005influential reference
DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
2005cited by this paper
Tracking repeats using significance and transitivity
2004influential reference
The Pfam protein families database
2004cited by this paper
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
2004influential reference
A novel method for multiple alignment of sequences with repeated and shuffled elements.
2004influential reference
The ProDom database of protein domain families: more emphasis on 3D
2004influential reference
Automatic prediction of protein domains from sequence information using a hybrid learning system
2004cited by this paper
SATCHMO: Sequence Alignment and Tree Construction Using Hidden Markov Models
2003cited by this paper
Multiple sequence alignment using partial order graphs
2002cited by this paper
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
2002cited by this paper
Protein domain identification and improved sequence similarity searching using PSI‐BLAST
2002cited by this paper
BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations
2001influential reference
Domain combinations in archaeal, eubacterial and eukaryotic proteomes.
2001influential reference
Mocca: semi-automatic method for domain hunting
2001cited by this paper
Rapid automatic detection and alignment of repeats in protein sequences
2000influential reference
Multiple sequence alignment in phylogenetic analysis.
2000influential reference
Amino acid substitution matrices.
2000cited by this paper
Domain size distributions can predict domain boundaries
2000influential reference
Identifying DNA and protein patterns with statistically significant alignments of multiple sequences
1999cited by this paper
DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment
1999influential reference
A comprehensive comparison of multiple sequence alignment programs
1999cited by this paper
A fast algorithm for genome‐wide analysis of proteins with repeated sequences
1999cited by this paper
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
1998influential reference
Pfam: multiple sequence alignments and HMM-profiles of protein domains
1998cited by this paper
A symmetric-iterated multiple alignment of protein sequences.
1998cited by this paper
DIALIGN: finding local similarities by multiple sequence alignment
1998influential reference
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997cited by this paper
Multiple DNA and protein sequence alignment based on segment-to-segment comparison.
1996influential reference
Combining evolutionary information and neural networks to predict protein secondary structure
1994cited by this paper
Modular arrangement of proteins as inferred from analysis of homology
1994cited by this paper
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
1994cited by this paper
Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer
1994cited by this paper
A method to recognize distant repeats in protein sequences
1993cited by this paper
Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.
1993cited by this paper
Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation
1993influential reference
Amino acid substitution matrices from protein blocks.
1992cited by this paper
A workbench for multiple alignment construction and analysis
1991cited by this paper
Basic local alignment search tool.
1990cited by this paper
A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons.
1987cited by this paper
Identification of common molecular subsequences.
1981cited by this paper
Nucleic Acid Research
1967cited by this paper

CITED BY

Overview of the modern approach of sequence alignment algorithms
2025cites this paper
Discovery and Analysis of Repeat and Low-Complexity Architectures in Proteins and Their Conserved Evolutionary Relationships Using Self-Homology Dot Plots.
2025cites this paper
Pre-trained Language Models for Decoding Protein Language: a Survey
2024cites this paper
Reducing the Impact of Domain Rearrangement on Sequence Alignment and Phylogeny Reconstruction
2023influential citation
Transformer-based deep learning for predicting protein properties in the life sciences
2023cites this paper
Next-Generation DNA Barcoding for Fish Identification Using High-Throughput Sequencing in Tai Lake, China
2023cites this paper
Dynamic Programming Algorithms for Discovery of Antibiotic Resistance in Microbial Genomes
2023cites this paper
A benchmark study of sequence alignment methods for protein clustering
2018cites this paper
Genetic distance between complex repeats
2018cites this paper
Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation
2017cites this paper
A Method of Multiple Protein Sequence Alignment Using a Hybrid Approach
2017cites this paper
Multiple Sequence Alignment.
2017cites this paper
Statistical Approaches to Detecting and Analyzing Tandem Repeats in Genomic Sequences
2015cites this paper
RNA motif discovery: a computational overview
2015cites this paper
DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment
2015cites this paper
Title GLProbs : Aligning multiple sequences adaptively
2015cites this paper
GLProbs: Aligning Multiple Sequences Adaptively
2015cites this paper
Is Sequence Alignment an Art or a Science?
2015cites this paper
Distribution on Contingency of Alignment of Two Literal Sequences Under Constrains
2014cites this paper
GLProbs: Aligning multiple sequences adaptively
2013cites this paper
Large-Scale Multiple Sequence Alignment and Phylogeny Estimation
2013cites this paper
Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
2013cites this paper
Fighting against uncertainty: an essential issue in bioinformatics
2013cites this paper
Alignment of genomic sequences with intrinsic disorder and tandem repeats
2013cites this paper
Models and Algorithms for Genome Evolution
2013cites this paper
Graph-based modeling of tandem repeats improves global multiple sequence alignment
2013cites this paper
Phylogenetic Analyses Uncover a Novel Clade of Transferrin in Nonmammalian Vertebrates
2012cites this paper
Shape-based alignment of genomic landscapes in multi-scale resolution
2012cites this paper
A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)
2012influential citation
PHYLOGENETIC ANALYSES UNCOVER A NOVEL CLADE OF TRANSFERRIN IN NON-MAMMALIAN VERTEBRATES Research article
2012cites this paper
Contribution à l'analyse des séquences de protéines similarité, clustering et alignement
2011cites this paper
Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection
2011cites this paper
Two Problems in Computational Genomics
2011cites this paper
A Theoretical Model for Whole Genome Alignment
2011cites this paper
progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement
2010cites this paper
Algorithms in comparative genomics
2010cites this paper
Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening.
2010cites this paper
The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment
2010cites this paper
Finding the balance between the mathematical and biological optima in multiple sequence alignment
2010cites this paper
A framework for phylogenetic sequence alignment
2009cites this paper
25 Protein Multiple Sequence Alignment
2009cites this paper
Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement
2009cites this paper
CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score
2009cites this paper
A local multiple alignment method for detection of non-coding RNA sequences
2009cites this paper
Protein multiple sequence alignment.
2008cites this paper
Functional Proteomics
2008cites this paper
Multiple protein sequence alignment.
2008cites this paper
Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities
2008cites this paper
The Basics of Protein Sequence Analysis
2008cites this paper
Recent developments in the MAFFT multiple sequence alignment program
2008cites this paper