Adaptive seeds tame genomic sequence comparison.

S. Kiełbasa,R. Wan,Kengo Sato,P. Horton,M. Frith

Published 2011 in Genome Research

ABSTRACT

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

PUBLICATION RECORD

Publication year
2011
Venue
Genome Research
Publication date
2011-03-01
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1101/gr.113985.110 PMID 21209072 PMCID PMC3044862
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Incorporating sequence quality data into alignment improves DNA read mapping
2010cited by this paper
Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content
2010influential reference
Sequencing technologies — the next generation
2010cited by this paper
Supporting Online Material for “Adaptive seeds tame genomic sequence comparison”
2010cited by this paper
A survey of sequence alignment algorithms for next-generation sequencing
2010cited by this paper
Parameters for accurate genome alignment
2010cited by this paper
The regulated retrotransposon transcriptome of mammalian cells
2009cited by this paper
How to map billions of short reads onto genomes
2009cited by this paper
On Subset Seeds for Protein Alignment
2009cited by this paper
DisLex : a Transformation for Discontiguous Suffix Array Construction
2009cited by this paper
Space Efficient Computation of Rare Maximal Exact Matches between Multiple Sequences
2008cited by this paper
Database indexing for production MegaBLAST searches
2008cited by this paper
Improved pairwise alignment of genomic dna
2007influential reference
A taxonomy of suffix array construction algorithms
2007cited by this paper
WindowMasker: window-based masker for sequenced genomes
2006cited by this paper
EPD in its twentieth year: towards complete promoter coverage of selected model organisms
2005cited by this paper
A unifying framework for seed sensitivity and its application to subset seeds
2005cited by this paper
The genome of model malaria parasites, and comparative genomics.
2005influential reference
Replacing suffix trees with enhanced suffix arrays
2004cited by this paper
The MB2 gene family of Plasmodium species has a unique combination of S1 and GTP-binding domains
2004cited by this paper
Performing Local Similarity Searches with Variable Length Seeds
2004cited by this paper
Versatile and open software for comparing large genomes
2004cited by this paper
Performing Local Similarity Searches with Variable Length Seeds
2004cited by this paper
Human-mouse alignments with BLASTZ.
2003cited by this paper
BLAT--the BLAST-like alignment tool.
2002cited by this paper
PatternHunter: faster and more sensitive homology search
2002cited by this paper
Alu repeats and human genomic diversity
2002influential reference
Stage-dependent Localization of a Novel Gene Product of the Malaria Parasite, Plasmodium falciparum *
2001cited by this paper
Opportunistic data structures with applications
2000influential reference
Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment.
2000cited by this paper
[서평]「Algorithms on Strings, Trees, and Sequences」
2000cited by this paper
Tandem repeats finder: a program to analyze DNA sequences.
1999cited by this paper
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997influential reference
Engineering Radix Sort
1993cited by this paper
Suffix arrays: a new method for on-line string searches
1993cited by this paper

CITED BY

Transposable element dynamics in Arthropoda genomes and their impacts on the evolution of functional genes
2026cites this paper
Rumen microbiome biogeography and ventral epithelial architecture in three ruminant species.
2026cites this paper
Gene retroposition and functional diversification of retrocopies in the Rhus gall aphid Schlechtendalia chinensis.
2026cites this paper
Progress in Flax Genome Assembly from Nanopore Sequencing Data
2026cites this paper
Genome sequence of the ornamental plant Digitalis purpurea reveals the molecular basis of flower color and morphology variation
2026cites this paper
[Research progress on nanopore sequencing data alignment analysis methods and reference databases].
2026cites this paper
Incomplete lineage sorting shaped mixed traits during a colobine primate radiation
2026cites this paper
Scalable hierarchical protocol format inference via feature-heuristic message delimiter
2026cites this paper
A general and extensible algorithmic framework to biological sequence alignment across scales and applications
2026cites this paper
Metagenomic insights of antibiotic resistance genes in Laguna Lake, Phillipines through nanopore sequencing
2025cites this paper
A composition-matching algorithm, MatchIDR, identifies prion-like domains that localize to stress granules
2025cites this paper
A dirigent protein redirects extracellular terpenoid metabolism for defense against biotic challenges
2025cites this paper
Molecular characterization of arenavirus defective viral genomes reveals sequence features associated with their formation
2025cites this paper
Origin and stepwise evolution of vertebrate lungs
2025cites this paper
Probability-based sequence comparison finds pre-eutherian nuclear mitochondrial DNA segments in mammalian genomes
2025cites this paper
TripLexicon: prediction and analysis of gene regulatory RNA–DNA interactions
2025cites this paper
From benchmarking alignment of genome assemblies to IMGT annotation: the paradigm of the bovine Bos taurus T cell receptor (TRG) locus
2025cites this paper
Unraveling phylogenetic conflicts and adaptive evolution in Chiroptera using large-scale mitogenomes and nuclear genes
2025cites this paper
Metagenomics Reveal Dynamic Coastal Ocean Reservoir of Antibiotic Resistance Genes
2025cites this paper
CGC1, a new reference genome for Caenorhabditis elegans
2025cites this paper
Transposon Dynamics Drive Genome Evolution and Regulate Genetic Mechanisms of Agronomic Traits in Cotton
2025cites this paper
Allele Level Sequencing of Killer Cell Immunoglobulin‐Like Receptor Genes Using Oxford Nanopore Long Read Sequencing
2025cites this paper
Quantifying antibiotic resistome risks across environmental niches: the L-ARRAP for long-read metagenomic profiling
2025influential citation
The complete genome of a songbird
2025cites this paper
The Genome of Nothapodytes tomentosa and Characterization of Strictosidine Synthase Provide Insights into The Evolutionary Divergence of Camptothecin Biosynthesis
2025cites this paper
HAlign-G: rapid and low-memory multiple-genome aligner for large-scale closely related genomes
2025cites this paper
Pangeneric genome analyses reveal the evolution and diversity of the orchid genus Dendrobium
2025cites this paper
Lineage-specific expansion and functional divergence of β-keratin genes underlying shell evolution in turtles.
2025cites this paper
X-Mapper: fast and accurate sequence alignment via gapped x-mers
2025cites this paper
Benchmarking, detection, and genotyping of structural variants in a population of whole-genome assemblies using the SVGAP pipeline
2025cites this paper
A closed-loop method for precise genome size estimation using HiFi reads
2025cites this paper
Contributions of the Petabyte Scale Sequence Search Codeathon toward efforts to scale sequence-based searches on SRA
2025cites this paper
Whole Genome Duplication in the Genomics Era: The Hidden Gems in Invertebrates?
2025cites this paper
Genomic evidence for low genetic diversity but purging of strong deleterious variants in snow leopards
2025cites this paper
Pangenome analysis reveals structural variation associated with seed size and weight traits in peanut
2025influential citation
From reactants to products: computational methods for biosynthetic pathway design
2025cites this paper
Validation of a comprehensive long-read sequencing platform for broad clinical genetic diagnosis
2025cites this paper
Transcriptome Analysis of Genes Responsive to Nutrient Level Changes in the Marine Red Alga Pyropia yezoensis (Nori)
2025influential citation
Fast and Flexible Search for Homologous Biological Sequences with DECIPHER v3
2025cites this paper
Transcription factor binding divergence drives transcriptional and phenotypic variation in maize
2025cites this paper
FastGA: fast genome alignment
2025influential citation
Oxford Nanopore Third Generation Sequencing for Analysis of FMR1 5'UTR CGG Repeat Expansions.
2025cites this paper
Characterization of phosphorylation variants for identifying adaptive alleles in Zea.
2025cites this paper
High-quality draft genome sequence of Thermobifida halotolerans DSM 44931
2025cites this paper
Novel genomics insights into the molecular evolution of long-distance migratory mammals
2025cites this paper
A comprehensive benchmarking of adaptive sampling tools for nanopore sequencing
2025cites this paper
Parallelizing RNA-Seq Analysis with BioSkel: A FastFlow Based Prototype
2025cites this paper
Genomic Insights Into the Body Size Evolution in Mustelidae (Mammalia: Carnivora)
2025cites this paper
Genomic and genetic insights into speciation and pigment pattern diversification in Danio fishes
2025cites this paper
TIPP-SD: A New Method for Species Detection in Microbiomes
2025influential citation
Characterization of FLOWERING LOCUS T‐related genes and their putative gene regulatory network in semi‐winter Brassica napus cultivar Zhongshaung11
2025cites this paper
DREAM-Stellar: Parallel and space efficient exact local alignment
2025influential citation
Long‐read sequencing reveals genomic and epigenomic variation in the dark genome of human Alzheimer's disease
2025cites this paper
KegAlign: optimizing pairwise alignments with diagonal partitioning
2025cites this paper
Chromosome‐Level Genome Assembly of the Leafcutter Bee Megachile rotundata Reveals Its Ecological Adaptation and Pollination Biology
2025cites this paper
Global identification and functional characterization of Z‐DNA in rice
2025cites this paper
Haplotype‐resolved genome of a papeda provides insights into the geographical origin and evolution of Citrus
2025influential citation
Multi-level genomic convergence of secondary aquatic adaptation in marine mammals
2025cites this paper
Probability-Based Sequence Comparison Finds the Oldest Ever Nuclear Mitochondrial DNA Segments in Mammalian Genomes
2025cites this paper
Subgenome-informed statistical modeling of transcriptomes in 25 common wheat accessions reveals cis- and trans- regulation architectures.
2025cites this paper
Genome Assembly of a Living Fossil, the Atlantic Horseshoe Crab Limulus polyphemus, Reveals Lineage-Specific Whole-Genome Duplications, Transposable Element-Based Centromeres, and a ZW Sex Chromosome System
2025influential citation
Subcellular Enrichment Patterns of New Genes in Drosophila Evolution
2025cites this paper
Visualizing the transcription and replication of influenza A viral RNAs in cells by multiple direct RNA padlock probing and in situ sequencing (mudRapp-seq)
2025cites this paper
Evolution and genetic adaptation of fishes to the deep sea.
2025cites this paper
Epigenome and interactome profiling uncovers principles of distal regulation in the barley genome
2025cites this paper
Constraint of accessible chromatins maps regulatory loci involved in maize speciation and domestication
2025cites this paper
NEAR: neural embeddings for amino acid relationships
2025cites this paper
Integrative omics reveals mechanisms of biosynthesis and regulation of floral scent in Cymbidium tracyanum
2025cites this paper
Chromosome-level genome assembly of flathead asp (Pseudaspius leptocephalus)
2025cites this paper
SegMantX: A Novel Tool for Detecting DNA Duplications Uncovers Prevalent Duplications in Plasmids
2025cites this paper
Genome assemblies of Nuttall’s White-crowned sparrow (Zonotrichia leucophrys nuttalli) and Rufous-collared sparrow (Zonotrichia capensis)
2025cites this paper
Genomic signatures associated with the evolutionary loss of egg yolk in parasitoid wasps
2025cites this paper
Allelic variation and duplication of the dmrt1 were associated with sex chromosome turnover in three representative Scatophagidae fish species
2025cites this paper
Retroelement expansions underlie genome evolution in stingless bees
2025cites this paper
Giant endogenous viral elements in the genome of the model protist Euglena gracilis reveal past interactions with giant viruses
2025cites this paper
Genomic insights into the mechanisms of body size evolution in Serpentes
2025influential citation
Chromosome-level genome assembly of the synanthropic fly Chrysomya megacephala: insights into oviposition location
2025cites this paper
Genomes of critically endangered saola are shaped by population structure and purging
2025cites this paper
Global spatiotemporal patterns of demographic fluctuations in terrestrial vertebrates during the Late Pleistocene
2025cites this paper
Mesoscale eddies shape Prochlorococcus community structure and dynamics in the oligotrophic open ocean
2025cites this paper
Chromosome-scale genome assembly and annotation of two geographically distinct strains of malaria vector Anopheles albimanus
2025cites this paper
Lizards on a sky archipelago: Genomic approaches to the evolution of the mountain genus Iberolacerta
2025cites this paper
MitoDelta: identifying mitochondrial DNA deletions at cell-type resolution from single-cell RNA sequencing data
2025cites this paper
Bimodal centromeres in pentaploid dogroses shed light on their unique meiosis
2025cites this paper
Genomic infrastructure for cetacean research and conservation: reference genomes for eight families spanning the cetacean tree of life
2025cites this paper
Unveiling the Evolutionary History of European Vipers and Their Venoms From a Multi‐Omic Approach
2025cites this paper
Accurate, Scalable Structural Variant Genotyping in Complex Genomes at Population Scales
2025cites this paper
A CGG Repeat Expansion in CSNK1E Associated with Progressive Myoclonic Epilepsy with Incomplete Penetrance
2025cites this paper
Integrated metabolomic and transcriptomic analysis reveals the biosynthesis mechanism of dihydrochalcones in sweet tea (Lithocarpus litseifolius)
2025cites this paper
Convergent evolution through independent rearrangements in the primate amylase locus
2025cites this paper
A Survey on Sequence Alignment Algorithms and State-of-the-Art Aligners
2025cites this paper
Temporal genomic change in the Scandinavian Arctic fox (Vulpes lagopus)
2025cites this paper
NUMTsearcher: advancing detection and evolutionary insights of nuclear mitochondrial DNA segments across human, rabbit, and fish genomes.
2025cites this paper
CLASV: Rapid Lassa virus lineage assignment with random forest
2025cites this paper
MANUDB: database and application to retrieve and visualize mammalian NUMTs
2025cites this paper
De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome
2025cites this paper
Evolution of Large Polymorphic Inversions in a Panmictic Songbird
2025influential citation
AGNES: Adaptive Graph Neural Network and Dynamic Programming Hybrid Framework for Real-Time Nanopore Seed Chaining
2025influential citation
A deep learning model captures position-specific effects of plant regulatory sequences and suggests genes under complex regulation
2025influential citation
The genomic landscape of gene-level structural variations in Japanese and global soybean Glycine max cultivars
2025cites this paper