Readjoiner: a fast and memory efficient string graph-based sequence assembler

Published 2012 in BMC Bioinformatics

ABSTRACT

Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads. Here we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only. Our suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.

PUBLICATION RECORD

Publication year
2012
Venue
BMC Bioinformatics
Publication date
2012-05-06
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1186/1471-2105-13-82 PMID 22559072 PMCID 3507659
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Efficient de novo assembly of large genomes using compressed data structures.
2012influential reference
A New Efficient Data Structure for Storage and Retrieval of Multiple Biosequences
2012cited by this paper
Localized Genome Assembly from Reads to Scaffolds: Practical Traversal of the Paired String Graph
2011cited by this paper
Assemblathon 1: a competitive assessment of de novo short read assembly methods.
2011cited by this paper
Hapsembler: An Assembler for Highly Polymorphic Genomes
2011cited by this paper
Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes
2011cited by this paper
Quake: quality-aware detection and correction of sequencing errors
2010cited by this paper
A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly
2010influential reference
Efficient algorithms for the all-pairs suffix-prefix problem and the all-pairs substring-prefix problem
2010influential reference
Efficient construction of an assembly string graph using the FM-index
2010influential reference
ABySS: a parallel assembler for short read sequence data.
2009cited by this paper
Next-generation gap
2009cited by this paper
De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.
2008cited by this paper
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.
2008cited by this paper
Short read fragment assembly of bacterial genomes.
2008cited by this paper
Engineering Radix Sort for Strings
2008cited by this paper
The fragment assembly string graph
2005influential reference
Versatile and open software for comparing large genomes
2004cited by this paper
Replacing suffix trees with enhanced suffix arrays
2004cited by this paper
Hierarchical scaffolding with Bambus.
2003cited by this paper
An Eulerian path approach to DNA fragment assembly
2001cited by this paper
The string B-tree: a new data structure for string search in external memory and its applications
1999cited by this paper
Introduction to algorithms
1996influential reference
Toward Simplifying and Accurately Formulating Fragment Assembly
1995cited by this paper
Engineering a sort function
1993cited by this paper
Suffix arrays: a new method for on-line string searches
1993cited by this paper
An Efficient Algorithm for the All Pairs Suffix-Prefix Problem
1992cited by this paper
Trie memory
1960cited by this paper
BIOINFORMATICS APPLICATIONS
year unknowncited by this paper
Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access
year unknowncited by this paper

CITED BY

Practical algorithms for Hierarchical overlap graphs
2024cites this paper
Unlocking plant genetics with telomere-to-telomere genome assemblies
2024cites this paper
Memory-Efficient All-Pair Suffix-Prefix Overlaps on GPU
2023cites this paper
Distributed RMI-DBG model: Scalable iterative de Bruijn graph algorithm for short read genome assembly problem
2023cites this paper
Suffix-Prefix Queries on a Dictionary
2023cites this paper
Graph theoretical Strategies in De Novo Assembling
2022cites this paper
English Speech Scoring System Based on Computer Neural Network
2022cites this paper
All-pairs suffix/prefix in optimal time using Aho-Corasick space
2022influential citation
Biological computation and computational biology: survey, challenges, and discussion
2021cites this paper
Genome-scale de novo assembly using ALGA
2021cites this paper
A Deep Dive into Genome Assemblies of Non-vertebrate Animals
2021cites this paper
RMI-DBG algorithm: A more agile iterative de Bruijn graph algorithm in short read genome assembly
2021cites this paper
A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs
2021cites this paper
Efficient Construction of Hierarchical Overlap Graphs
2020cites this paper
Algorithmic and computational comparison of metagenome assemblers
2020cites this paper
All Pairs Suffix-Prefix Matches using Enhanced Suffix Array
2020cites this paper
String Processing and Information Retrieval: 27th International Symposium, SPIRE 2020, Orlando, FL, USA, October 13–15, 2020, Proceedings
2020cites this paper
Methods to improve short fragment NGS analysis - with a focus on ancient DNA
2019influential citation
Latest Advances in Solving the All-Pairs Suffix Prefix Problem
2019cites this paper
Improving the sensitivity of long read overlap detection using grouped short k-mer matches
2019cites this paper
Graph Theory and Definitions
2019cites this paper
Hidden Markov Model Based Graph Construction Process for DNA Sequence Assembly
2019cites this paper
TAR-VIR: a pipeline for TARgeted VIRal strain reconstruction from metagenomic data
2019cites this paper
Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era
2019influential citation
SOF: An Efficient String Graph Construction Algorithm
2019influential citation
Efficient String Graph Construction Algorithm
2019cites this paper
Computational Haplotyping: Theory and Practice
2018cites this paper
Hierarchical Overlap Graph
2018cites this paper
Extended suffix array construction using Lyndon factors
2018cites this paper
De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding
2018cites this paper
Two Efficient Techniques to Find Approximate Overlaps between Sequences
2017cites this paper
Self-indexing for de novo assembly
2017influential citation
GfaPy: a flexible and extensible software library for handling sequence graphs in Python
2017cites this paper
Two Efficient Techniques to Find Approximate Overlaps between Sequences.
2017cites this paper
A fast algorithm for the all-pairs suffix-prefix problem
2017cites this paper
Algorithm Engineering for All-Pairs Suffix-Prefix Matching
2017cites this paper
An improved algorithm for the all-pairs suffix-prefix problem
2016cites this paper
LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly
2016cites this paper
BASE: a practical de novo assembler for large genomes using long NGS reads
2016cites this paper
Parallel Computation for the All-Pairs Suffix-Prefix Problem
2016cites this paper
Assembly and Application to the Tomato Genome
2016cites this paper
Hadooping the genome: The impact of big data tools on biology
2016cites this paper
FSG: Fast String Graph Construction for De Novo Assembly
2016cites this paper
De-Novo Assembly of Short Reads in Minimal Overlap Model
2015cites this paper
Sequencing of plant genomes - a review
2015cites this paper
SeedsGraph: an efficient assembler for next-generation sequencing data
2015cites this paper
The Theory and Practice of Genome Sequence Assembly.
2015cites this paper
A Practical and Scalable Tool to Find Overlaps between Sequences
2015cites this paper
Reconstructing 16S rRNA genes in metagenomic data
2015influential citation
MLSB 14 The eighth International Workshop on Machine Learning in Systems Biology 6-7 September 2014
2014influential citation
Next-Generation Sequence Assemblers
2014cites this paper
Approaches and Challenges of Next-Generation Sequence Assembly Stages
2014cites this paper
String graph construction using incremental hashing
2014influential citation
A brief overview of the size and composition of the myrtle rust genome and its taxonomic status
2014cites this paper
A bioinformatician’s guide to the forefront of suffix array construction algorithms
2014cites this paper
Next Generation Sequencing Technologies and Challenges in Sequence Assembly
2014cites this paper
Binary classification of metagenomic samples using discriminative DNA superstrings
2014cites this paper
Data compression for sequencing data
2013cites this paper
Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
2013cites this paper
GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations
2013cites this paper
Final Report Genome sequencing of myrtle rust , Puccinia psidii sensu lato
2013influential citation
Algorithms for dna sequence assembly and motif search
2012cites this paper
Building approximate overlap graphs for DNA assembly using random-permutations-based search
2012cites this paper
Distributed under Creative Commons Cc-by 4.0 Rgfa: Powerful and Convenient Handling of Assembly Graphs
year unknowncites this paper
Suﬀix-Prefix Queries on a Dictionary
year unknowncites this paper
All-pairs suﬀix/prefix in optimal time using Aho-Corasick space
year unknowncites this paper