Exploiting sparseness in de novo genome assembly

Chengxi Ye,Z. Ma,C. Cannon,Mihai Pop,Douglas W. Yu

Published 2012 in BMC Bioinformatics

ABSTRACT

The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k- mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k- mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

PUBLICATION RECORD

Publication year
2012
Venue
BMC Bioinformatics
Publication date
2012-04-19
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1186/1471-2105-13-S6-S1 PMID 22537038 PMCID 3369186
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

GAGE: A critical evaluation of genome assemblies and assembly algorithms.
2012cited by this paper
Efficient de novo assembly of large genomes using compressed data structures.
2012influential reference
A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies
2011cited by this paper
Comparative studies of de novo assembly tools for next-generation sequencing technologies
2011cited by this paper
Efficient counting of k-mers in DNA sequences using a bloom filter
2011cited by this paper
FLASH: fast length adjustment of short reads to improve genome assemblies
2011cited by this paper
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
2011cited by this paper
De novo assembly of human genomes with massively parallel short read sequencing.
2010cited by this paper
High-quality draft assemblies of mammalian genomes from massively parallel sequence data
2010cited by this paper
Succinct data structures for assembling large genomes
2010cited by this paper
ABySS: a parallel assembler for short read sequence data.
2009cited by this paper
Bioinformatics challenges of new sequencing technology.
2008cited by this paper
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.
2008cited by this paper
Genome assembly forensics: finding the elusive mis-assembly
2008cited by this paper
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
2007cited by this paper
SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing.
2007cited by this paper
Assembling millions of short DNA sequences using SSAKE
2006cited by this paper
Toward an Individual Approach to Methadone Therapy of Heroin Addicts
2006cited by this paper
The fragment assembly string graph
2005cited by this paper
Versatile and open software for comparing large genomes
2004cited by this paper
The Atlas genome assembly system.
2004cited by this paper
Fragment assembly with short reads
2004cited by this paper
Reducing storage requirements for biological sequence comparison
2004cited by this paper
The phusion assembler.
2003cited by this paper
ARACHNE: a whole-genome shotgun assembler.
2002cited by this paper
An Eulerian path approach to DNA fragment assembly
2001cited by this paper
A whole-genome assembly of Drosophila.
2000cited by this paper
BIOINFORMATICS ORIGINAL PAPER
year unknowncited by this paper

CITED BY

Gene expression profiling of the carbon pathways and virulence factors of Candida albicans in different carbon sources.
2026cites this paper
Comparative analysis of the GATA transcription factors in seven Ipomoea species
2025cites this paper
Development and validation of a LAMP-based method for rapid and reliable detection of Xanthomonas albilineans, the causal agent of sugarcane leaf scald
2025cites this paper
PlastidHub: An integrated analysis platform for plastid phylogenomics and comparative genomics
2025cites this paper
Development of viability-quantitative PCR with propidium monoazide for assessment of white spot syndrome virus structural integrity and viability
2025cites this paper
Heterogeneous expression of BMP7, HIF-2α, Ki-67, and E-Cadherin in clear cell renal carcinoma: Prognostic implications based on tumor region.
2025cites this paper
Characterization of a direct role of GnRHs in the control of spermiogenesis and steroidogenesis in the small-spotted catshark Scyliorhinus canicula.
2025cites this paper
The fish pituitary directly responds to daylength and drives seasonality.
2025cites this paper
eDNA confirms presence of critically endangered big-headed turtle to support community conservation
2025cites this paper
Development and validation of a carnitine cycle and transport disorders (CCD) panel: an ONT-compatible multi-gene diagnostic kit for newborn and selective screening
2025cites this paper
Transcription of genes involved in bleaching of a coral reef species Acropora downingi (Wallace, 1999) in response to high temperature.
2025cites this paper
Genome-wide association study and candidate gene identification for fall armyworm resistance in maize (Zea mays L.).
2025cites this paper
Unraveling a novel missense mutation (c.A248C) in Wiskott-Aldrich syndrome gene by whole exome sequencing: Insights from dynamic simulation, molecular docking and in-silico studies.
2025cites this paper
Reanalysis of Exome Sequencing Data in the Indian Undiagnosed Diseases Program: Improving Diagnostic Yield and Ending Diagnostic Odyssey
2025cites this paper
Sub-chronic nanoplastic toxicity in Etroplus suratensis (Pisces, Cichilidae): Insights into tissue accumulation, stress and metabolic disruption.
2025cites this paper
Salicylic acid and methyl jasmonate activate key genes of plant-defense pathways conferring partial protection to Polystigma amygdalinum in a susceptible almond cultivar.
2025cites this paper
Efficient De Novo Assembly and Recovery of Microbial Genomes from Complex Metagenomes Using a Reduced Set of k-mers
2025cites this paper
Whole genome sequencing, assembly and annotation of Curvularia geniculata strain SKG23 isolated from leaf spot disease of mentha (Mentha arvensis L.)
2025cites this paper
PrimeSpecPCR: Python toolkit for species-specific DNA primer design and specificity testing
2025cites this paper
Pleiotropic signaling of single-chain thyrostimulin (GPB5-GPA2) on homologous Glycoprotein Hormone Receptors (ScFSHR, ScLHR, ScTSHR) in the elasmobranch Scyliorhinus canicula reproduction.
2025cites this paper
Neuronal degeneration, mitochondrial dysfunction, and disturbance of movements induced by rotenone in the ascidian Styela plicata.
2025cites this paper
The olfactory receptor OR51E2 regulates prostate cancer aggressiveness and modulates STAT3 in prostate cancer cells and in xenograft tumors
2025cites this paper
IsoPrimer: a pipeline for designing isoform-aware primer pairs for comprehensive gene expression quantification
2025cites this paper
High-throughput approaches for the identification of ribosome heterogeneity
2025cites this paper
PHLDA1 silencing in IMR-32 human neuroblastoma cells results in ABCB1 overexpression, augments chemoresistance and leads to increased growth of tumors
2025cites this paper
Detection of Diaporthe cinerascens from latent and symptomatic fig tree cankers using species-specific primers
2025cites this paper
Effects of a dietary polyphenol on the growth, digestive physiology and nutritional quality of Australian hybrid abalone (Haliotis laevigata x H. rubra) under benign and stressful temperatures
2025cites this paper
Pharmacomodulation of G-quadruplexes in long non-coding RNAs dysregulated in colorectal cancer
2025cites this paper
Transcriptomic analysis of laser-capture microdissected tumors reveals RAD51AP1 as a tumor-specific marker associated progression from pancreatic intraepithelial neoplasia to invasive pancreatic cancer
2025cites this paper
Phylogenetic Analysis of Grapevine Fanleaf Virus, Grapevine Virus A, and Grapevine Leafroll-Associated Virus 3 in Kazakhstan
2025cites this paper
BMP and WNT signaling are involved in tracheal cartilage development in chicken embryos.
2025cites this paper
Genome Assembly, Polishing, and Analysis of the Chytrid Batrachochytrium salamandrivorans.
2025cites this paper
Identification of QTLs for aphid (Melanaphis sacchari) resistance in sorghum (Sorghum bicolor) based on BSA-seq and analysis of candidate genes
2025cites this paper
Sequencing, assembly and annotation of whole genome with microsatellites of Lasiodiplodia theobromae strain RAH19 isolated from plastic waste
2025cites this paper
Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning
2024cites this paper
Characterization of gonadotropins and their receptors in a chondrichthyan, Scyliorhinus canicula, fills a gap in the understanding of their coevolution.
2024cites this paper
Effect of larval rearing temperature on steroidogenesis pathway development in winter flounder (Pseudopleuronectes americanus) early life history.
2024cites this paper
“From Grass to Yeast; Functional insights from heterologous expression of LfHKT2;1 in ion regulation”
2024cites this paper
Screening Host Genomic Data for Wolbachia Infections.
2024cites this paper
Comprehensive analysis of hairpin small RNAs involved in resistance and pathogenesis during wheat-Puccinia triticina interactions.
2024cites this paper
High-resolution genetic map and SNP chip for molecular breeding in Panax ginseng, a tetraploid medicinal plant
2024cites this paper
Novel viruses discovered in metatranscriptomic analysis of farmed barramundi in Asia and Australia.
2024cites this paper
Zooarchaeological and ancient DNA identification of a non-local gopher tortoise (Gopherus polyphemus) in New Orleans, Louisiana, USA
2024cites this paper
Development of a novel crAss-like phage detection method with a broad spectrum for microbial source tracking.
2024cites this paper
Development of a novel triplex-PCR assay for the identification of feline hemoplasma species and survey of hemoplasma species in cats in Türkiye.
2024cites this paper
Dental pulp mesenchymal stem cells-response to fibrin hydrogel reveals ITGA2 and MMPs expression
2024cites this paper
Who bit the boat? New DNA collection and genomic methods enable species identification in suspected shark-related incidents.
2024cites this paper
Utilizing novel Escherichia coli‐specific conserved signature proteins for enhanced monitoring of recreational water quality
2024cites this paper
Variation in the AA-NAT gene G203A is associated with Awassi and Hamdani sheep fertility
2024cites this paper
Mitogenomic recognition of incognito lineages in the mud spiny lobster Panulirus polyphagus (Herbst, 1793): A tale of unique genetic structuring and diversification.
2024cites this paper
Genome assembly in the telomere-to-telomere era
2024cites this paper
Prolonged Heat Stress during Winter Diapause Alters the Expression of Stress-Response Genes in Ostrinia nubilalis (Hbn.)
2024cites this paper
Alfin-like (AL) transcription factor family in Oryza sativa L.: Genome-wide analysis and expression profiling under different stresses
2024cites this paper
Development of PCR-based assays for the detection of the evident and latent infection with Stilbocrea banihashemiana, the causal agent of fruit tree cankers
2024cites this paper
High-risk ARGs (HRA) Chip: A high-throughput qPCR-based array for assessment of high-risk ARGs from the environment.
2024cites this paper
A novel Leifsonia xyli subsp. xyli quantitative LAMP-based diagnostic correlated with sugarcane ratoon stunting disease rating
2024cites this paper
The unknown unknown: A framework for assessing environmental DNA assay specificity against unsampled taxa.
2024cites this paper
Genomic resources for the Yellowfin tuna Thunnus albacares
2024cites this paper
Transcriptome Profiling of a Soybean Mutant with Salt Tolerance Induced by Gamma-ray Irradiation
2024cites this paper
Effects of dried okra extract on lipid profile, renal function and some RAGE-related inflammatory genes expression in patients with diabetic nephropathy: A randomized controlled trial.
2024cites this paper
Chromosome level assemblies of Nakaseomyces (Candida) bracarensis uncover two distinct clades and define its adhesin repertoire
2024cites this paper
The Statistics of Parametrized Syncmers in a Simple Mutation Process Without Spurious Matches
2024cites this paper
Replacement Nellore heifers receiving supplementation under different herbage allowance: effects on forage characteristics, performance, physiology, and reproduction.
2024cites this paper
CDR identification, epitope mapping and binding affinity determination of novel monoclonal antibodies generated against human apolipoprotein B-100.
2024cites this paper
Brief guide to RT-qPCR
2024cites this paper
Long-read sequencing reveals the RNA isoform repertoire of neuropsychiatric risk genes in human brain
2024cites this paper
Early‐Branching Cyanobacteria Grow Faster and Upregulate Superoxide Dismutase Activity Under a Simulated Early Earth Anoxic Atmosphere
2024cites this paper
N6-methyladenosine participates in mouse hippocampus neurodegeneration via PD-1/PD-L1 pathway
2023cites this paper
Tissue-specific transcriptional response of post-larval clownfish to ocean warming.
2023cites this paper
6mA DNA Methylation on Genes in Plants Is Associated with Gene Complexity, Expression and Duplication
2023cites this paper
Exhaustive benchmarking of de novo assembly methods for eukaryotic genomes
2023influential citation
Density and Conservation Optimization of the Generalized Masked-Minimizer Sketching Scheme
2023cites this paper
Mesenchymal stem cells of Oravka chicken breed: promising path to biodiversity conservation
2023cites this paper
Primerdiffer: a python command-line module for large-scale primer design in haplotype genotyping
2023cites this paper
Population dynamics and insecticide resistance in Tuta absoluta (Lepidoptera: Gelechiidae), an invasive pest on tomato in Kenya
2023cites this paper
Creating and Using Minimizer Sketches in Computational Genomics
2023cites this paper
Suboptimal zinc supply affects the S-nitrosoglutathione reductase enzyme and nitric oxide signaling in Arabidopsis
2023cites this paper
Chronic stress decreases lactation performance.
2023cites this paper
Transcriptome analysis reveals the mechanism underlying the tetracycline resistance of Lactiplantibacillus plantarum FZJTZ19M1 and FZJTZ29M8
2023cites this paper
Antiemetic effects of sclareol, possibly through 5-HT3 and D2 receptor interaction pathways: In-vivo and in-silico studies.
2023cites this paper
A novel de novo frameshift variant in the CHD2 gene related to intellectual and developmental disability, seizures and speech problems
2023cites this paper
rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R.
2023cites this paper
To design, or not to design? Comparison of beetle ultraconserved element probe set utility based on phylogenetic distance, breadth, and method of probe design
2023cites this paper
Iron restriction increases the expression of a cytotoxic cysteine proteinase TvCP2 by a novel mechanism of tvcp2 mRNA alternative polyadenylation in Trichomonas vaginalis.
2023cites this paper
Diverse methods of reducing and confirming false-positive results of loop-mediated isothermal amplification assays: A review.
2023cites this paper
Detection of Tirmania pinoyi in Roots of Inoculated Cistaceae Plant Species by Nested Polymerase Chain Reaction
2023cites this paper
Validating duplex-PCR targeting ND2 for bovine and porcine detection in meat products
2023cites this paper
Design of a data set of qPCR primers for the early region of Human Papillomavirus oncogenic types 16 and 18
2023cites this paper
Efficient large-scale screening of viral pathogens by fragment length identification of pooled nucleic acid samples (FLIPNAS).
2023cites this paper
Parasite manipulation of host phenotypes inferred from transcriptional analyses in a trematode‐amphipod system
2023cites this paper
Performance of intron 7 of the β-fibrinogen gene for phylogenetic analysis: An example using gladiator frogs, Boana Gray, 1825 (Anura, Hylidae, Cophomantinae)
2023cites this paper
Genome assembly in the telomere-to-telomere era
2023cites this paper
A Review on Machine-Learning and Nature-Inspired Algorithms for Genome Assembly
2023cites this paper
Sympatric and allopatric Alcolapia soda lake cichlid species show similar levels of assortative mating
2023cites this paper
Mitochondrial Genome Sequence of Salvia officinalis (Lamiales: Lamiaceae) Suggests Diverse Genome Structures in Cogeneric Species and Finds the Stop Gain of Genes through RNA Editing Events
2023cites this paper
Positive effect of miR-2392 on fibroblast to cardiomyocyte-like cell fate transition: an in silico and in vitro study.
2023cites this paper
Microspore embryogenesis induction by mannitol and TSA results in a complex regulation of epigenetic dynamics and gene expression in bread wheat
2023cites this paper
Molecular and phenotypical findings of a novel de novo SYNGAP1 gene variant in an 11-year-old Iranian boy with intellectual disability.
2023cites this paper
Understanding of molecular basis of histological graded horn cancer by transcriptome profiling.
2023cites this paper
Bringing up to date the toolkit for the catabolism of aromatic compounds in fungi: The unexpected 1,2,3,5‐tetrahydroxybenzene central pathway
2023cites this paper