Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment

Yatish Turakhia,Kevin Zheng,G. Bejerano,W. Dally

Published 2017 in bioRxiv

ABSTRACT

Genomics is set to transform medicine and our understanding of life in fundamental ways. But the growth in genomics data has been overwhelming - far outpacing Moore’s Law. The advent of third generation sequencing technologies is providing new insights into genomic contribution to diseases with complex mutation events, but have prohibitively high computational costs. Over 1,300 CPU hours are required to align reads from a 54× coverage of the human genome to a reference (estimated using [1]), and over 15,600 CPU hours to assemble the reads de novo [2]. This paper proposes “Darwin” - a hardware-accelerated framework for genomic sequence alignment that, without sacrificing sensitivity, provides 125× and 15.6× speedup over the state-of-the-art software counterparts for reference-guided and de novo assembly of third generation sequencing reads, respectively. For pairwise alignment of sequences, Darwin is over 39,000× more energy-efficient than software. Darwin uses (i) a novel filtration strategy, called D-SOFT, to reduce the search space for sequence alignment at high speed, and (ii) a hardware-accelerated version of GACT, a novel algorithm to generate near-optimal alignments of arbitrarily long genomic sequences using constant memory for trace-back. Darwin is adaptable, with tunable speed and sensitivity to match emerging sequencing technologies and to meet the requirements of genomic applications beyond read assembly.

PUBLICATION RECORD

Publication year
2017
Venue
bioRxiv
Publication date
2017-01-24
Fields of study
Biology, Computer Science
Identifiers
DOI 10.1101/092171
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation
2016cited by this paper
Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications
2016cited by this paper
Graphicionado: A high-performance and energy-efficient accelerator for graph analytics
2016cited by this paper
Fast and sensitive mapping of nanopore sequencing reads with GraphMap
2016influential reference
Long-read sequence assembly of the gorilla genome
2016cited by this paper
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.
2015cited by this paper
A Novel High-Throughput Acceleration Engine for Read Alignment
2015cited by this paper
Big Data: Astronomical or Genomical?
2015cited by this paper
Optimal seed solver: optimizing seed selection in read mapping
2015cited by this paper
MinION Analysis and Reference Consortium: Phase 1 data release and analysis
2015cited by this paper
Excess of rare, inherited truncating mutations in autism
2015cited by this paper
Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome
2015cited by this paper
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
2014cited by this paper
Efficient Local Alignment Discovery amongst Noisy Long Reads
2014influential reference
Accelerating the Next Generation Long Read Mapping with the FPGA-Based System
2014influential reference
From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline
2013cited by this paper
The hallmarks of aging.
2013cited by this paper
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
2013influential reference
PBSIM: PacBio reads simulator - toward accurate genome assembly
2013cited by this paper
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
2012cited by this paper
Hardware Acceleration of Short Read Mapping
2012cited by this paper
Accelerating Millions of Short Reads Mapping on a Heterogeneous Architecture with FPGA Accelerator
2012cited by this paper
Oxford Nanopore announcement sets sequencing sector abuzz
2012cited by this paper
Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph.
2012cited by this paper
Prediction of eye and skin color in diverse populations using seven SNPs.
2011cited by this paper
A window into third-generation sequencing.
2010cited by this paper
A comprehensive catalogue of somatic mutations from a human cancer genome
2010cited by this paper
Fast and accurate long-read alignment with Burrows–Wheeler transform
2010influential reference
Real-time DNA sequencing from single polymerase molecules.
2010cited by this paper
A parallel FPGA design of the Smith-Waterman traceback
2010influential reference
Synthesis of a Parallel Smith-Waterman Sequence Alignment Kernel into FPGA Hardware
2009cited by this paper
SOAP2: an improved ultrafast tool for short read alignment
2009influential reference
Compiler generated systolic arrays for wavefront algorithm acceleration on FPGAs
2008cited by this paper
SeqAn An efficient, generic C++ library for sequence analysis
2008cited by this paper
Mapping short DNA sequencing reads and calling variants using mapping quality scores.
2008cited by this paper
Compressed indexing and local alignment of DNA
2008cited by this paper
Improved pairwise alignment of genomic dna
2007cited by this paper
A Reconfigurable Accelerator for Smith–Waterman Algorithm
2007cited by this paper
A Banded Smith-Waterman FPGA Accelerator for Mercury BLASTP
2007cited by this paper
Scalable hardware accelerator for comparing DNA and protein sequences
2006cited by this paper
Vector seeds: An extension to spaced seeds
2005cited by this paper
Initial sequence of the chimpanzee genome and comparison with the human genome
2005cited by this paper
YASS: enhancing the sensitivity of DNA similarity search
2005cited by this paper
Reducing storage requirements for biological sequence comparison
2004cited by this paper
gprof: a call graph execution profiler
2004cited by this paper
On spaced seeds for similarity search
2004cited by this paper
A Smith-Waterman Systolic Cell
2003cited by this paper
BLAT--the BLAST-like alignment tool.
2002influential reference
The path to personalized medicine.
2002cited by this paper
A guided tour to approximate string matching
2001cited by this paper
Cacti 3. 0: an integrated cache timing, power, and area model
2001cited by this paper
Initial sequencing and analysis of the human genome
2001cited by this paper
Efficient large-scale sequence comparison by locality-sensitive hashing
2001cited by this paper
High Speed Homology Search with FPGAs
2001cited by this paper
A Greedy Algorithm for Aligning DNA Sequences
2000influential reference
[서평]「Algorithms on Strings, Trees, and Sequences」
2000cited by this paper
A fast bit-vector algorithm for approximate string matching based on dynamic programming
1998influential reference
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997influential reference
Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology
1997cited by this paper
Aligning two sequences within a specified diagonal band
1992cited by this paper
Basic local alignment search tool. Journal of Molecular Biology
1990cited by this paper
Basic local alignment search tool.
1990influential reference
Sequence the Human Genome
1986cited by this paper
An improved algorithm for matching biological sequences.
1982cited by this paper
Identification of common molecular subsequences.
1981cited by this paper
A linear space algorithm for computing maximal common subsequences
1975cited by this paper
Supporting Online Material Materials and Methods Figs. S1 to S3 Tables S1 to S5 References Worldwide Human Relationships Inferred from Genome-wide Patterns of Variation
year unknowncited by this paper

CITED BY

SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework
2026cites this paper
AGNES: Adaptive Graph Neural Network and Dynamic Programming Hybrid Framework for Real-Time Nanopore Seed Chaining
2025cites this paper
Bancroft: Genomics Acceleration Beyond On-Device Memory
2025cites this paper
Parallelization of the Banded Needleman & Wunsch Algorithm on UPMEM PiM Architecture for Long DNA Sequence Alignment
2024cites this paper
Efficient Memory Layout for Pre-Alignment Filtering of Long DNA Reads Using Racetrack Memory
2024cites this paper
WFA-FPGA: An efficient accelerator of the wavefront algorithm for short and long read genomics alignment
2023cites this paper
ALPHA: A Novel Algorithm-Hardware Co-Design for Accelerating DNA Seed Location Filtering
2022cites this paper
Efficient Memory Partitioning in Software Defined Hardware
2022cites this paper
DNA Pre-Alignment Filter Using Processing Near Racetrack Memory
2022cites this paper
An FPGA Accelerator of the Wavefront Algorithm for Genomics Pairwise Alignment
2021cites this paper
Hardware acceleration of genomics data analysis: challenges and opportunities
2021cites this paper
Metagenomic Analysis: A Pathway Toward Efficiency Using High-Performance Computing
2021cites this paper
Recut: a Concurrent Framework for Sparse Reconstruction of Neuronal Morphology
2021cites this paper
OrderLight: Lightweight Memory-Ordering Primitive for Efficient Fine-Grained PIM Computations
2021cites this paper
Seed-and-Vote based In-Memory Accelerator for DNA Read Mapping
2020cites this paper
High scoring segment selection for pairwise whole genome sequence alignment with the maximum scoring subsequence and GPUs
2020cites this paper
Hardware Accelerators for Genomic Data Processing
2019cites this paper
FPGA Accelerated INDEL Realignment in the Cloud
2019cites this paper
Genome-wide effects of social status on DNA methylation in the brain of a cichlid fish, Astatotilapia burtoni
2019cites this paper
GenCache: Leveraging In-Cache Operators for Efficient Sequence Alignment
2019influential citation
MESGA: An MPSoC based embedded system solution for short read genome alignment
2018cites this paper
Adaptively Banded Smith-Waterman Algorithm for Long Reads and Its Hardware Accelerator
2018influential citation
Extreme Datacenter Specialization for Planet-Scale Computing: ASIC Clouds
2018cites this paper
Azure Accelerated Networking: SmartNICs in the Public Cloud
2018cites this paper
Scalable Systems and Algorithms for Genomic Variant Analysis
2017cites this paper