RT Journal Article SR Electronic T1 Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment JF bioRxiv FD Cold Spring Harbor Laboratory SP 092171 DO 10.1101/092171 A1 Yatish Turakhia A1 Kevin Jie Zheng A1 Gill Bejerano A1 William J. Dally YR 2017 UL http://biorxiv.org/content/early/2017/01/24/092171.abstract AB Genomics is set to transform medicine and our understanding of life in fundamental ways. But the growth in genomics data has been overwhelming - far outpacing Moore’s Law. The advent of third generation sequencing technologies is providing new insights into genomic contribution to diseases with complex mutation events, but have prohibitively high computational costs. Over 1,300 CPU hours are required to align reads from a 54× coverage of the human genome to a reference (estimated using [1]), and over 15,600 CPU hours to assemble the reads de novo [2]. This paper proposes “Darwin” - a hardware-accelerated framework for genomic sequence alignment that, without sacrificing sensitivity, provides 125× and 15.6× speedup over the state-of-the-art software counterparts for reference-guided and de novo assembly of third generation sequencing reads, respectively. For pairwise alignment of sequences, Darwin is over 39,000× more energy-efficient than software. Darwin uses (i) a novel filtration strategy, called D-SOFT, to reduce the search space for sequence alignment at high speed, and (ii) a hardware-accelerated version of GACT, a novel algorithm to generate near-optimal alignments of arbitrarily long genomic sequences using constant memory for trace-back. Darwin is adaptable, with tunable speed and sensitivity to match emerging sequencing technologies and to meet the requirements of genomic applications beyond read assembly.