Chromosome-scale shotgun assembly using an in vitro method for long-range linkage

  1. Richard E. Green1,2
  1. 1Dovetail Genomics LLC, Santa Cruz, California 95060, USA;
  2. 2Department of Biomolecular Engineering, University of California, Santa Cruz, California 95066, USA;
  3. 3UC Santa Cruz Genomics Institute and Howard Hughes Medical Institute, University of California, Santa Cruz, California 95066, USA;
  4. 4Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA;
  5. 5Department of Energy, Joint Genome Institute, Walnut Creek, California 94598, USA
  1. Corresponding author: ed{at}soe.ucsc.edu
  1. 6 These authors contributed equally to this work.

Abstract

Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem, dramatically increasing the scaffold contiguity of assemblies. Here, we describe a simpler approach (“Chicago”) based on in vitro reconstituted chromatin. We generated two Chicago data sets with human DNA and developed a statistical model and a new software pipeline (“HiRise”) that can identify poor quality joins and produce accurate, long-range sequence scaffolds. We used these to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 20 Mbp. We also demonstrated the utility of Chicago for improving existing assemblies by reassembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kbp to 10 Mbp.

Footnotes

  • [Supplemental material is available for this article.]

  • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.193474.115.

  • Freely available online through the Genome Research Open Access option.

  • Received April 23, 2015.
  • Accepted December 21, 2015.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server