RT Journal Article SR Electronic T1 Reconstructing the Gigabase Plant Genome of Solanum pennellii using Nanopore Sequencing JF bioRxiv FD Cold Spring Harbor Laboratory SP 129148 DO 10.1101/129148 A1 Maximilian H.-W. Schmidt A1 Alxander Vogel A1 Alisandra K. Denton A1 Benjamin Istace A1 Alexandra Wormit A1 Henri van de Geest A1 Marie E. Bolger A1 Saleh Alseekh A1 Janina Maβ A1 Christian Pfaff A1 Ulrich Schurr A1 Roger Chetelat A1 Florian Maumus A1 Jean-Marc Aury A1 Alisdair R. Fernie A1 Dani Zamir A1 Anthony M. Bolger A1 Bjöern Usadel YR 2017 UL http://biorxiv.org/content/early/2017/04/21/129148.abstract AB Recent updates in sequencing technology have made it possible to obtain Gigabases of sequence data from one single flowcell. Prior to this update, the nanopore sequencing technology was mainly used to analyze and assemble microbial samples1-3. Here, we describe the generation of a comprehensive nanopore sequencing dataset with a median fragment size of 11,979 bp for the wild tomato species Solanum pennellii featuring an estimated genome size of ca 1.0 to 1.1 Gbases. We describe its genome assembly to a contig N50 of 2.5 MB using a pipeline comprising a Canu4 pre-processing and a subsequent assembly using SMARTdenovo. We show that the obtained nanopore based de novo genome reconstruction is structurally highly similar to that of the reference S. pennellii LA7165 genome but has a high error rate caused mostly by deletions in homopolymers. After polishing the assembly with Illumina short read data we obtained an error rate of <0.02 % when assessed versus the same Illumina data. More importantly however we obtained a gene completeness of 96.53% which even slightly surpasses that of the reference S. pennellii genome5. Taken together our data indicate such long read sequencing data can be used to affordably sequence and assemble Gbase sized diploid plant genomes.Raw data is available at http://www.plabipd.de/portal/solanum-pennellii and has been deposited as PRJEB19787.