TY - JOUR T1 - W2RAP: a pipeline for high quality, robust assemblies of large complex genomes from short read data JF - bioRxiv DO - 10.1101/110999 SP - 110999 AU - Bernardo J. Clavijo AU - Gonzalo Garcia Accinelli AU - Jonathan Wright AU - Darren Heavens AU - Katie Barr AU - Luis Yanes AU - Federica Di-Palma Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/02/22/110999.abstract N2 - Producing high-quality whole-genome shotgun de novo assemblies from plant and animal species with large and complex genomes using low-cost short read sequencing technologies remains a challenge. But when the right sequencing data, with appropriate quality control, is assembled using approaches focused on robustness of the process rather than maximization of a single metric such as the usual contiguity estimators, good quality assemblies with informative value for comparative analyses can be produced. Here we present a complete method described from data generation and qc all the way up to scaffold of complex genomes using Illumina short reads and its application to data from plants and human datasets. We show how to use the w2rap pipeline following a metric-guided approach to produce cost-effective assemblies. The assemblies are highly accurate, provide good coverage of the genome and show good short range contiguity. Our pipeline has already enabled the rapid, cost-effective generation of de novo genome assemblies from large, polyploid crop species with a focus on comparative genomics.Availability w2rap is available under MIT license, with some subcomponents under GPL-licenses. A ready-to-run docker with all software pre-requisites and example data is also available.http://github.com/bioinfologics/w2raphttp://github.com/bioinfologics/w2rap-contigger ER -