TY - JOUR T1 - Tigmint: Correcting Assembly Errors Using Linked Reads From Large Molecules JF - bioRxiv DO - 10.1101/304253 SP - 304253 AU - Shaun D Jackman AU - (0000-0002-9275-5966) AU - Lauren Coombe AU - Justin Chu AU - Rene L Warren AU - Benjamin P Vandervalk AU - Sarah Yeo AU - Zhuyi Xue AU - Hamid Mohamadi AU - Joerg Bohlmann AU - Steven JM Jones AU - Inanc Birol AU - (0000-0003-0950-7839) Y1 - 2018/01/01 UR - http://biorxiv.org/content/early/2018/04/20/304253.abstract N2 - Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity, and assembly errors are common. These misassemblies may be identified by comparing the sequencing data to the assembly, and by looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembly. Although tools exist to identify and correct misassemblies using Illumina pair-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint for this purpose. To demonstrate the effectiveness of Tigmint, we corrected assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate its usefulness in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing. The source code of Tigmint is available for download from https://github.com/bcgsc/tigmint, and is distributed under the GNU GPL v3.0 license. ER -