TY - JOUR T1 - A comprehensive evaluation of long read error correction methods JF - bioRxiv DO - 10.1101/519330 SP - 519330 AU - Haowen Zhang AU - Chirag Jain AU - Srinivas Aluru Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/01/13/519330.abstract N2 - Motivation Third-generation sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used.Results In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research.Availability The source code is available at https://github.com/haowenz/LRECE.Contact aluru{at}cc.gatech.edu ER -