PT - JOURNAL ARTICLE AU - Keith Mitchell AU - Igor Mandric AU - Jaqueline Brito AU - Qiaozhen Wu AU - Sergey Knyazev AU - Sei Chang AU - Lana S. Martin AU - Aaron Karlsberg AU - Ekaterina Gerasimov AU - Russell Littman AU - Brian L. Hill AU - Nicholas C. Wu AU - Harry Yang AU - Kevin Hsieh AU - Linus Chen AU - Taylor Shabani AU - German Shabanets AU - Douglas Yao AU - Ren Sun AU - Jan Schroeder AU - Eleazar Eskin AU - Alex Zelikovsky AU - Pavel Skums AU - Mihai Pop AU - Serghei Mangul TI - Benchmarking of computational error-correction methods for next-generation sequencing data AID - 10.1101/642843 DP - 2019 Jan 01 TA - bioRxiv PG - 642843 4099 - http://biorxiv.org/content/early/2019/12/18/642843.short 4100 - http://biorxiv.org/content/early/2019/12/18/642843.full AB - Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error-correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. In this paper, we evaluate the ability of error-correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we applied the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then performed a realistic evaluation of error correction methods. In terms of accuracy, we found that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identified the techniques that offer a good balance between precision and sensitivity.