PT - JOURNAL ARTICLE AU - Avantika Lal AU - Michael Brown AU - Rahul Mohan AU - Joyjit Daw AU - James Drake AU - Johnny Israeli TI - Improving long-read consensus sequencing accuracy with deep learning AID - 10.1101/2021.06.28.450238 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.06.28.450238 4099 - http://biorxiv.org/content/early/2021/06/30/2021.06.28.450238.short 4100 - http://biorxiv.org/content/early/2021/06/30/2021.06.28.450238.full AB - The PacBio HiFi sequencing technology combines less accurate, multi-read passes from the same molecule (subreads) to yield consensus sequencing reads that are both long (averaging 10-25 kb) and highly accurate. However, these reads can retain residual sequencing error, predominantly insertions or deletions at homopolymeric regions. Here, we train deep learning models to polish HiFi reads by recognizing and correcting sequencing errors. We show that our models are effective at reducing these errors by 25-40% in HiFi reads from human as well as E. coli genomes.Competing Interest StatementAvantika Lal, Joyjit Daw, and Johnny Israeli are employees of NVIDIA Corporation. Michael Brown and James Drake are employees of Pacific Biosciences.