Abstract
Nanopore sequencing has introduced the ability to sequence long stretches of DNA, enabling the resolution of repeating segments, or paired SNPs across long stretches of DNA. Unfortunately significant error rates >15%, introduced through systematic and random noise inhibit downstream analysis. We propose a novel method, using unsupervised learning, to correct biologically amplified reads before downstream analysis proceeds. We also demonstrate that our method has performance comparable to existing techniques without limiting the detection of repeats, or the length of the input sequence.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.