TY - JOUR T1 - Haplotype-aware variant calling enables high accuracy in nanopore long-reads using deep neural networks JF - bioRxiv DO - 10.1101/2021.03.04.433952 SP - 2021.03.04.433952 AU - Kishwar Shafin AU - Trevor Pesout AU - Pi-Chuan Chang AU - Maria Nattestad AU - Alexey Kolesnikov AU - Sidharth Goel AU - Gunjan Baid AU - Jordan M. Eizenga AU - Karen H. Miga AU - Paolo Carnevali AU - Miten Jain AU - Andrew Carroll AU - Benedict Paten Y1 - 2021/01/01 UR - http://biorxiv.org/content/early/2021/03/05/2021.03.04.433952.abstract N2 - Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high-quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished).Competing Interest StatementThe authors have declared no competing interest. ER -