Abstract
Despite the increasing relevance of structural variants (SV) in the development of many human diseases, progress in novel pathological SV discovery remains impeded, partly due to the challenges of accurate and routine SV characterization in patients. The recent advent of third-generation sequencing (3GS) technologies brings promise for better characterization of genomic aberrations by virtue of having longer reads. However, the applications of 3GS are restricted by their high sequencing error rates and low sequencing throughput. To overcome these limitations, we present NanoVar, an accurate, rapid and low-depth (4X) 3GS SV caller utilizing long-reads generated by Oxford Nanopore Technologies. NanoVar employs split-reads and hard-clipped reads for SV detection and utilizes a neural network classifier for true SV enrichment. In simulated data, NanoVar demonstrated the highest SV detection accuracy (F1 score = 0.91) amongst other long-read SV callers using 12 gigabases (4X) of sequencing data. In patient samples, besides the detection of genomic aberrations, NanoVar also uncovered many normal alternative sequences or alleles which were present in healthy individuals. The low sequencing depth requirements of NanoVar enable the use of Nanopore sequencing for accurate SV characterization at a lower sequencing cost, an approach compatible with clinical studies and large-scale SV-association research.