PT - JOURNAL ARTICLE AU - Peter Edge AU - Vikas Bansal TI - Longshot: accurate variant calling in diploid genomes using single-molecule long read sequencing AID - 10.1101/564443 DP - 2019 Jan 01 TA - bioRxiv PG - 564443 4099 - http://biorxiv.org/content/early/2019/03/01/564443.short 4100 - http://biorxiv.org/content/early/2019/03/01/564443.full AB - Short-read sequencing technologies such as Illumina enable the accurate detection of single nucleotide variants (SNVs) and short insertion/deletion variants in human genomes but are unable to provide information about haplotypes and variants in repetitive regions of the genome. Single-molecule sequencing technologies such as Pacific Biosciences and Oxford Nanopore generate long reads (≥ 10 kb in length) that can potentially address these limitations of short reads. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, that leverages the haplotype information present in SMS reads to enable the accurate detection and phasing of single nucleotide variants in diploid genomes. Using whole-genome Pacific Biosciences data for multiple human individuals, we demonstrate that Longshot achieves very high accuracy for SNV detection (precision ≥0.992 and recall ≥0.96) that is significantly better than existing variant calling methods. Longshot can also call SNVs with good accuracy using whole-genome Oxford Nanopore data. Finally, we demonstrate that it enables the discovery of variants in duplicated regions of the genome that cannot be mapped using short reads. Longshot is freely available at https://github.com/pjedge/longshot.