Abstract
Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, a fast and accurate genotyper that models SVs using sequence graphs and SV annotations produced by a range of methods and technologies. We demonstrate the accuracy of Paragraph on whole genome sequence data from a control sample with both short and long read sequencing data available, and then apply it at scale to a cohort of 100 samples of diverse ancestry sequenced with short-reads. Comparative analyses indicate that Paragraph has better accuracy than other existing genotypers. The Paragraph software is open-source and available at https://github.com/Illumina/paragraph
Footnotes
schen6{at}illumina.com, pkrusche{at}gmail.com, edolzhenko{at}illumina.com, rsherman{at}jhu.edu, RPetrovski{at}illumina.com, fschlesinger{at}illumina.com, mkirsche{at}jhu.edu, DBentley{at}illumina.com, mschatz{at}cs.jhu.edu, fritz.sedlazeck{at}bcm.edu, meberle{at}illumina.com
List of abbreviations
- SV
- structural variation
- bp
- base pair
- TR
- tandem repeat