RT Journal Article SR Electronic T1 A robust benchmark for germline structural variant detection JF bioRxiv FD Cold Spring Harbor Laboratory SP 664623 DO 10.1101/664623 A1 Justin M. Zook A1 Nancy F. Hansen A1 Nathan D. Olson A1 Lesley M. Chapman A1 James C. Mullikin A1 Chunlin Xiao A1 Stephen Sherry A1 Sergey Koren A1 Adam M. Phillippy A1 Paul C. Boutros A1 Sayed Mohammad E. Sahraeian A1 Vincent Huang A1 Alexandre Rouette A1 Noah Alexander A1 Christopher E. Mason A1 Iman Hajirasouliha A1 Camir Ricketts A1 Joyce Lee A1 Rick Tearle A1 Ian T. Fiddes A1 Alvaro Martinez Barrio A1 Jeremiah Wala A1 Andrew Carroll A1 Noushin Ghaffari A1 Oscar L. Rodriguez A1 Ali Bashir A1 Shaun Jackman A1 John J Farrell A1 Aaron M Wenger A1 Can Alkan A1 Arda Soylev A1 Michael C. Schatz A1 Shilpa Garg A1 George Church A1 Tobias Marschall A1 Ken Chen A1 Xian Fan A1 Adam C. English A1 Jeffrey A. Rosenfeld A1 Weichen Zhou A1 Ryan E. Mills A1 Jay M. Sage A1 Jennifer R. Davis A1 Michael D. Kaiser A1 John S. Oliver A1 Anthony P. Catalano A1 Mark JP Chaisson A1 Noah Spies A1 Fritz J. Sedlazeck A1 Marc Salit A1 the Genome in a Bottle Consortium YR 2019 UL http://biorxiv.org/content/early/2019/07/18/664623.abstract AB New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls ≥50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.