Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A robust benchmark for germline structural variant detection

View ORCID ProfileJustin M. Zook, Nancy F. Hansen, Nathan D. Olson, Lesley M. Chapman, View ORCID ProfileJames C. Mullikin, Chunlin Xiao, Stephen Sherry, Sergey Koren, Adam M. Phillippy, View ORCID ProfilePaul C. Boutros, Sayed Mohammad E. Sahraeian, Vincent Huang, Alexandre Rouette, Noah Alexander, Christopher E. Mason, Iman Hajirasouliha, Camir Ricketts, Joyce Lee, Rick Tearle, Ian T. Fiddes, Alvaro Martinez Barrio, Jeremiah Wala, Andrew Carroll, Noushin Ghaffari, Oscar L. Rodriguez, Ali Bashir, View ORCID ProfileShaun Jackman, John J Farrell, Aaron M Wenger, View ORCID ProfileCan Alkan, Arda Soylev, Michael C. Schatz, Shilpa Garg, George Church, Tobias Marschall, Ken Chen, Xian Fan, Adam C. English, Jeffrey A. Rosenfeld, Weichen Zhou, Ryan E. Mills, Jay M. Sage, Jennifer R. Davis, Michael D. Kaiser, John S. Oliver, Anthony P. Catalano, Mark JP Chaisson, Noah Spies, Fritz J. Sedlazeck, Marc Salit, the Genome in a Bottle Consortium
doi: https://doi.org/10.1101/664623
Justin M. Zook
1Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Justin M. Zook
  • For correspondence: jzook@nist.gov
Nancy F. Hansen
2National Human Genome Research Institute, National Institutes of Health, 5625 Fishers Lane, Rockville, MD 20852
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nathan D. Olson
1Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lesley M. Chapman
1Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James C. Mullikin
2National Human Genome Research Institute, National Institutes of Health, 5625 Fishers Lane, Rockville, MD 20852
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for James C. Mullikin
Chunlin Xiao
3National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD, 20894
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen Sherry
3National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD, 20894
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sergey Koren
2National Human Genome Research Institute, National Institutes of Health, 5625 Fishers Lane, Rockville, MD 20852
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adam M. Phillippy
2National Human Genome Research Institute, National Institutes of Health, 5625 Fishers Lane, Rockville, MD 20852
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul C. Boutros
4Department of Human Genetics, University of California, Los Angeles
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paul C. Boutros
Sayed Mohammad E. Sahraeian
5Roche Sequencing Solutions, Belmont, CA, 94002, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vincent Huang
6Ontario Institute for Cancer Research, 661 University Ave, Suite 510, Toronto, ON M5G 0A3
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexandre Rouette
7Charles-Bruneau Cancer Centre, Division of Hematology-oncology, CHU Sainte-Justine, Montreal, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noah Alexander
8Molecular Biology Institute, University of California, Los Angeles
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher E. Mason
9Weill Cornell Medicine, 1300 York Ave., New York, NY 10065
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Iman Hajirasouliha
9Weill Cornell Medicine, 1300 York Ave., New York, NY 10065
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Camir Ricketts
9Weill Cornell Medicine, 1300 York Ave., New York, NY 10065
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joyce Lee
10Bionano Genomics, Inc. 9540 Towne Centre Drive, Ste. 100, San Diego, CA 92121
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rick Tearle
11Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy SA 5371, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ian T. Fiddes
1210x Genomics, Pleasanton, California 94566, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alvaro Martinez Barrio
1210x Genomics, Pleasanton, California 94566, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeremiah Wala
13Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew Carroll
14Google, 1600 Amphitheater Pkwy, Mountain View, CA 94040
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noushin Ghaffari
15Genomics and Bioinformatics, Texas A&M AgriLife Research, Texas A&M University, 1500 Research Parkway, Suite 250B, College Station, TX 77845; TAMU HPRC, Texas A&M University, Mail Stop 3361, College Station, TX 77843-3361
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Oscar L. Rodriguez
16Department of Genetics and Data Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place New York, NY 10029-5674
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ali Bashir
16Department of Genetics and Data Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place New York, NY 10029-5674
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shaun Jackman
17BC Cancer Genome Sciences Centre, 100-570 W 7th Ave, Vancouver, BC, V5Z 4S6, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shaun Jackman
John J Farrell
18Biomedical Genetics, Dept of Medicine, Boston University Medical School, 72 East Concord Street, Boston MA 02118
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aaron M Wenger
19Pacific Biosciences, Menlo Park, CA 94025, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Can Alkan
20Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Can Alkan
Arda Soylev
21Department of Computer Engineering, Konya Food and Agriculture University, Konya 42080, Turkey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael C. Schatz
22Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, 21218
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shilpa Garg
23Department of Genetics, Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
George Church
23Department of Genetics, Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tobias Marschall
24Saarland University and Max Planck Institute for Informatics, Saarland Informatics Campus E2.1, 66123 Saarbrücken, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ken Chen
25Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston, TX, 77030
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xian Fan
26Department of Computer Science, Rice University, Houston, TX, 77005
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adam C. English
27Bioinformatics R&D, Spiral Genetics, Seattle WA 98104
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey A. Rosenfeld
28Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA Department of Pathology, Robert Wood Johnson Medical School, New Brunswick, NJ, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Weichen Zhou
29Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ryan E. Mills
29Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jay M. Sage
30Nabsys 2.0, LLC, 60 Clifford St, Providence, RI 02903
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jennifer R. Davis
30Nabsys 2.0, LLC, 60 Clifford St, Providence, RI 02903
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael D. Kaiser
30Nabsys 2.0, LLC, 60 Clifford St, Providence, RI 02903
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John S. Oliver
30Nabsys 2.0, LLC, 60 Clifford St, Providence, RI 02903
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anthony P. Catalano
30Nabsys 2.0, LLC, 60 Clifford St, Providence, RI 02903
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark JP Chaisson
31Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408H, Los Angeles, CA, 90089
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noah Spies
32Joint Initiative for Metrology in Biology, SLAC National Accelerator Lab, Stanford University, 435 Via Ortega, Stanford, CA 94305
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fritz J. Sedlazeck
33Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston TX 77030
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marc Salit
32Joint Initiative for Metrology in Biology, SLAC National Accelerator Lab, Stanford University, 435 Via Ortega, Stanford, CA 94305
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls ≥50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90% of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3%, and genotype concordance with manual curation was >98.7%. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping. GIAB is working towards a new version of the benchmark set that will use new technologies and methods such as PacBio Circular Consensus Sequencing and ultralong Oxford Nanopore sequencing to expand to more challenging genome regions and include more challenging SVs such as inversions. We are also developing a robust integration process to make calls on GRCh37 and GRCh38 for all seven GIAB samples.

Footnotes

  • ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.
Back to top
PreviousNext
Posted June 09, 2019.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A robust benchmark for germline structural variant detection
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A robust benchmark for germline structural variant detection
Justin M. Zook, Nancy F. Hansen, Nathan D. Olson, Lesley M. Chapman, James C. Mullikin, Chunlin Xiao, Stephen Sherry, Sergey Koren, Adam M. Phillippy, Paul C. Boutros, Sayed Mohammad E. Sahraeian, Vincent Huang, Alexandre Rouette, Noah Alexander, Christopher E. Mason, Iman Hajirasouliha, Camir Ricketts, Joyce Lee, Rick Tearle, Ian T. Fiddes, Alvaro Martinez Barrio, Jeremiah Wala, Andrew Carroll, Noushin Ghaffari, Oscar L. Rodriguez, Ali Bashir, Shaun Jackman, John J Farrell, Aaron M Wenger, Can Alkan, Arda Soylev, Michael C. Schatz, Shilpa Garg, George Church, Tobias Marschall, Ken Chen, Xian Fan, Adam C. English, Jeffrey A. Rosenfeld, Weichen Zhou, Ryan E. Mills, Jay M. Sage, Jennifer R. Davis, Michael D. Kaiser, John S. Oliver, Anthony P. Catalano, Mark JP Chaisson, Noah Spies, Fritz J. Sedlazeck, Marc Salit, the Genome in a Bottle Consortium
bioRxiv 664623; doi: https://doi.org/10.1101/664623
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A robust benchmark for germline structural variant detection
Justin M. Zook, Nancy F. Hansen, Nathan D. Olson, Lesley M. Chapman, James C. Mullikin, Chunlin Xiao, Stephen Sherry, Sergey Koren, Adam M. Phillippy, Paul C. Boutros, Sayed Mohammad E. Sahraeian, Vincent Huang, Alexandre Rouette, Noah Alexander, Christopher E. Mason, Iman Hajirasouliha, Camir Ricketts, Joyce Lee, Rick Tearle, Ian T. Fiddes, Alvaro Martinez Barrio, Jeremiah Wala, Andrew Carroll, Noushin Ghaffari, Oscar L. Rodriguez, Ali Bashir, Shaun Jackman, John J Farrell, Aaron M Wenger, Can Alkan, Arda Soylev, Michael C. Schatz, Shilpa Garg, George Church, Tobias Marschall, Ken Chen, Xian Fan, Adam C. English, Jeffrey A. Rosenfeld, Weichen Zhou, Ryan E. Mills, Jay M. Sage, Jennifer R. Davis, Michael D. Kaiser, John S. Oliver, Anthony P. Catalano, Mark JP Chaisson, Noah Spies, Fritz J. Sedlazeck, Marc Salit, the Genome in a Bottle Consortium
bioRxiv 664623; doi: https://doi.org/10.1101/664623

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3691)
  • Biochemistry (7800)
  • Bioengineering (5678)
  • Bioinformatics (21295)
  • Biophysics (10582)
  • Cancer Biology (8179)
  • Cell Biology (11946)
  • Clinical Trials (138)
  • Developmental Biology (6764)
  • Ecology (10401)
  • Epidemiology (2065)
  • Evolutionary Biology (13874)
  • Genetics (9709)
  • Genomics (13074)
  • Immunology (8150)
  • Microbiology (20020)
  • Molecular Biology (7859)
  • Neuroscience (43070)
  • Paleontology (321)
  • Pathology (1279)
  • Pharmacology and Toxicology (2260)
  • Physiology (3353)
  • Plant Biology (7232)
  • Scientific Communication and Education (1313)
  • Synthetic Biology (2008)
  • Systems Biology (5539)
  • Zoology (1128)