Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

SVCurator: A Crowdsourcing app to visualize evidence of structural variants for the human genome

Lesley M Chapman, Noah Spies, Patrick Pai, Chun Shen Lim, Andrew Carroll, Giuseppe Narzisi, Christopher M. Watson, Christos Proukakis, Wayne E. Clarke, Naoki Nariai, Eric Dawson, Garan Jones, Daniel Blankenberg, Christian Brueffer, Chunlin Xiao, Sree Rohit Raj Kolora, Noah Alexander, Paul Wolujewicz, Azza Ahmed, Graeme Smith, Saadlee Shehreen, Aaron M. Wenger, Marc Salit, Justin M. Zook
doi: https://doi.org/10.1101/581264
Lesley M Chapman
1Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noah Spies
1Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
2The Joint Initiative for Metrology in Biology, Stanford University, Stanford, CA, USA
26Departments of Genetics and Pathology, Stanford University, Stanford, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Patrick Pai
3University of Maryland, College Park
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chun Shen Lim
4Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew Carroll
5DNAnexus Inc, 1975 Mountain View STE 101., Mountain View, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Giuseppe Narzisi
6New York Genome Center, New York, NY 10013
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher M. Watson
7School of Medicine University of Leeds Saint James’s University Hospital Leeds LS9 7TF United Kingdom
8Yorkshire Regional Genetics Service, The Leeds Teaching Hospitals NHS Trust, Saint James’s University Hospital, Leeds LS9 7TF, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christos Proukakis
9University College London, Institute of Neurology London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wayne E. Clarke
6New York Genome Center, New York, NY 10013
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Naoki Nariai
10Illumina, Inc. 5200 Illumina Way San Diego, CA 92122
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric Dawson
11Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
12Department of Genetics, University of Cambridge, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Garan Jones
13University of Exeter Medical School, Epidemiology and Public Health Group, Barrack Road, Exeter, Devon, EX2 5DW, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel Blankenberg
14Genomic Medicine Institute Lerner Research Institute Cleveland Clinic, 9500 Euclid Avenue / NE50, Cleveland, OH 44195
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christian Brueffer
15Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chunlin Xiao
16National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sree Rohit Raj Kolora
17German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
18Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
19Molecular Evolution and Systematics of Animals, Institute of Biology, University of Leipzig, Leipzig, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noah Alexander
20Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul Wolujewicz
21Weill Cornell, Belfer Research Building, 413 E. 69th St, New York, NY 10021
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Azza Ahmed
22Center for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum and Department of Electrical and Electronic Engineering, Faculty of Engineering, University of Khartoum, Al Gamaa Avenue, PO Box 321, postal code 11111, Khartoum, Sudan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Graeme Smith
23Guy’s Hospital and St Thomas’s NHS Foundation Trust Great Maze Pond, London, SE1 9RT
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Saadlee Shehreen
24Department of Genetic Engineering & Biotechnology, University of Dhaka, Bangladesh
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aaron M. Wenger
25Pacific Biosciences, Menlo Park, California, 94025
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marc Salit
1Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
2The Joint Initiative for Metrology in Biology, Stanford University, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Justin M. Zook
1Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: justin.zook@nist.gov
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is yet to be defined. In this study, we manually curated 1235 SVs which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app – SVCurator – to help curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy.

SVCurator is a Python Flask-based web platform that displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002], We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. The crowdsourced results were highly concordant with 37 out of the 61 curators having at least 78% concordance with a set of ‘expert’ curators, where there was 93% concordance amongst ‘expert’ curators. This produced high confidence labels for 935 events. When compared to the heuristic-based draft benchmark SV callset from GIAB, the SVCurator crowdsourced labels were 94.5% concordant with the benchmark set. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted July 18, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
SVCurator: A Crowdsourcing app to visualize evidence of structural variants for the human genome
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
SVCurator: A Crowdsourcing app to visualize evidence of structural variants for the human genome
Lesley M Chapman, Noah Spies, Patrick Pai, Chun Shen Lim, Andrew Carroll, Giuseppe Narzisi, Christopher M. Watson, Christos Proukakis, Wayne E. Clarke, Naoki Nariai, Eric Dawson, Garan Jones, Daniel Blankenberg, Christian Brueffer, Chunlin Xiao, Sree Rohit Raj Kolora, Noah Alexander, Paul Wolujewicz, Azza Ahmed, Graeme Smith, Saadlee Shehreen, Aaron M. Wenger, Marc Salit, Justin M. Zook
bioRxiv 581264; doi: https://doi.org/10.1101/581264
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
SVCurator: A Crowdsourcing app to visualize evidence of structural variants for the human genome
Lesley M Chapman, Noah Spies, Patrick Pai, Chun Shen Lim, Andrew Carroll, Giuseppe Narzisi, Christopher M. Watson, Christos Proukakis, Wayne E. Clarke, Naoki Nariai, Eric Dawson, Garan Jones, Daniel Blankenberg, Christian Brueffer, Chunlin Xiao, Sree Rohit Raj Kolora, Noah Alexander, Paul Wolujewicz, Azza Ahmed, Graeme Smith, Saadlee Shehreen, Aaron M. Wenger, Marc Salit, Justin M. Zook
bioRxiv 581264; doi: https://doi.org/10.1101/581264

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4685)
  • Biochemistry (10362)
  • Bioengineering (7682)
  • Bioinformatics (26343)
  • Biophysics (13534)
  • Cancer Biology (10694)
  • Cell Biology (15446)
  • Clinical Trials (138)
  • Developmental Biology (8501)
  • Ecology (12824)
  • Epidemiology (2067)
  • Evolutionary Biology (16867)
  • Genetics (11401)
  • Genomics (15484)
  • Immunology (10620)
  • Microbiology (25225)
  • Molecular Biology (10225)
  • Neuroscience (54481)
  • Paleontology (402)
  • Pathology (1669)
  • Pharmacology and Toxicology (2897)
  • Physiology (4345)
  • Plant Biology (9252)
  • Scientific Communication and Education (1587)
  • Synthetic Biology (2558)
  • Systems Biology (6781)
  • Zoology (1466)