Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Small variant benchmark from a complete assembly of X and Y chromosomes

View ORCID ProfileJustin Wagner, View ORCID ProfileNathan D. Olson, Jennifer McDaniel, Lindsay Harris, View ORCID ProfileBrendan J. Pinto, View ORCID ProfileDavid Jáspez, Adrián Muñoz-Barrera, View ORCID ProfileLuis A. Rubio-Rodríguez, View ORCID ProfileJosé M. Lorenzo-Salazar, View ORCID ProfileCarlos Flores, Sayed Mohammad Ebrahim Sahraeian, View ORCID ProfileGiuseppe Narzisi, View ORCID ProfileMarta Byrska-Bishop, Uday S Evani, View ORCID ProfileChunlin Xiao, View ORCID ProfileJuniper A. Lake, View ORCID ProfilePeter Fontana, View ORCID ProfileCraig Greenberg, View ORCID ProfileDonald Freed, View ORCID ProfileMohammed Faizal Eeman Mootor, View ORCID ProfilePaul C. Boutros, Lisa Murray, View ORCID ProfileKishwar Shafin, View ORCID ProfileAndrew Carroll, View ORCID ProfileFritz J Sedlazeck, View ORCID ProfileMelissa Wilson, View ORCID ProfileJustin M. Zook
doi: https://doi.org/10.1101/2023.10.31.564997
Justin Wagner
1National Institute of Standards and Technology, Material Measurement Laboratory, 100 Bureau Dr., Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Justin Wagner
Nathan D. Olson
1National Institute of Standards and Technology, Material Measurement Laboratory, 100 Bureau Dr., Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nathan D. Olson
Jennifer McDaniel
1National Institute of Standards and Technology, Material Measurement Laboratory, 100 Bureau Dr., Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lindsay Harris
1National Institute of Standards and Technology, Material Measurement Laboratory, 100 Bureau Dr., Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brendan J. Pinto
2Center for Evolution & Medicine, Arizona State University, Tempe, AZ 85281 USA -- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI 53233 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brendan J. Pinto
David Jáspez
3Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David Jáspez
Adrián Muñoz-Barrera
3Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Luis A. Rubio-Rodríguez
3Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Luis A. Rubio-Rodríguez
José M. Lorenzo-Salazar
3Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for José M. Lorenzo-Salazar
Carlos Flores
3Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
4CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, Madrid, Spain; Research Unit, Hospital Universitario Ntra. Sra. de Candelaria, Santa Cruz de Tenerife, Spain; Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, Las Palmas de Gran Canaria, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Carlos Flores
Sayed Mohammad Ebrahim Sahraeian
5Roche Sequencing Solutions, Santa Clara, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Giuseppe Narzisi
6New York Genome Center, NewYork, NY 10013, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Giuseppe Narzisi
Marta Byrska-Bishop
6New York Genome Center, NewYork, NY 10013, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marta Byrska-Bishop
Uday S Evani
6New York Genome Center, NewYork, NY 10013, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chunlin Xiao
7National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chunlin Xiao
Juniper A. Lake
8Pacific Biosciences, Menlo Park, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Juniper A. Lake
Peter Fontana
9National Institute of Standards and Technology, Information Technology Laboratory, 100 Bureau Dr. Mailstop 8940, Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter Fontana
Craig Greenberg
9National Institute of Standards and Technology, Information Technology Laboratory, 100 Bureau Dr. Mailstop 8940, Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Craig Greenberg
Donald Freed
10Sentieon Inc. San Jose, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Donald Freed
Mohammed Faizal Eeman Mootor
11Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mohammed Faizal Eeman Mootor
Paul C. Boutros
11Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paul C. Boutros
Lisa Murray
12Illumina, San Diego, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kishwar Shafin
13Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kishwar Shafin
Andrew Carroll
13Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrew Carroll
Fritz J Sedlazeck
14Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fritz J Sedlazeck
Melissa Wilson
15Center for Evolution & Medicine and School of Life Sciences, Arizona State University, Tempe, AZ 85281 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Melissa Wilson
Justin M. Zook
1National Institute of Standards and Technology, Material Measurement Laboratory, 100 Bureau Dr., Gaithersburg, MD 20899, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Justin M. Zook
  • For correspondence: jzook@nist.gov
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To evaluate variant detection on chromosomes X and Y, we created an 111,725 variant benchmark for the Genome in a Bottle HG002 reference material. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking complex gene conversions, copy number variable gene arrays, and human satellites.

Competing Interest Statement

JAL is an employee of PacBio. DF is an employee of Sentieon, Inc., and holds stock options as part of the standard compensation package. PCB sits on the Scientific Advisory Boards of Intersect Diagnostics Inc., Sage Bionetworks and BioSymetrics Inc. LM is an employee and shareholder of Illumina Inc. KS and AC are employees of Google LLC and own Alphabet stock as part of the standard compensation package. FJS has support from ONT, Illumina, Pacbio and Genentech.

Footnotes

  • Shortening and minor updates to text

  • https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/chrXY_v1.0/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.
Back to top
PreviousNext
Posted November 18, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Small variant benchmark from a complete assembly of X and Y chromosomes
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Small variant benchmark from a complete assembly of X and Y chromosomes
Justin Wagner, Nathan D. Olson, Jennifer McDaniel, Lindsay Harris, Brendan J. Pinto, David Jáspez, Adrián Muñoz-Barrera, Luis A. Rubio-Rodríguez, José M. Lorenzo-Salazar, Carlos Flores, Sayed Mohammad Ebrahim Sahraeian, Giuseppe Narzisi, Marta Byrska-Bishop, Uday S Evani, Chunlin Xiao, Juniper A. Lake, Peter Fontana, Craig Greenberg, Donald Freed, Mohammed Faizal Eeman Mootor, Paul C. Boutros, Lisa Murray, Kishwar Shafin, Andrew Carroll, Fritz J Sedlazeck, Melissa Wilson, Justin M. Zook
bioRxiv 2023.10.31.564997; doi: https://doi.org/10.1101/2023.10.31.564997
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Small variant benchmark from a complete assembly of X and Y chromosomes
Justin Wagner, Nathan D. Olson, Jennifer McDaniel, Lindsay Harris, Brendan J. Pinto, David Jáspez, Adrián Muñoz-Barrera, Luis A. Rubio-Rodríguez, José M. Lorenzo-Salazar, Carlos Flores, Sayed Mohammad Ebrahim Sahraeian, Giuseppe Narzisi, Marta Byrska-Bishop, Uday S Evani, Chunlin Xiao, Juniper A. Lake, Peter Fontana, Craig Greenberg, Donald Freed, Mohammed Faizal Eeman Mootor, Paul C. Boutros, Lisa Murray, Kishwar Shafin, Andrew Carroll, Fritz J Sedlazeck, Melissa Wilson, Justin M. Zook
bioRxiv 2023.10.31.564997; doi: https://doi.org/10.1101/2023.10.31.564997

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4865)
  • Biochemistry (10821)
  • Bioengineering (8063)
  • Bioinformatics (27372)
  • Biophysics (14018)
  • Cancer Biology (11159)
  • Cell Biology (16099)
  • Clinical Trials (138)
  • Developmental Biology (8807)
  • Ecology (13326)
  • Epidemiology (2067)
  • Evolutionary Biology (17393)
  • Genetics (11704)
  • Genomics (15960)
  • Immunology (11057)
  • Microbiology (26151)
  • Molecular Biology (10674)
  • Neuroscience (56719)
  • Paleontology (422)
  • Pathology (1737)
  • Pharmacology and Toxicology (3012)
  • Physiology (4567)
  • Plant Biology (9662)
  • Scientific Communication and Education (1617)
  • Synthetic Biology (2698)
  • Systems Biology (6993)
  • Zoology (1513)