Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Identifying, understanding, and correcting technical biases on the sex chromosomes in next-generation sequencing data

View ORCID ProfileTimothy H. Webster, Madeline Couse, View ORCID ProfileBruno M. Grande, Eric Karlins, View ORCID ProfileTanya N. Phung, View ORCID ProfilePhillip A. Richmond, View ORCID ProfileWhitney Whitford, View ORCID ProfileMelissa A. Wilson Sayres
doi: https://doi.org/10.1101/346940
Timothy H. Webster
1School of Life Sciences, Arizona State University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Timothy H. Webster
Madeline Couse
2Child and Family Research Institute, University of British Columbia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bruno M. Grande
3Department of Molecular Biology and Biochemistry, Simon Fraser University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bruno M. Grande
Eric Karlins
4Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tanya N. Phung
5Interdepartmental Program in Bioinformatics, UCLA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tanya N. Phung
Phillip A. Richmond
6Centre for Molecular Medicine and Therapeutics, University of British Columbia
7BC Children’s Hospital
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Phillip A. Richmond
Whitney Whitford
8School of Biological Sciences, The University of Auckland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Whitney Whitford
Melissa A. Wilson Sayres
1School of Life Sciences, Arizona State University
9Center for Evolution and Medicine, Arizona State University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Melissa A. Wilson Sayres
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. This sequence homology can cause the mismapping of short sequencing reads derived from the sex chromosomes and affect variant calling and other downstream analyses. Understanding and correcting this problem is critical for medical genomics and population genomic inference. Here, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that: (1) facilitates the inference of sex chromosome complement from next-generation sequencing data; (2) corrects erroneous read mapping on the sex chromosomes; and (3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We show how these metrics can be used to identify XX and XY individuals across diverse sequencing experiments, including low and high coverage whole genome sequencing, and exome sequencing. We also show that XYalign corrects mismapped reads on the sex chromosomes, resulting in more accurate variant calling. Finally, we discuss how the flexibility of the XYalign framework can be leveraged for other use cases including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3).

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted July 18, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Identifying, understanding, and correcting technical biases on the sex chromosomes in next-generation sequencing data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Identifying, understanding, and correcting technical biases on the sex chromosomes in next-generation sequencing data
Timothy H. Webster, Madeline Couse, Bruno M. Grande, Eric Karlins, Tanya N. Phung, Phillip A. Richmond, Whitney Whitford, Melissa A. Wilson Sayres
bioRxiv 346940; doi: https://doi.org/10.1101/346940
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Identifying, understanding, and correcting technical biases on the sex chromosomes in next-generation sequencing data
Timothy H. Webster, Madeline Couse, Bruno M. Grande, Eric Karlins, Tanya N. Phung, Phillip A. Richmond, Whitney Whitford, Melissa A. Wilson Sayres
bioRxiv 346940; doi: https://doi.org/10.1101/346940

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3518)
  • Biochemistry (7373)
  • Bioengineering (5355)
  • Bioinformatics (20349)
  • Biophysics (10058)
  • Cancer Biology (7788)
  • Cell Biology (11360)
  • Clinical Trials (138)
  • Developmental Biology (6456)
  • Ecology (9995)
  • Epidemiology (2065)
  • Evolutionary Biology (13369)
  • Genetics (9378)
  • Genomics (12624)
  • Immunology (7733)
  • Microbiology (19122)
  • Molecular Biology (7482)
  • Neuroscience (41191)
  • Paleontology (301)
  • Pathology (1236)
  • Pharmacology and Toxicology (2145)
  • Physiology (3188)
  • Plant Biology (6885)
  • Scientific Communication and Education (1277)
  • Synthetic Biology (1901)
  • Systems Biology (5332)
  • Zoology (1091)