Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Utility of long-read sequencing for All of Us

View ORCID ProfileM. Mahmoud, Y. Huang, K. Garimella, P. A. Audano, W. Wan, N. Prasad, R. E. Handsaker, S. Hall, A. Pionzio, M. C. Schatz, M. E. Talkowski, E. E. Eichler, S. E. Levy, View ORCID ProfileF. J. Sedlazeck
doi: https://doi.org/10.1101/2023.01.23.525236
M. Mahmoud
1Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
2Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for M. Mahmoud
Y. Huang
3Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02141
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
K. Garimella
3Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02141
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
P. A. Audano
4The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
W. Wan
3Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02141
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
N. Prasad
5Discovery Life Sciences, Huntsville, AL 35806, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
R. E. Handsaker
6Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
S. Hall
5Discovery Life Sciences, Huntsville, AL 35806, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
A. Pionzio
5Discovery Life Sciences, Huntsville, AL 35806, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. C. Schatz
7Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. E. Talkowski
8Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
9Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
E. E. Eichler
10Genome Sci, University of Washington, Seattle, WA, USA
11Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
S. E. Levy
12HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
F. J. Sedlazeck
1Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
2Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
13Department of Computer Science, Rice University, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for F. J. Sedlazeck
  • For correspondence: fritz.sedlazeck@gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compared the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis revealed substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also considered the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produced the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results will lead to widespread improvements across AoU.

Competing Interest Statement

FS received support from Illumina, PacBio, Oxford Nanopore Technologies, and Genentech. SH received support from Hudson Alpha Institute for Biotechnology and Owens Cross Roads.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted January 24, 2023.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Utility of long-read sequencing for All of Us
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Utility of long-read sequencing for All of Us
M. Mahmoud, Y. Huang, K. Garimella, P. A. Audano, W. Wan, N. Prasad, R. E. Handsaker, S. Hall, A. Pionzio, M. C. Schatz, M. E. Talkowski, E. E. Eichler, S. E. Levy, F. J. Sedlazeck
bioRxiv 2023.01.23.525236; doi: https://doi.org/10.1101/2023.01.23.525236
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Utility of long-read sequencing for All of Us
M. Mahmoud, Y. Huang, K. Garimella, P. A. Audano, W. Wan, N. Prasad, R. E. Handsaker, S. Hall, A. Pionzio, M. C. Schatz, M. E. Talkowski, E. E. Eichler, S. E. Levy, F. J. Sedlazeck
bioRxiv 2023.01.23.525236; doi: https://doi.org/10.1101/2023.01.23.525236

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4230)
  • Biochemistry (9123)
  • Bioengineering (6767)
  • Bioinformatics (23970)
  • Biophysics (12109)
  • Cancer Biology (9511)
  • Cell Biology (13753)
  • Clinical Trials (138)
  • Developmental Biology (7623)
  • Ecology (11675)
  • Epidemiology (2066)
  • Evolutionary Biology (15492)
  • Genetics (10632)
  • Genomics (14310)
  • Immunology (9473)
  • Microbiology (22824)
  • Molecular Biology (9087)
  • Neuroscience (48920)
  • Paleontology (355)
  • Pathology (1480)
  • Pharmacology and Toxicology (2566)
  • Physiology (3841)
  • Plant Biology (8322)
  • Scientific Communication and Education (1468)
  • Synthetic Biology (2295)
  • Systems Biology (6180)
  • Zoology (1299)