Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Scaling accurate genetic variant discovery to tens of thousands of samples

Ryan Poplin, Valentin Ruano-Rubio, Mark A. DePristo, Tim J. Fennell, Mauricio O. Carneiro, Geraldine A. Van der Auwera, David E. Kling, Laura D. Gauthier, Ami Levy-Moonshine, David Roazen, Khalid Shakir, Joel Thibault, Sheila Chandran, Chris Whelan, Monkol Lek, Stacey Gabriel, Mark J Daly, Ben Neale, Daniel G. MacArthur, Eric Banks
doi: https://doi.org/10.1101/201178
Ryan Poplin
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Valentin Ruano-Rubio
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark A. DePristo
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tim J. Fennell
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mauricio O. Carneiro
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Geraldine A. Van der Auwera
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David E. Kling
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Laura D. Gauthier
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
2Massachusetts General Hospital, Boston, MA 02114
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ami Levy-Moonshine
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Roazen
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Khalid Shakir
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joel Thibault
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sheila Chandran
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chris Whelan
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Monkol Lek
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
2Massachusetts General Hospital, Boston, MA 02114
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stacey Gabriel
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark J Daly
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
2Massachusetts General Hospital, Boston, MA 02114
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ben Neale
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
2Massachusetts General Hospital, Boston, MA 02114
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel G. MacArthur
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
2Massachusetts General Hospital, Boston, MA 02114
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric Banks
1Broad Institute, 75 Ames Street, Cambridge, MA 02142
2Massachusetts General Hospital, Boston, MA 02114
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC-RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted July 24, 2018.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Scaling accurate genetic variant discovery to tens of thousands of samples
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Scaling accurate genetic variant discovery to tens of thousands of samples
Ryan Poplin, Valentin Ruano-Rubio, Mark A. DePristo, Tim J. Fennell, Mauricio O. Carneiro, Geraldine A. Van der Auwera, David E. Kling, Laura D. Gauthier, Ami Levy-Moonshine, David Roazen, Khalid Shakir, Joel Thibault, Sheila Chandran, Chris Whelan, Monkol Lek, Stacey Gabriel, Mark J Daly, Ben Neale, Daniel G. MacArthur, Eric Banks
bioRxiv 201178; doi: https://doi.org/10.1101/201178
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Scaling accurate genetic variant discovery to tens of thousands of samples
Ryan Poplin, Valentin Ruano-Rubio, Mark A. DePristo, Tim J. Fennell, Mauricio O. Carneiro, Geraldine A. Van der Auwera, David E. Kling, Laura D. Gauthier, Ami Levy-Moonshine, David Roazen, Khalid Shakir, Joel Thibault, Sheila Chandran, Chris Whelan, Monkol Lek, Stacey Gabriel, Mark J Daly, Ben Neale, Daniel G. MacArthur, Eric Banks
bioRxiv 201178; doi: https://doi.org/10.1101/201178

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4229)
  • Biochemistry (9109)
  • Bioengineering (6753)
  • Bioinformatics (23944)
  • Biophysics (12103)
  • Cancer Biology (9498)
  • Cell Biology (13745)
  • Clinical Trials (138)
  • Developmental Biology (7617)
  • Ecology (11664)
  • Epidemiology (2066)
  • Evolutionary Biology (15479)
  • Genetics (10620)
  • Genomics (14297)
  • Immunology (9467)
  • Microbiology (22795)
  • Molecular Biology (9078)
  • Neuroscience (48894)
  • Paleontology (355)
  • Pathology (1479)
  • Pharmacology and Toxicology (2566)
  • Physiology (3824)
  • Plant Biology (8309)
  • Scientific Communication and Education (1467)
  • Synthetic Biology (2290)
  • Systems Biology (6172)
  • Zoology (1297)