Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Computationally efficient whole genome regression for quantitative and binary traits

View ORCID ProfileJoelle Mbatchou, Leland Barnard, View ORCID ProfileJoshua Backman, View ORCID ProfileAnthony Marcketta, Jack A. Kosmicki, Andrey Ziyatdinov, View ORCID ProfileChristian Benner, Colm O’Dushlaine, Mathew Barber, View ORCID ProfileBoris Boutkov, Lukas Habegger, View ORCID ProfileManuel Ferreira, Aris Baras, View ORCID ProfileJeffrey Reid, View ORCID ProfileGonçalo Abecasis, View ORCID ProfileEvan Maxwell, View ORCID ProfileJonathan Marchini
doi: https://doi.org/10.1101/2020.06.19.162354
Joelle Mbatchou
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joelle Mbatchou
Leland Barnard
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joshua Backman
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joshua Backman
Anthony Marcketta
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anthony Marcketta
Jack A. Kosmicki
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrey Ziyatdinov
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christian Benner
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christian Benner
Colm O’Dushlaine
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mathew Barber
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Boris Boutkov
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Boris Boutkov
Lukas Habegger
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manuel Ferreira
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Manuel Ferreira
Aris Baras
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey Reid
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jeffrey Reid
Gonçalo Abecasis
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gonçalo Abecasis
Evan Maxwell
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Evan Maxwell
Jonathan Marchini
1Regeneron Genetics Center, Tarrytown, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonathan Marchini
  • For correspondence: jonathan.marchini@regeneron.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine learning method called REGENIE for fitting a whole genome regression model that is orders of magnitude faster than alternatives, while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes, and only requires local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives which must load genomewide matrices into memory. This results in substantial savings in compute time and memory usage. The method is applicable to both quantitative and binary phenotypes, including rare variant analysis of binary traits with unbalanced case-control ratios where we introduce a fast, approximate Firth logistic regression test. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach compared to several existing methods using quantitative and binary traits from the UK Biobank dataset with up to 407,746 individuals.

Competing Interest Statement

All of the authors are current employees and/or stockholders of Regeneron Pharmaceuticals

Footnotes

  • Author list edit

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted June 22, 2020.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Computationally efficient whole genome regression for quantitative and binary traits
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Computationally efficient whole genome regression for quantitative and binary traits
Joelle Mbatchou, Leland Barnard, Joshua Backman, Anthony Marcketta, Jack A. Kosmicki, Andrey Ziyatdinov, Christian Benner, Colm O’Dushlaine, Mathew Barber, Boris Boutkov, Lukas Habegger, Manuel Ferreira, Aris Baras, Jeffrey Reid, Gonçalo Abecasis, Evan Maxwell, Jonathan Marchini
bioRxiv 2020.06.19.162354; doi: https://doi.org/10.1101/2020.06.19.162354
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Computationally efficient whole genome regression for quantitative and binary traits
Joelle Mbatchou, Leland Barnard, Joshua Backman, Anthony Marcketta, Jack A. Kosmicki, Andrey Ziyatdinov, Christian Benner, Colm O’Dushlaine, Mathew Barber, Boris Boutkov, Lukas Habegger, Manuel Ferreira, Aris Baras, Jeffrey Reid, Gonçalo Abecasis, Evan Maxwell, Jonathan Marchini
bioRxiv 2020.06.19.162354; doi: https://doi.org/10.1101/2020.06.19.162354

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4222)
  • Biochemistry (9096)
  • Bioengineering (6744)
  • Bioinformatics (23927)
  • Biophysics (12077)
  • Cancer Biology (9485)
  • Cell Biology (13722)
  • Clinical Trials (138)
  • Developmental Biology (7614)
  • Ecology (11652)
  • Epidemiology (2066)
  • Evolutionary Biology (15469)
  • Genetics (10613)
  • Genomics (14289)
  • Immunology (9453)
  • Microbiology (22767)
  • Molecular Biology (9057)
  • Neuroscience (48818)
  • Paleontology (354)
  • Pathology (1479)
  • Pharmacology and Toxicology (2560)
  • Physiology (3820)
  • Plant Biology (8307)
  • Scientific Communication and Education (1467)
  • Synthetic Biology (2285)
  • Systems Biology (6168)
  • Zoology (1297)