Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

View ORCID ProfileWei Zhou, View ORCID ProfileJonas B. Nielsen, View ORCID ProfileLars G. Fritsche, Rounak Dey, Maiken B. Elvestad, View ORCID ProfileBrooke N. Wolford, Jonathon LeFaive, Peter VandeHaar, View ORCID ProfileSarah A Gagliano, Aliya Gifford, Lisa A. Bastarache, Wei-Qi Wei, Joshua C. Denny, Maoxuan Lin, Kristian Hveem, Hyun Min Kang, Goncalo R. Abecasis, View ORCID ProfileCristen J. Willer, View ORCID ProfileSeunggeun Lee
doi: https://doi.org/10.1101/212357
Wei Zhou
Center for Statistical Genetics, University of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wei Zhou
Jonas B. Nielsen
Dept. of Internal Medicine, Division of Cardiology, University of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonas B. Nielsen
Lars G. Fritsche
Center for Statistical Genetics, University of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lars G. Fritsche
Rounak Dey
Center for Statistical Genetics & Dept. of Biostatistics, Univ. of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maiken B. Elvestad
K. G. Jebsen Center for Genetic Epidemiology, Dept. of Public Health, NTNU;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brooke N. Wolford
Center for Statistical Genetics, University of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brooke N. Wolford
Jonathon LeFaive
Center for Statistical Genetics & Dept. of Biostatistics, Univ. of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter VandeHaar
Center for Statistical Genetics & Dept. of Biostatistics, Univ. of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sarah A Gagliano
Center for Statistical Genetics & Dept. of Biostatistics, Univ. of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sarah A Gagliano
Aliya Gifford
Departments of Biomedical Informatics, Vanderbilt University, Nashville, TN;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lisa A. Bastarache
Departments of Biomedical Informatics, Vanderbilt University, Nashville, TN;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wei-Qi Wei
Departments of Biomedical Informatics, Vanderbilt University, Nashville, TN;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joshua C. Denny
Depts. of Biomedical Informatics and Medicine, Vanderbilt University, Nashville, TN;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maoxuan Lin
University of Michigan;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kristian Hveem
Norwegian University of Science and Technology
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hyun Min Kang
Center for Statistical Genetics & Dept. of Biostatistics, Univ. of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Goncalo R. Abecasis
Center for Statistical Genetics & Dept. of Biostatistics, Univ. of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristen J. Willer
Dept. of Internal Medicine, Division of Cardiology, University of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Cristen J. Willer
Seunggeun Lee
Center for Statistical Genetics & Dept. of Biostatistics, Univ. of Michigan, Ann Arbor;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Seunggeun Lee
  • For correspondence: leeshawn@umich.edu
  • Abstract
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, linear mixed model and the recently proposed logistic mixed model, perform poorly -- producing large type I error rates -- in the analysis of phenotypes with unbalanced case-control ratios. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation (SPA) to calibrate the distribution of score test statistics. This method, SAIGE, provides accurate p-values even when case-control ratios are extremely unbalanced. It utilizes state-of-art optimization strategies to reduce computational time and memory cost of generalized mixed model. The computation cost linearly depends on sample size, and hence can be applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK-Biobank data of 408,961 white British European-ancestry samples, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.

Footnotes

  • Numerical stability and convergence for the numerical and asymptotic approximations that we use to achieve the computational scalability have been evaluated and the details are now added to the supplementary material. We have added more detailed derivation of the algorithm and a discussion on the heritability estimation.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted April 24, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies
Wei Zhou, Jonas B. Nielsen, Lars G. Fritsche, Rounak Dey, Maiken B. Elvestad, Brooke N. Wolford, Jonathon LeFaive, Peter VandeHaar, Sarah A Gagliano, Aliya Gifford, Lisa A. Bastarache, Wei-Qi Wei, Joshua C. Denny, Maoxuan Lin, Kristian Hveem, Hyun Min Kang, Goncalo R. Abecasis, Cristen J. Willer, Seunggeun Lee
bioRxiv 212357; doi: https://doi.org/10.1101/212357
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies
Wei Zhou, Jonas B. Nielsen, Lars G. Fritsche, Rounak Dey, Maiken B. Elvestad, Brooke N. Wolford, Jonathon LeFaive, Peter VandeHaar, Sarah A Gagliano, Aliya Gifford, Lisa A. Bastarache, Wei-Qi Wei, Joshua C. Denny, Maoxuan Lin, Kristian Hveem, Hyun Min Kang, Goncalo R. Abecasis, Cristen J. Willer, Seunggeun Lee
bioRxiv 212357; doi: https://doi.org/10.1101/212357

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (996)
  • Biochemistry (1485)
  • Bioengineering (938)
  • Bioinformatics (6803)
  • Biophysics (2414)
  • Cancer Biology (1782)
  • Cell Biology (2514)
  • Clinical Trials (106)
  • Developmental Biology (1683)
  • Ecology (2553)
  • Epidemiology (1488)
  • Evolutionary Biology (5003)
  • Genetics (3598)
  • Genomics (4614)
  • Immunology (1157)
  • Microbiology (4222)
  • Molecular Biology (1617)
  • Neuroscience (10744)
  • Paleontology (81)
  • Pathology (236)
  • Pharmacology and Toxicology (407)
  • Physiology (552)
  • Plant Biology (1445)
  • Scientific Communication and Education (410)
  • Synthetic Biology (542)
  • Systems Biology (1868)
  • Zoology (257)