Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Efficient Bayesian mixed-model analysis increases association power in large cohorts

Abstract

Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN2) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Computational performance of mixed-model association methods.
Figure 2: BOLT-LMM increases the power to detect associations in simulations.
Figure 3: BOLT-LMM increases the power to detect associations for WGHS phenotypes.

Similar content being viewed by others

References

  1. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    Article  CAS  PubMed  Google Scholar 

  2. Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Kang, H.M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

    Article  CAS  PubMed  Google Scholar 

  6. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., van Duijn, C.M. & Aulchenko, Y.S. Rapid variance components–based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).

    Article  CAS  PubMed  Google Scholar 

  11. Listgarten, J., Lippert, C. & Heckerman, D. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat. Genet. 45, 470–471 (2013).

    Article  CAS  PubMed  Google Scholar 

  12. Yang, J., Zaitlen, N.A., Goddard, M.E., Visscher, P.M. & Price, A.L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Stahl, E.A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lippert, C. et al. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep. 3, 1815 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Rakitsch, B., Lippert, C., Stegle, O. & Borgwardt, K. A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics 29, 206–214 (2013).

    Article  CAS  PubMed  Google Scholar 

  17. Meuwissen, T.H., Hayes, B.J. & Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. de Los Campos, G., Hickey, J.M., Pong-Wong, R., Daetwyler, H.D. & Calus, M.P. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 327–345 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Meuwissen, T.H., Solberg, T.R., Shepherd, R. & Woolliams, J.A. A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value. Genet. Sel. Evol. 41, 2 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Carbonetto, P. & Stephens, M. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7, 73–108 (2012).

    Article  Google Scholar 

  22. Logsdon, B.A., Hoffman, G.E. & Mezey, J.G. A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinformatics 11, 58 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Jakobsdottir, J. & McPeek, M.S. MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet. 92, 652–666 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Bulik-Sullivan, B. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 10.1038/ng.3211 (2 February 2015).

  25. Ridker, P.M. et al. Rationale, design, and methodology of the Women's Genome Health Study: a genome-wide association study of more than 25,000 initially healthy American women. Clin. Chem. 54, 249–255 (2008).

    Article  CAS  PubMed  Google Scholar 

  26. García-Cortés, L.A., Moreno, C., Varona, L. & Altarriba, J. Variance component estimation by resampling. J. Anim. Breed. Genet. 109, 358–363 (1992).

    Article  Google Scholar 

  27. Matilainen, K., Mäntysaari, E.A., Lidauer, M.H., Strandén, I. & Thompson, R. Employing a Monte Carlo algorithm in Newton-type methods for restricted maximum likelihood estimation of genetic parameters. PLoS ONE 8, e80821 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Legarra, A. & Misztal, I. Computing strategies in genome-wide selection. J. Dairy Sci. 91, 360–366 (2008).

    Article  CAS  PubMed  Google Scholar 

  29. VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423 (2008).

    Article  CAS  PubMed  Google Scholar 

  30. Sawcer, S. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Aulchenko, Y.S., Ripke, S., Isaacs, A. & Van Duijn, C.M. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007).

    Article  CAS  PubMed  Google Scholar 

  32. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  33. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    Article  CAS  PubMed  Google Scholar 

  34. Wray, N.R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Campbell, C.D. et al. Demonstrating stratification in a European American population. Nat. Genet. 37, 868–872 (2005).

    Article  CAS  PubMed  Google Scholar 

  36. Tucker, G., Price, A.L. & Berger, B.A. Improving the power of GWAS and avoiding confounding from population stratification with PC-Select. Genetics 197, 1045–1049 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Stephens, M. & Balding, D.J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009).

    Article  CAS  PubMed  Google Scholar 

  38. Logsdon, B.A., Carty, C.L., Reiner, A.P., Dai, J.Y. & Kooperberg, C. A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging. Bioinformatics 28, 1738–1744 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Styrkarsdottir, U. et al. Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 497, 517–520 (2013).

    Article  CAS  PubMed  Google Scholar 

  40. Do, C.B. et al. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS Genet. 7, e1002141 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Hayeck, T. et al. Mixed model with correction for case-control ascertainment increases association power. bioRxiv 10.1101/008755 (2014).

  42. Speed, D. & Balding, D.J. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Chen, W.-M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Aulchenko, Y.S., De Koning, D.-J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Chen, W.-M., Manichaikul, A. & Rich, S.S. A generalized family-based association test for dichotomous traits. Am. J. Hum. Genet. 85, 364–376 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Boyd, S.P. & Vandenberghe, L. Convex Optimization (Cambridge University Press, 2004).

  47. Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to M. Lipson, S. Simmons, A. Gusev, K. Galinsky, J. Yang, P. Visscher, Z. Zhu and D. Gudbjartsson for helpful discussions. This research was supported by US National Institutes of Health grant R01 HG006399 and US National Institutes of Health fellowship F32 HG007805. H.K.F. was supported by the Fannie and John Hertz Foundation. The WGHS is supported by HL043851 and grants HL080467 from the National Heart, Lung, and Blood Institute and grant CA047988 from the National Cancer Institute, by the Donald W. Reynolds Foundation and by the Fondation Leducq, with collaborative scientific support and funding for genotyping provided by Amgen.

Author information

Authors and Affiliations

Authors

Contributions

P.-R.L., N.P. and A.L.P. designed experiments. P.-R.L. performed experiments. P.-R.L., G.T., B.K.B.-S., B.J.V., H.K.F. and A.L.P. analyzed data. D.I.C. and P.M.R. provided data. All authors wrote the manuscript.

Corresponding authors

Correspondence to Po-Ru Loh or Alkes L Price.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Tables 1–15 and Supplementary Note. (PDF 2591 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Loh, PR., Tucker, G., Bulik-Sullivan, B. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 47, 284–290 (2015). https://doi.org/10.1038/ng.3190

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3190

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics