Efficient Bayesian mixed-model analysis increases association power in large cohorts

Loh, Po-Ru; Tucker, George; Bulik-Sullivan, Brendan K; Vilhjálmsson, Bjarni J; Finucane, Hilary K; Salem, Rany M; Chasman, Daniel I; Ridker, Paul M; Neale, Benjamin M; Berger, Bonnie; Patterson, Nick; Price, Alkes L

doi:10.1038/ng.3190

Technical Report
Published: 02 February 2015

Efficient Bayesian mixed-model analysis increases association power in large cohorts

Po-Ru Loh^1,2,
George Tucker^1,3,4,
Brendan K Bulik-Sullivan^2,5,
Bjarni J Vilhjálmsson^1,2,
Hilary K Finucane³,
Rany M Salem^2,6,
Daniel I Chasman⁷,
Paul M Ridker⁷,
Benjamin M Neale^2,5,
Bonnie Berger^3,4,
Nick Patterson² &
…
Alkes L Price^1,2,8

Nature Genetics volume 47, pages 284–290 (2015)Cite this article

30k Accesses
726 Citations
51 Altmetric
Metrics details

Subjects

Abstract

Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN²) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Computational performance of mixed-model association methods.**

**Figure 2: BOLT-LMM increases the power to detect associations in simulations.**

**Figure 3: BOLT-LMM increases the power to detect associations for WGHS phenotypes.**

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

Associations of dietary patterns with brain health from behavioral, neuroimaging, biochemical and genetic analyses

Article Open access 01 April 2024

Ruohan Zhang, Bei Zhang, … Wei Cheng

References

Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Article CAS PubMed Google Scholar
Kang, H.M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
Article PubMed PubMed Central Google Scholar
Kang, H.M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).
Article CAS PubMed Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).
Article CAS PubMed PubMed Central Google Scholar
Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).
Article CAS PubMed PubMed Central Google Scholar
Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).
Article CAS PubMed PubMed Central Google Scholar
Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., van Duijn, C.M. & Aulchenko, Y.S. Rapid variance components–based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).
Article CAS PubMed Google Scholar
Listgarten, J., Lippert, C. & Heckerman, D. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat. Genet. 45, 470–471 (2013).
Article CAS PubMed Google Scholar
Yang, J., Zaitlen, N.A., Goddard, M.E., Visscher, P.M. & Price, A.L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).
Article PubMed PubMed Central Google Scholar
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
Article PubMed PubMed Central Google Scholar
Stahl, E.A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lippert, C. et al. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep. 3, 1815 (2013).
Article PubMed PubMed Central Google Scholar
Rakitsch, B., Lippert, C., Stegle, O. & Borgwardt, K. A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics 29, 206–214 (2013).
Article CAS PubMed Google Scholar
Meuwissen, T.H., Hayes, B.J. & Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
CAS PubMed PubMed Central Google Scholar
de Los Campos, G., Hickey, J.M., Pong-Wong, R., Daetwyler, H.D. & Calus, M.P. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 327–345 (2013).
Article PubMed PubMed Central Google Scholar
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
Article CAS PubMed PubMed Central Google Scholar
Meuwissen, T.H., Solberg, T.R., Shepherd, R. & Woolliams, J.A. A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value. Genet. Sel. Evol. 41, 2 (2009).
Article PubMed PubMed Central Google Scholar
Carbonetto, P. & Stephens, M. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal. 7, 73–108 (2012).
Article Google Scholar
Logsdon, B.A., Hoffman, G.E. & Mezey, J.G. A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinformatics 11, 58 (2010).
Article PubMed PubMed Central Google Scholar
Jakobsdottir, J. & McPeek, M.S. MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet. 92, 652–666 (2013).
Article CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 10.1038/ng.3211 (2 February 2015).
Ridker, P.M. et al. Rationale, design, and methodology of the Women's Genome Health Study: a genome-wide association study of more than 25,000 initially healthy American women. Clin. Chem. 54, 249–255 (2008).
Article CAS PubMed Google Scholar
García-Cortés, L.A., Moreno, C., Varona, L. & Altarriba, J. Variance component estimation by resampling. J. Anim. Breed. Genet. 109, 358–363 (1992).
Article Google Scholar
Matilainen, K., Mäntysaari, E.A., Lidauer, M.H., Strandén, I. & Thompson, R. Employing a Monte Carlo algorithm in Newton-type methods for restricted maximum likelihood estimation of genetic parameters. PLoS ONE 8, e80821 (2013).
Article PubMed PubMed Central Google Scholar
Legarra, A. & Misztal, I. Computing strategies in genome-wide selection. J. Dairy Sci. 91, 360–366 (2008).
Article CAS PubMed Google Scholar
VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423 (2008).
Article CAS PubMed Google Scholar
Sawcer, S. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).
Article CAS PubMed PubMed Central Google Scholar
Aulchenko, Y.S., Ripke, S., Isaacs, A. & Van Duijn, C.M. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007).
Article CAS PubMed Google Scholar
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Article CAS PubMed Google Scholar
Wray, N.R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).
Article CAS PubMed PubMed Central Google Scholar
Campbell, C.D. et al. Demonstrating stratification in a European American population. Nat. Genet. 37, 868–872 (2005).
Article CAS PubMed Google Scholar
Tucker, G., Price, A.L. & Berger, B.A. Improving the power of GWAS and avoiding confounding from population stratification with PC-Select. Genetics 197, 1045–1049 (2014).
Article PubMed PubMed Central Google Scholar
Stephens, M. & Balding, D.J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009).
Article CAS PubMed Google Scholar
Logsdon, B.A., Carty, C.L., Reiner, A.P., Dai, J.Y. & Kooperberg, C. A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging. Bioinformatics 28, 1738–1744 (2012).
Article CAS PubMed PubMed Central Google Scholar
Styrkarsdottir, U. et al. Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 497, 517–520 (2013).
Article CAS PubMed Google Scholar
Do, C.B. et al. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS Genet. 7, e1002141 (2011).
Article CAS PubMed PubMed Central Google Scholar
Hayeck, T. et al. Mixed model with correction for case-control ascertainment increases association power. bioRxiv 10.1101/008755 (2014).
Speed, D. & Balding, D.J. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chen, W.-M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).
Article CAS PubMed PubMed Central Google Scholar
Aulchenko, Y.S., De Koning, D.-J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).
Article CAS PubMed PubMed Central Google Scholar
Chen, W.-M., Manichaikul, A. & Rich, S.S. A generalized family-based association test for dichotomous traits. Am. J. Hum. Genet. 85, 364–376 (2009).
Article CAS PubMed PubMed Central Google Scholar
Boyd, S.P. & Vandenberghe, L. Convex Optimization (Cambridge University Press, 2004).
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We are grateful to M. Lipson, S. Simmons, A. Gusev, K. Galinsky, J. Yang, P. Visscher, Z. Zhu and D. Gudbjartsson for helpful discussions. This research was supported by US National Institutes of Health grant R01 HG006399 and US National Institutes of Health fellowship F32 HG007805. H.K.F. was supported by the Fannie and John Hertz Foundation. The WGHS is supported by HL043851 and grants HL080467 from the National Heart, Lung, and Blood Institute and grant CA047988 from the National Cancer Institute, by the Donald W. Reynolds Foundation and by the Fondation Leducq, with collaborative scientific support and funding for genotyping provided by Amgen.

Author information

Authors and Affiliations

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Po-Ru Loh, George Tucker, Bjarni J Vilhjálmsson & Alkes L Price
Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
Po-Ru Loh, Brendan K Bulik-Sullivan, Bjarni J Vilhjálmsson, Rany M Salem, Benjamin M Neale, Nick Patterson & Alkes L Price
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
George Tucker, Hilary K Finucane & Bonnie Berger
Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA
George Tucker & Bonnie Berger
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
Brendan K Bulik-Sullivan & Benjamin M Neale
Department of Endocrinology, Children's Hospital Boston, Boston, Massachusetts, USA
Rany M Salem
Division of Preventive Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
Daniel I Chasman & Paul M Ridker
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Alkes L Price

Authors

Po-Ru Loh
View author publications
You can also search for this author in PubMed Google Scholar
George Tucker
View author publications
You can also search for this author in PubMed Google Scholar
Brendan K Bulik-Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Bjarni J Vilhjálmsson
View author publications
You can also search for this author in PubMed Google Scholar
Hilary K Finucane
View author publications
You can also search for this author in PubMed Google Scholar
Rany M Salem
View author publications
You can also search for this author in PubMed Google Scholar
Daniel I Chasman
View author publications
You can also search for this author in PubMed Google Scholar
Paul M Ridker
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin M Neale
View author publications
You can also search for this author in PubMed Google Scholar
Bonnie Berger
View author publications
You can also search for this author in PubMed Google Scholar
Nick Patterson
View author publications
You can also search for this author in PubMed Google Scholar
Alkes L Price
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.-R.L., N.P. and A.L.P. designed experiments. P.-R.L. performed experiments. P.-R.L., G.T., B.K.B.-S., B.J.V., H.K.F. and A.L.P. analyzed data. D.I.C. and P.M.R. provided data. All authors wrote the manuscript.

Corresponding authors

Correspondence to Po-Ru Loh or Alkes L Price.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Tables 1–15 and Supplementary Note. (PDF 2591 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loh, PR., Tucker, G., Bulik-Sullivan, B. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 47, 284–290 (2015). https://doi.org/10.1038/ng.3190

Download citation

Received: 22 July 2014
Accepted: 16 December 2014
Published: 02 February 2015
Issue Date: March 2015
DOI: https://doi.org/10.1038/ng.3190

This article is cited by

Protein-altering variants at copy number-variable regions influence diverse human phenotypes
- Margaux L. A. Hujoel
- Robert E. Handsaker
- Po-Ru Loh
Nature Genetics (2024)
Sex-specific genetic architecture of blood pressure
- Min-Lee Yang
- Chang Xu
- Santhi K. Ganesh
Nature Medicine (2024)
Improving fine-mapping by modeling infinitesimal effects
- Ran Cui
- Roy A. Elzur
- Hilary K. Finucane
Nature Genetics (2024)
The impact of reproductive factors on the metabolic profile of females from menarche to menopause
- Gemma L. Clayton
- Maria Carolina Borges
- Deborah A. Lawlor
Nature Communications (2024)
Genetic architecture distinguishes tinnitus from hearing loss
- Royce E. Clifford
- Adam X. Maihofer
- Caroline M. Nievergelt
Nature Communications (2024)