RT Journal Article SR Electronic T1 A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations JF bioRxiv FD Cold Spring Harbor Laboratory SP 408484 DO 10.1101/408484 A1 Sahir R Bhatnagar A1 Karim Oualkacha A1 Yi Yang A1 Marie Forest A1 Celia MT Greenwood YR 2018 UL http://biorxiv.org/content/early/2018/10/03/408484.abstract AB Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effect models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects’ relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM framework called ggmix that simultaneously, in one step, selects variables and estimates their effects, while accounting for between individual correlations. Our method can accommodate several sparsity-inducing penalties such as the lasso, elastic net and group lasso, and also readily handles prior annotation information in the form of weights. We develop a blockwise coordinate descent algorithm which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations, we show that ggmix leads to correct Type 1 error control and improved variance component estimation compared to the two-stage approach or principal component adjustment. ggmix is also robust to different kinship structures and heritability proportions. Our algorithms are available in an R package (https://github.com/greenwoodlab).