RT Journal Article SR Electronic T1 A Fast and Flexible Algorithm for Solving the Lasso in Large-scale and Ultrahigh-dimensional Problems JF bioRxiv FD Cold Spring Harbor Laboratory SP 630079 DO 10.1101/630079 A1 Qian, Junyang A1 Du, Wenfei A1 Tanigawa, Yosuke A1 Aguirre, Matthew A1 Tibshirani, Robert A1 Rivas, Manuel A. A1 Hastie, Trevor YR 2019 UL http://biorxiv.org/content/early/2019/05/07/630079.abstract AB Since its first proposal in statistics (Tibshirani, 1996), the lasso has been an effective method for simultaneous variable selection and estimation. A number of packages have been developed to solve the lasso efficiently. However as large datasets become more prevalent, many algorithms are constrained by efficiency or memory bounds. In this paper, we propose a meta algorithm batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and build a scalable lasso solution for large datasets. We also introduce snpnet, an R package that implements the proposed algorithm on top of glmnet (Friedman et al., 2010a) for large-scale single nucleotide polymorphism (SNP) datasets that are widely studied in genetics. We demonstrate results on a large genotype-phenotype dataset from the UK Biobank, where we achieve state-of-the-art heritability estimation on quantitative and qualitative traits including height, body mass index, asthma and high cholesterol.