RT Journal Article SR Electronic T1 Computationally efficient whole genome regression for quantitative and binary traits JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.06.19.162354 DO 10.1101/2020.06.19.162354 A1 Joelle Mbatchou A1 Leland Barnard A1 Joshua Backman A1 Anthony Marcketta A1 Jack A. Kosmicki A1 Andrey Ziyatdinov A1 Christian Benner A1 Colm O’Dushlaine A1 Mathew Barber A1 Boris Boutkov A1 Lukas Habegger A1 Manuel Ferreira A1 Jeffrey Reid A1 Gonçalo Abecasis A1 Evan Maxwell A1 Jonathan Marchini YR 2020 UL http://biorxiv.org/content/early/2020/06/20/2020.06.19.162354.abstract AB Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine learning method called REGENIE for fitting a whole genome regression model that is orders of magnitude faster than alternatives, while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes, and only requires local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives which must load genomewide matrices into memory. This results in substantial savings in compute time and memory usage. The method is applicable to both quantitative and binary phenotypes, including rare variant analysis of binary traits with unbalanced case-control ratios where we introduce a fast, approximate Firth logistic regression test. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach compared to several existing methods using quantitative and binary traits from the UK Biobank dataset with up to 407,746 individuals.Competing Interest StatementAll of the authors are current employees and/or stockholders of Regeneron Pharmaceuticals