PT - JOURNAL ARTICLE AU - Rounak Dey AU - Ellen M. Schmidt AU - Goncalo R. Abecasis AU - Seunggeun Lee TI - A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS AID - 10.1101/109876 DP - 2017 Jan 01 TA - bioRxiv PG - 109876 4099 - http://biorxiv.org/content/early/2017/04/06/109876.short 4100 - http://biorxiv.org/content/early/2017/04/06/109876.full AB - The availability of electronic health record (EHR)-based phenotypes allows for genome-wide association analyses in thousands of traits, and has great potential to identify novel genetic variants associated with clinical phenotypes. We can interpret the phenome-wide association study (PheWAS) result for a single genetic variant by observing its association across a landscape of phenotypes. Since PheWAS can test 1000s of binary phenotypes, and most of them have unbalanced (case:control = 1:10) or often extremely unbalanced (case:control = 1:600) case-control ratios, existing methods cannot provide an accurate and scalable way to test for associations. Here we propose a computationally fast score test-based method that estimates the distribution of the test statistic using the saddlepoint approximation. Our method is much faster than the state of the art Firth’s test (∼ 100 times). It can also adjust for covariates and control type I error rates even when the case-control ratio is extremely unbalanced. Through application to PheWAS data from the Michigan Genomics Initiative, we show that the proposed method can control type I error rates while replicating previously known association signals even for traits with a very small number of cases and a large number of controls.