Abstract
Whole genome and exome sequencing studies are used to test the association of rare genetic variants with health traits. Many existing WGS efforts now aggregate data from heterogeneous groups, e.g. combining sets of individuals of European and African ancestries. We here investigate the statistical implications on rare variant association testing with a binary trait when combining together heterogeneous studies, defined as studies with potentially different disease proportion and different frequency of variant carriers. We study and compare in simulations the type 1 error control and power of the naïve Score test, the saddlepoint approximation to the score test (SPA test), and the BinomiRare test in a range of settings, focusing on low numbers of variant carriers. Taking into account test performance as well as computation considerations, we develop recommendations for association analysis of rare genetic variants. We show that the Score test is preferred when the case proportion in the sample is 50%. Otherwise, for very low number of carriers, BinomiRare is preferred due to computational efficiency and type 1 error control. When there are at least 90 carriers in the combined sample, the SPA test generally controls the type 1 error and is preferred over BinomiRare due to higher power and wider implementation in software packages. Finally, we recommend to not sample controls in order to generate more balanced case-control ratio, rather, to use appropriate analytic methods. Sampling of controls reduces power.
Competing Interest Statement
The authors have declared no competing interest.