PT - JOURNAL ARTICLE AU - MJ. Emond AU - T. Eoin West TI - Extreme Sampling for Genetic Rare Variant Association Analysis of Dichotomous Traits with Focus on Infectious Disease Susceptibility AID - 10.1101/2021.12.02.470949 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.12.02.470949 4099 - http://biorxiv.org/content/early/2021/12/02/2021.12.02.470949.1.short 4100 - http://biorxiv.org/content/early/2021/12/02/2021.12.02.470949.1.full AB - As genomic sequencing becomes more accurate and less costly, large cohorts and consortiums of cohorts are providing high power for rare variant association studies for many conditions. When large sample sizes are not attainable and the phenotype under study is continuous, an extreme phenotypes design can provide high statistical power with a small to moderate sample size. We extend the extreme phenotypes design to the dichotomous infectious disease outcome by sampling on extremes of the pathogenic exposure instead of sampling on extremes of phenotype. We use a likelihood ratio test (LRT) to test the significance of association between infection status and presence of susceptibility rare variants. More than 10 billion simulations are studied to assess the method. The method results in high sample enrichment for rare variants affecting susceptibility. Greater than 90% power to detect rare variant associations is attained in reasonable scenarios. The ordinary case-control design requires orders of magnitude more samples to achieve the same power. The Type I error rate of the LRT is accurate even for p-values < 10-7. We find that erroroneous exposure assessment can lead to power loss more severe than excluding the observations with errors. Nevertheless, careful sampling on exposure extremes can make a study feasible by providing adequate statistical power. Limitations of this method are not unique to this design, and the power is never less than that of the ordinary case-control design. The method applies without modification to other dichotomous outcomes that have strong association with a continuous covariate.Competing Interest StatementThe authors have declared no competing interest.