PT - JOURNAL ARTICLE AU - Bowen Jin AU - John A. Capra AU - Penelope Benchek AU - Nicholas Wheeler AU - Adam C. Naj AU - Kara L. Hamilton-Nelson AU - John J. Farrell AU - Yuk Yee Leung AU - Brian Kunkle AU - Badri Vadarajan AU - Gerard D. Schellenberg AU - Richard Mayeux AU - Li-san Wang AU - Lindsay A. Farrer AU - Margaret A. Pericak-Vance AU - Eden R. Martin AU - Jonathan L. Haines AU - Dana C. Crawford AU - William S. Bush TI - An Association Test of the Spatial Distribution of Rare Missense Variants within Protein Structures Improves Statistical Power of Sequencing Studies AID - 10.1101/2021.08.09.455695 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.08.09.455695 4099 - http://biorxiv.org/content/early/2021/08/10/2021.08.09.455695.short 4100 - http://biorxiv.org/content/early/2021/08/10/2021.08.09.455695.full AB - Over 90% of variants are rare, and 50% of them are singletons in the Alzheimer’s Disease Sequencing Project Whole Exome Sequencing (ADSP WES) data. However, either single variant tests or unit-based tests are limited in the statistical power to detect the association between rare variants and phenotypes. To best utilize rare variants and investigate their biological effect, we exam their association with phenotypes in the context of protein. We developed a protein structure-based approach, POKEMON (Protein Optimized Kernel Evaluation of Missense Nucleotides), which evaluates rare missense variants based on their spatial distribution on the protein rather than allele frequency. The hypothesis behind this is that the three-dimensional spatial distribution of variants within a protein structure provides functional context and improves the power of association tests. POKEMON identified four candidate genes from the ADSP WES data, namely two known Alzheimer’s disease (AD) genes (TREM2 and SORL) and two novel genes (DUSP18 and CSF1R). For known AD genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low frequency risk variants within these genes. DUSP18 has a cluster of variants primarily shared by case subjects around the ligand-binding domain, and this cluster is further validated in a replication dataset with a larger sample size. POKEMON is an open-source tool available at https://github.com/bushlab-genomics/POKEMON.Competing Interest StatementThe authors have declared no competing interest.