PT  - JOURNAL ARTICLE
AU  - Ronald Yurko
AU  - Max G’Sell
AU  - Kathryn Roeder
AU  - Bernie Devlin
TI  - Application of post-selection inference to multi-omics data yields insights into the etiologies of human diseases
AID  - 10.1101/806471
DP  - 2019 Jan 01
TA  - bioRxiv
PG  - 806471
4099  - http://biorxiv.org/content/early/2019/10/16/806471.short
4100  - http://biorxiv.org/content/early/2019/10/16/806471.full
AB  - To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of post-selection inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive p-value thresholding (Lei &amp;amp; Fithian 2018) (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association p-values play the role of the primary data for AdaPT; SNPs are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: independent GWAS statistics from genetically-correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene-gene coexpression, captured by subnetwork (module) membership. In all 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations and it is especially apparent using gene expression information from the developing human prefontal cortex (Werling et al. 2019), as compared to adult tissue samples from the GTEx Consortium. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.