ABSTRACT
Genetic signal detection in genome-wide association studies (GWAS) can be enhanced by pooling small signals from multiple Single Nucleotide Polymorphism (SNP), e.g. across genes and pathways. Because genes are believed to influence traits via gene expression, it is of interest to combine information from expression Quantitative Trait Loci (eQTLs) in a gene or genes in the same pathway. Such methods, widely referred as transcriptomic, already exists for gene analysis, e.g. our group’s Joint Effect on Phenotype of eQTLs associated with a Gene in Mixed cohorts (JEPEGMIX/JEPEGMIX2). However, due to the its quadratic (in the number of SNPs) computational burden for computing linkage disequilibrium (LD) across large regions, transcriptomic methods are not yet available for arbitrarily large pathways/gene sets. To overcome this obstacle, we propose JEPEGMIX2-pathways (JEPEGMIX-P), which implements a novel transcriptomic pathway method having a desirable linear computational burden. It 1) automatically estimates the ethnic composition (weights) of the cohort using a very large and diverse reference panel (33K subjects, including ∼11K Han Chinese), 2) uses these weights and the reference panel to estimate the LD between gene transcriptomic statistics and 3) uses the estimated LD values along with GWAS summary statistics to rapidly test for the association between trait and the expression of genes even in the largest pathways. To underline its potential for increasing the power to uncover genetic signals over the state-of-the-art and commonly used non-transcriptomics (agnostic) methods, e.g. MAGMA, we applied JEPEGMIX2-P to summary statistics of several large meta-analyses from Psychiatric Genetics Consortium (PGC). Surprisingly, most of these significant pathways do not seem to be directly involved in the activity of the central nervous system. While our work is just the first step on the road toward the end goal of clinical translation, PGC anorexia results suggest possible avenues for (personalized) treatment.
Author summary By using summary statistics from genetic studies to infer the association between the biologically relevant measure of gene expression and traits, transcriptomics methods are a promising avenue for uncovering risk genes and pathway of genes for complex human diseases. While numerous such transcriptomic methods were used to uncover a large number of gene level signals, due to the extreme computational burden, none of these methods was successfully extended for detecting signals at the, probably even more biologically relevant, pathway of genes level. In this paper we propose a novel transcriptomic pathway method that has a close to minimally attainable computation burden and is applicable “as-is” to ethnically diverse studies. The proposed method adequately controls the false positive rates. Its application to psychiatric disorder studies unveils numerous new signals that were not detected by state-of-the art non-transcriptomic (agnostic) methods.