Abstract
Transcriptome-wide association studies (TWAS) integrate GWAS and expression quantitative trait locus (eQTL) datasets to discover candidate causal gene-trait associations. We integrate multi-tissue expression panels and summary GWAS for LDL cholesterol and Crohn’s disease to show that TWAS are highly vulnerable to discovering non-causal genes, because variants at a single GWAS hit locus are often eQTLs for multiple genes. TWAS exhibit acute instability when the tissue of the expression panel is changed: candidate causal genes that are TWAS hits in one tissue are usually no longer hits in another, due to lack of expression or strong eQTLs, even though non-causal genes at the same loci remain. Because of these vulnerabilities, it is invalid to use TWAS as a method for finding causal genes, though it can be used as a weighted burden test to identify trait-associated loci. More broadly, our results showcase limitations of using expression variation across individuals to determine causal genes at GWAS loci.
Introduction
Transcriptome-wide association studies (TWAS) are a recent family of methods that leverage expression reference panels (eQTL cohorts with expression and genotype data) to discover associations in GWAS datasets1,2. TWAS begin by building predictive models of gene expression from allele counts (typically using variants within a window of 500 kb or 1 MB around the gene), then use these models to predict expression for each individual in the GWAS cohort and associate this predicted expression with the trait (Fig. 1).
TWAS have garnered substantial interest within the human genetics community and TWAS have subsequently been conducted for a wide variety of traits and tissues 3. A key reason for the appeal of TWAS is the promise that gene-disease associations represent likely causal genes, although both papers are careful not to claim causality with absolute certainty.
Alternatively, TWAS can be interpreted as a weighted burden test. All existing TWAS methods use a linear expression model, which means that TWAS is equivalent to testing a linear combination of variants against the phenotype, where the weights of the linear combination have been chosen based on how much the variant is predicted to contribute to expression variation across individuals in the reference panel. The goal of a weighted burden test is to increase power relative to single-variant testing (GWAS).
Results
TWAS loci frequently contain multiple hit genes
It is well known that GWAS rarely identifies single variant-trait associations, but instead identifies blocks of associated variants in linkage disequilibrium (LD) with each other (Fig. 1a). Unexpectedly, TWAS also frequently identifies multiple hit genes per locus (Fig. 1b). We call this phenomenon co-regulation.
What is the cause of this phenomenon? To answer this question, we performed TWAS in two traits and two tissues with Fusion, using GWAS summary statistics for LDL cholesterol4 and Crohn’s disease5 and the 522 liver and 447 whole blood expression samples from the STARNET cohort6 (Fig. S2, Online Methods). We clumped hit genes within 2.5 MB and found that while some loci contained only a single hit gene, many contained two, three, four or even up to eleven (Fig. S3).
Correlated expression across individuals may lead to non-causal TWAS hit genes
The conventional way co-regulation is measured is by correlating the expression of a pair of genes across individuals in an expression cohort. Do genes that have correlated expression with a strong TWAS hit also tend to be TWAS hits (Fig. 5a)? We analyzed the locus containing the strongest hit gene across all four TWAS, SORT1 in LDL/Liver (TWAS p< 1 × 10−243; Fig. 2a). SORT1 has strong evidence of causality, though not without some controversy over the precise mechanism: in mouse models, overexpression of SORT1 in liver reduced plasma LDL levels and siRNA knockdown increased plasma LDL levels 7, 8, though in other studies deletion of SORT1 counter-intuitively reduced, rather than increased, atherosclerosis in mice without affecting plasma LDL levels9, 10, 11.
The SORT1 locus contains 8 other TWAS hit genes besides SORT1, and their TWAS p values are highly related to their expression correlation with SORT1 (Spearman = 0.75; Fig. 2b). Given that SORT1 has strong evidence of causality, and that other genes at the locus lack strong literature evidence, the most parsimonious explanation is that most or all of the other genes are non-causal and are only hits due to their correlation with SORT1.
Correlated predicted expression is sufficient for non-causal hits even without correlated total expression
However, expression correlation is not the whole story: after all, TWAS tests for association with predicted expression, not total expression. Total expression includes both genetic and environmental components, and the genetic component of expression includes contributions from common cis eQTLs (the only component reliably detectable in current TWAS methods), rare cis eQTLs, and trans eQTLs. Predicted expression likely only represents a small component of the GWAS individuals’ total expression: a large-scale twin study12 found that common cis eQTLs explain only about 10% of genetic variation in gene expression.
While predicted expression correlations between genes at the same locus are often similar to total expression correlations, they are generally slightly higher, and sometimes substantially (Fig. 3a, Fig. S4). It is sensible for nearby genes to be more tightly co-regulated at the level of cis expression than at the level of total expression, since even if distinct trans and environmental effects act on the two genes, they do at least share the same cis sequence context.
Predicted expression correlation may lead to non-causal hits even for genes with low total expression correlation (Fig. 5b). For instance, SARS is the main outlier in Fig. 2b because, despite having a similar TWAS p value to SORT1, it has an unexpectedly low expression correlation of approximately 0.2; yet it is still a strong hit because of its high predicted expression correlation of approximately 0.9 (Fig. 3a).
Another example is the IRF2BP2 locus in LDL/liver (Fig. 3b), where RP4-781K5.7 is a likely non-causal hit due to predicted expression correlation with IRF2BP2, a gene encoding an inflammation-suppressing regulatory factor with strong evidence of causality from mouse models, at least at the level of atherosclerosis13. While there is almost no correlation in total expression between the two genes (Pearson = −0.02), IRF2BP2’s expression model includes a GWAS hit variant, rs556107, with a negative weight while RP4-781K5.7’s includes the same variant, as well as two other linked variants, with positive weights (Fig. 3c), resulting in almost perfectly anti-correlated predicted expression between the two genes (Pearson = −0.94).
Shared GWAS variants can cause non-causal hits even without correlated predicted expression
More generally, pairs of genes may share GWAS variants in their models even if they have low predicted expression correlation, since other variants that are distinct between the models may “dilute” the correlation (Fig. 5c). For instance, at the NOD2 locus for Crohn’s/whole blood, NOD2 is a known causal gene14,15, but 4 other genes are also TWAS hits (Fig. 4a), none with strong evidence of causality (though rare variants in one gene, ADCY7, have been associated with ulcerative colitis but not Crohn’s16). The model for the strongest hit at the locus, BRD7, puts most of its weight on rs1872691, which is also the strongest GWAS variant in NOD2’s model (Fig. 4b). However, the NOD2 model puts most of its weight on two other variants, rs7202124 and rs1981760, which are slightly weaker GWAS hits. The result is that even though there is information sharing between the models, and BRD7 appears to be a non-causal hit because its model uses a variant that likely derives its GWAS signal from NOD2, the overall predicted expression correlation between the two genes is very low (-0.03), as is the total expression correlation (0.05).
In the most general case, models need not even share the same GWAS variants for there to be non-causal hits (Fig. 5d). For instance, the other two variants in NOD2’s model are neither shared nor in strong LD with of the variants in BRD7s model (Fig. 4b). Under the assumption that NOD2 is the only causal gene at the locus, this suggests that these variants are GWAS hits because they (or variants in LD) regulate NOD2 as well as BRD7, but that this connection is missed by NOD2’s model, i.e. the expression modeling has a false negative. This type of scenario might occur even without any false negatives in the expression modeling, e.g. if the two NOD2 variants (or variants in LD) deleteriously affected the coding sequence of NOD2 as well as regulating BRD7.
Using expression from less related tissues substantially worsens the effects of co-regulation
So far, all our TWAS case studies have used expression from tissues with a clear mechanistic relationship to the trait: liver for LDL and whole blood for Crohn’s. What if we swap these tissues (liver for Crohn’s and whole blood for LDL), so that we are using tissues without a clear mechanistic relationship? It is well-known that the architecture of eQTLs differs substantially across tissues: even among strong eQTLs in GTEx (p ~ 1 × 10−10), one quarter switch which gene they are most significantly associated with across tissue s17.
We manually curated causal genes from the literature at 9 LDL/liver and 4 Crohn’s/whole blood multi-hit TWAS loci and looked at how their hit strengths changed when swapping tissues (Fig. 6). Strikingly, almost every candidate causal gene (9 of 11 for LDL and 5 of 6 for Crohn’s) was no longer a hit in the “opposite” tissue, either because they were not sufficiently expressed (N = 4: PPARG, LPA, LPIN3, SLC22A4) or because they did not have sufficiently heritable cis expression, according to a likelihood ratio test, to be tested by Fusion (N = 10: SORT1, IRF2BP2, TNKS, FADS3, ALDH2, KPNB1, SLC22A5, IRF1, CARD9, STAT3).
Worse, 15 other genes at the same loci were still hits (8 in LDL/whole blood and 7 in Crohn’s/liver), and 5 were even strong hits with p < 1 × 10−20. This suggests that the strategy of conducting TWAS in a tissue that is sub-optimal for the trait being examined (e.g. whole blood, lymphoblastoid cell lines), just because that tissue happens to have a large expression reference panel, is especially problematic because many hit loci may contain only non-causal genes and the causal gene may not even be included in the list of hits.
Discussion
We have shown that it is invalid to use TWAS as a method for finding causal genes, since it is highly vulnerable to non-causal gene-trait associations, intuitively because GWAS hits may be eQTLs for multiple genes. However, the ways in which co-regulation may lead to non-causal hits in TWAS are multi-faceted; co-regulation is hard to quantify, let alone correct for. The problem is particularly acute when using expression from tissues without a clear mechanistic relationship to the trait. It is still valid to use TWAS as a weighted burden test, where the goal is not to identify causal genes but merely discover associated loci.
Is it possible, despite the limitations of TWAS, to somehow perform statistical fine-mapping and determine the causal gene or genes? We believe that it is not, even in principle. This is because predicted expression only imperfectly captures cis expression, the component of expression driven by variants near the gene; there are sources of both variance and bias in the expression modeling. The main source of variance is the finite size of the reference panel, although this can be mitigated with resampling methods. More problematically, the choice of tissue, cell type composition and quantification of the expression panel can all introduce bias. We have shown that using a tissue with a less clear mechanistic relationship to the trait hinders the ability to detect most candidate causal genes. Yet diseases rarely act through a single tissue: different genes may be causal in different tissues, so even using a tissue where most genes are causal may introduce bias for the remaining genes that are causal in a different tissue. Furthermore, most expression panels are gathered for tissues, not cell types, and genes may only be causal for a single cell type within a tissue. There may be substantial cell type heterogeneity within and between samples (e.g. due to the presence of blood and immune cells), which can also introduce bias. It is impossible to quantify every source of bias.
In our case studies, we have generally assumed that the single gene with substantial evidence of causality is the sole causal gene at the locus, with some exceptions where there are multiple candidates and the causal gene or genes are under debate (FADS1-3, SLC22A4/5/IRF1. While this is the most parsimonious explanation, it is possible that some loci harbor multiple causal genes. Indeed, under an omnigenic model of complex traits18, every gene may be causal to some degree. Furthermore, the expression of other genes at the locus may causally contribute to the expression of the causal gene, merely by being actively transcribed, even if the gene is non-coding or its protein product has no causal role19.
The vulnerabilities we have identified in TWAS, co-regulation and tissue bias, also apply to other methods that integrate GWAS and expression data. Gene-trait association testing based on Mendelian Randomization (MR)20, 21, 22 is vulnerable to non-causal hits because co-regulation, as a form of pleiotropy, violates one of the core assumptions of MR23. While the HEIDI test20 is designed to correct MR in the case where the two genes have distinct, but linked, causal variants, it does not control for the case where the two genes share the same causal variant. GWAS-eQTL colocalization methods such as Sherlock24, coloc25, 26, QTLMatch27, eCaviar28, enloc29 and RTC30 are also vulnerable to this phenomenon. The more tightly a pair of genes is co-regulated in cis, the more difficult it becomes to distinguish causality based on GWAS and expression data alone. Our results underscore the need for computational and experimental methods that move beyond using expression variation across individuals to determine the causal genes at GWAS loci.
Methods
TWAS were performed with the Fusion software (https://github.com/gusevlab/fusion_twas/tree/9142723485b38610695cea4e7ebb508945ec006c), using default settings and also including polygenic risk score as a possible model during cross-validation in addition to BLUP, Lasso, and ElasticNet. Variants in the STARNET reference panel were filtered for quality control using PLIN K31 with the options “--maf 1e-10 --hwe 1e-6 midp --geno”. STARNET expression was processed as described in the STARNET pape r6, including probabilistic estimation of expression residuals32 (PEER) covariate correction. Because Fusion, to our knowledge, only supports training on PLINK version 1 hard-call genotype files and not genotype dosages, we trained expression models on only the variants both genotyped in STARNET and either genotyped or imputed in the GWAS, filtering out variants without matching strands between the GWAS and STARNET. Expression models were trained on all remaining variants within 500 kb of a gene’s TSS, using Ensembl v87 TSS annotations for hg19 33. Linkage disequilibrium and total and predicted expression correlations were calculated across individuals in STARNET. Code to replicate the post-TWAS analysis is available at https://github.com/Wainberg/Vulnerabilities_of_TWAS.
Author Contributions
M.W., M.R. and A.K. conceived of the study. M.W. performed analyses. N.S.-A., D.K. and D.G. provided intellectual input. R.E., A.R., T.Q., K.H. and J.B. provided assistance with the STARNET dataset. M.R. and A.K. supervised the study. M.W., M.R. and A.K. wrote the manuscript. All authors reviewed the manuscript.
Competing Financial Interests
The authors declare no competing financial interests.
Supplemental Information
Acknowledgements
We gratefully acknowledge Jonathan Pritchard and Hua Tang for helpful discussions. This work was funded in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) (grant PGSD3-476082-2015 to M.W.), Stanford Bio-X Bowes fellowship (to M.W.) and NIH grant 1DP2OD022870 (to A.K.).