Abstract
Knowledge of trans-acting expression quantitative trait loci (trans-eQTLs) regulating distant target genes can reveal biological mechanisms that link single nucleotide polymorphisms (SNPs) with complex traits. However, identifying trans-eQTLs is challenging because their effect sizes are typically small and simple regression of millions of SNPs against each gene expression imposes a severe multiple testing burden. Here we present Tejaas, an efficient method to discover trans-eQTLs using L2-regularized ‘reverse’ multiple regression of the gene expressions against each SNP. Tejaas aggregates evidence of small trans-effects from all distant target genes simultaneously while being robust against the strong correlation of the gene expressions. Tejaas, coupled with a novel k-nearest neighbors algorithm for unsupervised confounder correction, discovers 18 851 unique trans-eQTLs across 49 tissues from the GTEx (v8) data. They are enriched in several functional signatures, including mediation via proximal genes, chromatin accessibility and occurrence in enhancer and promoter regions. Several trans-eQTLs overlap with disease-associated SNPs and reveal underlying transcriptional regulation mechanism. Tejaas is available at https://github.com/soedinglab/tejaas
Introduction
Over the last decade, genome-wide association studies (GWASs) have identified over 100 000 unique associations between single nucleotide polymorphisms (SNPs) and human traits [1, 2]. However, our understanding of the underlying mechanism through which SNPs influence the risk of complex, non-infectious diseases has not grown in proportion because more than 90% of the SNPs identified by GWAS do not reside in coding regions [3].
Several lines of evidence suggest the involvement of these SNPs in regulation of intermediate cellular phenotypes [4], including gene expression levels [5], chromatin accessibility [6], chromatin state [7] and protein abundance [8]. SNPs that are associated with the gene expression levels are called expression quantitative trait loci (eQTL). For example, non-coding SNPs lying in cell-type specific enhancer regions can alter the expression of target genes [9], which can then increase or decrease disease risk [10].
The eQTLs, which are proximal (< 1Mb) to the regulated genes are called cis-eQTLs and the eQTLs, which regulate distal genes located elsewhere in the genome, are called trans-eQTLs. Heritability estimates from 856 female twins suggest that, on average, cis-eQTLs explain < 40% of the heritable variation in the gene expression of adipose tissue, lymphoblastoid cell line and skin tissue [11]. For African Americans, cis-eQTLs explain only 12 ± 3% of the heritable gene expression variation in the lymphoblastoid cell line [12]. The remaining heritability of gene expression levels is generally attributed to trans-eQTLs [11, 12].
Discovering trans-eQTLs is important not only for explaining the observed gene expression variations, but also for understanding the transcriptional regulation mechanisms, which can then shed light on the aetiology of complex diseases. For example, a recent ATAC-Seq study [13] identified a single SNP that alters the chromatin accessibility across multiple genomic loci including the BLK region, which is associated with multiple autoimmune diseases. In spite of such growing evidence of long-range regulation, the systematic discovery of trans-eQTLs [14, 15] has hardly advanced due to the enormous statistical challenges involved.
Discovery of eQTLs in silico is possible by analyzing paired genotype and gene expression data collected in parallel from many individuals. Simple linear regression is commonly employed on each SNP-gene pair to test for associations. Cis-eQTLs often have a large effect size and the number of association tests are limited to genes in the vicinity (generally < 1Mb) of each SNP. Therefore, relatively modest sample sizes enable their detection using a simple regression method. In contrast, identification of trans-eQTLs remains a major challenge because they (1) tend to have a smaller effect size, (2) impose a severe multiple testing burden due to the need to examine possible association between each gene and all SNPs across the genome, and (3) are frequently tissue- and context-specific. Therefore, several subjective constraints are imposed to reduce false positives while scanning for trans-eQTLs, for instance, working with a reduced set of SNPs that are associated with disease traits or that have known cis effects. Subjective constraints might sacrifice the discovery of many true trans-eQTLs.
Scientists are now trying to use known biological signatures of trans-eQTLs to boost the power to detect them. For example, Rakitsch and Stegle [16] developed a two-stage gene-network linear mixed model (GNetLMM), which implicitly assumed that a trans-eQTL is linked to a target trans-eGene via an intermediate cis-associated gene. This property of trans-eQTLs allowed them to construct local, directed gene-regulatory networks and identify exogenous genes that account for hidden variation in the target trans-eGene. Conditioning the trans-eGene on the exogenous gene improved the power for the trans-eQTL association test. Hore et al. [17] assumed another biological signature: Trans-eQTLs create variation in the expression levels of gene networks across tissues. They decomposed the three-dimensional (individual, genes and tissues) array or tensor using a Variational Bayes (VB) approach with sparsity enforced by a spike-and-slab prior to obtain latent components that represent the major modes of variation in the data. They tested each latent component against genetic variation across the genome to discover underlying QTL effects. The VB optimization results in different latent components in separate runs and the authors ensured robustness by only considering latent components that are persistently found across multiple runs.
In this work, we rely on another commonly considered aspect of trans-eQTLs, that they regulate multiple genes simultaneously [18]. Instead of looking at each SNP-gene pair, we try to find SNPs which regulate tens to hundreds of genes. Earlier, Brynedal et al. [19] used cross-phenotype meta-analysis (CPMA) to find trans-eQTLs based on the same property. They evaluated the p-values for the pairwise linear association of a candidate SNP with all available gene expression levels. For the null SNPs with no trans effect, p-values will follow a uniform distribution and the −log p-values will follow a chi-square distribution. A trans-eQTL will be associated with more genes than expected by chance and the distribution of −log p-values will be overdispersed near zero. The CPMA statistic estimates the overdispersion near zero. However a major limitation of this approach arises due to strong correlations among the gene expression levels, which induce strong correlations among the p-values. This leads to overdispersion near zero by chance, increasing the false positive rate and diminishing the power of the method significantly. For example, let us consider a SNP that changes the expression of tens or hundreds of genes. With increasing strength of the gene expression correlation, the probability of finding similar associations to null SNPs by chance increases and the significance of the truly causal SNP decreases.
In order to circumvent the problem of correlation among the gene expression levels, we use multiple regression in the reverse direction by explaining the minor allele counts using the gene expression levels. Since available eQTL data generally has significantly lower number of samples than the number of expressed genes, we used an L2 regularizer (equivalent to a Gaussian prior) to limit model complexity. Our motivation is that multiple regression using a regularizer should help find the causal genes, with directly affected genes explaining away the effect of the genes which are merely correlated. There are two major benefits of our approach: (1) Although the effect sizes of the trans-eQTLs on individual genes are small, the signal is accumulated over many genes, making them easier to discover. (2) The multiple testing problem is reduced significantly because each SNP is tested only once instead of being tested against every gene separately. We note that the work of Brynedal et al. [19] also has the same benefits but suffers from the correlation among the single SNP-gene p-values.
Results
Methods overview
Tejaas (see URLs) is a new tool for discovering trans-eQTLs. It implements the Reverse Regression (RR-score, qrr) for ranking trans-eQTLs and a non-linear KNN correction for removing confounding effects from the gene expression. We wanted to compare Tejaas with CPMA statistic of Brynedal et al. [19] because we use the same underlying assumption. As there are no currently available software for CPMA, we also implemented the Joint P-value Analysis (JPA-score, qjpa) within Tejaas as an alternative. Both the JPA-score and RR-score are summarized in Fig. 1 and briefly introduced in the ensuing paragraphs. For a detailed discussion, along with explanation on software usage and choosing model parameters, please refer to Supplementary Sec. 2.
JPA evaluates the distribution of p-values of the pairwise linear association of a candidate SNP with all available gene expression levels. The null SNPs (no trans-effect) will have a uniform distribution of p-values, while trans-eQTLs will be associated with more genes than expected by chance, leading to overdispersion near zero. We defined the JPA-score (qjpa) as a statistic which estimates whether the distribution of p-values is significantly overdispersed near zero.
Reverse Regression (RR) performs a multiple linear regression using expression levels of all genes to explain the genotype of a candidate SNP. In contrast to conventional methods, the direction of the regression is reversed, with the gene expressions as explanatory variables. In brief, let x denote the vector of scaled and centered minor allele counts of a SNP for N samples and Y be the G× N matrix of preprocessed expression levels for G genes. We model x with a normal distribution whose mean depends linearly on the gene expression through a vector of regression coefficients β:
Generally, the number of explanatory variables (genes) is much larger than the number of samples (G ≫ N) in currently available eQTL data sets. To avoid overtraining, we introduce a normal prior on β, with mean 0 and variance γ2,
This L2 regularization pushes the effect size of non-target genes towards zero. Ideally, a spike-and-slab prior should work better than the current model but is analytically intractable and is too slow to approximate. We calculated the significance of the trans-eQTL model (β ≠ 0) compared to the null model (β = 0) using Bayes theorem to define the RR-score (qrr),
For each SNP, the null distribution of qrr can be obtained by randomly permuting the sample labels of the genotype multiple times. Note that this null distribution will depend on the minor allele frequency and preprocessing of the SNPs but it is computationally infeasible to obtain the null distribution empirically for each SNP independently. We could, however, analytically obtain the expectation and variance of qrr under this permuted null model. Assuming that the null distribution is Gaussian, we calculated a p-value to get the significance of any observed qrr.
Our method requires the gene expression matrix Y to have full column rank. Any covariate correction method involving linear regression would also reduce the column rank of Y and cannot be used for preprocessing the gene expression for calculating qrr. Therefore, we developed an unsupervised non-linear correction using k-nearest neighbors, which we call KNN correction (Supplementary Sec. 3.2) to remove confounding effects.
Simulation studies
We ran simulations to benchmark Tejaas against existing methods, to compare different preprocessing methods for removing confounders and to estimate the model parameters. Several software packages exist for finding trans-eQTLs using single SNP-gene regression and we used MatrixEQTL [20] as a representative of these methods. As an alternative for CPMA, we used our JPA implementation in Tejaas, henceforth referred to as JPA (~ CPMA).
For the simulations, we used the strategy of Hore et al. [17], the details of which are discussed in the Supplementary Sec. 4. In brief, we sampled I = 12 639 SNPs from the real genotype of the Genotype Tissue Expression (GTEx) project to retain the complexity of real data. We simulated the expression levels for G = 12 639 genes, containing non-genetic signals (background correlation and confounding factors) and genetic signals (cis and trans effects). The background correlation of the gene expression was obtained with same covariance structure as that of the artery-aorta tissue of the GTEx project. The strength for confounder effects, cis effects and trans effects were obtained from Hore et al., while we additionally introduced genotype principal components as confounders to simulate population substructure.
For every simulation, we randomly selected 800 SNPs to be cis-eQTLs, out of which 30 SNPs were also trans-eQTLs [17]. The trans-eQTLs regulated the nearest gene via cis effect. This cis target gene was considered a transcription factor (TF) and regulated multiple target genes downstream (excluding other TFs). Let Mtrans be the number of target genes regulated by each TF and |βgj| ~ Gamma (ψtrans, 0.02) be the effect size of the jth TF on the gth target gene.
In Fig. 2a, we show the results for different covariate correction strategies: (1) without any covariate correction (denoted as ‘None’), (2) the most commonly used confounder correction method using residuals after linear regression of the gene expression with known covariates (denoted as ‘CCLM’), and (3) KNN correction with 30 nearest neighbors. The GTEx consortium recommended using inverse normal transformation of the gene expression data before applying covariate correction. Hence, the CCLM correction was done on inverse normal transformed gene expression data. The KNN correction was applied directly on the gene expression data because we found that trans-eQTL signals are removed if KNN correction is applied on inverse normal transformed data (Supplementary Fig. S4). We then applied MatrixEQTL, JPA (∼CPMA) and Tejaas (qrr) to find trans-eQTLs from the corrected gene expressions. The ranking with qrr depends on the parameter γ and we set it empirically at γ = 0.2 (Supplementary Fig. S3). For Tejaas, we used the cis-masking option (Supplementary Sec. 2.8) in our software, i.e., removed all genes located within ±1Mb of each SNP to avoid the strong cis-eQTL signals. For each preprocessing option, we performed 20 simulation replicates. We compared the ranking of trans-eQTLs using the partial area under the ROC curve (pAUC) where the false positive rate (FPR) ≤ 0.1. This is because we are only interested in the top predictions.
Our results show that the KNN correction is the most effective covariate correction for Tejaas. Unlike simulations, in real data we do not have exact knowledge of the confounders. Hence, it is encouraging to note that the KNN correction can remove the background noise in an unsupervised fashion. Covariate correction using linear regression (CCLM) is effective for traditional SNP-gene pair analysis (if the true covariates are known) but unfortunately it reduces the rank of the gene expression matrix and breaks down the Tejaas ranking (Supplementary Sec. 2.5 and Fig. S2).
In Fig. 2b, we compared different methods for discovering trans-eQTLs at different signal strengths. The varying signal strength was simulated by tuning (1) the number of target genes (Mtrans) of the TF linked to the trans-eQTL and (2) the effect size of the TF on the target genes, which is sampled from a Gamma (ψtrans, 0.02) distribution with mean 〈βgj〉 = 0.02 ψtrans. For discovering trans-eQTLs, Tejaas used KNN correction with K = 30 directly on the gene expression and qrr with γ = 0.2. For MatrixEQTL and JPA (~ CPMA), all known covariates introduced in the previous simulation steps were corrected out using CCLM on the inverse normal transformed gene expression. We compared the accuracy of the methods using the partial area under the ROC curve (pAUC) where the false positive rate (FPR) is ≤ 0.1. We find that JPA (~ CPMA) has slightly lower pAUC than MatrixEQTL, while Tejaas performs best with significantly higher pAUC at all values of Mtrans and 〈βgj〉, even without exact knowledge of covariates. At very low signals, for example with mean effect size of 0.1 and 50 target genes, the ranking performance of all methods are significantly reduced and we would need a larger sample size for efficient trans-eQTL discovery. However, Tejaas improves more rapidly compared to JPA (~ CPMA) or MatrixEQTL with increasing signal strength of the trans-eQTLs.
Genotype Tissue Expression trans-eQTL analysis
To illustrate Tejaas in a relevant data set, we analyzed trans-eQTLs across 49 human tissues using data from the Genotype Tissue Expression (GTEx) project [21–23]. The GTEx project aims to provide insights into mechanisms of gene regulation by collecting gene expression measurements from multiple tissues in human donors. The latest analysis on the GTEx v8 release yielded 143 trans-eQTLs across all tissues, with 121 linked to protein coding genes and 22 linked to lincRNA [24]. Of these trans-eQTLs, 47 trans-eQTLs were observed in testis alone.
We used the GTEx genotype and gene expression data provided in dbGaP (accession phs000424) for our analysis. Details of the preprocessing steps are discussed in Supplementary Sec. 5. In brief, we converted the gene expression read counts obtained from phASER to standardized TPMs (Transcripts per Millions) for all the 49 tissues and used KNN correction with 30 nearest neighbors to remove confounders. We then estimated the optimal values of γ for each tissue. and broadly classified the tissues into two groups: (a) 45 tissues analyzed with γ = 0.1 and (b) 4 tissues analyzed with γ = 0.006 (Supplementary Fig. S8). For each SNP, we removed all corresponding genes from the vicinity (± 1Mb) to avoid the relatively stronger cis-eQTL signals inflating qrr. We predicted all SNPs with p < 5 × 10−8 as trans-eQTLs for further analyses. To avoid double-counting trans-eQTLs that are in LD with one another, we pruned the list of trans-eQTLs by retaining only the best trans-eQTL (with lowest p-value) in each independent LD region defined by r2 > 0.5 in 200kb windows.
We discovered 16 929 unique lead trans-eQTLs across all GTEx tissues except brain (Fig. 3a) and 1 922 unique lead trans-eQTLs in brain tissues. Consistent with our simulation results, Tejaas is able to discover more trans-eQTLs than traditional methods in GTEx. We find that the trans-eQTLs are tissue-specific, with 77.3% of the trans-eQTLs being discovered in single tissues (Fig. 3b). The number of trans-eQTLs discovered increases exponentially with the number of samples (Fig. 3c) for N > 250, indicating that larger studies would be able to discover more trans-eQTLs. In Fig. 3d, we show that the fraction of trans-eQTLs with a cis-effect vary proportionally with the total number of trans-eQTLs in each tissue, implying that a significant proportion of trans-eQTLs act via cis-eGenes. To note the results of Tejaas at single-tissue level, we show the quantile-quantile plot (Fig. 3e) and Manhattan plot (Fig. 3f) for two representative tissues, namely artery-aorta (ARTAORT) and EBV-transformed lymphocytes (LCL).
Functional enrichment analyses of trans-eQTLs
Enrichment of the newly discovered trans-eQTLs in functionally relevant regulatory annotation of the genome provides insight into the underlying biological mechanisms of the trans-eQTLs. Given the lack of experimental validation, the biological relevance of the trans-eQTLs suggested by their functional enrichment in several diverse, independent experiments is indicative of them being true positives. The enrichment of the functional features were measured in comparison to a random set of SNPs obtained by sampling from the GTEx genotype (Supplementary Sec. 5.6).
A possible mechanism of trans-eQTLs involves mediation via cis-eQTLs, where the cis-eGene (for example, some known transcription factor) might regulate distant genes. Indeed, we observed a significant enrichment of trans-eQTLs being also cis-eQTLs to proximal genes in the same tissue (Fig. 4a), although our trans-eQTLs were discovered excluding all genes in the vicinity of that SNP. We also observed that the cis-mediator genes have a higher proportion of being protein-coding than the background distribution of GTEx cis-eGenes (Fig. 4d). For this analysis, the cis-eQTLs and their target genes (mediator genes for trans-eQTLs) were obtained from the GTEx portal. Although we rarely found significant enrichment of transcription factors (TFs) among the cis-mediator genes, trans-eQTLs are enriched in proximal locations (< 100Kb) of TFs (Fig. 4a).
Reporter assay QTLs (raQTLs) are SNPs that alter the activity of putative regulatory elements (enhancers and promoters), partially in a cell-type-specific manner. In Fig. 4a, we show the enrichment of the trans-eQTLs in two sets of raQTLs for two cell types, K562 and HepG2. The raQTL data was obtained from the survey of regulatory elements (SuRE) [26]. K562 is an erythroleukemia cell line with strong similarities to whole blood tissue and HepG2 cells are derived from hepatocellular carcinoma with similarities to liver tissue.
DNase I hypersensitive sites (DHSs) are accessible regions of the chromatin, often considered as markers in the genome for regulatory elements (promoters, enhancers, insulators and other control regions) and are functionally associated with transcriptional activity. We found that the trans-eQTLs occur within these regions more often than expected by chance, showing significant DHS enrichment in most tissues (4b).
With well-powered trans-eQTL mapping by Tejaas, it also becomes possible to describe and disentangle tissue-specific enrichments. Using chromatin state predictions from a set of tissues from the Roadmap Epigenomics project [28], we show that the trans-eQTLs are enriched in enhancer, bivalent and repressed polycomb regions of their matched tissues (Fig. 4c). They are depleted in the inaccessible heterochromatin regions for most of the tissues while they show no enrichment or depletion in inactive quiescent regions.
Association with complex diseases
We investigated the overlap of the novel trans-eQTLs discovered by Tejaas with GWAS variants of complex traits to find transcriptional regulatory mechanisms through which SNPs affect complex diseases. We used the GWAS summary statistics from 87 traits harmonized and imputed to GTEx v8 variants with MAF > 0.01 using only European samples by Barbeira et al. [29]. These 87 traits were broadly classified into a range of disease categories. For example, the category “Immune” contained all studies related to immune diseases such as asthma or psoriasis.
We calculated enrichments for each tissue in each individual trait (Supplementary Sec. 6 and Fig. S13) and each disease category (Fig. 5). We considered all SNPs with imputed p < 10−7 to be a significant GWAS hit for the corresponding study or disease category. There are several tissue – disease category pairs that have a clear biological relationship. For example, trans-eQTLs discovered in whole blood are 1.3-fold enriched (p = 0.0014) in the disease category of “Blood”, which contains different studies investigating varying blood cell counts such as those of red blood cells and lymphocytes. Trans-eQTLs in whole blood and heart atrial appendage are 1.7 and 7-fold enriched in cardiometabolic traits, with p = 0.01 and p = 0.008 respectively. The cardiometabolic disease category includes studies on cholesterol levels, blood pressure and coronary artery disease, among others.
GWAS-associated trans-eQTLs can provide insight to previously unknown disease pathways. For example, three of our trans-eQTLs rs7864322, rs4297160 and rs10983975 (all in chr9q22) discovered in thyroid tissue were found to be associated with hypothyroidism. These trans-eQTLs control the expression of nearby PTCSC2 lncRNA, a thyroid-specific regulator, and FOXE1 gene, which is known to play an important role in thyroid development. The distant target genes were enriched in the ‘thyroid hormone signaling’ pathway, indicating possible disease mechanism. For instance, the DIO1 gene in chr1 targeted by rs4297160 plays an important role in the production of T3, which is the main mediator of thyroid action.
Discussion
We developed Tejaas to increase the power for detecting trans-eQTLs by using two key innovations: the reverse regression and the KNN correction. We created a fast, parallel open-source software using these concepts, validated the method in a semi-realistic synthetic data and demonstrated its usefulness on a substantive real data set from the GTEx consortium to discover trans-eQTLs with clear biological and statistical significance. A marginal analysis of single SNP-gene pair or a method like CPMA would not have discovered those trans-eQTLs because of the low effect size of the trans-eQTLs on each single target gene and the strong correlated noise of the gene expression levels.
Tejaas complements other eQTL pipelines that focus on analyzing single SNP-gene pairs. Tejaas excels in discovering trans-eQTLs with multiple small effects by accumulating signals from many genes, which are regulated by that trans-eQTL, while other methods excel in discovering trans-eQTLs with a single large effect on a distant gene. Hence, we expect Tejaas and other existing methods to be complementary rather than overlapping.
Distinguishing causation from correlation is a long-standing and well-studied problem in statistics. In human genetics, the low number of samples compared to the explanatory variables (in our case, the number of genes) additionally requires controlling for sparsity. One widely accepted Bayesian approach is to use multiple regression with a sparsity-enforcing prior, for example the spike-and-slab prior, which has been previously used with success in different contexts such as fine-mapping in GWAS [30, 31]. Reverse regression controls for the correlation among the gene expression levels by using them as explanatory variables in a multiple regression setting. However, due to computational limitations, we had to use a normal prior which reduces model complexity but cannot enforce sparsity. In Tejaas, the standard deviation γ of the normal prior is not learnt from the data, but is set empirically. As expected, a high value of γ (> 0.2) would be too wide to reduce the model complexity and lead to overtraining. A low value of γ (< 0.001), on the other hand, will be too restrictive for the model and lead to false signals even with chance correlations of a single gene with a genotype. We encourage future users to make informed decision on the choice of γ for every gene expression profile, for example by first simulating a null set of qrr on a simulated genotype and calculating the non-Gaussian parameter as explained in the supplementary text for the GTEx gene expressions.
The current method can be improved by introducing sparsity-enforcing priors on the effect size of the genes. It will not only improve accuracy for finding trans-eQTLs but also remove the dependency on γ. Additionally, it will enable robust variable selection, giving a more refined selection of trans-eQTL target genes. This could replace the current two-stage procedure for finding target genes with single SNP-gene pairwise regression after the trans-eQTLs are discovered by reverse regression. In spite of such multiple anticipated benefits, it remains technically challenging to implement such a method for large data sets.
Although reverse regression proved to be a powerful approach for finding trans-eQTLs, a major impediment was that the gene expression could not be corrected for confounders with the standard approach of regressing the known covariates or hidden PEER factors [32] (Supplementary Sec. 3.1). Hence, we proposed the KNN correction, a simple but powerful method for unsupervised confounder correction. Indeed, it corrected for most of the known covariates in GTEx (Supplementary Fig. S7). We expect the KNN correction to become an important tool for confounder correction in future eQTL pipelines.
Robust identification of trans-eQTLs and underlying disease pathways is crucial to further our understanding of genetics and its implication in complex diseases. Alongside larger studies with more samples, this will inevitably require more powerful methods for analyses. Tejaas represents a major step towards this goal.
Author Contributions
J.S. conceptualized the problem, acquired funding, and supervised the project. J.S. designed the reverse regression with comments from S.B., F.L.S., A.K. and R.M.; S.B. and F.L.S. wrote the software with assistance from A.K. (JPA and KNN) and R.M. (RR). S.B. designed and performed the simulations. F.L.S. performed the GTEx preprocessing. F.L.S. and S.B. analyzed the GTEx data for discovering trans-eQTLs and assessing their functional enrichments. F.L.S. checked the contribution of known covariates in KNN and the effect of cross-mappable genes. K.E.D. analyzed the GWAS data. R.N. contributed to the initial phase of the project setup. S.B. wrote the original manuscript with assistance from F.L.S.; J.S., S.B. and F.L.S. reviewed and edited the manuscript.
Competing Interests
The authors declare no competing interests.
URLs
Tejaas: https://github.com/soedinglab/tejaas
GTEx trans-eQTLs: http://www.user.gwdg.de/~compbiol/tejaas/2020_03/gtex_v8_trans_eqtls_summary
Additional Information
Supplementary Text and Figures
Supporting information available for download.
Acknowledgments
This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the e:Med research and funding concept (grants e:AtheroSysMed 01ZX1313A-2014). We thank Markus Scholz for helpful communications and alerting us to the problem of strong cis-eQTLs, which led us to use cis-masking. We thank Hae Kyung Im and Alvaro Barbeira for email communications and kindly providing us the GWAS summary statistics from 87 traits harmonized and imputed to GTEx v8 variants. We thank our colleagues, especially Eli Levy Karin, Christian Roth, Wanwan Ge, Salma Sohrabi-Jahromi, Milot Mirdita and Ruoshi Zhang for helpful discussions and feedback. We used the data generated by the GTEx Consortium for the trans-eQTL analysis (accession phs000424). We thank the participants of the GTEx Consortium as well as all the research staff who worked on the data collection.