Analyzing and Reconciling Colocalization and Transcriptome-wide Association Studies from the Perspective of Inferential Reproducibility

Transcriptome-wide association studies and colocalization analysis are popular computational approaches for integrating genetic association data from molecular and complex traits. They show the unique ability to go beyond variant-level genetic association evidence and implicate critical functional units, e.g., genes, in disease etiology. However, in practice, when the two approaches are applied to the same molecular and complex trait data, the inference results can be markedly different. This paper systematically investigates the inferential reproducibility between the two approaches through theoretical derivation, numerical experiments, and analyses of 4 complex trait GWAS and GTEx eQTL data. We identify two classes of inconsistent inference results. We find that the first class of inconsistent results may suggest an interesting biological phenomenon, i.e., horizontal pleiotropy; thus, the two approaches are truly complementary. The inconsistency in the second class can be understood and effectively reconciled. To this end, we propose a novel approach for locus-level colocalization analysis. We demonstrate that the joint TWAS and locus-level colocalization analysis improves specificity and sensitivity for implicating biological-relevant genes.


Introduction
With the rapid advancements of sequencing technologies, genetic association analyses have been routinely performed and made significant contributions to gain insights into the roles of genetic variants in complex diseases. Due to recent expansions in large-scale joint genotyping and phenotyping of molecular traits, integrative genetic association analysis has emerged as a tool to study the biological basis of complex diseases. Integrative analyses have the unique ability to interpret genetic associations beyond individual mutations and link complex diseases to functional genomic units, e.g., genes, metabolites, and proteins [1,2,3]. Discoveries from integrative genetic analyses have enabled the discovery of novel drug targets [4], hence improved treatments for diseases.
In this paper, our discussions focus on two prevailing types of integrative genetic association analyses: transcriptome-wide association studies (TWAS) and colocalization analysis. Both approaches are widely applied to integrating results from expression quantitative trait loci (eQTL) mapping and complex traits GWAS and show promises in implicating potential causal genes for complex diseases [5,6]. The analytical goal of TWAS is to test associations between a complex trait of interest and genetically predicted gene expression levels (that are constructed from eQTL information) [7,8,9,10]. More broadly, it connects to the causal inference framework of instrumental variable analysis: given an established TWAS association, the causal effect from a target gene to the complex trait can be estimated [10,11]. Nevertheless, our discussions focus on the testing stage, which we refer to as TWAS scan henceforth. Colocalization analysis aims to identify overlapping causal genetic variants for both molecular (e.g., gene expression) and complex traits [12,13,14,15]. A colocalized genetic variant in a particular cis-gene region implies that a single mutation is responsible for variations in both molecular and complex traits, thus establishing an intuitive link between the traits. A more detailed review of both approaches is provided in the Method section.
Our primary motivation is to investigate the consistency and inconsistency patterns between the inference results by TWAS scan and colocalization analysis in practical settings. Such patterns are examples of inferential reproducibility -one of the three modes defined in the lexicon of reproducibility by Goodman et al. [16]. Narrowly speaking, inferential reproducibility refers to the consistency of inference results when different analytical approaches are applied to the same dataset. In practice, both TWAS scan and colocalization analysis are often applied to analyze the same eQTL and GWAS data combinations. However, the implicated genes and the number of discoveries from the two analyses are often markedly different, making biological interpretation and design of follow-up studies challenging. Thus, a systematic investigation to dissect and understand these practical differences is warranted.
Under the settings of inferential reproducibility, the overlapping findings from all approaches are often considered conceptual replications for individual methods, with enhanced validity. Nevertheless, the emphasis of the inferential reproducibility analysis is typically placed on the difference sets. We focus on examining the implicated genes reported only by either colocalization or TWAS analysis in our specific application context. Unlike its method and results reproducibility counterparts, inferential reproducibility does not expect or even encourage all methods to yield identical results. On the contrary, the differences driven by different analytical and operational assumptions are largely anticipated [16,17]. The goal of the analysis is to quantify, understand, and interpret these differences properly. Our study aims to gain insights into how different analytical and operating assumptions of the computational procedures, combined with specific data characteristics, lead to the different implicated genes connecting molecular and complex traits. Specifically, we show that not all the different results in the integrative genetic association analysis are equivalent: in one scenario, they can be reconciled; in the other, they are truly complementary.
Based on our analysis of inferential reproducibility and better facilitate connecting the reconcilable set of genes implicated by the two integrative analysis approaches, we propose a novel locus-level colocalization analysis method derived from the same probabilistic modeling framework of fastENLOC. Compared to the existing locus-level colocalization methods in the literature, e.g., RTC [25], JLIM [26], the candidate loci selected for analysis are carefully constructed from the state-of-the-art Bayesian multi-SNP fine-mapping algorithms, and the inference results show much-improved resolution and specificity. Thus, it can serve as a complementary approach to variant-level colocalization, overcoming some of its intrinsic power limitations based on the currently available data [15]. This method is implemented in the software package, fastENLOC (v2.0), and freely available at https://github.com/xqwen/fastenloc/.

Comparing findings from TWAS scan and colocalization analysis
For this comparative analysis, we select four complex traits, including standing heights from the UK Biobank, coronary artery disease (CAD) status from CARDioGRAM Consortium, and highdensity lipoprotein (HDL) and low-density lipoprotein (LDL) measurements from Global Lipids Genetic Consortium (GLGC). These four traits are representative of a range of quantitative and discrete complex traits measured at organismal and molecular levels. We perform both TWAS scan and variant-level colocalization analyses for the selected GWAS traits using the multi-tissue eQTL data from the GTEx project. To enable direct comparisons, we apply the integrative approaches (PTWAS for TWAS scan and fastENLOC for colocalization analysis) utilizing the same set of cis-eQTL annotations derived from the multi-SNP fine-mapping analysis method, DAP-G [14].
Our comparison focuses on the gene-level quantification for each trait-tissue-gene combination.
The PTWAS scan analysis summarizes the evidence by a p-value testing the correlation between a gene's genetically "predicted" gene expression levels in a specific tissue and the complex trait of interest. In the colocalization analysis, fastENLOC reports a regional colocalization probability (RCP) for each independent eQTL signal of a gene in a specific tissue (also known as a signal cluster) and an independent GWAS hit for a trait of interest. We subsequently compute a gene-level variant colocalization probability (GRCP) by where the set G t represents the set of all eQTL signal clusters within the cis-region of a genetissue pair. The GRCP represents the probability that the gene of interest harbors at least one colocalized causal variant for a trait-tissue-gene combination.
The rank correlations between the -log10 PTWAS p-values and the corresponding GRCPs are quite modest, with mean = 0.223 ( median = 0.085) across all genes, tissues, and traits. Despite the correlations being significantly different from 0, the low levels of rank consistency seemingly indicate a high degree of discordance (in ranking important genes) between the two types of integrative analyses by the standards of inferential reproducibility.
Next, we inspect the overlapping between noteworthy genes implicated by the two approaches (i.e., the set of conceptually replicated genes). Following [1,5,18,6], we consider that a gene is noteworthy in the colcoalization analysis if its GRCP exceeds the probability threshold of 0.50 for a given trait-tissue pair. In the TWAS analysis, a gene is deemed noteworthy if it is rejected at the FDR 5% level in a trait-tissue pair (FDR controls are performed using the qvalue method based on PTWAS p-values). The full result of this analysis is summarized in Table 1.
Marginally, the TWAS analysis implicates many more noteworthy genes than the colocalization analysis (128,130 vs. 2,337) across 49 × 4 = 196 trait-tissue pairs. Among the two sets, 2054 genes overlap, corresponding to 88% of the noteworthy colocalization genes and 1.6% of the noteworthy TWAS genes, respectively. This finding suggests that most colocalization genes also show strong evidence of TWAS associations, whereas the vast majority of TWAS genes generally lack strong evidence for variant-level colocalizations.
To better understand the discrepancy between the TWAS and colocalization analysis, we further investigate the two difference sets of the noteworthy genes in greater detail.

Strong colocalization and weak TWAS signals
There are 283 trait-tissue-gene combinations that show strong variant-level colocalization but weak TWAS association evidence. A small subset of these findings can be attributed to the threshold effect of TWAS analysis. That is, if we re-define noteworthy TWAS genes by (slightly) relaxing the FDR control level, these combinations will be re-classified in the overlapping set.
However, the majority of the combinations in this set show compatibility with the null hypothesis of the TWAS scan, asserting that genetically predicted gene expression levels are uncorrelated with complex traits of interest. Upon further inspection, we find that most of these instances can be explained by the phenomenon known as "horizontal pleiotropy".
To illustrate, we take one In summary, we find that most instances in this difference set of implicated genes represent a scenario where the two integrative analysis approaches can be complementary. Additionally, most implicated genes in this set are unlikely direct causal genes to the complex traits, and their relevance to the complex traits could potentially be explained by the phenomenon of horizontal pleiotropy.

Strong TWAS and weak colocalization signals
Strong TWAS and weak colocalization signals account for most discrepancies between colocalization and TWAS analyses. This phenomenon is largely anticipated by different hypotheses employed by the two approaches. A straightforward analytical derivation shows that discovering a TWAS signal does not imply the existence of variant-level colocalization. Instead, the necessary conditions that drive TWAS signals are much relaxed and can be precisely summarized in the following proposition. Proof. Let sets E and G denote the collections of inferred eQTLs (of the target gene) and the causal GWAS hits, respectively. According to the linear model assumption, the genotypepredicted gene expression can be written aŝ Without loss of generality, assuming the complex trait of interest is quantitative and its genetic association can be described by the following linear model, i.e., A TWAS scan procedure examines the correlation betweenŷ e and y c and tests the null hypothesis, H 0 : Corr(ŷ e , y c ) = 0. It follows that Therefore, Corr(ŷ e , y c ) = 0 =⇒ ∃ a pair of (i, j), such that α jβi Cov(g i , g j ) = 0.
Note that this simple proposition states a necessary but not sufficient condition for the existence of TWAS scan signals. Specifically, the TWAS signal is driven by the sum of all non-zero α jβi Cov(g i , g j ) pairs. As illustrated in the previous section, it is statistically possible that multiple terms with different signs can cancel each other out. Additionally, the linearity of the prediction model assumption covers almost all popular TWAS approaches, but it can be relaxed to allow non-linear prediction functions. In the expanded prediction function family, Equation (2) becomes a first-order approximation.
The proposition also implies a direct connection between TWAS scan and variant-level colocalization analysis. By definition, a variant-level colocalization signal satisfies the condition α jβi Cov(g i , g j ) = 0 for some i = j. A colocalized genetic variant of both molecular and complex traits should also drive a TWAS scan signal in the absence of the cancellation phenomenon. This corollary explains our observation that most genes implicated by the colocalization analysis are also implicated by the TWAS scan.
Next, we consider the scenario that only a single term in (3) drives a TWAS signal by assuming the absence of allelic heterogeneity for the target gene. It becomes apparent that the strength of a TWAS signal reflects the joint effect of α j ,β i , and the LD between the two variants (i and j).
It further implies that even when the LD between a causal eQTL and a GWAS hit is weak, the strength of the resulting TWAS signal can be compensated by relatively strong genetic effects, α j and/or β i . This result seemingly explains a common pattern in practical TWAS scan results, where noteworthy signals tend to cluster around some of the strongest GWAS hits. Mancuso et al. [19] also discussed a similar pattern of clustered TWAS scan signals due to LD. However, our derivation does not need to assume any true causal relationship between genes and the complex trait of interest (i.e., the phenomenon can exist without any causal genes). We conduct simulations to demonstrate that this single GWAS signal (p-value = 2.4 × 10 −256 ) drives the entire cluster of TWAS signals at this locus. We utilize the real eQTL genotype and phenotype data from GTEx for 39 neighboring genes at this locus and independently simulate the phenotype data of a complex trait with rs2871960 being assumed the only causal SNP. The TWAS scan of the simulated data replicates the pattern observed in the real data, with a third of neighboring genes showing significant TWAS associations. To confirm that the sole GWAS association induces all TWAS signals, we repeat the analysis using the residuals of simulated complex trait phenotype by regressing out the genotypes of SNP rs2871960 ( Figure 3).
Although the LD hitch-hiking TWAS signals are not false positives from the perspective of statistical associations, the abundance of signals of these sorts should caution the biological interpretations. For example, it would be mistaken to regard all LD hitch-hiking TWAS signals as independent candidates of causal genes for the trait of interest. On this point, we are in full agreement with Mancuso et al. [19] that TWAS scan results need to be further processed, better understood, and carefully reported.
In summary, our statistical analysis reveals some main characteristics of TWAS scan signals, which do not require variant-level colocalization and tend to be correlated. These characteristics help explain the difference set of strong TWAS but weak colocalization signals. Regarding this difference set of implicated genes, we find that the two approaches can be reconciled by i) redefining the standard for colocalization (i.e., adjusting colocalization analysis); and ii) removing non-independent findings from TWAS scan reporting (i.e., adjusting TWAS scan analysis).

Locus-level colocalization: a reconciliation
To reconcile the results between TWAS scan and colocalization analysis, especially targeting the genes showing strong TWAS but weak variant-level colocalization signals, we propose a novel locus-level colocalization analysis method.
From the perspective of variant-level colocalization analysis, the lack of statistical power is a primary limiting factor in analyzing the currently available molecular and complex trait data.
Many authors [23,24] have shown that it is often difficult, if not impossible, to pinpoint the causal genetic associations with the limited sample sizes employed in current molecular QTL mapping studies. Furthermore, with limited sample size, the lead SNPs, whether quantified by Bayesian or frequentist approaches, are often not the true causal SNPs. In many cases, we are relatively certain of a genuine association signal among a group of tightly linked variants (e.g., variants within a signal cluster), however uncertain about the exact variants. Hukku et al. [15] show that such uncertainty causes a large class of false-negative findings (i.e., class II FNs) in variant-level colocalization analysis. In their simulation studies based on realistic settings, there are more class II FNs than the identified true findings. Acknowledging this fundamental limitation, we propose the locus-level colcoalization approach aiming to identify the co-existence of casual eQTLs and GWAS hits at an appropriately decreased genomic resolution. The key rationale is that even when locus-level colocalizations do not show strong evidence of variant-level colocalizations in the current data; they may very well be proven class II FNs by future experiments with improved statistical power.
From the perspective of the TWAS scan, the critical issue for reconciliation is to filter out redundant representations due to LD and report only independent and biologically relevant signals.
One possible solution is to require causal SNPs for molecular and complex traits colocalized at a small enough genomic region, such that not only is Cov(g i , g j ) in (3) automatically constrained but the interpretation of the TWAS signal becomes natural. This idea is not new. Many authors [18,15] have proposed to use variant-level colocalizations as a pre-requisite for following-up TWAS scan results. These proposals are also supported by a class of probabilistic generative models that connect TWAS scan and colocalization analysis ( see Eqn (20) in the Method section). Here, we relax the colocalization standard, considering the limited practical power in identifying variant-level colocalizations.
The key to the proposed approach lies in the definition of a genomic locus. As we hope that the variants within a locus represent causal variants for respective traits, a natural solution is to borrow the concept of credible sets/signal clusters from the literature of genetic fine-mapping analysis [20], [14]. The genetic variants classified into a signal cluster are in high LD. More importantly, members of the same signal cluster can be used to interpret the same underlying association signal almost interchangeably with only slight quantitative differences (in model likelihood/posterior probabilities) [20]. By these two properties, at most one variant within a signal cluster represents the true association signal. The candidate loci constructed from signal clusters are also practically small, typically including only a few SNPs in LD. For example, the fine-mapping of the GTEx whole blood eQTL by DAP-G yields 113,318 signal clusters with coverage probability >= 0.95 (i.e., the 95% Bayesian credible sets). On average, each signal cluster contains only 16 SNPs (median = 8) with the minimum pairwise R 2 > 0.5.
The computation of locus-level colocalization probability given pre-defined candidate genomic loci is similar to the analysis of variant-level colocalization. We provide the detailed derivation and description in the Method section. It is worth pointing out that the locus-level colocalization probability (LCP) is fundamentally different from the regional (variant) colocalization probability (RCP). RCP quantifies the probability of a genomic region containing a single colocalized variant.
In general, it follows that LCP ≥ RCP for any given loci. For each gene, we define a gene-level colocalization probability (GLCP) for the locus-level analysis, i.e., To interpret significant TWAS scan results using the locus-level colocalization analysis, we require that the GLCPs of candidate causal genes exceed a threshold (pre-defined or by FDR control).

Simulation study
We design and conduct simulation experiments to benchmark the proposed locus-level colocaliza- We analyze the simulated datasets using TWAS scan, variant-, and locus-level colocalization methods. We apply two TWAS scan approaches: PTWAS and the single-SNP TWAS method

Re-analysis of GTEx and GWAS data using locus-level colocalizations
Finally, we re-analyze the 4 complex traits GWAS and the GTEx eQTL data using the proposed locus-level colocalization approach. We again ensure that all examined methods utilize identical input information. Detailed comparisons between PTWAS, variant-, and locus-level colocalization analyses are provided in Table 1. Across 196 trait-tissue pairs, the locus-level colocalization identifies 7,516 genes with GLCP > 0.50, representing a 2.2 fold increase of discoveries than the variant-level colocalization analysis. A remarkable 83% of the high GLCP genes overlap with the significant PTWAS scan genes. We consider this set of 6,255 PTWAS genes filtered by locus-level colocalization analysis as high-priority genes for further validation.
We first inspect the set of 1,261 genes that show high GLCP but do not pass FDR control at 5% level in PTWAS scan analysis. Similar to the weak TWAS and strong colocalization signals implicated by variant-level colocalization analysis, the is set of genes also shows excessive heterogeneity in estimated gene-to-trait effects by multiple independent eQTL signals ( Figure   5), suggesting potential horizontal pleiotropy. Figure 5 also indicates an increase of genes whose PTWAS scan signals are close but do not pass the FDR control threshold.
In the set of 6,255 filtered PTWAS significant genes by the locus-level colocalization analysis, we find many new discoveries have documented biological relevance to the corresponding complex traits in the literature. We first investigate the Online Mendelian Inheritance in Man (OMIM) [21] database to find associated genes. For this investigation, we select phenotype OMIM IDs mapped to any of our four traits and subsequently identify all confirmed gene associations with those phenotypes. In total, we extract 12 validated genes from the OMIM across the 4 analyzed complex traits, with 6 in our result by the joint PTWAS and locus-level colocalization analysis (Supplementary Table S1). It is worth noting that the joint SNP-level colocalization and PTWAS analysis only identifies 1 out of 12 genes.
An illustrative example is gene LPL (Ensembl id: ENSG00000175445) with HDL. Within the adipose subcutaneous tissue, there is a very strong TWAS signal (p-value = 9.7 × 10 −135 ) and substantial evidence for a locus-level colocalization (GLCP = 0.97). Meanwhile, the GRCP for this gene is only 0.07, reinforcing the added utility of locus-level colocalization.
Additionally, we utilize one of the largest gene-disease association repositories, DisGeNET [22], to inspect the biological relevance of CAD-associated genes implicated by the proposed jointanalysis. DisGeNET comprehensively integrates and ranks multiple types of reliable gene-disease association evidence from a catalog of source databases. We pull out a list of 65 high-confident CAD-relevant genes, whose DisGenNET scores are all greater than the built-in default selection threshold of 0.3. In our integrative analyses, the PTWAS scan identifies 510 unique genes across 49 tissues. A subset of 172 unique genes passes the additional filtering of the locus-level colocalization analysis. We find that 23 of the 172 genes appear in the DisGenNET CAD gene list (Supplementary Table S2). In comparison, the remaining 338 genes that lack locus-level colocalization evidence only contribute 7 additional hits on the DisGenNET CAD list. A Fisher's exact test indicates that the enrichment of CAD-relevant genes implicated by the proposed jointanalysis is statistically highly significant in contrast to stand-alone TWAS scan analysis (p-value = 8.8 × 10 −7 ). Compared to the joint-analysis of TWAS and variant-level colocalization analysis (which flags 17 DisGenNET genes from 117 implicated unique genes), the level of signal enrichment is similar, but the confirmed discoveries by the newly proposed method represent a 35% increase.

Discussion
This paper systematically investigates two prevailing types of integrative genetic analysis methods, TWAS scan and colocalization analysis, focusing on understanding and reconciling their inferential differences in practical settings. From the perspective of inferential reproducibility, we identify multiple statistical and biological factors that yield different sets of implicated genes.
In one scenario (i.e., strong colocalization but weak TWAS signals), we find that most genes in the specific difference set show interesting characteristics requiring further biological investigations, indicating the two approaches are complementary. In the other scenario (i.e., strong TWAS but weak colocalization signals), we find that the differences can be effectively reconciled. Subsequently, we propose and implement a new locus-level colocalization analysis method to bridge the two types of analyses. We illustrate that the proposed joint-analysis approach can produce a rich list of biologically relevant "conceptual replications" for downstream investigations and validations.
Variant-level colocalization analysis utilizes a conceptually rigorous and superior standard to examine the overlapping of causal association signals at the finest resolution. It exhibits the highest specificity among the existing integrative analysis approaches. The most noticeable drawback is its limited sensitivity given currently available data [15]. The issue is originated from the difficulty in quantifying variant-level association evidence/uncertainty in the presence of LD [23,24]. While future data with more precise phenotyping (e.g., single-cell expression data) and/or larger sample size will certainly improve the power of variant-level colocalization analysis, the intrinsic difficulty due to complex LD patterns may not be fully resolved. We trade off the rigid conceptual standard of variant-level colocalizations for improved sensitivity in the proposed locus-level colocalization analysis. This is mostly motivated by a series of realistic simulation studies presented in [15] and this paper, where the ratios of class II false negatives versus reported findings (which contain few false-positive errors) are often strikingly high (i.e., ∼ 1).
We acknowledge that locus-level colocalization analyses are not new. For example, RTC [25] and JLIM [26] are two representatives of general-sense locus-level colocalization methods in the literature. However, we see at least two distinct advantages of the proposed approach over the existing methods. First, we define target loci through the fine-mapping output of signal clusters/credible sets, which harbor very limited but highly relevant candidate genetic variants.
As expected, the precise locus definition by the proposed approach results in high specificity, which is much comparable to variant-level colocalization analysis as shown in our real data analysis. Second, the proposed approach is based on the same Bayesian hierarchical model that explicitly estimates the enrichment of molecular eQTLs in GWAS hits and subsequently utilizes the enrichment information to boost the discovery of colocalized sites. This is a unique and novel feature of the proposed locus-level colocalization method.
The empirical comparison of TWAS scan and colocalization analyses helps us identify statistical factors that differentiate the two sets of results. One of this work's most important take-away messages is that TWAS scan results are unlikely independent and need further processing. At a certain level, PTWAS scan is analogous to the practice of single-variant testing in the common practice of genetic association analysis, where the community standard is not stopping at reporting significant individual variants but summarizing the testing results and grouping linked variants to flag independent causal variant-harboring loci.
Our proposal to apply locus-level colocalization to screen and filter TWAS scan results follows the similar strategy established in PhenomeXcan [18], which is proven effective in identifying biologically relevant potential causal genes. Given the increased sensitivity of locus-level versus variant-level colocalization analysis, the improvement in the performance of such a strategy is logically expected. We note that the software FOCUS utilizes an alternative and statistically elegant strategy (analogous to the multi-variant fine-mapping in genetic association analysis) to parse correlated TWAS scan findings and identify potential "causal" genes. This strategy can be highly effective if the assumption of a causal gene in the genomic region of interest is met.
In comparison, our proposed strategy does not require such an assumption but relies on the biological implication from colocalizations, hence offering some added robustness in inference. In practice, the two strategies can be complementary and applied simultaneously. Additionally, we note that many authors [8,9,10,11] have connected TWAS analysis to Mendelian randomization (MR) and instrumental variable (IV) analyses and point out the scan procedure is equivalent to the testing procedure in MR and IV analyses. We strongly agree that additional estimation and heterogeneity diagnosis procedures from the MR and IV analyses can be further applied to validate the causality of implicated genes in combination with the proposed joint-analysis procedure. We believe that the conceptual replications identified from the proposed joint-analysis forms a good set for such purpose. Finally, joint TWAS scan and colocalization analysis can be used to identify cases of horizontal pleiotropy by examining genes with strong colocalization but weak TWAS signals. Such findings can be critical to uncover the full molecular mechanisms underlying complex diseases and deserve attention from the community.
Much of this work is motivated by understanding the inferential reproducibility between TWAS scan and colocalization analysis. Unlike the other types of reproducibility, namely, methods and results reproducibility, inferential reproducibility does not prompt pursuing the consistency of outcomes from different analytical methods [16]. Instead, the primary aim is to identify analytical assumptions/factors driving inferential differences and understand the extent of inconsistency between methods. These factors are particularly important for practitioners to design analysis schemes with all available tools. As we have demonstrated in this paper, inferential reproducibility is fundamentally different from inferential errors/mistakes. It should be treated with care, and most importantly, in a context-dependent manner. 20

Overview of TWAS scan and variant-level colocalization analysis
This section provides a brief overview of TWAS scan and variant-level colocalization analyses.
There are multiple implementations for each integrative approach. Here, we emphasize the commonality among different implementations and refer the readers to the cited publications for their differences.

TWAS scan analysis
TWAS scan analysis aims to identify genes whose genetically predicted expression levels are Existing literature has explicitly connected TWAS scan to the testing procedure in instrumental variable analysis/Mendelian randomization [8,10,11]. Particularly, the prediction model (5) is viewed as a composite instrumental variable (i.e., a weighted sum of genotypes) and is shown more powerful than the methods considering single genetic variants as eligible instruments. Conversely, a prediction model using a single genetic variant can be straightforwardly derived following the principles of both supervised learning and instrumental variable analysis (Section 1 of the Supplementary Material). Because this approach is computationally trivial and theoretically more powerful than the existing SMR method, we use this approach to represent the singlevariant TWAS scan method in this paper.
Finally, we emphasize that the TWAS scan does not represent the full IV/MR analysis procedure, where gene-to-trait effect estimation and model checking is typically required. Nevertheless, under the high-dimensional setting of integrative genetic association analysis, these additional procedures can be subsequently applied on TWAS scan results [10].

Variant-level colocalization analysis
Variant-level colocalization analysis aims to identify the overlapping of causal eQTL and GWAS SNPs. The problem is challenging due to wide-spread linkage disequilibrium and the limited power in uncovering true causal associations given the currently available molecular and complex trait data. Most existing variant-level colocalization methods take the Bayesian probabilistic modeling approach to effectively account for the inevitable uncertainty in genetic association analysis [14,15,12,13]. They also take advantage of fine-mapping results obtained from individual analysis of each trait to achieve improved accuracy. The colocalization probabilities for individual SNPs (i.e., SNP-level colocalization probabilities or SCPs) can be unimpressive, especially when a few SNPs are in high LD. The aforementioned methods all report a regional-level colocalization probability (RCP) to represent the probability that a genomic region harbors a single colocalized variant.
Variant-level colcoalization analysis is known to have limited power given the currently available GWAS and molecular QTL data. There are two classes of false-negative (FN) errors commonly encountered in practice [15]. Specifically, the class I FNs refer to the cases where the genetic association analyses for individual traits fail to identify at least one genuine association at a colocalized site. The class II FNs are caused by inaccurate quantification of association evidence at the individual variant level: even both associations are identified for a locus, the probabilistic characterization of the assumed causal variants may lead to strong evidence against variant-level colocalization.

Evaluating inferential reproducibility between TWAS scan and colocalization analysis
In practice, both TWAS scan and colocalization analysis are applied to the same data, which forms the basis for evaluating their inferential reproducibility. In this paper, we select the TWAS scan method, PTWAS scan [10], and the variant-level colocalization method, fastENLOC [18,15], for this purpose. In our real data analysis, both PTWAS and fastENLOC utilize the same probabilistic eQTL annotations derived from the GTEx project (v8) [1] and the fine-mapping method, DAP-G [27]. The GWAS summary statistics for the 4 selected complex traits are publicly available and further harmonized to match variants genotyped in the eQTL dataset.
Our main motivation for these selections is to minimize the procedural differences in eQTL and GWAS data pre-processing. Thus, we can focus on the important analytical factors that lead to differences in inferential results.

Probabilistic colocalization analysis at locus level
In this section, we outline the analytical derivation and computational algorithm for evaluating locus-level colocalizations.

Defining genomic loci
Our implementation defines a locus by the intersected signal clusters from the target molecular and complex traits. A signal cluster is defined by a set of LD SNPs representing the same underlying association signal for a given trait. Intuitively, any member of a signal cluster can similarly represent an independent association signal in likelihood computation, but the true causal SNP may not be distinguishable. This construction of loci for colocalization analysis implies that each resulting locus harbors at most one causal SNP for the complex trait and at most one causal molecular QTL.

Computation of locus-level colocalization probability
We denote the genotype and phenotype data combinations for the molecular and the complex traits of interest by Y e and Y g , respectively. Consider p overlapping SNPs in a predefined locus. Let β i and α i denote the genetic effects of SNP i for the complex and molecular traits, respectively. The latent binary association status of all member SNPs in the locus for the complex trait is represented by a p-binary vector, γ, where γ i = 1(β i = 0). Similarly, we use p-vector d to denote the latent association status for the molecular trait, where d i = 1(α i = 0).
Let Γ k and D k denote the sets of configurations of γ and d values with exactly k independent association signals for the corresponding traits, respectively. By the construction of the genetic locus through fine-mapped signal clusters, it follows that The problem of locus-level colocalization then can be framed as evaluating the following posterior Assuming the molecular and complex trait data are collected from two non-overlapping sets of samples, it follows that Next, we show that both required probabilities can be deduced from the variant-level colocalization model detailed in [14,15].
It follows from the locus definition and (6) that where Pr(d i = 1 | Y e ) denotes the posterior inclusion probability (PIP) for member SNP i computed from the fine-mapping analysis of the molecular QTLs.
The computation of Pr(γ ∈ Γ 1 | d ∈ D 1 , Y g ) can be formulated as a problem of fine-mapping GWAS hits with informative molecular QTL priors [27]. More specifically, the variant-level priors, can be estimated by a multiple imputation (MI) procedure described in [14]. Thus, the prior required for the locus-level colocalization analysis is computed by considering all compatible configurations of γ and d values, i.e., Similarly, As in the variant-level colocalization analysis, the PIPs of GWAS SNPs are assumed available from Bayesian fine-mapping methods based on an exchangeable and eQTL non-informative prior, Pr(γ i = 1) := π, which can also be estimated by the average of all GWAS PIPs [14]. For this eQTL non-informative prior, the induced prior probabilities for the locus of interest are given by It again follows from the locus definition and (6) that Consequently, the marginal likelihood defined by the following Bayes factor, can be obtained by Put together, by (6), (10), (11), (15), and the Bayes theorem, Consequently, the desired locus-level colocalization probability, can be computed by (7).

Simulation details 4.3.1 Simulation to illustrate LD hitchhiking effect in TWAS scan
Our simulation scheme is informed by real data analysis. Firstly, we identify the SNP rs2871960 as one of the most significant genetic associations (z = −34.2) for standing height in the UK Biobank data. In the GTEx data, this SNP maps to the cis-region of a single gene, ZBTB38  Table S3.
In this simulation, we consider all 22,662 cis-SNPs of the 39 genes. The eQTL dataset is directly taken from the GTEx muscle skeletal tissue with 706 individuals. We note that rs2871960 is unlikely to be the true causal eQTL for ENSG00000177311 in the muscle skeletal tissue based on the fine-mapping result. Specifically, it does not fall into any signal cluster of the gene and the PIP = 2.75 × 10 −3 (despite its single-SNP testing p-value reaching 10 −11 ). Additionally, the SNP-level colocalization probability for this SNP is also quite low (2.72 × 10 −3 ). We simulate a complex trait (y h ) for the 706 GTEx individuals using their true genotypes (g) for SNP rs2871960 and the following simple linear regression model, That is, the only causal SNP for the complex trait is assumed to be rs2871960, with its genetic effect fixed at −1.5. Note that the genotypes for the gene expression and complex trait are perfectly matched. However, because the complex trait is independently simulated, the overall scheme fits the two-sample design for integrative analysis. The particular genetic effect value attempts to match the observed complex trait z-score in the UK Biobank with much-reduced sample size. To further illustrate that all significant TWAS findings from the simulated dataset are due to the LD hitchhiking effect, we regress out the genotypes of the causal SNP and treat the residuals as a new complex trait phenotype. Finally, we analyze both datasets by PTWAS scan and report the corresponding p-values for each examined gene.

Simulation to evaluate locus-level colocalization analysis
In this simulation, we assemble a genomic region with a "known" LD structure from the genotype data of 838 GTEx samples. We classify the artificial genomic region into 22 LD blocks, with each block containing 50 consecutive common SNPs from a unique chromosome. We intend to capture natural LD patterns within each block and minimize LD between blocks (as the genotypes are taken from distinct chromosomes). The LD structure of the assembled genomic region is shown in Supplementary Figures S1 and S2.
For each simulated dataset, we independently generate phenotype data for a molecular (y e ) and a complex trait (y c ) using the following linear regression models Note that the colocalization in (19) induces a structural (mean) equation between y e and y c , i.e., which is often used in TWAS simulations [8,9].

Analysis of GTEx eQTL and complex trait GWAS data
We use the multi-tissue eQTL data generated from the GTEx project (v8) in our real data analysis. The data are processed and analyzed by the GTEx consortium. The pre-processing and analysis protocols are documented in [1]. For evaluating inferential reproducibility, we particularly focus on the cis-eQTL fine-mapping results generated by the software package DAP [27], which are publicly available at the GTEx portal. These results are re-formatted to feed PTWAS scan and fastENLOC without further processing.
The summary statistics from the 4 selected complex trait GWAS ( HDL and LDL from the GLGC consortium; standing height from the UK Biobank; coronary artery disease from the CARDio-GRAM consortium) are also publicly available. We use the single SNP association z-scores for the 4 traits harmonized by the GTEx project [1]. The main purpose of the harmonization procedure is to match common SNPs between the SNPs interrogated in the GWAS and the GTEx project. More specifically, if a GTEx SNP is missing from the corresponding GWAS, its GWAS z-score is imputed using the software package impG. The harmonized z-scores are subsequently used in PTWAS scan and fastENLOC analyses.   Figure 1: Estimated gene-to-trait effects on CAD by two independent eQTL SNPs of gene TDRKH explain its strong colocalization but weak TWAS scan signals The eQTL fine-mapping analysis of TDRKH using GTEx Artery Tibal tissue samples identifies two independent strong eQTLs (SPIPs > 0.9) represented by the lead SNPs rs6667279 and rs1521185, respectively. The second eQTL signal represented by rs1521185 (highlighted in red) also shows strong variant-level colocalization, with RCP = 0.92. The figure shows estimated gene-to-trait effects using the two lead independent eQTL SNPs as instruments. Specifically, the first instrument shows increased gene expression is associated with increased CAD risk, while the colocalized instrument predicts the opposite. As a result, the genetically predicted gene expression by combining the two SNPs shows no evidence of association with CAD risk (PTWAS scan p-value = 0.98).  : Histogram of I 2 statistics for genes with strong variant-level colocalization but weak TWAS scan signals I 2 statistic represents the level of heterogeneity in estimated gene-to-trait effects from independent eQTLs. Large I 2 values (i.e., I 2 → 1) indicate that vertical pleiotropy is unlikely. The figure shows that genes with strong variant-level colocalization but weak TWAS scan signals are more likely to have high I 2 values than those with strong TWAS signals. That is, the phenomenon illustrated in Figure 1 can be common in this set of genes. The peak at I 2 = 0 for this set mostly represent the genes whose TWAS scan p-value approach but do not exceed the pre-defined FDR significance level. 10 60 q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq  Figure 3: Simulation illustrating LD hitch-hiking effects in TWAS scan Panel A shows an observed cluster of significant TWAS scan genes from UK Biobank standing height data. The signals are centered around Gene ZBTB38 (Ensembl ID: ENSG00000177311), whose position is labeled by the dotted vertical line. Each point on the figure represents a TWAS scan p-value of a neighboring gene. Panel B shows that a similar cluster pattern can be replicated from a simulated dataset, which is generated by assuming a single causal variant within ZBTB38. Panel C shows the significant TWAS scan cluster disappears once the genotypes of the sole causal SNP are regressed out from the simulated phenotype data. In all three panels, the horizontal red line indicates the nominal 0.05 significance level. The simulation experiment illustrates that the significant TWAS scan findings can be attributed to the LD hitchhiking effects.   : Histogram of I 2 statistics for genes with strong locus-level colocalization but weak TWAS scan signals The figure indicates that genes with strong locus-level colocalization but weak TWAS scan signals are more likely to have high I 2 values than those with strong TWAS signals. This phenomenon is similar to the genes with strong variant-level colocalization but weak TWAS scan signals, indicating both sets of genes are likely enriched with cases of horizontal pleiotropy.