Meta-imputation of transcriptome from genotypes across multiple datasets using summary-level data

Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS without having to access each individual-level dataset. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblast cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits. Author Summary The gene expression levels within a cell are affected by various factors, including DNA variation, cell type, cellular microenvironment, disease status, and other environmental factors surrounding the individual. The genetic component of gene expression is known to explain a substantial fraction of transcriptional variation among individuals and can be imputed from genotypes in a tissue-specific manner, by training from population-scale transcriptomic profiles designed to identify expression quantitative loci (eQTLs). Imputing gene expression levels is shown to help understand the genetic basis of human disease through Transcriptome-wide association analysis (TWAS) and Mendelian Randomization (MR). However, it has been unclear how to integrate multiple imputation models trained from individual datasets to maximize their accuracy without having to access individual genotypes and expression levels that are often protected for privacy concerns. We developed SWAM (Smartly Weighted Averaging across Multiple datasets), a meta-imputation framework which can accurately impute gene expression levels from genotypes by integrating multiple imputation models without requiring individual-level data. Our method examines the similarity or differences between resources and borrowing information most relevant to the tissue of interest. We demonstrate that SWAM outperforms existing single-tissue and multi-tissue imputation models and continue to increase accuracy when integrating additional imputation models.

Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute 23 expression levels of a specific tissue is to use the model trained from the same tissue type. 24 When multiple tissues are available for the same subjects, it has been demonstrated that 25 training imputation models from multiple tissue types improves the accuracy because of shared 26 eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue 27 methods require access of genotype and expression data across all tissues. Moreover, they 28 cannot leverage the abundance of various expression datasets across various tissues for non-29 overlapping individuals. 30 Here, we explore the optimal way to combine imputed levels across training models 31 from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed 32 method (SWAM) combines arbitrary number of transcriptome imputation models to linearly 33 optimize the imputation accuracy given a target tissue. By integrating models across tissues 34 and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve 35 power to TWAS without having to access each individual-level dataset. To evaluate the accuracy 36 of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx 37 Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks 38 (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblast cell lines samples. We increases the specificity and interpretability in identifying GWAS signals driven by gene 75 regulation. Imputed gene expression can be utilized in various contexts of association analysis 76 samples may not necessarily overlap. To evaluate the benefit of SWAM's ability for multi-252 dataset "meta-imputation", we integrated imputation models trained from GTEx v7 and v8, as 253 well as 922 whole blood transcriptomes from Depression Gene Network (DGN). The rationale to 254 include GTEx v7 and v8 models is that the datasets are slightly different from v6 (for example, 255 v7 has more samples in all tissues except for LCL, FCL, and whole blood) and integrating 256 multiple training models from slightly different versions of datasets may improve the accuracy. 257 The reason to include DGN whole blood is that the sample size is much larger than any 258 individual tissue GTEx, so it may help further reduce the variance and MSE of the imputation 259 model. 260 When applying SWAM to GTEx v6, v7, or v8 datasets individually, the number of 261 significantly imputed genes at FDR < .05 were 3,040, 3,060, and 3,203, respectively ( Figure 7B). 262 However, when all datasets were combined, the number of imputable genes increased to 263 3,342. These results suggest that imputation across multiple datasets can help even when the 264 datasets are highly overlapping. When we additionally integrated SWAM with the DGN whole 265 blood model, which detected 2,390 imputable genes by itself, the number of imputable genes 266 by the integrated SWAM model further increased to 3,413. Note that we needed individual-267 level data only for the reference tissue/data (GTEx v6 LCL in our experiment), so an arbitrary 268 combination of imputation models, which consist of only summary-level data, can be 269 seamlessly added to the meta-imputation framework of SWAM. 270 Overall, using all 49 GTEx v8 tissues in combination with the DGN whole blood model 271 provided the highest number of imputable genes, with a 112.9% improvement over the 272 corresponding GTEx v8 PrediXcan-LCL model (single tissue), and a 13.5% improvement over the GTEx v6 version of SWAM-LCL (multi-tissue) ( Figure 7B). Regardless of the version of GTEx used, 274 including the DGN whole blood model gives a substantial improvement in number of imputable 275 genes compared to not including it in the model. Another interesting observation is that while 276 PrediXcan-LCL (v6) appears to perform better than PrediXcan-LCL (v7), SWAM-LCL derived from 277 v7 performs better than v6 SWAM-LCL. This may suggest that while GTEx v7 PrediXcan-LCL may 278 not have had a significant improvement in eQTL detection compared to its predecessor, other 279 tissues may have improved in more substantial ways. This is because the sample size for LCL in 280 v7 decreased by 18 samples, whereas other non-blood tissues had substantial sample size gains 281 of up to 89 individuals. Here, SWAM leverages the increase in quality from other tissues, which 282 allows for better overall imputation regardless of the quality of the target tissue itself. 283 SWAM robustly captures both tissue-specific and cross-tissue regulatory components 284 The key component behind the robust performance of SWAM is that it learns how to 285 distribute weights across multiple imputation models for each gene individually. If a gene 286 shares eQTLs across many tissues, the SWAM's weights will be distributed evenly across tissues 287 and the model will behave similarly to the naïve average heuristic. For example, ERAP2 is a well-288 known gene with shared eQTLs profiles across most tissues. In the GTEx (v6), ERAP2 can be 289 reliably imputed with any of the 44 single-tissue imputation models from PrediXcan with r 2 > 290 0.77 or more eQTLs. As a result, the weights from SWAM is almost evenly distributed across the 291 tissues, ranging from 0.018 to 0.027 (S2 Fig), and the accuracy of SWAM (r 2 = 0.795) is very 292 similar to the accuracy of naïve average (r 2 = 0.796). 293 On the other hand, when the imputation model from the reference tissue is not 294 particularly good due to smaller sample size or other technical issues, SWAM can substantially improve accuracy by leveraging eQTL sharing from other tissues. For example, the single-tissue 296 imputation accuracy of GSTM1 is relatively low in LCL tissue (r 2 = 0.368) compared to the 297 accuracy of the 38 other tissues in which a PrediXcan imputation model is available (average r 2 298 = 0.61). Using SWAM, the predictive R-squared increases to r 2 = 0.741 by assigning positive 299 weights to 31 tissues (S2 Fig).  300 Finally, for genes that are highly tissue-specific, the SWAM's weights will be distributed 301 similarly to the best tissue heuristic. For example, CTSK is expressed in most tissues, but has 302 eQTLs in only 16 tissues, (S2 Fig). SWAM assigns weights to 7 of these tissues, and substantially 303 improves the predictive accuracy from r 2 = 0.111 to r 2 = 0.447. 304 Comparison of imputation models in the context of TWAS 305 We conducted TWAS analysis using SWAM, UTMOST, and PrediXcan models via 306 We plotted transcriptome-wide signals for the LDL trait using the GTEx v6 liver model 317 for PrediXcan, UTMOST and SWAM ( Figure 8). One interesting signal gained from the SWAM 318 analysis is the APOC1 gene, which is primarily expressed in the liver and has been implicated in 319 playing a role in HDL and LDL/VLDL (very low-density lipid) metabolism [31]. 320 One potential shortcoming for both multi-tissue approaches (SWAM and UTMOST) 321 appear to be that the number of unique signals (across all tissues) is fewer than those 322 generated by PrediXcan's single tissue models. For example, SWAM produces 210 unique 323 associations for the HDL trait, while we see 187 unique associations from UTMOST and 248 324 unique associations from PrediXcan. Similarly, MultiXcan detects 284 significant associations 325 when scanning across all tissues (based off the PrediXcan models). It appears that while the 326 multi-tissue methods can leverage information from other tissues to impute expression 327 accurately, marginal association signals in TWAS are potentially lost using these approaches. 328 However, we found that a high number of these unique signals from the PrediXcan TWAS 329 appeared only in one or two tissues (92.5% for HDL, 98.2% for LDL and 100% for T2D). 330 With all these various considerations, SWAM appears to improve TWAS power for a 331 given tissue, although ultimately may yield fewer signals compared to comprehensive tissue 332 scans using PrediXcan or MultiXcan. While SWAM outperforms other methods in terms of 333 imputation accuracy, there may not be a clear-cut winner in terms of performance in TWAS. 334 The best approach to use will likely depend on the needs of the researcher, and each approach 335 may provide different yet complementary insights into understanding the biological 336 mechanisms from these association studies. traits. Imputation of gene expression in the context of transcriptome wide association studies is 343 a promising approach to understanding the connection between our genes and many traits. 344 Yet, there are still many challenges that arise when performing association studies with 345 imputed expression. Current tissue-specific imputation models are trained using data obtained 346 from their respective tissues, which can vary greatly in data quality and sample size. As such, 347 there is a great deal of variability among tissues in the imputation accuracy of tissue-specific 348 gene expression levels. For example, PrediXcan was able to significantly impute only 2086 349 vagina-specific genes, while it discovered 8171 genes specific to the tibial nerve tissue. 350 Furthermore, the imputation accuracy of significant genes within a tissue are also highly 351 variable, with some genes such as ERAP2 having very high (>80% of variation explained by 352 eQTLs) imputability and other genes (~1% of variation explained by eQTLs) with low 353 imputability. 354 In this paper we developed SWAM, a method that determines the level of eQTL sharing 355 between tissues and uses the shared information from other tissues to improve the imputation 356 accuracy for the target tissue. By simultaneously examining the relatedness of multiple tissues, 357 SWAM in essence increases the effective sample size of imputation models. Using GEUVADIS LCL data, we compared SWAM to single-tissue approaches. We found that our multi-tissue 359 approach, in addition to increasing the number of significantly imputable genes for each tissue, 360 also improved the overall imputation accuracy for genes that were already significantly 361 imputable using PrediXcan. We improved the power of TWAS by running a SWAM-adapted 362 version of MetaXcan for various traits, finding an increased number of significant 363 transcriptome-trait associations, even when correcting for the larger number of genes imputed. 364 Although SWAM provides a substantial improvement for the number of significantly 365 imputable genes for many tissues and generally increases power for TWAS, there are some 366 shortcomings and caveats to consider with the approach. It is important to note that unlike 367 PrediXcan, SWAM does not actually perform model training or eQTL discovery. Instead, it 368 evaluates the efficacy of various single-tissue imputation models (in this case, the GTEx tissues) 369 and assigns weights to the models based on their relatedness to the target tissue. Therefore, 370 for SWAM to work, there must already be a database of imputation models that it can use to 371 derive the multi-tissue weighting. Because we are utilizing existing imputation models, we 372 acknowledge that there will be cases where the SWAM imputation accuracy could be similar or 373 worse to the single-tissue imputation, especially if the gene has shared eQTLs across many 374 tissues or if the single-tissue imputation model was already performing well. The improvement 375 observed in our validations and TWAS are an overall trend, and as with any analysis, 376 interpretation of any specific results should be approached with caution. Furthermore, the 377 improvement for any given gene has an upper limit which is dependent on the pool of single 378 tissue models available. There may be tissues that have very few relevant other tissues to draw 379 information from. For any given gene within the target tissue, SWAM automatically assigns weights of non-relevant tissues to zero based on a threshold. However, for the purposes of our 381 study, the threshold was tuned to be more lenient, allowing for more tissues to be included in 382 the imputation of each gene's expression levels. A more lenient threshold will yield more genes, 383 but a lower sensitivity to the target tissue. A stricter threshold will provide imputed expressions 384 that are more specific to the target tissue but will provide imputation for fewer genes and may 385 reduce imputation accuracy in some genes. Optimal tuning of this threshold may depend on the 386 target tissue, and the goals of the analysis. Further work could help determine the ideal way to 387 tune these thresholds, perhaps using a different threshold depending on the gene and tissue in 388 question. 389 Next, our empirical validation of imputation accuracy was tested on European To conclude, we propose a novel method for gene expression imputation, which 410 extends already established single-tissue imputation models into a multi-tissue setting. By 411 combining information from multiple models, we were able to increase overall tissue-specific 412 imputation accuracy for many genes and increase power for transcriptome-wide association 413 studies. 414 415

SWAM Notation and Framework 417
Our framework for SWAM is designed to find the optimal linear combination of imputed 418 expression levels from multiple tissues and datasets. For simplicity, we will denote each (tissue, 419 dataset) combination as a source. We assume there are imputation models from individual 420 sources, with each model indexed as ∈ (1, . . , ). We also denote ∈ {1, … , } to represent Here we describe how SWAM calculates optimal , whose derivation is shown in the 437 Supplementary Text. It is important to note that SWAM works ideally when the tissue type 438 intended to be imputed matches to the tissue types of the reference source. We define as 439 the × 1 vector of individual-level expression measurements for the reference source, and as 440 before, to be the corresponding × matrix of individual-level genotypes. The first step is 441 to impute expression using each of the models using the reference genotypes. Thus, we 442 obtain sets of imputed expressions, = ( | ), with each being a single-source 443 imputation for the samples in the reference data. The weights for SWAM are given by 444 Here, the correlation matrix account for the similarity between the imputation models, 448 and the vector containing the entries , account for the empirical similarity of 449 imputed expressions from each model to the measured expressions in the reference source. 450 When = , because , will be prone to overfitting, we replace this value to a 5-fold 451 cross-validated correlation instead, which is available from PrediXcan output. Finally, acts to 452 regularize the weights, providing numerical stability for the inversion of the covariance matrix. 453 The calibration of is further discussed in the Supplementary Text. 454 Simulations 455 Our simulation study sought to examine SWAM's ability to detect the correct shared 456 components between related tissues across a wide spectrum of parameter settings. We 457 compared SWAM with naïve average, best tissue and single tissue approaches. For each 458 simulation, we independently generate individual-level genotypes and expression multiple 459 tissues. For the reference set, we simulated , an × genotype where is the number of 460 individuals and the number of SNPs. In our simple simulation, we assume that each SNP is 461 independent, with non-reference allele frequency (AF) distributed with Beta(1,3). The 462 genotypes were simulated using a binomial distribution based off the AF. To simulate multi-463 tissue expressions, for each tissue ∈ (1, . . , ) we specific effect sizes , to simulate 464 expressions = + . For reference tissue (i.e. = ), we assume two causal SNPs with 465 nonzero elements in , where one SNP is expected to explain tissue-specific heritability (ℎ ) 466 for the reference tissue and the other SNP explains the cross-tissue heritability (ℎ ), summing 467 up to total heritability (ℎ = ℎ + ℎ ). Other tissues (i.e. ≠ ) were divided into "related 468 tissues" and "independent tissues". For related issues, had only one non-zero values 469 corresponding to cross-tissue heritability (ℎ ). For independent tissues, all had zero values. [2] to build multi-tissue imputation models. To demonstrate the ability to SWAM to incorporate 492 multiple datasets, we used DGN [11] dataset as well as multiple versions of GTEx datasets. 493 Multi-tissue transcriptomic profiles and imputation models from the GTEx project 494 To build multi-tissue imputation models using SWAM, UTMOST, naïve average, and best 495 tissue methods, we used single-tissue imputation models, individual-level genotypes, and 496 expressions obtained from the GTEx consortium. Single-tissue imputation models were 497 tissue (e.g. EBV-transformed lymphocytes) which is deemed to be the closest to the validation 501 data (e.g. GEUVADIS LCL), using GTEx version 6. 502 When evaluating multi-tissue imputation models within a single dataset, we used GTEx 503 version 6. When evaluating imputation models across multiple tissues and multiple datasets, 504 we used various combinations of GTEx versions to evaluate the benefit of multiple imputation 505 models trained from overlapping datasets. When training across different datasets, genes were 506 matched by ensemble ID, ignoring version numbers. In addition to training SWAM, we also used 507 the single tissue PredictDB imputation models as a basis for comparison with our method. 508 Validation dataset from the GEUVADIS study 509 We used individual-level genotypes and expression levels from lymphoblastoid cell lines 510 (LCL) from the GEUVADIS consortium only to evaluate various methods after imputing 511 expression levels with models built from other datasets. Each imputation model was evaluated 512 by applying the model to GEUVADIS genotypes to impute individual expression levels, and by 513 calculating the correlation between the imputed and measured expressions. We focused on 514 344 European individuals where genotypes and normalized expressions (from RNA-seq) are 515 available, with comparable linkage disequilibrium (LD) structure to GTEx and DGN datasets. 516

Imputation models from Depression Genes Network 517
We also downloaded the imputation model trained using the 922 whole blood 518 transcriptomes from the Depression Genes Network (DGN) via PredictDB. DGN was evaluated 519 as a single-tissue imputation model. It was also used in the evaluation of multi-dataset 520 imputation models when DGN is combined with various versions of GTEx imputation models.

Imputation models from UTMOST 522
We compared our methods to UTMOST, another multi-tissue approach for expression 523 imputation [24]. The UTMOST imputation models were jointly trained across 44 tissues from 524 GTEx version 6 and were downloaded from their published online repository 525 (https://github.com/Joker-Jerome/UTMOST). We applied the imputation model targeted for 526 EBV-transformed lymphocytes when evaluating the imputation accuracy with the GEUVADIS 527 LCL expression. 528

Evaluating imputation accuracy with GEUVADIS measured expression 529
We evaluated the accuracy of various imputation models by comparing imputed 530 expressions from individual-level genotypes with the measured expression from GEUVADIS 531 LCLs. Individual-level expression were imputed across 344 European GEUVADIS samples using 532 various single-tissue, multi-tissue/multi-dataset methods to calculate the correlation with the 533 normalized measured expression from GEUVADIS LCL. The correlation between imputed and 534 measured expressions were calculated using spearman correlation and a one-sided p-value was 535 evaluated by converting the correlation coefficients into t-statistics. Genes were considered 536 "significantly imputable" if the Benjamini-Hochberg false discovery rate (FDR) was less than 537 0.05. This procedure was applied across all genes within each method, with the counts being 538 tabulated. 539 Comparing single-tissue and multi-tissue imputation models within a single dataset. 540 With these results, we first focused on comparing the imputation accuracy of SWAM 541 with other methods using GTEx v6. We compared SWAM-LCL (SWAM using GTEx EBV-542 transformed lymphocytes as reference), every single tissue imputation model from PredictDB, and best tissue methods. We focused on evaluation using GTEx v6 models where UTMOST 545 models were available. We also focused on genes included in the Consensus Coding Sequence 546 Project (CCDS) [36] to minimize the discrepancy between imputation models. 547 To keep a fair comparison with UTMOST and the single tissue methods, we restricted 548 the set of genes to those that have at least one eQTL in any single tissue models from PredictDB 549 and also in any UTMOST models across all reference tissues. 550 Evaluating multi-tissue imputation models across multiple datasets.     This figure demonstrates the training of the imputation model using the reference data. The inputs required for SWAM are a set of reference genotypes with sample matched measured expression, and the multiple imputation models to be included. The list of multiple imputation models must also include a model derived from the reference data, which can be done via prediXcan. SWAM uses these models to impute tissue-specific expression levels from the reference genotypes. These imputed expression sets are then compared with the measured expression of the reference set. The weights are calculated based on the similarity between the measured and imputed expression and the covariance structure of tissues. For full details, see the methods section. 742 743 Figure 6 -simulation study comparing SWAM with naïve average, best tissue and single tissue methods.
We ran each simulation 10,000 times, with the following default settings: 10 total tissues (1 target, 4 relevant, 5 irrelevant), 100 SNPs (2 per tissue), 10% genetic heritability, 50% shared heritability between relevant tissues. In addition, the sample size of the target tissue was 100 individuals, and the remaining tissues had 200 individuals. This was done to emphasize the importance of integrating information from other tissues when the quality of the target tissue model is limited. In panel (A), we varied the number of relevant tissues, from 0 to 10. Panel (B) shows the improvement when the total number of tissues is increased, with the number of irrelevant tissues fixed at 50% of the total. Panel (C) shows the effects of changing the shared heritability for the relevant tissues. We note here, that each tissue has 2 causal SNPS -for the relevant tissues, 1 of these causal SNPS is shared with the target tissue while the other is independent of all simulated tissues. Panel (D) shows the performance of the approaches for different levels of genetic heritability. This simulation demonstrates the range of heritability that we would expect to see the most improvement. Empirically, we do notice the same trend seen here, as SWAM performs similarly the single tissue model when the cross-validated R-squared is high. Panel (E) shows the effects of target tissue sample size. The x-axis pertains to the sample size of the target tissue only, and all other tissues were fixed at 200 individuals. Finally, panel (F) shows the performance of the methods at different p-value thresholds, using the default simulation settings. We used our LCL-targeted SWAM model to impute expression levels based on the genotypes of 344 European samples. We then calculated the concordance between imputed expression and measured LCL expression. We repeated this for all of the other methods mentioned here. (A) shows the performance of SWAM against the single-tissue models from 44 tissue-specific predictDB models derived from GTEx version 6. In (B), we derived various SWAM models using every combination of the following: 1) all GTEx v6 tissues, 2) all GTEx v7 tissues, 3) all GTEx v8 tissues, and 4) Depression Gene Network (DGN) single tissue whole blood model from predictDB. Here, we also included the UTMOST LCL model, naïve average and best tissue models, all derived from GTEx v6. To correct for this, we added a diagonal matrix, prior to inverting the matrix , giving 813 us the solution = + ( , ). To choose the correct value of , we 814 tested the imputation accuracy of in our validation test set for a large range of . We found 815 that imputation accuracy was low when = 0, likely due to overfitted and the amplification of 816 noise. Larger values of yielded better results but ignored the correlation structure between 817 tissues. We found empirically that = 3 provided the best results (this value depends highly on 818 the scale and normalization of the data). 819 820 Application of SWAM to other target tissues 821 Throughout our work we primarily used the LCL tissue from GTEx version 6 as our target tissue 822 for application of SWAM. In addition to producing SWAM-LCL models, we also generated 823 models targeting each of the 44 GTEx v6 tissues. Supplementary Figure 3  Overall, we observe clustering that appears to separate the tissue types quite well. For 828 example, brain tissues are primarily getting high weights from other brain tissues while 829 receiving low weights from all other tissue types. This heatmap provides evidence of SWAM 830 being able to capture tissue-specific signals. 831 The principal behind SWAM is it considers the bias-variance tradeoff for each tissue, and 838 assigns higher weights to tissues that reduce MSE. In this example, tissues such as Skeletal 839

List of Supplementary Figures
Muscle have a high sample size (and therefore lower variance) but may be biased as they are 840 not the relevant tissue to the tissue of interest (in this case LCL). Other tissues such as 841 Fibroblasts may have a lower sample size but compensate by having low bias (high relevance to 842 tissue of interest) and will contribute more weight. The principal behind SWAM is it considers the bias-variance tradeoff for each tissue, and assigns higher weights to tissues that reduce MSE. In this example, tissues such as Skeletal Muscle have a high sample size (and therefore lower variance) but may be biased as they are not the relevant tissue to the tissue of interest (in this case LCL). Other tissues such as Fibroblasts may have a lower sample size but compensate by having low bias (high relevance to tissue of interest) and will contribute more weight. (A) shows the ERAP 2 gene, which had a single tissue r 2 = 0.801, while the SWAM model had r 2 = 0.795. (B) depicts are scenario where SWAM is able to leverage information from other tissues to make up for the relatively lower quality of the target tissue -here the single tissue model gave r 2 = 0.368 while SWAM increased the accuracy to r 2 = 0.741. (C) shows an example where the eQTLs are highly tissue specific. Here, SWAM improved the single tissue accuracy from r 2 = 0.111 to r 2 = 0.447. We used SWAM to derive multi-tissue imputation models for all 44 GTEx v6 tissues. Each cell in this heatmap depict the number of times each tissue contributed the highest weight to the target tissue. Here, the rows correspond to the target tissue and the columns correspond to the weight contribution of each tissue. For the sake of clarity, the diagonal values were not included as they were consistently much higher than the remaining elements of the matrix.  We also compared every prediXcan model derived from GTEx version 7 and version 8 tissues, 961 and tested imputation accuracy against GEUVADIS LCL measured expression levels. Surprisingly, 962 despite the increase in sample size, the LCL tissue from v8 performed worse than its version 7 963 counterpart. The number of tissues outperforming LCL in both v7 and v8 highlight the 964 opportunity to leverage information from other tissues to improve imputation accuracy for 965 under-powered tissues. 966