Polygenic Transcriptome Risk Scores 1 Can Translate Genetic Results 2 Between Species 3

Genome-wide association studies have demonstrated that most 35 traits are highly polygenic; however, translating these polygenic signals into 36 biological insights remains diﬃcult. A lack of satisfactory methods for 37 translating polygenic results across species has precluded the use of model 38 organisms to address this problem. Here we explore the use of polygenic 39 transcriptomic risk scores (PTRS) for translating polygenic results across species. 40 Unlike polygenic risk scores (PRS), which rely on SNPs for predicting traits, PTRS 41 use imputed gene expression for prediction, which allows cross-species 42 translation to orthologous genes. We ﬁrst developed RatXcan, which is a 43 framework for transcriptome-wide association studies (TWAS) in outbred rats. 44 Leveraging predicted transcriptome and genotype data from UK Biobank, and 45 the genetically trained gene expression models from RatXcan, we scored more 46 than 3,000 rats using a human-derived PTRS for height. Strikingly, we found that 47 human-derived height PTRS signiﬁcantly predicted body length in rats (P<0.013). 48 The genes included in the PTRS were enriched for biological pathways including 49 skeletal growth and metabolism and were over-represented in tissues including 50 pancreas and brain. This approach facilitates experimental studies in model 51 organisms that examine the polygenic basis of human complex traits and 52 provides an empirical metric by which to evaluate the suitability of speciﬁc 53 animal models and identify their shared biological underpinnings. 54


56
Over the last decade, genome-wide association studies (GWAS) have identified 57 numerous genetic loci that contribute to biomedically important traits [Visscher 58 et al., 2017]. GWAS have demonstrated that most traits have a highly polygenic 59 architecture, meaning that numerous genetic variants with individually small ef-60 fects confer risk [Loos, 2020]. However, translating these results into meaning- To address this problem, we sought to develop a novel method that allows 75 translation of polygenic signals from humans to other species and vice-versa. 76 This method focuses on gene expression, rather than SNPs, and builds on our 77 past work with polygenic transcriptomic risk scores (PTRS) [Liang et al., 2022]. 78 PTRS are premised on the regulatory nature of most GWAS loci [Maurano et al., 79 2012] and use genetically regulated gene expression (transcript abundance), in-80 stead of SNPs, as features for prediction. We recently showed that PTRS are use-81 ful for translating polygenic signals between different human ancestry groups 82 [Liang et al., 2022], supporting the view that the effects of genes on a phenotype 83 are conserved across ancestry groups. In the current project, we hypothesized 84 that the relationships between genes and phenotypes are conserved not only 85 between human ancestry groups, but also across species. Thus, we explored 86 whether PTRS trained using human data could predict similar traits in another 87 species by applying the PTRS to orthologous genes in the target species. We se-88 lected heterogeneous stock (HS) rats because they are a well characterized, out-89 bred mammalian population for which dense genotype, phenotype and gene ex- The workflow was divided into 4 stages: a) gene expression prediction training, b) gene-trait association, c) PTRS fitting in humans, d) PTRS prediction. a) In the gene expression prediction training stage, we used genotype (117,155 SNPs) and gene expression data (15,216 genes) from samples derived from 5 brain regions in 88 rats. The prediction weights (rat PredictDB weights) are stored in predictdb.org. Rats used in this stage constitute the training set. b) In the gene-trait association stage, we used genotype and phenotype data from the target set of 3,407 rats (no overlap with training set rats). Predicted gene expression (8,567 genes for which prediction was possible) was calculated for all the 3,407 target set rats, and gene-trait associations were tested using RatXcan (N=1,463-3,110). We queried human gene-level associations from PhenomeXcan to estimate enrichment levels with our rat findings. c) Human PTRS weights were fitted using elastic net regression of height on predicted whole blood gene expression levels (7,002 genes) in the UK Biobank (N=356,476). d) The human PTRS weights were used for complex trait prediction in rats. PTRS trained in humans were then used to predict the analogous height trait in our target rat set. Prediction performance of PTRS was calculated as the correlation (and partial correlation) between the predicted scores in rats and the observed traits. Analyses in rats are shown in blue and analyses in humans are shown in pink.

93
Experimental setup 94 To build a framework for translating genetic results between species, we followed 95 the experimental setup illustrated in Fig. 1. In the training stage (Fig. 1a), we inves-96 tigated the genetic architecture of gene expression and built prediction models 97 of gene expression in rats. We used genotype and transcriptome data from five 98 brain regions sampled from 88 rats, generated by the NIDA Center for GWAS 99 for Outbred rats (Fig. 1a). In the association stage (Fig. 1b), we used their geno-100 type data to predict the transcriptome in a non-overlapping target set of 3,407 101 rats and tested for association between the genetically predicted gene expres-102 sion and body length by adapting the PrediXcan software, which was originally 103 developed for use in humans [Gamazon et al., 2015], to rats ('RatXcan'). We also 104 examined fasting glucose, which served as a negative control. In the discovery 105 stage (Fig. 1c), we determined the human-derived PTRS weights for height us-106 ing data from 356,476 individuals of European-descent from UK Biobank. In the 107 final stage (Fig 1d), we used these human-derived weights in conjunction with 108 genetically predicted gene expression for rats in the target set. We assessed the 109 prediction performance by comparing the predictions from the PTRS to the true 110 body length (which is equivalent to human height) for each rat.

112
To inform the optimal prediction model training, we examined the genetic archi-113 tecture of gene expression in HS rats by quantifying heritability and polygenicity.

114
Unless otherwise specified, we show the results for nucleus accumbens core in 115 the main section and the remaining tissues in the supplement.  123 for all brain tissues tested (Table 1). Fig. 2a shows the heritability estimates for 124 gene expression in the nucleus accumbens core, while heritability estimates in 125 other tissues are shown in Fig. S1. In humans, we identified a similar heritability 126 distribution (Fig. 2b, Fig. S2) based on whole blood samples from GTEx.

127
Next, to evaluate the polygenicity of gene expression levels, we examined 128 whether predictors with more polygenic (i.e., many variants of small effects) or 129 more sparse (i.e., just a few larger effect variants) architecture correlated better 130 with observed expression. We fitted elastic net regression models using a range

147
Based on the relative performance across different elastic net mixing parameters, 148 we chose a value of 0.5, which yielded slightly less sparse predictors than lasso 149 but provided robustness to missing or low quality variants; this is the same value 150 that we have chosen in the past for humans datasets [Gamazon et al., 2015]. 151 We trained elastic net predictors for all genes in all 5 brain regions. The proce-152 dure yielded 8,244-8,856 genes across five brain tissues from the available 15,216 153 genes ( Table 1). The 10-fold cross-validated prediction performance ( 2 ) ranged 154 from 0 to 80% with a mean of 8.51% in the nucleus accumbens core. As shown in 155 Fig. 2a and b, mean prediction 2 was consistently lower than mean heritability, 156 as is expected since genetic prediction performance is restricted by its heritabil- On the x-axis, genes are ordered by their heritability estimates. 95% credible sets are shown in gray for each gene. Blue dots indicate the prediction performance (cross validated 2 between predicted and observed expression). b) cis heritability of gene expression levels in whole blood tissue in humans from GTEx. We show only the same 10,268 orthologous genes. On the x-axis, genes are ordered by their heritability estimates. 95% credible sets are shown in gray for each gene. Pink dots indicate the prediction performance (cross validated 2 between predicted and observed expression). c) Cross validated prediction performance in rats (Pearson correlation ) as a function of the elastic net parameter ranging from 0 to 1. d) Cross validated prediction performance in humans (Pearson correlation ) as a function of the elastic net parameter ranging from 0 to 1. suringly, prediction performance values followed the heritability curve, confirm-159 ing that genes with highly heritable expression tend to be better predicted than 160 genes with low heritability in both HS rats and humans ( Fig. 2a-b). Interestingly, 161 we identified better prediction performance in HS rats than in humans (Fig. S3), 162 despite heritability of gene expression being similar across species ( Fig. 2a-b). 163 In Fig. 3a-b, we show the prediction performance of the best predicted genes 164 in HS rats (Mgmt, 2 = 0.72) and humans (RPS26, 2 = 0.74). Across all genes, 165 we found that the prediction performance in HS rats was correlated with that of 166 humans ( = 0.061, = 8.03 * 10 −6 ; Fig. 3c). Furthermore, performance per gene 167 in different tissues was similar in both HS rats (Fig. 3d) and humans (Fig. 3e), 168 namely, genes that were well-predicted in one tissue were also well-predicted 169 in another tissue. Correlation of prediction performance across tissues ranged 170 from 58 to 84% in HS rats and 42 to 69% in humans.

171
Having established the similarity of the genetic architecture of gene expres-172 sion between rats and humans, we transitioned to the association stage.

174
To extend the PrediXcan/TWAS framework to rats, we developed RatXcan. We 175 used the predicted weights from the training stage to estimate the genetically reg-176 ulated expression in the target set of 3,407 densely genotyped HS rats. We then 177 tested the association between predicted expression and body length. 178 We identified 90 Bonferroni significant genes ( (0.05∕5388) = 9.28 × 10 −6 ) in 57 and is related to growth pathways, including epidermal growth factor.

184
To evaluate whether trait-associated genes identified in HS rats were more 185 significantly associated with the corresponding traits in humans, we performed 186 enrichment analysis. Specifically, we selected genes that were nominally asso-  Manhattan plot of the association between predicted gene expression and rat body length, which is analogous to human height. We label the genes whose human orthologs are at least nominally associated in human data ( < 0.01); Grey dotted line corresponds to the Bonferroni correction threshold of 0.05/5,388 of tests. Red dotted line corresponds to an arbitrary threshold of 1 × 10 −4 . Triangular points refer to genes that were significantly associated with body length at the Bonferroni threshold, where the direction of the triangle corresponds with the sign of the associated gene. b) Q-q plot of the p-values of the association between predicted gene expression levels in humans (phenomexcan.org). Pink dots correspond to all genes tested in humans. Blue dots correspond to the subset of genes that were nominally significantly associated with body length in rats ( < 0.05). c) Correlation between human-derived height PTRS and observed body length in rats for one of the 37 regularization parameters used in building the PTRS. Correlation coefficients for all 37 models are available in Fig. S5.
ings further encouraged us to test whether PTRS based on human studies could 199 predict the analogous trait in rats.  in Figure 2 because not all genes are currently predictable both in humans and 304 rats); increasing our knowledge of orthologous genes or identifying other strate-305 gies to address this limitation will further improve performance.

306
The ability to transfer polygenic signals to other species creates novel oppor-

387
Genotype and expression data in the training rat set 388 The rats used for this study are part of a large multi-site project focused on ge-

416
Genotype and phenotype data in the target rat set 417 We used genotype and phenotype data from 3,407 HS rats (i.e., target set) re-418 ported in Chitre et al. [2020]. We used phenotypic information on body length 419 (including tail), and fasting glucose. For each trait, sex, age, batch number and 420 site, were regressed out if they were significant and if they explained more than 421 2 % of the variance, as described in [Chitre et al., 2020]. 422 Querying human gene-trait association results

423
To retrieve analogous human gene-trait association results, we queried PhenomeX- were also limited to the same 10,268 as above.

440
Examining polygenicity versus sparsity of gene expression 441 To examine the polygenicity versus sparsity of gene expression in rats, we iden-442 tified the optimal elastic net mixing parameter , as described in Wheeler et al. 443 [2016]. Briefly, we compared the prediction performance of a range of elastic net 444 mixing parameters spanning from 0 to 1 (11 values from 0 to 1, with steps of 0.1).

445
If the optimal mixing parameter was closer to 0, corresponding to ridge regres-446 sion, we deemed gene expression trait to be polygenic. In contrast, if the optimal 447 mixing parameter was closer to 1, corresponding to lasso, then the gene expres-448 sion trait was considered to be more sparse. We also restricted the number of 449 genes in the pipeline to the 10,268 orthologous genes.

450
Training gene expression prediction in rats 451 To train prediction models for gene expression in rats, we used the training set 452 of 88 rats described above and followed the elastic net pipeline from predictdb.org.

453
Briefly, for each gene, we fitted an elastic net regression using the glmnet package we only included genes whose prediction performance was greater than 0.01 and 474 had a non-negative correlation coefficient, as these genes were considered well 475 predicted. 476 We tested the prediction performance of our elastic net model trained in nu-477 cleus accumbens core in an independent rat reference transcriptome set. We Whitney test as implemented in R to discern whether the distribution of the p-503 values for genes in humans was the same for the genes that were and were not 504 nominally significant in rats.

505
Calculating PTRS weights in the UK Biobank 506 We calculated human-derived height PTRS weights using elastic net with a mixing parameter of 0.5, as described in Liang et al. [2022]. We predicted expression levels in 356,476 UK Biobank unrelated participants of European descent using whole blood prediction models trained in GTEx. We used the prediction models trained with UTMOST based on grouped lasso, which borrows information across tissues to improve prediction performance [ To calculate human-derived height PTRS for body length in the target rats, we 517 used the predicted gene expression matrix calculated for the association stage.

518
For each rat, we multiplied the predicted expression with the corresponding human-519 derived weight for that gene. The aggregated effects of these weighted genes are 520 summarized in a single score, PTRS: We generated 37 PTRS models for height for a range of regularization param- Quantifying PTRS prediction performance 530 We calculated the Pearson correlation ( ) coefficient between height PTRS the 531 and analogous observed phenotype in rats. To facilitate comparison with pre-532 vious papers, we report partial̃ 2 . In rats, body length had alrady been been 533 adjusted for covariates,̃ 2 is equivalent to 2 . We verified that using Spearman 534 correlation did not change the substance of the results (data not shown). We refer to heritability (ℎ 2 , cis-heritability within 1Mb) as the proportion of variance explained (PVE). Across all brain tissues tested, heritability estimates were significantly correlated ( = [0.58 − 0.83], < 2.20 × 10 −16 ). Figure S2. Heritability of gene expression was correlated between rats and humans. We found a significant correlation ( = 0.07, = 4.34 × 10 −12 ) between heritability estimates in rats and humans. Confidence intervals are represented as gray bars. The gray line represents the null distribution. Figure S3. Prediction was greater in rat tissues than that in human GTEx tissues.
The number of predicted genes across all five rat tissues was greater than those in GTEx human tissues with similar sample size. To ensure fair comparison, we included the same subset of genes that were orthologous across all tested tissues. Nucleus Accumbens Core (NAcc) Infralimbic Cortex (IL) Lateral Habenula (LHb) Prelimibic Cortex (PL) Orbitofrontal Cortex (OFC) Figure S4. Tissue analysis revealed substantial enrichment in multiple relevant tissues, including heart, pancreas, muscle, liver, and central nervous system. Significantly enriched sets ( < 0.05) are highlighted in red.

Figure S5. Correlation between observed body length vs height PTRS.
Correlation between human-derived height PTRS and observed body length in rats for the 37 regularization parameters used in building the PTRS. Strikingly, human-derived height PTRS significantly predicted body length in rats; that is, the correlation between PTRS and observed rat body length was significant for all the elastic net regularization parameters that included at least 27 genes (maximum = 0.08, = 8.57 × 10 −6 ). (a) Distribution of correlation between weight-permuted height PTRS and observed body length in rats. All 37 model weights were permuted and the best performing model for each simulation was selected. Within each of the 1000 simulations, the permutation of weights across genes were consistent for all 37 models, mimicking the set of actual PTRS weights.
(b) Distribution of correlation between sign-flipped height PTRS and observed body length in rats. All 37 model weights were permuted and the best performing model for each simulation was selected. Within each of the 1,000 simulations, the permutation of weights across genes were consistent for all 37 models, mimicking the set of actual PTRS weights. Figure S7. Human derived PTRS weights did not predict observed fasting glucose levels in rats. Human-dervied height PTRS in rats was not correlated with observed fasting glucose levels in the target rat set ( = 0.008, = 7.09 × 10 −1 ), which served as a negative control.