Longevity and rejuvenation effects of cell reprogramming are decoupled from loss of somatic identity

Partial somatic cell reprogramming has been touted as a promising rejuvenation strategy. However, its association with mechanisms of aging and longevity at the molecular level remains unclear. We identified a robust transcriptomic signature of reprogramming in mouse and human cells that revealed co-regulation of genes associated with reprogramming and response to lifespan-extending interventions, including those related to DNA repair and inflammation. We found that age-related gene expression changes were reversed during reprogramming, as confirmed by transcriptomic aging clocks. The longevity and rejuvenation effects induced by reprogramming in the transcriptome were mainly independent of pluripotency gain. Decoupling of these processes allowed predicting interventions mimicking reprogramming-induced rejuvenation (RIR) without affecting somatic cell identity, including an anti-inflammatory compound osthol, ATG5 overexpression, and C6ORF223 knockout. Overall, we revealed specific molecular mechanisms associated with RIR at the gene expression level and developed tools for discovering interventions that support the rejuvenation effect of reprogramming without posing the risk of neoplasia.

Aging is associated with the buildup of molecular damage and a gradual loss of function, culmi-23 nating in chronic age-related diseases and ultimately death (1). Searching for safe and efficient 24 interventions that can slow down or partially reverse the aging process is a major challenge in 25 the aging field (2-6). In this regard, reprogramming of somatic cells into induced pluripotent 26 stem cells (iPSCs) has been proposed as a candidate longevity intervention due to its potential 27 to rejuvenate cells in a targeted way (7, 8). 28 Pluripotency can be achieved in vitro by the ectopic expression of four transcription factors: 29 nature, clustered together, pointing to the general similarity of this process across species. not significantly enriched in any functional terms, supporting high heterogeneity of RIR across 325 tissues. 326 To compare the rejuvenating trajectories of human and mouse cells subjected to OSKM treat-327 ment, we aggregated normalized tAge values across the datasets for each species and applied a 328 moving average smoothing approach (Fig. 3D). We observed a rapid decrease of transcriptomic 329 age of murine cells following the shape of exponential decay. On the other hand, rejuvena-330 tion of human cells followed a sigmoid curve. Although the transcriptomic age of cells from 331 both species was close to zero at the end of reprogramming, the RIR of human cells required 332 more time. Remarkably, this time difference was consistent with the duration of the repro-333 gramming process, lasting, on average, for 14 and 30 days for mouse and human cell lines, 334 respectively (36). 335 Reprogramming signature uncovers new geroprotective interventions 336 To identify treatments that induce reprogramming-associated rejuvenation at the gene expres-  These examples confirm that rejuvenation and loss of somatic identity associated with repro-step towards decomposing rejuvenation and pluripotency vectors that may lead to the safe and efficient reprogramming-induced rejuvenation. 496 Methods 497 Data collection 498 We collected publicly available cell reprogramming datasets containing more than three time 499 points across the reprogramming process (Table S1). ESC and iPSC states were excluded since 500 they were not corresponding to any particular time point of reprogramming. We used only 501 preprocessed data (e.g., read counts) provided in the GEO database by datasets' contributors. 502 Data preprocessing 503 To aggregate multiple datasets into a joint signature, we utilized an approach as in our earlier 504 work (27). It consists of several steps (Fig. 1A). First, each dataset was normalized using a 505 conventional normalization technique appropriate for the given data type. RLE normalization 506 followed by log transformation was applied for RNA-seq data. Log transformation of intensi-507 ties followed by scaling and quantile normalization was used for microarray data. Second, for 508 each gene changing its expression value with time, a linear regression model was constructed 509 using the limma package (68). Third, slope coefficients, their standard errors and related statis-  Deming regression was calculated 10 times with random initial sets of normalization coeffi-543 cients, and final coefficients were chosen from the run with the smallest cumulative regression 544 error. Among these 10 runs, the error minimum was the same for most runs, indicating that the 545 global minimum was achieved for each signature. 546 Then we used the rma.mv function in the metafor package (69) to construct intercept-only 547 multilevel mixed-effects model with nested random effects (71). As a response variable, we 548 used Deming-normalized slopes derived for each dataset. Since datasets originated from diverse 549 sources, we had to account for their heterogeneity across different experiments (i.e., different 550 GSE IDs) and within the same experiment (i.e., the same GSE ID), implying the multilevel 551 embedded structure of the model. Fixed effects were not considered within this model. The 552 final model can be described with the following formula: 553ŝ ij = µ + ζ (inGSE)ij + ζ (bwGSE)j + ϵ ij (1) whereŝ ij is an estimate of the true effect size s ij ; µ is an actual mean of the slopes' distribution; 554 term ij denotes that some effect size i is nested in cluster j; ζ (inGSE)ij is random term cor-555 responding to a within-GSE-ID heterogeneity; ζ (bwGSE)j is a random term corresponding to a 556 between-GSE-ID heterogeneity; ϵ ij is a sampling error of individual datasets, which can be es- To estimate the rejuvenation effect of a specific gene in a particular dataset, the following 594

Aggregated analysis of rejuvenation-inducing interventions based on CMAP
To identify treatments mimicking RIR at the gene expression level, we used CMAP query API 607 (51). As a query, genes upregulated in combined reprogramming signature but downregulated in 608 combined aging signature ('Up' subset), and genes downregulated in combined reprogramming 609 signature but upregulated in combined aging signature ('Down' subset) were used. We will 610 refer to these gene subsets as the RIR gene set.

611
The result of a CMAP query is essentially a list of perturbagens ordered by the score of as- scriptomic clocks to the gene expression profiles induced by these treatments as well as control samples. Specifically, we obtained quantile normalized data from the CMAP level 3 data pre-631 processing step. We downloaded treatment data and corresponding control data for a given 632 unique intervention-dosage-duration group indicator. After an additional data normalization 633 procedure (see "Prediction of transcriptomic age" section for details), we applied the mouse 634 and human transcriptomic clocks to the gene expression vectors. The obtained relative age 635 values were aggregated with a linear model of the following form: where Age is the relative tAge value, Cell Line is the name of corresponding cell line from 637 CMAP (factor variable), and T reatment is the binary variable, which indicates whether the 638 given relative tAge value is from the control or treatment subset. We fitted this model using  Among the identified interventions, we searched for those not inducing the expression of pluripotency-649 related genes. First, we obtained differential gene expression data from the CMAP level 5 data 650 preprocessing step for each of our top hits. Then, we performed gene set enrichment analysis 651 (GSEA) using fGSEA package (75) testing if the gene expression response induced by a cer-652 tain treatment is enriched for the set of pluripotency-associated genes obtained from (64). The calculated Normalized Enrichment Scores (NES) were then aggregated using simple averaging.

654
The statistical significance of enrichment across cell lines was assessed using t-test with the null 655 hypothesis that the mean of NES across cell types is equal to zero.     Table S1). Cells are colored based on Spearman's correlation coefficient. (C) Expression trajectory of top five upregulated and downregulated genes with the lowest BH-adjusted p-value according to the mouse reprogramming signature. Upregulated and downregulated genes are shown in red and blue, respectively. (D) Volcano plot of meta-slope values extracted from the signature and corresponding BH-adjusted p-values. Each dot represents a single gene. Pluripotency markers are highlighted in green. Significantly upregulated and downregulated genes are shown in red and blue, respectively. The horizontal dashed line represents the significance cut-off (BH adjusted p-value < 0.001), while vertical lines represent meta-slope cut-offs (|logFC| > 2.5).

Spearman correlation
(E) The overlap of significantly upregulated and downregulated genes between murine and human signatures. Only uniquely mapped orthologs according to Ensembl were considered for analysis. Numbers within cells demonstrate the observed numbers of overlapping orthologous genes, while color represents the difference between observed and expected number of genes in the corresponding cell. The p-value is calculated using Fisher's exact test.  thors.

888
Competing Interests The authors declare that they have no competing financial interests.

889
Data and materials availability: Additional data and materials are available online. Tables. S1 to S3 899