Identify Non-Mutational p53 Functional Deficiency in Human Cancers

An accurate assessment of TP53’s functional status is critical for cancer genomic medicine. However, there is a significant challenge in identifying tumors with non-mutational p53 inactivations that are not detectable through DNA sequencing. These undetected cases are often misclassified as p53-normal, leading to inaccurate prognosis and downstream association analyses. To address this issue, we build the support vector machine (SVM) models to systematically reassess p53’s functional status in TP53 wild-type (TP53WT) tumors from multiple TCGA cohorts. Cross-validation demonstrates the excellent performance of the SVM models with a mean AUC of 0.9822, precision of 0.9747, and recall of 0.9784. Our study reveals that a significant proportion (87-99%) of TP53WT tumors actually have compromised p53 function. Additional analyses uncovered that these genetically intact but functionally impaired (termed as predictively reduced function of p53 or TP53WT-pRF) tumors exhibit genomic and pathophysiologic features akin to p53 mutant tumors: heightened genomic instability and elevated levels of hypoxia. Clinically, patients with TP53WT-pRF tumors experience significantly shortened overall survival or progression-free survival compared to those with TP53WT-pN (predictive normal function of p53) tumors, and these patients also display increased sensitivity to platinum-based chemotherapy and radiation therapy.

Tumor suppressor p53, encoded by the TP53 gene, is a transcription factor that plays critical roles in 10 preventing tumorigenesis and tumor progression by inducing cell cycle arrest, cell death (apoptosis, 11 ferroptosis, necrosis, and autophagy), senescence, and DNA damage repair (Liu & Gu, 2022; 12 Mantovani, Collavin, & Del Sal, 2019; Vousden & Lane, 2007). TP53 is one of the most extensively 13 studied genes, and its transcriptional targets have been well characterized, with many have been 14 experimentally validated (Fischer, 2017). Meanwhile, it stands out as the most frequently mutated 15 tumor suppressor gene that undergoes genetic inactivation in 20 different types of cancer (Jia & Zhao,16 2019). Somatic mutation frequencies of TP53 exceed 50% in ovarian, esophageal, pancreatic, lung, 17 colorectal, uterine, head and neck, oral (gingivobuccal), soft tissue (leiomyosarcoma), gastric and 18 biliary tract cancers (ICGC: https://dcc.icgc.org/genes/ENSG00000141510/mutations). The functional 19 activities of p53 have been frequently associated with the response to chemo-and radiation therapy  However, it is crucial to acknowledge that genetic alteration is not the sole mechanism responsible for 2 disrupting or activating protein function. There is a growing recognition that p53 function can be . These molecular mechanisms are widespread, but their 13 heterogeneous nature presents a formidable challenge in characterizing them. Thus, a comprehensive 14 assessment that goes beyond TP53 mutation is necessary for a more comprehensive understanding of 15 p53's functional status in tumors. 16 17 In this study, our objective is to re-assess functional status of p53 in tumors with wild-type TP53. We 18 hypothesized that both mutational and non-mutational inactivation of p53 could be reflected by the 19 altered expression of genes regulated by p53. Toward this end, we first defined cancer type-specific 20 gene sets (encompassing both direct and indirect targets of p53), which can serve as indicators of p53's 21 functional status. This was accomplished through a systematic literature review and differential 22 expression analysis (DEA) of RNA-seq data. We then calculated the composite expression score (CES) 23 derived from these genes using various algorithms, including GSVA (gene set variation analysis) trained SVM models using CESs from non-cancerous normal tissues (referred to as the "NT" group, 3 assumed to have normal p53 function) and tumor samples harboring TP53 truncating mutations 4 (referred to as the "TM" group, assumed to have lost or reduced p53's tumor suppressor function). 5 These SVM models were applied to tumors with TP53 wildtype (referred to as the "WT" group) and 6 tumors with missense mutations (referred to as the "MM" group). 7 8 We evaluated the performance of our SVM models using a testing dataset and compared genome and 9 chromosome instability measurements to examine the prediction results. As anticipated, nearly all 10 samples with missense mutations were predicted to have reduced function (pRF). Interestingly, a 11 significant proportion of TP53 WT tumors were also predicted to have reduced p53 function (termed 12 TP53 WT -pRF). This suggests a functional deficiency in p53 despite the absence of genetic mutations. 13 Further analyses revealed that TP53 WT -pRF tumors exhibit distinct genomic characteristics compared 14 to other TP53 WT tumors predicted to be normal (termed TP53 WT -pN). TP53 WT -pRF tumors displayed 15 significantly increased tumor mutation burden (TMB), copy number variation burden, aneuploidy 16 score, and hypoxia score, consistent with p53's role as a regulator of DNA damage repair and the 17 "guardian of the genome". Based on our analyses, it has been determined that around 22-25% of 18 TP53 WT -pRF can be attributed to either false-negative results (where TP53 mutations are identified 19 through RNA-seq data, but go undetected in the WES assay) or amplification of MDM2 and MDM4. 20 21 Clinically, TP53 WT -pRF patients show significantly reduced overall and disease-free survival rates 22 compared to TP53 WT -pN patients. Additionally, our data demonstrate that TP53 WT -pRF tumors 23 exhibited significantly heightened sensitivities to radio-and chemotherapy, offering valuable insights 24 for personalized medicine. The increased sensitivity of TP53 WT -pRF tumors to radiation therapy is 25 further supported by the patient-derived xenografts models (PDX) of glioblastomas.

2
Identification of cancer-specific gene sets reflecting p53's functional status 3 We identified a set of p53-regulated genes whose expression levels are statistically associated with 4 p53 truncation for each cancer type, enabling their utilization as features for predicting the functional 5 status of p53 status in these cancers. First, we compiled a list of 147 highly reliable target genes, 6 comprising both direct and indirect targets, from a comprehensive study conducted by Fischer (Fischer,7 2017). Then, we examined the expression profiles of these 147 genes using RNA-seq from the GTEx 8 datasets, while also assessing p53 bindings using the ReMap2020 ChIP-seq database (Cheneby et al., 9 2020) (Figure 1 a). These genes were found to be reliably expressed across various normal tissues, 10 and their promoters displayed evidence of p53 binding (Figure 1-Source Data 1). To further validate 11 these p53-regulated genes, we reanalyzed RNA-seq and ChIP-seq data obtained from MCF-7 breast 12 cancer cells that possess wild-type TP53, both before and after p53 activation. Expectedly, minimal or 13 weak p53 binding was observed at the promoters of p53-regulated genes (such as ATF3, BTG2) prior 14 to p53 activation induced by gamma irradiation. However, upon p53 activation, significant p53 15 bindings were observed, accompanied by consistent upregulation in the expression levels of these 16 genes (Hafner et al., 2017) (Figure 1-figure supplement 1). 17 18 The transcriptional programs regulated by p53 vary by tissue ( expression analyses between NT and TM groups. This analysis was performed using TCGA RNA-seq 22 data for the previously selected 147 genes in each cancer type (Figure 1 a). As a result, we identified 23 a set of genes that were down-regulated (referred to as p53tru-DR genes) and a set of genes that were 1 up-regulated (referred to as p53tru-UR genes) in the TM group for each specific cancer type. It should 2 be noted that p53tru-UR genes primarily represent genes whose expression is negatively regulated by 3 normal p53, and the underlying mechanisms are likely indirect and not yet fully understood (Fischer,4 2017). Finally, we identified 47, 55, 36, and 50 p53 regulated genes for BRCA, LUNG, ESCA, and 5 COAD, respectively. Among them, 16 genes (ABCB1, ANLN, BIRC5, CCNB1, CCNB2, CDC20, 6 CDC25C, CDK1, CKS2, ECT2, FAM13C, NEK2, PCNA, PLK1, PMAIP1, PRC1) were shared across 7 all four cancer types, and it is likely that 14 of these genes are indirectly regulated by p53. Moreover, 8 these 16 genes demonstrated significant enrichment in well-known p53-associated pathways, 9 including cell cycle control and DNA damage response (Figure 1-figure supplement 2). 10 Composite expression of p53tru-DR and p53tru-UR genes 11 We analyzed the expression profiles of p53tru-DR and p53tru-UR genes across four distinct groups: 12 NT, WT, MM and TM. As anticipated, p53tru-DR and p53tru-UR genes exhibited the opposite trends; 13 the expressions of p53tru-DR genes were significantly decreased in the TM group, whereas the 14 expression of p53tru-UR genes were significantly increased in the TM group, consistent with the 15 compromised p53 status in this group (Figure 1 b-d). Interestingly, similar expression patterns of these 16 p53-regulated genes were observed between the MM and TM groups (Figure 1 b-d). This suggests 17 impaired p53 function in the MM samples, and indicates that missense/in-frame mutations and 18 truncating mutations have a comparable impact on p53's cellular activity. This observation aligns with 19 the fact that most missense mutations occur within the DNA binding domain, leading to the disruption 20 of p53's ability to bind to DNA and transactivate its downstream targets (Kato et al., 2003). 21 22 In the LUNG and BRCA cohorts, the WT group displayed an intermediate state between the NT and 23 TM groups, suggesting many TP53 WT tumors have compromised p53 function despite their lack of 24 genetic mutation (Figure 1 b-c). In the COAD cohort, the expression profile of the TP53 WT tumors 1 was almost indistinguishable from those of the mutant groups (MM + TM) (Figure 1 d). These findings 2 collectively indicated a reduced tumor suppressor function of p53 in TP53 WT samples. 3 Training and evaluating SVM model based on the expression of p53-regulated genes 4 Our hypothesis was the inactivation of p53 could be predicted by examining the altered expression of 5 its transcriptional targets, including both directly and indirectly regulated genes, regardless of 6 mutational status of TP53 gene. To test this hypothesis, we calculated the composite expression score CES not only provides a combined and stable measure of p53 activity but also reduces the 11 dimensionality of the expression data and helps mitigate potential overfitting of the SVM model. 12 Using the LUNG dataset as an example, we observed significant inverse correlations between CESs 13 calculated from p53tru-DR and p53tru-UR genes. The Pearson's correlation coefficients were -0.996, 14 -0.998, and -0.749 for GSVA, ssGSEA, and combined Z-score respectively (Figure 2- figure   15 supplement 1 a-c). These results indicated that both p53tru-DR and p53tru-UR genes could effectively 16 reflect p53's functional status. In addition, we observed significant positive correlations (Pearson's 17 correlation coefficients ranged from 0.874 to 0.996) among CESs calculated by different algorithms, 18 indicating a high level of concordance among these algorithms (Figure 2-figure supplement 1 d-i). 19 When comparing the CESs across the NT, WT, MM, and TM groups, we found CESs of the NT 20 samples tightly congregated within a narrow range. In contrast, CESs of the tumor samples (including 21 WT, MM, and TM groups) showed a higher degree of dispersion, indicating the increased 22 heterogeneity in p53 activity among tumor samples (Figure 2-figure supplement 2). Consistent with 23 the gene-level expression data (Figure 1 b), CESs of WT samples were intermediate between those of 24 NT and TM groups, while the CESs of the MM samples resembled those of the TM samples ( Figure   1 2 a-j). These results further support the notion that the majority of WT samples exhibit compromised 2 p53 function. 3 4 Next, we proceeded to train SVM models using seven CESs (GSVA, ssGSEA, and Z-score for p53tru-5 DR and p53tru-UR genes respectively, and PCA for all genes) with the purpose of predicting p53's 6 functional status in TP53 WT and TP53 MM tumors. To achieve the best performance, these models were 7 independently trained and validated for the four TCGA cancer types including LUNG, BRCA, COAD, 8 and ESCA. To achieve an unbiased validation, we adopted the following strategy. Initially, we 9 randomly partitioned 75% of the NT and TM samples as the training set, while the remaining 25% 10 were reserved as the testing set. Subsequently, within the training set, we identified the p53tru-UR and 11 p53tru-DR genes and constructed the SVM models. The performance of these models was evaluated 12 using the testing set, which notably excluded samples used for feature gene identification and SVM 13 model construction. This validation process, termed hold-out validation, was iterated ten times, as 14 visually depicted in Figure 1a. As exemplified by the LUNG cohort, the mean precision, recall, F1-  Figure   23 2-Source Data 2). However, it should be noted that building a pan-caner SMV model is a 24 compromise when individual cancer types lack sufficient samples to train their own SVM models. 25 Furthermore, the pan-cancer SVM model may be influenced by a few cancer types with exceptionally 1 larger sample sizes, potentially introducing bias. 2 Predicting p53 status using SVM models 3 We examined the predictive power of our SVM models by applying them to TP53 WT and TP53 MM 4 tumors, which were deliberately excluded from the training and testing phases. To avoid potential 5 confusion caused by the term "p53 loss of function (LoF)", which typically refers to p53 dysfunction 6 due to mutations, we designated samples predicted to be similar to TP53 TM samples as "reduced  To assess the prognostic value of the SVM prediction, we analyzed the LUNG cohort by dividing 16 TP53 WT tumors into two subgroups: TP53 WT -pRF, representing samples predicted to be have reduced  pN (n = 20) and TP53 WT -pRF (n = 310) were 77.0% and 30.2%, respectively. As reference points, the 20 10-year OS rates for TP53 MM (n = 419) and TP53 TM (n = 254) were 24.3% and 16.7%, respectively 21 (Figure 3 b). After adjusting for demographic variables such as sex, age, and smoking status using 22 Cox regression, lung cancer patients with TP53 WT -pRF tumors exhibited a significant 61% decrease 23 in survival compared to those with TP53 WT -pN tumors (P = 0.016). In contrast, the overall survival of 24 patients with TP53 WT -pRF tumors did not differ significantly from that of patients with TP53 MM or 1 TP53 TM tumors. In the BRCA cohort, 84% (71 out of 85) of the TP53 WT -pN tumors are classified as 2 Luminal-A subtype, but we did not observe a significant difference in progression-free survival (PFS) 3 of patients with these tumors. The remaining 15% (13 out of 85) of TP53 WT -pN cases are normal-like 4 breast tumors. Within this subgroup, the TP53 WT -pRF patients exhibited significantly reduced 5 prognosis compared to the TP53 WT -pN group (P = 0.017, log-rank test) (Figure 3- figure   6 supplement 2 f). Normal-like tumors were generally considered as "artifacts" resulting from a high 7 percentage of normal specimens or slow-growing basal-like tumors (Parker et al., 2009). However, 8 our data suggested that the PFS is significantly reduced when p53 function is compromised, supporting 9 the classification of normal-like tumors as a distinct subtype rather than mere normal tissue 10 contamination. In summary, in both lung and breast (normal-like) cancers, TP53 WT -pRF tumors exhibit 11 a significantly worse prognosis compared to the TP53 WT -pN tumors. 12 13 Considering the crucial role of p53 in DNA damage repair (Lane, 1992; Levine, 1997), we 14 hypothesized that p53-defective tumors would accumulate more DNA damage compared to p53 15 normal tumors. To investigate this hypothesis, we compared measurements of genome and 16 chromosome instability, including TMB, copy number variation burden, and aneuploidy score, 17 between the TP53 WT -pRF and TP53 WT -pN groups. The analysis revealed significantly higher levels of exhibited increased genomic instability, worse prognosis, and higher Buffa hypoxia score, resembling 1 the p53 mutant tumors. These data not only functionally reaffirm our SVM predictions but also 2 highlight the necessity to reconsider the determination of TP53 status based solely on DNA sequencing. 3 Increased sensitivity of TP53 WT -pRF tumors to chemo-and radiation therapy 4 The impact of mutant p53 on the response to chemo-and radiation therapy remains controversial. 5 Several studies have associated p53 mutations with reduced sensitivity to these therapies (Gurtner et  When comparing chemo-sensitivity across groups with different p53 statuses in the LUNG cohort, we 1 found that tumors in the NT group exhibited the highest RPS, while tumors in the TP53 MM and TP53 TM 2 groups displayed the lowest RPS. Expectedly, the RPS scores of TP53 WT -pN tumors were close to 3 those of the NT group and were significantly higher than the TP53 WT -pRF tumors (P = 2.8×10 -9 , two-4 sided Wilcoxon test). Similar results were observed in the BRCA cohort. Notably, four BRCA TP53 MM 5 tumors that were predicted to be p53 normal also had significantly higher RPS scores than the 6 remaining TP53 MM samples (P = 0.029, two-sided Wilcoxon test) (Figure 4 a-b). The lower RPS score 7 was linked to increased mutagenesis, adverse clinical features, and inferior patient survival rates, but    16 We assessed radiosensitivity by dividing the RSS gene signature into positive (i.e., genes positively 17 correlated with radiosensitivity) and negative (genes negatively correlated with radiosensitivity) 18 subsets. Our analysis focused on the TCGA BRCA cohort since the RSS signature was derived from 19 breast cancer. When comparing TP53 WT -pN tumors with TP53 WT -pRF tumors using positive genes, 20 we found a significant increase in RSS scores for TP53 WT -pRF tumors (Figure 4 c), indicating 21 heightened radiosensitivity. Meanwhile, when measured using negative genes, TP53 WT -pRF tumors 22 showed significantly decreased RSS scores, also suggesting increased radiosensitivity (Figure 4 d). 23 On the other hand, TP53 MM -pN tumors displayed significantly reduced radiosensitivity compared to 24 TP53 MM -pRF tumors (Figure 4 c-d).

1
Although there is an overlap of one gene (RAD51) from the RPS signature and four genes from the 2 RSS signature with the p53tru-DR and p53tru-UR genes for LUNG or BRCA (Figure 4- figure   3 supplement 1 a), we obtained consistent results even after excluding these overlapped genes ( Figure   4 4- figure supplement 1 b-e). These findings collectively indicate that tumors predicted to have 5 reduced p53 function, similar to p53 mutant tumors, show increased sensitivity to platinum-based 6 chemotherapy and radiation therapy. This could be attributed to compromised DNA damage repair due 7 to reduced p53 function, making tumor cells more susceptible to the effects of chemo and radiation 8 therapies. 9 10 We further confirmed this finding in vivo using 35 PDX models. We first defined 17 p53tru-DR and 11 19 p53tru-UR genes from the TCGA GBM cohort (Figure 1-Source Data 1). Due to limited sample Data 2). To determine the RT responsiveness, we calculated the ratio of median survival days between 16 the RT group and the placebo group, and PDX models with ratios >= 1.52 were considered as RT- Out of 20 PDXs that were predicted to be RF, 15 18 responded to RT. In contrast, among the 15 PDXs predicted to be p53 normal, only 4 were responders. 19 These results indicate a significant over-representation of RT responders within the pRF group 20 (Fisher's exact test, P = 0.0068) (Figure 4 f and g). Although TP53 somatic mutations (determined 21 from WES) were enriched in the pRF group (Figure 4 f), the association between TP53 genetic 22 mutation and RT response did not reach statistical significance (Fisher's exact test, P = 0.13). In 23 summary, these findings suggested that our in silico predictions effectively uncovered a significant 24 association between p53 status and RT response, which would have been overlooked if solely 1 examining the TP53 genetic status. 2 Dissecting predicted TP53 WT -pRF 3 Next, we sought to explore the potential factors and mechanisms that could account for the majority 4 of TP53 WT tumors being predicted as RF by the SVM model. We investigated the RNA and protein 5 expression, re-evaluated all the TP53 missense mutations (reported from WES) using RNA-seq data, 6 and examined the alteration status of p53 upstream regulators MDM2 and MDM4. We did not detect 7 significant changes in p53 protein abundance between TP53 WT -pN and TP53 WT -pRF tumors. that the impaired p53 function observed in the TP53 WT -pRF group is unlikely to be attributed to 12 reduced TP53 mRNA and protein abundances. 13 14 We then reassessed all the TP53 missense mutations, such as R249 and R273, in the TP53 WT -pRF Notably, these two mutations were also 22 detectable in the WES data, albeit with much lower numbers of supporting reads (2 and 5 reads, 23 respectively). This discrepancy explains why they were not identified by the TCGA somatic variant 24 caller ( Figure 5-figure supplement 2). The substantial increase in MAFs observed in the RNA-seq 1 data is probably due to the preferential expression of the mutant alleles. Interestingly, when comparing 2 the amino acid locations of missense mutations reported from WES data, these mutations identified 3 through RNA-seq data were significantly enriched (P = 1.05×10 -6 , two-sided Fisher exact test) at the 4 p53 R249 position in both lung and breast cancers ( Figure 5 a and b). Missense mutation at the R249 5 position is generally recognized as a structural mutation that destabilize the p53 protein due to 6 alteration in its 3D structure (Joerger & Fersht, 2016). Furthermore, the mutant tumors rescued from 7 the RNA-seq data exhibited similar genomic characteristics (tumor mutation burden, copy number   (Figure 5 d). 20

21
In this study, we employed a novel approach to measure p53 activity by defining p53-regulated genes 22 and calculating their CESs as a surrogate. By training and cross-validating SVM models using CES 23 data from the NT (p53-normal) and TM groups (p53-RF), we demonstrated the accuracy and 24 effectiveness of our in silico approach. Our comprehensive analysis revealed the widespread 1 occurrence of non-mutational p53 inactivation in human malignancies. Moreover, our analyses 2 unveiled that the predicted TP53 WT -pRF tumors exhibited a comparable level of genomic instability 3 to those harboring genetic TP53 mutation. This included a significant increase in the number of 4 mutations, copy number alterations, and aneuploidy. Importantly, patients with TP53 WT -pRF tumors 5 showed considerably worse overall survival rates when compared to those with TP53 WT -pN tumors, 6 highlighting the prognostic value of our prediction. Furthermore, when evaluated using clinically  (Figure 1-figure supplement 2 b). Second, we employed SVM models trained on normal tissues 2 (NT samples) and tumor samples with truncating TP53 mutations (TM samples) that result in 3 shortened protein. Our hypothesis posits that p53's tumor suppressor function remains normal in NT 4 samples, while it is compromised or significantly reduced in TM samples. However, it is essential to 5 recognize that our training dataset is not without imperfections, given that the p53 statuses of these 6 samples is inferred rather than definitively determined. To be more precise, the predicted outcomes 7 should be termed NT-like or TM-like. Looking ahead, datasets that encompass paired RNA-and DNA-8 sequencing information, coupled with clearly defined p53 statuses are needed to overcome the existing 9 limitations and gain more conclusive insights. Meanwhile, TP53 missense mutations that were 10 predicted to be functionally reduced (TP53 MM -pRF) should be interpreted as the "reduced tumor 11 suppressor function of p53". Nevertheless, we could not rule out the possibility that the mutant p53 12 protein in the TP53 MM -pRF samples has gained oncogenic functions. Additionally, although p53tru- 13 UR genes are negatively regulated by p53 through indirect and unclear mechanism, we retained these 14 genes for downstream analyses since they can effectively reflect p53's functional status as p53tru-DR 15 genes do. Our strategy allows us to identify tumors with compromised p53 function resulting from 16 multiple mechanisms, and it is plausible that some cases predicted to have reduced p53 function may 17 genuinely represent a loss of function (especially those in the MM group), while others may indicate 18 dysregulation. 19 20 According to our analyses, approximately 22-25% of TP53 WT tumors predicted as RF (TP53 WT -pRF) 21 can be attributed to false-negative results (i.e., TP53 mutations that failed to be detected from the WES 22 assay) or amplification of MDM2 and MDM4. However, the underlining mechanisms for the remaining 23 75-78% of TP53 WT -pRF are unknown and warrant further investigation. In order to gain more insights, 24 we analyzed the DNA methylation data generated from the Infinium HumanMethylation450 BeadChip 25 array. However, we did not observe significant differences in the DNA methylation patterns among all 1 nine CpG sites within the TP53 gene (cg02087342, cg06365412, cg10792831, cg12041075, 2 cg12041429, cg13468400, cg16397722, cg18198734, cg22949073) when comparing TP53 WT -pRF 3 and TP53 WT -pN tumors. This suggests that p53 function may be compromised through other non- ). Unfortunately, due to the unavailability of matched data, we were unable to perform specific 9 analyses in these areas. 10 11 In our analysis, we found no significant differences in tumor purities (measured by IHC (Aran, Sirota,

13
Data collection 14 We downloaded somatic mutation data for TP53 from the LUNG (LUSC + LUAD), BRCA, COAD, 15 ESCA, BLCA, HNSC, LIHC, STAD, and UCES datasets from cBioPortal 16 (https://www.cbioportal.org/). We also obtained from cBioPortal the pre-calculated meta scores 17 including the fraction of genome altered (FGA), mutation count, aneuploidy score, and Buffa hypoxia 18 score. To re-evaluate the p53 mutation status, TCGA WES and RNA-seq BAM files for BRCA and 19 LUNG were downloaded from the GDC data portal (https://portal.gdc.cancer.gov/). Furthermore, we 20 acquired TCGA level-3 RNA-seq data, demographic information, and survival data for selected 21 cancers from the University of California Santa Cruz's Xena web server (https://xena.ucsc.edu/). To 22 identify differentially expressed genes between the NT and TM groups, we used pre-calculated raw 23 RNA-seq read counts. Log2-transformed FPKM (i.e., Fragments Per Kilobase of exon per Million 1 mapped fragments) was used to calculate CESs, and TPM (i.e., Transcript Per Million) was used to 2 calculate GSVA to evaluate chemo-and radiotherapy sensitivities. ChIP-seq data (p53 binding peaks) 3 were downloaded from the ReMap 2020 database (Cheneby et al., 2020) (https://remap.univ-amu.fr/). 4 Gene expression data (TPM) of normal tissues were downloaded from the GTEx (release V8) data 5 portal (https://gtexportal.org/home/). 6 Identification of p53-regulated genes across different cancer types 7 To identify p53-regulated genes that can reflect p53's functional status in each cancer type, we 8 employed a stepwise approach. First, we compiled a set of 147 genes that have been experimentally 9 validated as being regulated by p53. This set includes the 116 genes identified as directly activated 10 targets of p53 (refer to Table 1 in (Fischer, 2017)) and 31 genes that are primarily repressed by p53 11 and with the evidence of indirect regulation (refer to Table 2 in (Fischer, 2017)). We then analyzed the 12 publicly available ChIP-seq data from ReMap2020 (Cheneby et al., 2020) to identify p53 peak within 13 the gene body or the promotor region, which is defined as +/-3 kb around the transcription start site 14 (TSS). The basal level expression of p53 regulated genes in normal tissues was evaluated using the 15 RNA-seq data of GTEx (Consortium, 2015). Specifically, we first calculated the median TPM across 16 all samples in each tissue type and then calculated the mean of median TPMs across all tissue types. 17 If the mean of median TPMs of each p53 regulated gene was > 1 TPM, the gene was considered to be 18 reliably expressed in GTEx normal tissues and thus was included in our downstream analyses. 19 20 Since p53 might regulate a specific set of genes in different cancer types (Fischer, 2017), and to 21 determine genes whose expression could reflect p53's functional status in each cancer type, we 22 performed differential expression analysis for the 147 genes in each cancer by comparing the NT group 23 (considered to be p53 functionally normal) to the TM group (considered to be p53 functionally 24 impaired). We employed the R package DESeq2 to perform differential analyses on the raw RNA-seq 1 reads counts (Love, Huber, & Anders, 2014). The significance threshold was set at an adjusted p-value 2 ≤ 0.05 and |fold change| ≥ 2. By applying these criteria, we were able to identify p53-regulated genes 3 specific to each TCGA cancer type. We defined "p53tru-DR" genes as those showing downregulation 4 in p53-truncated samples compared to p53-normal samples. Conversely, "p53tru-UR" genes referred 5 to genes showing upregulation in p53-truncated samples. 6 7 It should be noted that this approach aims to identify genes whose expression alterations are 8 significantly associated with p53 functional status, encompassing both direct and indirect regulation. 9 Interestingly, we found that the indirectly regulated targets display a higher sensitivity to p53 10 truncation, with 25 among the 31 indirect targets demonstrating significant differential expression 11 upon p53 truncation in at least one cancer type. In comparison, 71 out of the 116 direct targets exhibit 12 significant differential expression (Figure 1-Source Data 1). Thus, from a computational point of 13 view, we incorporated both direct and indirect targets to construct cancer-type-specific SVM models. 15 The CES is a single enrichment score that provides intuitive and stable measurement of p53 activity 16 in each cancer. We employed four algorithms to calculate the CES, namely: (1) Gene Set Variation learn.org/stable/) was used to perform PCA analyses.

10
Training and evaluating the performance of SVM models 11 Since the prediction of p53 status is a binary classification, we chose to use the SVM model-a widely 12 used, supervised machine-learning method. SVM models with linear kernel were trained using the 13 CESs of NT (normal tissue, coded as "0") and TM (truncating mutations, coded as "1") groups.
14 GridSearch was employed to pick the proper C and gamma parameters for the SVM model. We used 15 TCGA data for both training and testing; no separate or external validation set was used. The number To better evaluate the performance of the SVM model of individual cancer types (e.g. LUNG SVM 2 model), our approach encompassed several steps. Firstly, we employed the "train_test_split" function 3 from the "sklearn.model_selection" class to partition the samples-comprising both NT and TM 4 samples-into a training set (constituting 75% of the data) and a separate testing set (comprising the 5 remaining 25%). Subsequently, we performed the differential analyses for feature gene selection and 6 trained the SVM models with samples exclusively from the training set. Finally, we used the testing 7 set to evaluate the performance. We repeated this process ten times to mitigate the potential variability 8 in the outcomes of cross-validation. In each iteration, a confusion matrix was made, and performance 9 measurements (i.e., sensitivity/recall, precision, accuracy) were calculated. The Pan-cancer model was  Scikit-learn (www.scikit-learn.org) Python package for SVM modeling, ten-time hold-out validation, 20 and five-fold cross validation. We used pandas (https://pandas.pydata.org/), numpy 21 (https://numpy.org/), and scipy (https://www.scipy.org/) for data processing and numerical analyses.  Python script. Samples with fewer than 10 mapped reads at a given genome site were excluded (n = 7 41), and a cut-off of minor allele frequency = 0.1 was employed to determine the genotype. A sample 8 was considered TP53 mutated if it had at least five high-quality reads (Phred-scaled sequencing quality 9 and mapping quality > 30) supporting the mutant allele. 32 and 57 TP53 WT samples met this criterion 10 in LUNG and BRCA samples, respectively. The Integrative Genomics Viewer (IGV) was used to 11 inspect the variants and visualize the alignments manually.

12
Investigation of two independent treatment-related signatures 13 Tumor chemotherapy sensitivity (i.e., RPS score) was estimated based on 4 genes involved in DNA

15
Tumor radiation sensitivity signature (RSS) scores in breast cancer were calculated based on 51-gene 16 panel reported by Speers et al. (Speers et al., 2015), and the genes were divided into "positive" and 17 "negative" groups according to the correlation between their expression level and radiation sensitivity. 18 The R package GSVA (Hanzelmann et al., 2013) was used to calculate the RPS and RSS scores. 19 Notably, according to the study of Pitroda et al., the original RPS was defined as the sum of the 20 expression levels times -1 after log2 transformation and Robust Multi-array Average (RMA)-21 normalization. In our study, RPS was represented by minus GSVA score calculated from TPM.  TGen Strexome V2 capture kits. Raw sequencing reads were aligned to both human (hg38) and mouse 11 (mm10) reference genomes to remove any reads potentially originated from the mouse tissue. Then, 12 filtered WES and RNA-seq data were analyzed using Mayo's MAP-RSeq and GenomeGPS workflow, 13 respectively. The raw sequencing data were submitted to the NCBI Sequence Read Archive with 14 accession numbers PRJNA543854 and PRJNA548556 for WES and RNA-seq, respectively.

18
Ethics approval and consent to participate 19 Not applicable 20 Consent for publication 21 Not applicable 22 Availability of data and materials 1 We downloaded somatic mutation data for TP53 from the LUNG (LUSC + LUAD), BRCA, COAD, 2 ESCA, BLCA, HNSC, LIHC, STAD, and UCES datasets on cBioPortal (https://www.cbioportal.org/) 3 along with pre-calculated meta scores including the fraction of genome altered (FGA), mutation count, 4 aneuploidy score, and Buffa hypoxia score. TCGA WES and RNA-seq BAM files for BRCA and 5 LUNG were downloaded from the GDC data portal (https://portal.gdc.cancer.gov/), which were used 6 to re-evaluate p53 mutation status. TCGA level-3 RNA-seq expression data of selected cancers and 7 the corresponding demographic and survival data were downloaded from the University of California 8 Santa Cruz's Xena web server (https://xena.ucsc.edu/). Pre-calculated raw RNA-seq read counts were 9 used to identify differentially expressed genes between the NT and TM groups, log2-transformed 10 FPKM (i.e., Fragments Per Kilobase of exon per Million mapped fragments) was used to calculate 11 CESs, and TPM (i.e., Transcript Per Million) was used to calculate GSVA to evaluate chemo-and 12 radiotherapy sensitivities. ChIP-seq data (p53 binding peaks) were downloaded from the ReMap 2020 13 database (https://remap.univ-amu.fr/). Gene expression data (TPM) of normal tissues were 14 downloaded from the GTEx (release V8) data portal (https://gtexportal.org/home/). All of the results 15 generated in this study are included as supplementary data sets (Source Data). 16 17 Python source code of our p53 status prediction method is available from 18 https://github.com/liguowang/epage. 19 Competing interests 20 The authors declare no competing interests.  The results shown in this study are in part based upon data generated by the TCGA Research Network 8 (https://www.cancer.gov/tcga). We thank TCGA's specimen donors and research groups that make 9 genomic data publicly available.   group and the placebo group. PDX models with ratios >= 1.52 were considered as RT-responsive. (f) A heatmap 8 showing the p53 functional status predicted by unsupervised clustering using composite expression (GSVA, ssGSEA, 9 and PC1) of GBM-specific p53tru-DR and p53tru-UR genes in PDX data. RT response and TP53 mutation status of 10 each sample are color coded on the right side of the heatmap. (g) A contingency table shows that pRF (predicted 11 reduced function) samples are significantly associated with RT responsiveness.  between CESs calculated from p53tru-DR and p53tru-UR genes for GSVA, ssGSEA, and Z-score.                 between CESs calculated from p53tru-DR and p53tru-UR genes for GSVA, ssGSEA, and Z-score. Significant 3 positive correlations between different CESs for p53tru-DR genes (d-f) and between that for p53tru-UR genes (g-          The performance metrics of the SVM model in TCGA LUNG and BRCA cohorts.      List of lung and breast cancer samples that have TP53 missense mutations detected from RNA-seq. 21 22