An Integrative Analysis of the Age-Associated Genomic, Transcriptomic and Epigenetic Landscape across Cancers

Age is the most important risk factor for cancer, as cancer incidence and mortality increase with age. However, how molecular alterations in tumours differ among patients of different age remains largely unexplored. Here, using data from The Cancer Genome Atlas, we comprehensively characterised genomic, transcriptomic and epigenetic alterations in relation to patients’ age across cancer types. We showed that tumours from older patients present an overall increase in genomic instability, somatic copy-number alterations (SCNAs) and somatic mutations. Age-associated SCNAs and mutations were identified in several cancer-driver genes across different cancer types. The largest age-related genomic differences were found in gliomas and endometrial cancer. We identified age-related global transcriptomic changes and demonstrated that these genes are controlled by age-associated DNA methylation changes. This study provides a comprehensive view of age-associated alterations in cancer and underscores age as an important factor to consider in cancer research and clinical practice.


Introduction 26
Age is the biggest risk factor for cancer, as cancer incident and mortality rates increase 27 exponentially with age in most cancer types 1 . However, the relationship between ageing and 28 molecular determinants of cancer remains to be characterised. Cancer arises through the 29 interplay between somatic mutations and selection, in a Darwinian-like process 2,3 . Thus, apart 30 from the mutation accumulation with age 4-6 , microenvironment changes during ageing could 31 also play a role in carcinogenesis 2,7,8 . We therefore hypothesise that, due to the differences in 32 selective pressures from tissue environmental changes with age, tumours arise from patients 33 across different ages might harbour different molecular landscapes; consequently, some 34 molecular changes might be more or less common in older or younger patients. 35 Recently, several studies have investigated the molecular differences in the cancer 36 genome in relation to clinical factors, including gender 9,10 and race 11,12 . These studies 37 demonstrated gender-and race-specific biomarkers, actionable target genes and provided clues 38 to understanding the biology behind the disparities in cancer incidence, aggressiveness and 39 treatment outcome across patients from different backgrounds. Although the genomic 40 alterations in childhood cancers and the differences with adult cancers have been systematically 41 characterised 13,14 , the age-related genomic landscape across adult cancers remains elusive. 42 Specific age-associated molecular landscapes have been reported in the cancer genome of 43 several cancer types, for example, glioblastoma 15 , prostate cancer 16 and breast cancer 17 . 44 However, these studies focused mainly on a single cancer type and only on some molecular 45 data types. 46 Here, using data from The Cancer Genome Atlas (TCGA), we systematically 47 investigated age-related differences in genomic instability, somatic copy number alterations 48 (SCNAs), somatic mutations, pathway alterations, gene expression, and DNA methylation 49 landscape across various cancer types. We show that, in general, genomic instability and 50 mutations frequency increase with age. We identify several age-associated genomic alterations 51 in cancers, particularly in low-grade glioma and endometrial carcinoma. Moreover, we also 52 demonstrate that age-related gene expression changes are controlled by age-related DNA 53 methylation changes and that these changes are linked to numerous biological processes. Association between age and genomic instability, loss of heterozygosity, and whole-77 genome duplication 78 To gain insight into the role of patient age into the somatic genetic profile of tumours, 79 we evaluated associations between patient age and genomic features of tumours in TCGA data 80 ( and cancer type, we found that genomic instability (GI) scores increase with age in pan-cancer 82 data (adj. R-squared = 0.35, p-value = 5.98x10 -7 ) (Fig. 1a). We next applied simple linear 83 regression to investigate the relationship between GI scores and age for each cancer type. 84 Cancer types with a significant association (adj. p-value < 0.05) were further adjusted for 85 clinical variables. We found a significant positive association between age and GI score in 86 seven cancer types (adj. p-value < 0.05) (Fig. 1b, Supplementary Fig. 1a and Supplementary 87 Table 2). Cancer types with the strongest significant positive association were low-grade 88 glioma, ovarian cancer, endometrial cancer, and sarcoma. This result indicates that the level of 89 genomic instability increases with the age of cancer patients in several cancer types. 90 The genomic loss of heterozygosity (LOH) refers to the irreversible loss of one parental 91 allele, causing an allelic imbalance, and priming the cell for another defect at the other 92 remaining allele of the respective genes 18 . To investigate whether there is an association 93 between patients' age and LOH, we quantified percent genomic LOH. By using simple linear 94 regression, we found a significant positive association between age and pan-cancer percent 95 genomic LOH (p-value = 1.20 x 10 -21 ). However, this association was no longer significant in 96 a multiple linear regression analysis (adj. R-squared = 0.32, p-value = 0.289) (Fig. 1c). Thus, 97 it is likely that this association might be cancer type-specific. We then performed a linear 98 regression between age and percent genomic LOH for each cancer type. Six cancer types 99 showed a positive association between age and percent genomic LOH (adj. p-value < 0.05) 100 simple linear regression to identify the association between age and overall SCNA scores. 126 Cancer types that displayed a significant association were further adjusted for clinical 127 variables. Consistent with the GI score results described above, the strongest positive 128 association between age and overall SCNA scores was found in low-grade glioma, ovarian and 129 endometrial cancers. Other cancer types for which a positive association between age and 130 overall SCNA score was observed were thyroid cancer and clear cell renal cell carcinoma (adj. 131 p-value < 0.05). On the other hand, lung adenocarcinoma is the only cancer type exhibiting a 132 negative association between overall SCNA score and age (Fig. 2a, Supplementary Fig. 2a,  133 and Supplementary Table 5). The different SCNA classes (focal-and chromosome/arm-level) 134 may arise through different biological mechanisms 12,21 , therefore we separately analysed the 135 association between age and focal-and chromosome/arm-level SCNA scores. Most cancers 136 that showed a significant relationship between age and overall SCNA score also had an 137 association between age and both chromosome/arm-level and focal-level SCNA scores (Fig.  138 2b-c, Supplementary Fig. 2b-c, and Supplementary Table 5). The only exception was in 139 cervical cancer, with a significant association between age and chromosome/arm-level but not 140 with focal-level and overall SCNA scores. 141 We next identified the chromosomal arms that tend to be gained and lost more often 142 with age, for 25 cancer types with sufficient samples (at least 100 tumours, Table 1). We 143 conducted the logistic regression on the significant recurrently gained and lost arms that were 144 identified by GISTIC2.0 for each cancer type. The significant association between age and 145 chromosomal arm gains and losses are shown in Fig. Fig. 2d, e, respectively (adj. p-value < 146 0.05) ( Supplementary Fig. 3, Supplementary Table 6). The gain of chromosome 7p, 7q, 20p, 147 and 20q significantly increased with age in several cancer types including two types of gliomas, 148 low-grade glioma and glioblastoma. On the other hand, the gain of chromosome 10p decreased 149 with increased age in gliomas ( Fig. 2d and 2f). For the arm losses, there was an increased 150 occurrence of loss in 11 arms with advanced age in endometrial cancer ( Fig. 2e and 2g), 151 consistent with a higher genomic instability and LOH with age in this cancer type. Low-grade 152 glioma and ovarian cancer, two other cancer types for which we found the highest significant 153 association between age and SCNA scores, also exhibited a significant increase or decrease in 154 losses with age in multiple arms ( Fig. 2e-f, Supplementary Fig. 3). We also observed that the 155 losses of chromosome 10p and 10q increased with age in gliomas. Recurrent losses of 156 chromosome 10 together with the gain of chromosome 7 are important features in IDH-wild-157 type (IDH-WT) gliomas 24 . This type of gliomas was more common in older patients, whereas 158 IDH-mutated gliomas were predominantly found in younger patients. 159 We further examined age-associated recurrent focal-level SCNAs. Applying a similar 160 logistic regression, we identified recurrent focal SCNAs associated with the age of the patients 161 for each cancer type. In total, we found 113 significant age-associated regions, including 67 162 gain regions across 10 cancer types and 46 loss regions across 9 cancer types (adj. p-value < 163 0.05) (Fig. 3a, Supplementary Table 7). In accordance with the arm-level result, the highest 164 number of significant regions was found in endometrial cancer (23 gain and 25 loss regions), 165 followed by ovarian cancer (13 gain 2 loss regions) and low-grade glioma (9 gain and 5 loss 166 regions) (Fig. 3b-c, Supplementary Fig. 4). 167 To further investigate the impact of these SCNAs, we studied the correlation between 168 the SCNA level and gene expression for tumours that have both types of data using Pearson 169 correlation. In total, 81 genes in the list of previously identified cancer driver genes 170 (Supplementary Table 8) were presented in at least one significant age-associated focal region 171 in at least one cancer type and showed a significant correlation between SCNAs and gene 172 expression (adj. p-value < 0.05) (Fig. 3d). For example, regions showing an increased gain 173 with age in endometrial cancer included 1q22, where the gene RIT1 is located in (OR = 1.0355, 174 95%CI = 1.0151-1.0571, adj. p-value = 0.0018) (Fig. 3c, e). The Ras-related GTPases RIT1 175 gene has been reported to be highly amplified and correlated with poor survival in endometrial 176 cancer 25 . Therefore, an increase in the gain of the RIT1 gene with age might relate to a poor 177 prognosis in older patients. The 16p13.3 loss increased in older endometrial cancer patients 178 (OR = 1.0335, 95%CI = 1.0048-1.0640, adj. p-value = 0.0328). This region contains the p53 179 coactivator gene CREBBP. The gain of 8q24.21 harbouring the oncogene MYC decreased with 180 patient age in low-grade glioma (OR = 0.9737, 95%CI = 0.9541-0.9927, adj. p-value = 0.0128) 181 and ovarian cancer (OR = 0.9729, 95%CI = 0.9553-0.9904, adj. p-value = 0.0063) (Fig. 3d, e). 182 In addition, in low-grade glioma, we found an increase in 9p21.3 loss with age (OR = 1.0332, 183 95%CI = 1.0174-1.0496, adj. p-value = 0.00017). This region contains the cell cycle-regulator 184 genes CNKN2A and CDKN2B (Fig. 3b, d, e). The full list of age-associated focal regions across 185 cancer types and the correlation between SCNA status and gene expression can be found in 186 Supplementary Table 7. Taken together, our analysis demonstrates the association between age 187 and SCNAs level across cancer types. We also identified age-associated arms and focal-

Age-associated somatic mutations in cancer 193
The increase in the mutational burden with age is well-established 4-6 . This age-related 194 mutation accumulation is largely explained by a clock-like mutational process, the spontaneous 195 deamination of 5-methylcytosine to thymine 5 . As expected, we confirmed the correlation 196 between age and mutation load (somatic non-silent SNVs and indels) in the pan-cancer cohort 197 using multiple linear regression adjusting for gender, race, and cancer type (adj. R-squared = 198 0.53, p-value = 1.41 x 10 -37 ) ( Supplementary Fig. 5a). For cancer-specific analysis, 18 cancer 199 types exhibited a significant relationship between age and mutation load using linear regression 200  (Fig. 4c). Therefore, the negative correlation between age and mutation loads in 217 endometrial cancer could be explained by the presence of hypermutated tumours in younger 218 patients, which are associated with MSI-H and POLE/POLD1 mutations. Previous studies on 219 POLE and MSI-H subtypes in hypermutated endometrial tumours revealed that these subtypes 220 associated with a better prognosis when compared with the copy-number high subtype 31-33 . 221 Together with our SCNA results, younger UCEC patients are likely to associate with a POLE 222 and MSI-H subtypes, high mutation rate and better survival, whilst tumours from older patients 223 are characterized by high SCNAs and are generally associated with a worse prognosis. We 224 extended the age and MSI-H analysis to other cancer types known to have a high prevalence 225 of MSI-H tumours, including colon, rectal, and stomach cancers 26 . Only in stomach cancer we 226 found an association between older age and the presence of MSI-H tumours (OR = 1.0392, 227 95%CI = 1.0091-1.0720, p-value = 0.01, Supplementary Fig. 6a). When we further examined 228 the association between age and mutations in POLE and POLD1 in other cancers apart from 229 endometrial cancer, no significant association was observed ( Supplementary Fig. 6b). 230 Although the increase in mutation load with age in cancer is well studied 4,28 , the bias 231 of mutation in particular genes with age across cancer types is largely unclear. To better 232 understand this, we conducted logistic regression to investigate genes that are more or less 233 likely to be mutated with an increased age. To prevent the potential bias caused by 234 hypermutated tumours, we restricted the analysis to samples with < 1,000 non-silent mutations 235 per exome (Table 1). We first investigated the association between age and pan-cancer gene-236 level mutations. Using multiple logistic regression correcting for gender, race, and cancer type, 237 mutations in IDH1 (OR = 0.9619, 95%CI = 0.9510-0.9730, adj. p-value = 4.18 x 10 -10 ) and 238 ATRX (OR = 0.9803, 95%CI = 0.9724-0.9881, adj. p-value = 9.85x10 -6 ) showed a negative 239 association with age. On the other hand, mutations in PIK3CA were more common in older 240 individuals (OR = 1.0082, 95%CI = 1.0022-1.0143, adj. p-value = 4.18x10 -10 ) (Fig. 4d). We 241 next identified genes in which mutations associated with age in a cancer-specific manner in 24 242 cancers with at least 100 samples (Table 1). Using logistic regression, we identified 35 243 mutations from 13 cancers that increased or decreased with the patients' age (adj. p-value < 244 1.13x10 -12 and , respectively). Our observation was consistent with the fact that the median age 251 of IDH-mutants is younger than IDH-WT gliomas. Patients carrying the IDH1 mutation 252 generally had longer survival than those with IDH-WT 34 . Previous studies also reported that 253 IDH1 mutations often co-occurred with ATRX and TP53 mutations, and mutations in these 254 three genes were more prevalent in gliomas without EGFR mutations 15,35 . Indeed, we found 255 that EGFR mutations were more common in older low-grade glioma patients (OR = 1.0865, 256 95%CI = 1.0525-1.1258, adj. p-value = 4.35x10 -7 ) (Fig. 4f). Moreover, our SCNA analysis 257 revealed an increase in the gain of EGFR with age in low-grade glioma but not in glioblastoma 258 ( Fig. 3d), suggesting the difference in age-associated genomic landscape between the two 259 glioma types. Together with the SCNA results, gliomas from younger patients are associated 260 with IDH1, ATRX, and TP53 mutations, lower SCNAs, and longer survival. In contrast, 261 gliomas from older patients were more likely to be IDH-WT with EGFR mutations, 262 chromosome 7 gain and 10 loss, CDKN2A deletion and worse prognosis. As we have identified numerous age-associated alterations in cancer driver genes in 279 both SCNA and somatic mutation levels, we asked if the age-associated patterns also exist in 280 particular oncogenic signalling pathways. We used the data from a previous TCGA study, 281 which had comprehensively characterized 10 highly altered signalling pathways in cancers 36 . 282 To make the subsequent analysis comparable to previous analyses, we restricted the analysis 283 to samples that were used in our previous analyses, yielding 8,055 samples across 33 cancer 284 types (Table 1). Using logistic regression adjusting for gender, race and cancer type, we 285 identified five out of 10 signalling pathways that showed a positive association with age (adj. 286 p-value < 0.05), indicating that the genes in these pathways are altered more frequently in older 287 patients, concordant with the increase in overall mutations and SCNAs with age (Fig. 5a, 288 Supplementary Table 11). The strongest association was found in cell cycle (OR = 1.0122, 289 95%CI = 1.0076-1.0168, adj. p-value = 1.40x10 -6 ) and Wnt signalling (OR = 1.0122, 95%CI 290 = 1.0073-1.0172, adj. p-value = 6.39x10 -6 ). We next applied logistic regression to investigate 291 the cancer-specific association between age and oncogenic signalling alterations for cancer 292 types that contained at least 100 samples. In total, we identified 28 significant associations 293 across 15 cancer types (adj. p-value < 0.05) (Fig. 5b, Supplementary Table 11). Alterations in 294 Hippo and TP53 signalling pathways significantly associated with age, both positively and 295 negatively, in five cancer types. Consistent with a pan-cancer analysis, cell cycle, Notch and 296 Wnt signalling each showed an increase in alterations with age in three cancer types. We found 297 that alterations in cell cycle pathway increased with age in low-grade glioma (OR = 1.0313, 298 95%CI = 1.0161-1.0467, adj. p-value = 0.00035). This was largely explained by the increase 299 in CDKN2A and CDKN2B deletions with age as well as epigenetic silencing of CDKN2A in 300 older patients (Fig. 5c). On the other hand, TP53 pathway alteration was more pronounced in 301 younger patients (OR = 0.9520, 95%CI = 0.9372-0.9670, adj. p-value = 2.63x10 -8 ), due to the 302 mutations in the TP53 gene (Fig. 5c). In endometrial cancer, two pathways -Hippo (OR = 303 0.9681, 95%CI = 0.9459-0.9908, adj. p-value = 0.0126) and Wnt (OR = 0.9741, 95%CI = 304 0.9541-0.9946, adj. p-value = 0.0240) -showed a negative association with age, that may be 305 explained by the presence of hypermutated tumours in younger patients. Collectively, we 306 reported pathway alterations in relation to age in several cancer types, highlighting differences 307 in oncogenic pathways that might be important in cancer initiation and progression in an age-308 related manner. 309 310

Age-associated gene expression and DNA methylation changes 311
Apart from the genomic differences with age, we investigated age-associated 312 transcriptomic and epigenetic changes across cancers. We separately performed multiple linear 313 regression analyses on gene expression data and methylation data of 24 cancer types that 314 contained at least 100 samples in both types of data (Table 1). We noticed that, across all genes, 315 the regression coefficient of age on gene expression negatively correlated with the regression 316 coefficient of age on methylation in all cancer types ( Supplementary Fig. 8 < 0.05), the number of age-DEGs and age-DMGs were consistent for most cancer types (Fig.  324 6a). We next focused our analysis on 10 cancer types that contained at least 150 age-DEGs and 325 150 age-DMGs, including low-grade glioma, breast cancer, endometrial cancer, oesophageal 326 cancer, papillary renal cell carcinoma, ovarian cancer, liver cancer, acute myeloid leukaemia, 327 melanoma, and prostate cancer. We identified overlapping genes between age-DEGs and age-328 DMGs and found that most of them, from 84% (37/44 genes) in ovarian cancer to 100% in 329 acute myeloid leukaemia (57 genes) and prostate cancer (7 genes), were genes that presented 330 increased methylation and decreased expression with age and genes that had decreased 331 methylation and increased expression with age ( Fig. 6b-c, Supplementary Fig. 9, 332 Supplementary Table 14). We further examined the correlation coefficient between 333 methylation and expression comparing between 4 groups of genes 1) overlap genes between 334 age-DMGs and age-DEGs (age-DMGs-DEGs), 2) age-DMGs only, 3) age-DEGs only, and 4) 335 other genes. We found that age-DMGs-DEGs had the most negative correlation between DNA 336 methylation and expression when comparing with other groups of genes (Fig. 6d We next performed Gene Set Enrichment Analysis (GSEA) to gain biological insights 340 into the expression and methylation changes with age. We identified various significantly 341 enriched Gene Ontology (GO) terms across cancers (Fig. 6e, Supplementary Fig. 11, 342 Supplementary which was altered more frequently in older breast cancer patients (Fig. 5b), showed a decrease 346 in gene expression and increase in methylation with age. In low-grade glioma, interestingly, 347 mitochondrial terms were enriched in the gene expression of younger patients. Mitochondrial 348 dysfunction is known to be important in glioma pathophysiology 37 , thus the different levels of 349 mitochondrial aberrations might contribute to the disparities in the aggressiveness of gliomas 350 in patients of different age. We also identified numerous immune-related terms enriched across 351 several cancer types, including oesophageal, papillary renal cell, liver, and prostate cancers 352 ( Supplementary Fig. 11, Supplementary Table 16). Previous studies suggested alterations in 353 immune-related gene expression and immune cell abundance changes with age in cancers 38,39 . 354 In the present study, we have systematically characterised the transcriptome and methylation 355 in relation to age across cancer types. Our results suggest that gene expression changes with 356 age in cancer are controlled, at least in part, by DNA methylation. These changes reflect 357 differences in biological pathways that might be important in tumour development. 358 359

Discussion 360
Although age is an important risk factor for cancer, how age impacts the molecular 361 landscape of cancer is not well understood. In this study, we provide a comprehensive overview 362 of the age-associated molecular landscape in cancer, including genomic instability, LOH, 363 WGD, SCNAs, somatic mutations, pathway alterations, gene expression, and DNA 364 methylation. We confirmed the known increase in mutation load 4,5 and found an increase in 365 genomic instability, LOH and WGD with age in several cancer types. We identified several 366 age-related pan-cancer and cancer-specific alterations. The highest age-related differences 367 were evident in low-grade glioma and endometrial cancer. 368 Cancer develops through the accumulation of genetic and epigenetic alterations. 369 Mutation accumulation with age is thought to be a cause of cancer and a substantial portion of 370 mutations arise before cancer initiation 6 . The age-associated mutation accumulation has been 371 demonstrated in both cancer 4,5 and normal tissues 40-42 , providing a better understanding of an 372 early carcinogenesis event. Our results show that, in addition to mutations, SCNAs, LOH and 373 WGD increase with age in several cancers, in particular low-grade glioma, endometrial and 374 ovarian cancers. Recent evidence suggests that SCNA burden is a prognostic factor associated 375 with both recurrence and death 43 , thus, an increased SCNA level with age might relate to poor 376 prognosis in the elderly. 377 The negative association between age and mutation in IDH1 and ATRX in glioma points 378 towards the difference of patient age at diagnosis between the IDH-mutant and IDH-WT 379 subtypes. IDH-mutant tumours are observed in the majority of low-grade glioma and show 380 favourable prognosis. IDH-WT low-grade gliomas, on the other hand, more resemble 381 glioblastomas and have poorer survival. In glioblastoma, although IDH-mutants are a minority 382 of tumours, they are also associated with younger age 44 . The present study together with 383 others 34,45 , therefore indicates that glioma shows unique age-associated subtypes. However, 384 more research is needed to understand how age influences the evolution of glioma subtypes. 385 Our results highlighted substantial age-associated differences in the genome of number high subtype is associated with poor survival. Therefore, endometrial cancer from 395 younger patients is associated with POLE mutations, mismatch repair defects, high mutation 396 load and better survival outcomes. Older endometrial cancer, however, is related to extensive 397 SCNAs and worse prognosis. Importantly, apart from low-grade glioma and endometrial 398 cancer, we demonstrate that other cancer types also present an age-associated genomic 399 landscape in cancer driver genes and oncogenic signalling pathways. These results highlight 400 the impact of age on the molecular profile of cancer. 401 Having identified these age-related differences in the molecular landscapes of various 402 cancers, the obvious question is what drives these differences. Accumulating evidence has 403 underscored the importance of tissue environment changes with ageing in cancer initiation and 404 Expression and methylation changes with age link to several biological processes, showing that 414 cancer from patients with different ages present different phenotypes. We also noticed that 415 cancer in female reproductive organs including breast, ovarian and endometrial cancers are 416 among those with the highest number of age-DEGs and age-DMGs. These cancers tend to have 417 a higher mass-normalised cancer incidence, which may reflect evolutionary trade-offs 418 involving selective pressures related to reproduction 49 . The age-associated hormonal changes 419 could also be responsible for this age-related expression differences in cancer 50 . The limitation 420 of this analysis is that although we have already included tumour purity in our linear model, it underlying mechanisms for age-associated genomic differences. Our study, however, has also 434 featured an age-related genomic profile in endometrial cancer. We have investigated cancer-435 specific associations between age and LOH, WGD and oncogenic signalling. Furthermore, we GI score was calculated as a genome fraction (percent-based) that does not fit the estimated 494 tumour ploidy, 2 for normal diploid, and 4 for tumours that have undergone the WGD process. 495 Simple linear regression was performed to identify the association between age and GI score. 496 For pan-cancer analysis, multiple linear regression was used to adjust for gender, race, and 497 cancer type. For cancer-specific analysis, multiple linear regression accounting for clinical 498 variables was conducted on the cancer types that had a significant association between age and 499 GI score from the simple linear regression analysis (adj. p-value < 0.05). The complete set of 500 results is presented in Supplementary Table 2. 501 502

Percentage genomic LOH quantification and analysis 503
To quantify the percent genomic LOH for each tumour, we used allele-specific copy number 504 profiles from ASCAT. X and Y chromosome regions were discarded from the analysis. The 505 LOH segments were segments that harbour only one allele. The percent genomic LOH was 506 defined as 100 times the total length of LOH regions / length of the genome. 507 Simple linear regression and multiple linear regression adjusting for gender, race, and 508 cancer types were conducted to investigate the relationship between age and the percent 509 genomic LOH in the pan-cancer analysis. For cancer-specific analysis, simple linear regression 510 was performed followed by multiple linear regression accounting for clinical factors for 511 cancers with a significant association in simple linear regression analysis (adj. p-value < 0.05). 512 The complete set of results is in Supplementary Table 3. 513 514 WGD analysis 515 WGD status for each tumour was obtained from fraction of genome with LOH and tumour 516 ploidy. To investigate the association between age and WGD across the pan-cancer dataset, we 517 performed simple logistic regression and multiple logistic regression correcting for gender, 518 race, and cancer type. For cancer-specific analysis, simple logistic regression was performed 519 to access the association between age and WGD on tumours from each cancer type. Cancer 520 types with a significant association between age and WGD (adj. p-value < 0.05) were further 521 subjected to the multiple logistic regression accounting for the clinical variables. The complete 522 set of results is in Supplementary Table 4. 523

List of known cancer driver genes 525
We compiled a list of known cancer driver genes from (1) the list of 243 COSMIC classic 526 genes from COSMIC database version 91 67 (downloaded on 1 st July 2020), (2) the list of 260 527 significantly mutated genes from Lawrence et al 68 , and (3) the list of 299 cancer driver genes 528 from the TCGA Pan-Cancer study 69 . In total, we obtained 505 cancer genes and focused on the 529 mutations and focal-level SCNAs on these genes in our study. The full list of cancer driver 530 genes is available in Supplementary Table 8. 531 532

Recurrent SCNA analysis 533
Recurrent arm-level and focal-level SCNAs of each cancer type were identified using 534 GISTIC2.0 22 . Segmented files (nocnv_hg19.seg) from TCGA, marker file and CNV file, 535 provided by GISTIC2.0, were used as input files. The parameters were set as follows: '-536 genegistic 1 -smallmem 1 -qvt 0.25 -ta 0.25 -td 0.25 -broad 1 -brlen 0.7 -conf 0.95 -armpeel 1 537 -savegene 1'. Based on these parameters, broad events were defined as the alterations happen 538 in more than 70% of an arm. The log2 ratio thresholds for copy number gains and deletions 539 were 0.25 and -0.25, respectively. The confidence level was set as 0.95 and the q-value was 540 0.25. 541 To investigate the association between age and arm-level SCNAs for each cancer type, 542 simple logistic regression was performed for each chromosomal arm that was identified as 543 recurrent SCNA in a cancer type. Only cancer types with more than 100 samples were included 544 in this analysis (Table 1). Arms with a significant association (adj. p-value < 0.05) were further 545 adjusted for clinical variables using multiple logistic regression. The complete set of results is 546 in Supplementary Table 6. Similarly, simple and multiple logistic regression was conducted on 547 the focal-level SCNAs for each cancer type. Regions that are not overlapped with centromeres 548 or telomeres were removed from the analysis. The complete set of results is in Supplementary 549 Table 7. 550 To confirm the impact of SCNAs on gene expression, we investigated the correlation 551 between GISTIC2.0 score and RNA-seq based gene expression (log2(normalised RSEM + 1)) 552 for tumours that have both types of data using Pearson correlation. The correlation was 553 considered significant if the p-value corrected for multiple-hypothesis testing using the 554 Benjamini-Hochberg procedure < 0.05. The complete set of results is in Supplementary Table  555 7. 556 557

SCNA score quantification and analysis 558
Previous studies have developed the SCNA score representing the SCNA level of a tumour 12,23 . 559 We applied the methods described by Yuan et al 12 to calculate SCNA scores. Using SCNA 560 profiles from GISTIC2.0 analysis, SCNA scores for each tumour were derived at three different 561 levels (chromosome-, arm-, and focal-level). For each tumour, each focal-event log2 copy 562 number ratio from GISTIC2.0 was classified into the following score: 2 if the log2 ratio ³ 1, 1 563 if the log2 ratio < 1 and ³ 0.25, 0 if the log2 ratio < 0.25 and ³ -0.25, -1 if the log2 ratio < -564 0.25 and ³ -1, and -2 if the log2 ratio < -1. The |score| from each focal event in a tumour was 565 then summed into a focal score of a tumour. Thereafter, the rank-based normalisation 566 (rank/number of tumours in a cancer type) was applied to focal scores from all tumours within 567 the same cancer type, resulting in normalized focal-level SCNA scores. Therefore, tumours 568 with high focal-level SCNAs will have focal-level SCNA scores close to 1, while tumours with 569 low focal-level SCNAs will have scores close to 0. For the arm-and chromosome-level SCNA 570 scores, a similar procedure was applied to the broad event log2 copy number ratio from 571 GISTIC2.0. An event was considered as a chromosome-level if both arms have the same log2 572 ratio, otherwise it was considered as an arm-level. Similar to the focal-level SCNA score, each 573 arm-and chromosome-event log2 copy number ratio was classified into the 2, 1, 0, -1, -2 scores 574 using the threshold described above. The |score| from all arm-events and chromosome-events 575 for a tumour were then summed into an arm score and chromosome score, respectively. For 576 each cancer type, the rank-based normalisation was applied to arm scores and chromosome 577 scores from all tumours to derive normalised arm-level SCNA scores and normalised 578 chromosome-level SCNA scores, respectively. An overall SCNA score for a tumour was 579 defined as the sum of focal-level, arm-level, and chromosome-level SCNA scores. A 580 chromosome/arm-level SCNA score for a tumour was defined as the sum of chromosome-level 581 and arm-level SCNA scores. 582 The association between age and overall, chromosome/arm-level, and focal-level 583 SCNA scores for each cancer type was investigated using simple linear regression. Cancer 584 types with a significant association (adj. p-value < 0.05) were then subjected to multiple linear 585 regression analysis adjusting for the clinical variables. The complete set of results is included 586 in Supplementary Table 5. 587 588

Analysis of age-associated somatic mutation in cancer genes 589
We obtained the mutation data from the MAF file from the recent TCGA Multi-Center 590 Mutation Calling in Multiple Cancers (MC3) project 53 . In the MC3 effort, variants were called 591 using seven variant callers. We filtered the variants to keep only non-silent SNVs and indels 592 located in gene bodies, retaining only "Frame_Shift_Del", "Frame_Shift_Ins", 593 "In_Frame_Del", "In_Frame_Ins", "Missense_Mutation", "Nonsense_Mutation", 594 "Nonstop_Mutation", "Splice_Site" and Translation_Start_Site in the 595 "Variant_Classification" column. We focused only on mutations in the cancer genes from our 596 compiled list of cancer driver genes. To prevent the bias that might cause by hypermutated 597 tumours, we restricted the analysis to tumours with < 1,000 mutations per exome. For pan-598 cancer analysis, multiple logistic regression accounting for gender, race and cancer type was 599 performed to investigate the association between age and mutations in 20 cancer genes that are 600 mutated in > 5% of samples (Supplementary Table 10). For cancer-specific analysis, simple 601 logistic regression was used to identify cancer genes that the mutations in these genes are 602 associated with the patient's age. Only genes that are mutated in > 5% of samples from each 603 cancer type were included in the analysis. The significant associations (adj. p-value < 0.05) 604 were further investigated using multiple logistic regression accounting for clinical variables. 605 The complete set of results is in Supplementary Table 10. 606 607

Analysis of mutational burden, MSI-H status, and POLE/POLD1 mutations 608
A mutational burden was defined as the total non-silent mutations in an exome. The mutational 609 burden for each tumour was log-transformed before using it in the subsequent analysis. To 610 investigate the relationship between age and mutational burden in pan-cancer, multiple linear 611 regression adjusting for gender, race and cancer type was conducted. For cancer-specific 612 analysis, simple linear regression was performed. Cancer types with a significant association 613 between age and mutational burden in simple linear regression analysis (adj. p-value < 0.05) 614 were further examined using multiple linear regression accounting for clinical factors. The 615 complete set of results is in Supplementary Table 9.