Cancers adapt to their mutational load by buffering protein misfolding stress

In asexual populations that don’t undergo recombination, such as cancer, deleterious mutations are expected to accrue readily due to genome-wide linkage between mutations. Despite this mutational load of often thousands of deleterious mutations, many tumors thrive. How tumors survive the damaging consequences of this mutational load is not well understood. Here, we investigate the functional consequences of mutational load in 10,295 human tumors by quantifying their phenotypic response through changes in gene expression. Using a generalized linear mixed model (GLMM), we find that high mutational load tumors up-regulate proteostasis machinery related to the mitigation and prevention of protein misfolding. We replicate these expression responses in cancer cell lines and show that the viability in high mutational load cancer cells is strongly dependent on complexes that degrade and refold proteins. This indicates that upregulation of proteostasis machinery is causally important for high mutational burden tumors and uncovers new therapeutic vulnerabilities. Statement of Significance Cancers can successfully survive an accumulation of thousands of protein-damaging mutations. Here, we show that high mutational load tumors mitigate these damaging consequences by up-regulating complexes that buffer against protein misfolding stress – providing novel therapeutic vulnerabilities and suggesting that disruption of proteostasis is a hallmark of somatic evolution.

mutations. Here, we show that high mutational load tumors mitigate these damaging 14 consequences by up-regulating complexes that buffer against protein misfolding stress 15 providing novel therapeutic vulnerabilities and suggesting that disruption of 16 proteostasis is a hallmark of somatic evolution. 17 Abstract 18 In asexual populations that don't undergo recombination, such as cancer, deleterious 19 mutations are expected to accrue readily due to genome-wide linkage between 20 mutations. Despite this mutational load of often thousands of deleterious mutations, 21 many tumors thrive. How tumors survive the damaging consequences of this mutational 22 load is not well understood. Here, we investigate the functional consequences of 23 mutational load in 10,295 human tumors by quantifying their phenotypic response 24 through changes in gene expression. Using a generalized linear mixed model (GLMM), 25 we find that high mutational load tumors up-regulate proteostasis machinery related to 26 the mitigation and prevention of protein misfolding. We replicate these expression 27 responses in cancer cell lines and show that the viability in high mutational load cancer 28 cells is strongly dependent on complexes that degrade and refold proteins. This 29 indicates that upregulation of proteostasis machinery is causally important for high 30 mutational burden tumors and uncovers new therapeutic vulnerabilities. small subset of these mutations drive tumor progression, the vast majority of remaining 40 mutations, known as passengers, don't help and might hinder cancer growth. The role 41 that passengers play in tumor progression has traditionally received little attention 42 despite their abundance and variation across cancer types. The number of passengers 43 in a tumor can vary by over four orders of magnitude, even within the same cancer type, 44 from just a few to tens of thousands of point mutations 1 . 45 Whether these passengers are neutral or damaging to tumors has long been a 46 matter of debate 2-10 . Some have argued that passengers are functionally unimportant to 47 tumors given that most non-synonymous mutations are not removed by negative 48 selection in somatic tissues 2,3 . This is in direct contrast to the human germ-line, where 49 non-synonymous mutations are functionally damaging to most genes 11 and signals of 50 negative selection are pervasive 3 . The common explanation for why damaging protein-51 coding mutations are removed in the human-germline but maintained in somatic tissues 52 is that most genes are only important for multi-cellular function at the organismal level 53 (e.g. during development), but not during somatic growth 2,12 . 54 However, the notion that non-synonymous mutations are only selectively neutral 55 in somatic tissues is surprising given their known functional consequences in the germ-56 line. Non-synonymous mutations are known to be damaging in the human germ-line 57 due to their effects on protein folding and stability 13 , which ought to be shared between 58 somatic and germline evolution. An alternative explanation is that non-synonymous 59 mutations are indeed damaging in somatic evolution, but negative selection is too 60 inefficient at removing them due to linkage effects driven by the lack of recombination in 61 somatic cells 10 . Without recombination to break apart combinations of mutations, 62 selection must act on beneficial drivers and deleterious passengers that arise in the 63 same genome together. This makes it less efficient for selection to individually favor 64 beneficial drivers or remove deleterious passengers 14 . As a result, a substantial number 65 of weakly damaging passengers can accrue in cancer due to inefficient negative 66 selection over time. In support of this model, tumors with very small numbers of 67 passengerswhere linkage effects are expected to be negligiblehave recently been 68 shown to exhibit signatures of negative selection and weed out damaging non-69 synonymous mutations 10 . In contrast, the remaining majority (>95%) of tumors, which 70 contain much larger numbers of linked mutations, display patterns of inefficient negative 71 selection. This provides evidence in favor of the inefficient selection model and implies 72 that most tumors carry a correspondingly large deleterious mutational load. 73 If individual passengers are in fact substantially damaging in cancer, successful 74 tumors with thousands of linked mutations must find ways to maintain their viability by 75 mitigating this large mutational load. While paths to mitigation are difficult to predict for 76 non-coding mutations, tumors with mutations in protein-coding genes are expected to 77 minimize the damaging phenotypic effects of protein mis-folding stress. Here, we 78 investigate this hypothesis by analyzing tumor tissues with paired mutational and gene 79 expression profiles to assess how the physiological state of cancer cells change as they 80 accumulate protein coding mutations. Using a general linear mixed effects regression 81 model (GLMM), we leverage variation across 10,295 tumors from 33 cancer types and 82 find that complexes that re-fold proteins (chaperones), degrade proteins (proteasome) 83 and splice mRNA (spliceosome) are up-regulated in high mutation load tumors. We 84 validate these results by showing that similar physiological responses occur in high 85 mutational load cancer cell lines as well. Finally, we establish a causal connection by 86 showing that high mutational load cell lines are particularly sensitive when proteasome 87 and chaperone function is disrupted through downregulation of expression via short-88 hairpin RNA (shRNA) knock-down or targeted therapies. Collectively, these data 89 indicate that the viability of high mutational load tumors is strongly dependent on the up-90 regulation of complexes that degrade and refold proteins, revealing a generic 91 vulnerability of cancer that can potentially be therapeutically exploited. 92

93
Quantifying transcriptional response to mutational load in human tumors.

95
We first performed a genome-wide screen to systematically identify which genes 96 are transcriptionally upregulated in response to mutational load in human tumors. To do 97 so, we utilized publicly available whole-exome and gene expression data from 10,295 98 human tumors across 33 cancer types from The Cancer Genome Atlas (TCGA) 15, 16 . We 99 considered multiple classes of mutations to define mutational load and investigated their 100 degree of collinearity, focusing on protein-coding regions since the use of whole-exome 101 data limits the ability to accurately assess mutations in non-coding regions. We find that 102 there is a high degree of collinearity among synonymous, non-synonymous and 103 nonsense point mutations in protein coding genes (R > 0.9) but weak collinearity 104 between point mutations and copy number alterations (R < 0.05) (Supplemental Figure  105 1). Thus, we decided to focus on the aggregate effects of protein-coding mutations and 106 for all analyses defined mutational load as log10 of the total number of point mutations in 107 protein-coding genes. For simplicity, we used all mutations rather than focusing only on 108 passenger mutations since identifying genuine drivers against a background of linked 109 passenger events can be difficult, especially for tumors with many mutations. 110 Since gene expression can vary across tumors due to many factors, such as 111 cancer type, tumor purity and other unknown factors, we utilized a generalized linear 112 mixed model (GLMM) to measure the association of mutational load and gene 113 expression while accounting for these potential confounders (Fig. 1A). Within the 114 GLMM, tumor purity and mutational load were modeled as fixed effects whereas cancer 115 type was modeled as a random effect since it varies across groups of patients and can 116 be interpreted as repeated measurements across groups. The following GLMM was 117 applied separately to each gene, 118 where is a vector of normalized expression values across all tumors, 0 is the fixed 121 intercept, 1 is the fixed slope for the predictor variable 1 which is a vector of mutational 122 load values for each tumor, 2 is the fixed slope for the predictor variable 2 which is a 123 vector of the purity of each tumor, v is the random intercept for each cancer type, and 124 is a Gaussian error term (Methods). 125

155
Gene silencing through alternative splicing in high mutational load tumors.

157
We next investigated in detail how these protein complexes could mitigate the 158 damaging effects of protein misfolding in high mutational load tumors by examining the 159 role of the spliceosome in gene silencing. We hypothesized that the up-regulation of the 160 spliceosome in high mutational load tumors prevents further protein misfolding by 161 regulating pre-mRNA transcripts to be degraded rather than translated. The down-162 regulation of gene expression via alternative splicing events, such as intron retention, is 163 one known mechanism to silence genes by funneling transcripts to mRNA decay 164 pathways. 22-24 165 To test whether gene expression is down-regulated in high mutational load 166 tumors through intron retention, we utilized previously called alternative splicing events 167 in TCGA 25 . Alternative splicing events within this dataset were quantified through a 168 metric called percent spliced in or PSI. PSI is calculated as the number of reads that 169 overlap the alternative splicing event (e.g. for intron retention, either at intronic regions 170 or those at the boundary of exon to intron junctions) divided by the total number of 171 reads that support and don't support the alternative splicing event. Thus, PSI estimates 172 the probability of alternative splicing events only at specific exonic boundaries in the 173 entire transcript population without requiring information on the complete underlying 174 composition of each full length-transcript. 175 Using these alternative splicing calls, we reasoned that if a transcript contains an 176 intron retention event and is downregulated in expression, the transcript is more likely to 177 have been degraded by mRNA decay pathways. For all genes, we first quantified 178 whether intron retention events were present based on a threshold value >80% PSI. For 179 each gene with an intron retention event, we quantified whether the expression of the 180 same gene was under-expressed. Each gene was counted as under-expressed if it was 181 one standard deviation below the mean expression within the same cancer type. To 182 control for mutations that might affect patterns of expression, (i.e., expression 183 quantitative trait loci or eQTL effects), alternative splicing events that contained a point 184 mutation within the same gene were removed from the analysis (which only represent 185 ~1% of intron retention events across all tumors; Methods). We find that relative to all 186 transcripts with intron retention events, the number of transcripts that are under-187 expressed increases with tumor mutational load ( Fig. 2A), suggesting that the degree of 188 intron-retention driven mRNA decay is elevated in high mutational load tumors. This 189 trend is robust to other PSI value thresholds (>50-90% PSI), even for other alternative 190 splicing events (e.g., exon skipping, mutually exclusive exons, etc.) and when not 191 filtering for potential eQTL effects (Supplemental Figure 3 and 4).

192
We next investigated which genes are more likely to be silenced through mRNA 193 decay between low and high mutational load tumors. For each intron retention event, 194 we calculated whether PSI values were significantly different in low mutational load 195 tumors (<10 total protein-coding mutations) compared to high mutational load tumors 196 (>1000 total protein-coding mutations) using a t-test. This approach identified 606 and 197 201 genes that have more and less intron retention events in high mutational load 198 tumors, respectively. Using gene set enrichment analysis, we find that cytoplasmic 199 ribosomes contain more intron retention events in high mutational load tumors, 200 potentially leading to their down-regulation through mRNA decay to prevent further 201 protein mis-folding (Fig. 2B). Genes that contain fewer intron retention events in high 202 mutational load tumors, which are less likely to undergo mRNA decay, are primarily 203 related to mRNA splicing. 204 205 206

214
Regulation of translation, protein folding and protein degradation in high 215 mutational load tumors.

217
Next, we investigated in detail how the remaining proteostasis complexes that 218 were significant in our genome-wide screen, which regulate protein synthesis, 219 degradation and folding, could mitigate protein misfolding in high mutational load 220 tumors. To do so, we expanded our gene sets to include other chaperone families, all 221 ribosomal complexes and proteasomal subunits (Fig. 3A). Using the GLMM framework 222 detailed above, we find that the expression of nearly all individual genes in chaperone 223 families that participate in protein folding (HSP60, HSP70 and HSP90), protein 224 disaggregation (HSP100), and have organelle-specific roles (ER and mitochondrial) are 225 significantly up-regulated in response to mutational load. Interestingly, however, small 226 heat shock proteins, which don't participate in protein folding or disaggregation, are 227 significantly down-regulated in response to increased protein coding mutations. The role 228 of small heat shock proteins is primarily to hold unfolded proteins in a reversible state 229 for re-folding or degradation by other chaperones 26 and thus, could possibly be down-230 regulated due to their inefficiency in mitigating protein misfolding.

231
We further examined differences in expression of different structural components 232 of the proteasome, a large protein complex responsible for degradation of intracellular 233 proteins. Consistent with the over-expression of chaperone families that mitigate protein 234 mis-folding, both the 19s regulatory particle (which recognizes and imports proteins for 235 degradation) and the 20s core (which cleaves peptides) of the proteasome are up-236 regulated in response to mutational load in TCGA (Fig. 3A). In addition, we find that 237 specifically mitochondrialbut not cytoplasmicribosome complexes are up-238 regulated in high mutational load tumors. As previously reported in yeast 27 and human 239 cells 28 , mitochondrial ribosome biogenesis has been shown to occur under conditions of 240 chronic protein misfolding as a mechanism of compartmentalization and degradation of 241 proteins. In contrast, translation of proteins through cytosolic ribosome biogenesis has 242 been previously characterized to be attenuated and slowed to prevent further protein 243 mis-folding 29 . This decrease in expression of cytoplasmic ribosomes is also consistent 244 with observed patterns of alternative splicing coupled to mRNA decay pathways in high 245 mutational load tumors (Fig. 2B).

246
Finally, we performed a jackknife re-sampling procedure to confirm that specific 247 cancer types aren't driving patterns of association within the GLMM. This was achieved 248 by removing each cancer type from the regression model one at a time, and re-249 calculating regression coefficients on the remaining set of samples. Overall, regression 250 coefficients were stable across cancer types and trends were unchanged (Supplemental 251 Figure 5). In addition, we also performed linear regression within cancer types and 252 found similar expression responses to mutational load across proteostasis complexes 253 (Supplemental Figure 6). Finally, we also confirmed that patient age was not driving 254 patterns of association of mutational load and gene expression within the GLMM 255 (Supplemental Figure 7). Taken  patterns seen in human tumors broadly replicate in cancer cell lines (Fig. 3). Similar to 272 the expression analysis in TCGA, we also confirmed through a jackknife re-sampling 273 procedure that specific cancer types aren't driving patterns of association within the 274 GLM (Supplemental Figure 8). Finally, we further validated these trends by 275 incorporating protein abundance estimates in CCLE, which contains the largest dataset 276 available of RNA (n=1377) and protein (n=373) abundances that are harmonized across 277 samples. We find similar patterns of expression and protein abundances in response to 278 mutational load in CCLE within proteostasis complexes (Supplemental Figure 9). 279 Overall, this indicated that the expression patterns observed are cell 280 autonomous (i.e., independent of organismal effects such as the immune system, age 281 or microenvironment) and consistent across high mutational load cancer cells.

282
Importantly, it also demonstrates that cancer cell lines are a reasonable model to 283 causally interrogate these effects further through functional and pharmacological 284 perturbation experiments. 285 286 287 288

297 298
To establish a causal relationship between the over-expression of proteostasis 299 machinery and maintenance of cell viability under high mutational load, we utilized 300 expression knock-down (shRNA) estimates from project Achilles 32 for the same cancer 301 cell lines as in CCLE. We sought to measure how mutational load impacts cell viability 302 when protein complexes and gene families undergo a loss of function through 303 expression knock-down. Since the shRNA screen was performed on an individual gene 304 basis, we utilized a GLM framework that aggregates expression knock-down estimates 305 of all genes within a given proteostasis gene family to jointly measure how mutational 306 load impacts cell viability after loss of function. Specifically, we included an additional 307 categorical variable of the gene name within each gene family to allow for a change in 308 the intercept within each gene in the GLM when measuring the association of 309 mutational load and cell viability after expression knock-down. In addition, we similarly 310 evaluated whether specific cancer types were driving patterns of association within the 311 GLM through jackknife re-sampling by cancer type (Fig. 4A). 312 Overall, we find that elevated mutational load is associated with decreased cell 313 viability when the function of most chaperone gene families are disrupted through 314 expression knock-down (Fig. 4A). However, only chaperones within the HSP100 family, 315 which have the unique ability to rescue and reactivate existing protein aggregates in 316 cooperation with other chaperone families 33 , show a significant negative relationship 317 between mutational load and cell viability across almost all cancer types. Similarly, we 318 find specificity in the vulnerability that mutational load generates when the function of 319 the proteasome and different ribosomal complexes are disrupted (Fig. 4A). Mutational 320 load significantly decreases cell viability only when expression knock-down of the 19s 321 regulatory particle of the proteasome is disrupted, suggesting that targeting the protein 322 import machinery of the proteasome is more effective than targeting the protein cleaving 323 machinery in the 20s core. Finally, mutational load significantly increases cell viability 324 when cytoplasmic ribosomeswhich are already down-regulated in response to 325 mutational load ( Fig. 2B)undergo a loss of function through expression knock-down. 326 Conversely, expression knock-down of mitochondrial ribosomes significantly decreases 327 viability with increased mutational load in cell lines, which is also consistent with the 328 patterns of expression observed.

329
Since functional redundancy in the human genome can make expression knock-330 down estimates within individual genes noisy, we also examined how drugs targeting 331 the function of whole complexes impacts viability with mutational load across all cancer 332 types and when removing individual cancer types through jackknife re-sampling. To do 333 so, we utilized drug sensitivity screening data in project PRISM 34 within CCLE and used 334 a simple GLM to measure the association of mutational load and cell viability after drug 335 inhibition. We find that treatment with the majority of proteasome inhibitors (6/8) and 336 ubiquitin-specific proteasome inhibitors (2/3), which target protein degradation 337 complexes, are significantly associated with a decrease in cell viability in high 338 mutational load cell lines. Similarly, most HSP90 inhibitors decrease cell viability with 339 mutational load (8/10), although only a few drugs show a significant relationship. This 340 variability in the efficacy of drugs with similar mechanisms of action likely reflects that 341 the efficacy to disrupt the function of proteostasis machinery is dependent on the 342 specific molecular affinity of a compound to its target and downstream effectors. While 343 these are the only relevant proteostasis drugs in the PRISM dataset that are currently 344 available, we anticipate that drugs targeting other chaperone machinery or splicing 345 complexes could also target other potential vulnerabilities in high mutational load 346 cancers. Collectively, these results indicate that elevated expression of protein 347 degradation and folding machinery is causally related to the maintenance of viability in 348 in high mutational load cell lines, and likely in high mutational load tumors by extension. 349

359
Lastly, we find that most drugs in the PRISM database do not significantly 360 decrease cell viability with mutational load (Fig. 5A), suggesting that high mutational 361 load cancer cells are not generically vulnerable to all classes of drugs. Specifically, we 362 find that drugs which inhibit transcription, cytoskeleton organization, protein 363 degradation, chaperones, protein synthesis and promote apoptosis are most effective at 364 targeting high mutational load cancer cellsdelineating additional potential therapeutic 365 vulnerabilities in high mutational burden tumors (Fig. 5B). 366 367 368  Here, we test the hypothesis that cancer cells regulate their proteostasis 383 machinery to mitigate the damaging effects of passenger mutations, which can 384 destabilize and misfold proteins. Misfolded proteins can arise from non-synonymous or 385 nonsense passengers which cause abnormal amino acid modifications or pre-mature 386 truncations in proteins. Even synonymous passengers, which are traditionally thought to 387 be functionally silent, can lead to misfolding of proteins through changes in mRNA 388 stability 35 , translational pausing 36,37 , and non-optimal codon usage. 38,39 As a result, 389 protein misfolding can be damaging in cells not only due to a loss of function of the 390 original protein, but also due to a gain in toxicity caused by the aggregation of aberrant 391 peptides. It is intriguing to consider the possibility that the need to manage protein 392 misfolding stress is a hallmark of somatic evolution in cancer.

393
To maintain viability by minimizing these cytotoxic effects, we find that high 394 mutational load tumorssimilar to yeast 40 , bacteria 41,42 , and viruses 43up-regulate the 395 expression of chaperones, which allow mutated proteins that would otherwise be 396 misfolded to retain function. We find evidence suggesting that specific chaperone 397 families that actively participate in protein re-folding (HSP60, HSP90 and HSP70) or 398 disaggregation (HSP100) are up-regulated in response to mutational load, while other 399 chaperone machinery that salvage proteins (Small HS) are downregulated. In addition, 400 we find degradation of mutated proteins through up-regulation of the proteasome to be 401 another possible strategy high mutational load tumors use to mitigate protein misfolding 402 stress. 403 Finally, we find additional mechanisms that high mutational load tumors use to 404 not just mitigate but also prevent protein misfolding. By utilizing post-transcriptional 405 processes that couple alternative splicing with mRNA decay pathways known to occur 406 in normal human tissues 22,44,45 , high mutational load tumors appear to selectively 407 prevent protein production by regulating certain pre-mRNA transcripts to be degraded 408 rather than translated. We find evidence suggesting that the targets of this coordinated 409 un-productive splicing are primarily related to cytoplasmic ribosomal gene expression 410 that controls the translation of proteins, consistent with observations in other 411 organisms 46-48 . Intriguingly, we find that while cytoplasmic ribosome expression is 412 attenuated, mitochondrial ribosome biogenesis in human tumors is up-regulated in 413 response to mutational load. This could both be another mechanism that high 414 mutational load tumors use to compartmentalize and degrade proteins 27 and reflect the 415 increased energetic demands of proteostasis maintenance 49 . 416 The expression responses observed here are not only consistent with protein 417 misfolding stress in other organisms, but also cross-validate in cancer cell lines, where 418 we find similar expression responses to mutational load. This provides further evidence 419 of a generic, cell intrinsic phenomenon occurring that cannot be explained by extrinsic 420 organismal effects, such as aging, changes in the immune system or microenvironment. 421 Furthermore, we move beyond correlations of gene expression responses to mutational 422 load and establish a causal connection by demonstrating that mitigation of protein 423 misfolding through protein degradation and re-folding is necessary for high mutational 424 load cancer cells to maintain viability through perturbation experiments via knockdown 425 experiments with shRNA and drug profiling. 426 The results presented here have many implications. First, they suggest that while 427 there is direct selection during somatic evolution for pathogenic drivers that allow cancer 428 cells to continually proliferate, damaging passengers that destabilize proteins must also 429 cause cancer cells to experience second-order indirect selection for alterations that 430 allow tumors to overcome this proteostasis imbalance. This could occur through 431 phenotypic plasticity, shifts in methylation and chromatin structure, or through 432 compensatory point mutations and duplications, consistent with other studies 50,51 . 433 Indeed, gene duplication, where one copy can still perform the required function while 434 the other copy is non-functional, is another known mechanism that allows cells to 435 maintain robustness to damaging mutations in many eukaryotic organisms 52,53 . In 436 support of this, whole genome-duplication, which is common in cancer, has recently 437 been shown as another potential mechanism that tumor cells could use to maintain 438 robustness to deleterious passengers 54 . However, duplication events are also known to 439 be of genes whose expression is up-regulated in response to mutational load in TCGA. For 516 each gene, expression values across all patients were z-score normalized in all 517 analyses to ensure fair comparisons across genes. Known co-variates of tumor purity 518 and cancer type were included in the GLMM. Tumor purity and mutational load were 519 modeled as fixed effects, whereas cancer type was modeled as a random effect (i.e. 520 random intercept) since it varies across groups of patients and can be interpreted as 521 repeated measurements across groups. For all analyses, mutational load was defined 522 as log10 of the number of synonymous, nonsynonymous and nonsense mutations per 523 tumor. For each gene, the parameters used in the GLMM were as follows, 524 where is a vector of expression values of each tumor, 0 is the fixed intercept, 1 is 526 the fixed slope for the predictor variable 1 which is a vector of mutational load values 527 for each tumor, 2 is the fixed slope for the predictor variable 2 which is a vector of the 528 purity of each tumor, v is the random intercept for each cancer type, and is a 529 Gaussian error term. To examine expression responses to mutational load within a 530 given protein complex and cancer type, the same normalization procedures were 531 applied as above within cancer types and a separate GLM for each cancer type was ran 532 as follows, 533 534 ~ 0 + 1 1 + 2 2 + 3 3 + 535 536 where is a vector of expression values of each tumor in a given cancer type, 0 is the 537 fixed intercept, 1 is the fixed slope for the predictor variable 1 which is a vector of 538 mutational load values for each tumor, 2 is the fixed slope for the predictor variable 2 539 which is a vector of the purity of each tumor, 3 is a change in the intercept for 3 which 540 is a categorical variable of individual genes within each proteostasis complex and is a 541 Gaussian error term. 542 Unlike TCGA, samples within each cancer type in CCLE can be small and are 543 unbalanced (i.e. some cancer types have <10 samples and others have >100 samples). 544 In these cases, mixed effects models may not be able to estimate among-population 545 variance accurately 31 . Thus, for all regression-based analyses in CCLE, a simple 546 generalized linear model (GLM) was used instead. Cell viability values across all cell 547 lines were z-score normalized by gene in all analyses to ensure fair comparisons across 548 genes. To assess whether the same sets of genes are up-regulated in response to 549 mutational load in CCLE using the GLM, a similar procedure to the GLMM was 550 performed. A separate GLM was applied for each gene with the following parameters, 551 where is a vector normalized expression values of each cell line, 0 is the fixed 553 intercept, 1 is the fixed slope for the predictor variable 1 which is a vector of mutational 554 load values for each tumor, and is a Gaussian error term. To assess whether protein 555 abundances are similarly up-regulated in response to mutational load in CCLE in 556 proteostasis complexes, a separate GLM was applied to each gene with the following 557 parameters, 558 559 ~ 0 + 1 1 + 2 2 + 560 561 where is a vector of protein abundance values within each cell line, 0 is the fixed 562 intercept, 1 is the fixed slope for the predictor variable 1 which is a vector of mutational 563 load values for each tumor, and is a Gaussian error term. A similar GLM framework as 564 above was used to estimate the association of mutational load and cell viability after 565 shRNA knock-down of individual genes in proteostasis complexes with the following 566 parameters, 567 ~ 0 + 1 1 + 2 2 + 568 569 where is a vector of normalized cell viability estimates after expression knock-down of 570 an individual gene across all cell lines, 0 is the fixed reference intercept, 1 is the fixed 571 slope for the predictor variable 1 which is a vector of mutational load values for each 572 cell line, 2 is a change in the intercept for 2 which is a categorical variable of individual 573 genes within each proteostasis complex, and is a Gaussian error term. To estimate 574 the association of mutational load and cell viability after pharmacologic inhibition of 575 proteostasis machinery, the following GLM was applied to each relevant drug in PRISM: 576 where is a vector normalized cell viability estimates after drug inhibition across all cell 578 ines, 0 is the fixed intercept, 1 is the fixed slope for the predictor variable 1 which is a 579 vector of mutational load values for each tumor, and is a Gaussian error term. classes (e.g. such as copy number alterations, CNAs) were considered but not found to 585 correlate with point mutations (Supplemental Figure 1). A jackknife re-sampling 586 procedure was used for outlier analysis and to determine whether specific cancer types 587 are driving patterns of association within the GLM and GLMM. Briefly, each cancer type 588 was removed from the regression model one at a time, and regression coefficients were 589 re-estimated. Overall, regression coefficients were fairly stable across cancer types and 590 trends remained the same (Supplemental Figure 5 and 8). 591 Proteostasis gene sets. Genes for chaperone complexes were identified from 76 and 592 genes that are co-chaperones were not considered. Proteasome and ribosomal 593 complexes were identified from CORUM 17 . 594 Gene set enrichment analysis. All gene set enrichment analysis was performed using 595 gprofiler2 with default parameters. For all sets of genes, significance was determined 596 after correcting for multiple hypothesis testing (FDR < 0.05). For gene set enrichment 597 analysis used to identify genes up-regulated in TCGA in response to mutational load, all 598 terms in CORUM database were reported and enrichment terms in the KEGG database 599 of diseases not related to cancer (e.g. 'Influenza A') were omitted from the main figures 600 for clarity and space. For gene sets used to identify terms differentially splice in between 601 high and low mutational load tumors, all terms in the CORUM and the REACTOME 602 database were reported in the main figures. The full set of enrichment terms for all 603 analyses is reported in Supplemental  Figure 4). For each alternative splicing 619 event in a gene, we quantified whether the expression of the same gene was under-620 expressed. Each gene was counted as under-expressed if it was one standard deviation 621 below the mean expression within each cancer type. Genes that contained a point 622 mutation within the same alternative splicing event were removed to control for eQTL 623 effects. We note that intron retention events removed from this analysis represent only 624 ~1% of intron retention events across all tumors and similar trends are found when this 625 filtering scheme is not applied (Supplemental Figure 3). In addition, we evaluated 626 whether this trend is robust to other alternative splicing events (i.e., Alternate Donor 627 Sites, Alternate Promoters, Alternate Terminators, Exon Skipping Events, ME=Mutually 628 Exclusive Exon; Supplemental Figure 4). 629 To investigate which genes are differentially spliced in between low and high 630 mutational load tumors for specific alternative splicing events (i.e. intron retention), a t-631 test was used to calculate whether PSI values were significantly different in tumors with 632 < 10 protein-coding mutations compared to tumors with > 1000 protein-coding 633 mutations. Each alternative splicing event within a gene was required to have less than 634 25% of missing PSI values and a mean difference between the two groups of >0.01 to 635 be considered. This approach identified 606 and 201 significant genes that have more 636 and fewer intron retention events in high mutational load tumors, respectively, after 637 correcting for multiple hypothesis testing (FDR < 0.05).

639
Drug category annotation and enrichment analysis. A separate GLM was ran for all 640 drugs in the PRISM database to evaluate whether they are associated with mutational 641 load and cell viability. All drugs that were negatively associated with mutational load and 642 viability were queried on PubMed based on their reported mechanism of action in 643 PRISM and grouped into broad categories (Supplemental Table 1). Categories of drug 644 mechanism of action were first chosen based on their role in metabolism and known 645 hallmarks of cancer. Additional categories not directly related to known cancer 646 associated functional groups were made for drugs that could not otherwise be grouped 647 (i.e. 'Ion Channel Regulation', Viral Replication Inhibitor', etc.). Drugs with ambiguous 648 mechanism of action (e.g. 'cosmetic', 'coloring agent') were grouped into 'Other'. The 649 abstracts of up to 10 associated papers were used to examine for evidence connecting 650 drug mechanisms of action to 33 broad categories. In total, 700 drug mechanism of 651 action were grouped and annotated into 33 broad categories. These broad categories 652 were used to assess whether high mutational load cancer cell lines are generically 653 vulnerable to drugs or whether certain categories are more likely to contain drugs 654 effective against high mutational load cell lines. To control for differences in the number 655 of drugs within each category, 50 drugs were randomly sampled, and the fraction of 656 drugs significantly associated with mutational load in each category was calculated 100 657 times to generate confidence intervals. 658 Code and software availability. All code used for analysis will be made publicly

708
These results further support the prediction that gene silencing is elevated in high mutational load tumors