Discovery of novel putative tumor suppressors from CRISPR screens reveals rewired lipid metabolism in AML cells

CRISPR knockout screens in hundreds of cancer cell lines have revealed a substantial number of context-specific essential genes that, when associated with a biomarker such as lineage or oncogenic mutation, offer candidate tumor-specific vulnerabilities for targeted therapies or novel drug development. Data-driven analysis of knockout fitness screens also yields many other functionally coherent modules that show emergent essentiality or, in rarer cases, the opposite phenotype of faster proliferation. We develop a systematic approach to classify these suppressors of proliferation, which are highly enriched for tumor suppressor genes, and define a network of 145 genes in 22 discrete modules. One surprising module contains several elements of the glycerolipid biosynthesis pathway and operates exclusively in a subset of AML lines, which we call Fatty Acid Synthesis/Tumor Suppressor (FASTS) cells. The proliferation suppressor activity of genes involved in the synthesis of saturated fatty acids, coupled with a more severe fitness phenotype for the desaturation pathway, suggests that these cells operate at the limit of their carrying capacity for saturated fatty acids, which we confirmed biochemically. Overexpression of genes in this module is associated with a survival advantage in an age-matched cohort of AML patients, suggesting the gene cluster driving an in vitro phenotype may be associated with a novel, clinically relevant subtype.


48
Gene knockouts are a fundamental tool for geneticists, and the discovery of CRISPR-based 49 genome editing 1 and its adaptation to gene knockout screens has revolutionized mammalian 50 functional genomics and cancer targeting 2-8 . Hundreds of CRISPR/Cas9 knockout screens in 51 cancer cell lines have revealed background-specific genetic vulnerabilities 9-13 , providing guidance 52 for tumor-specific therapies and the development of novel targeted agents. Although lineage and 53 mutation state are powerful predictors of context-dependent gene essentiality, variation in cell 54 growth medium and environment can also drive differences in cell state, particularly among 55 metabolic genes 14,15 , and targeted screening can reveal the genetic determinants of metabolic 56 pathway buffering 16,17 . 57

58
The presence and composition of metabolic and other functional modules in the cell can also be 59 inferred by integrative analysis of large numbers of screens. Correlated gene knockout fitness 60 profiles, measured across hundreds of screens, have been used to infer gene function and the 61 modular architecture of the human cell 18-21 . Data-driven analysis of correlation networks reveals 62 clusters of functionally related genes whose emergent essentiality in specific cell backgrounds is 63 often unexplained by the underlying lineage or mutational landscape 21 . Interestingly, in a recent 64 (removing dual-annotated tumor suppressors) with our previously-specified set of nonessential 148 genes as a true negative reference set 7,47 . Since there is no expectation for the presence of a 149 consistent set of PSG across cell lines, we analyzed each of the 808 cell lines from the Avana 150 2020Q4 data release independently 10,48,49 calculating gene-level scores on each cell line 151 individually and then combining all scores into a master list of 808 x 18k = 14.6 million gene-cell 152 line observations (Supplementary Table 1). Moreover, since there is also no expectation that all 153 COSMIC TSG would be detected cumulatively across all cell lines, we judged that traditional recall 154 metrics (e.g. percentage of the reference set recovered) were inappropriate. We therefore defined 155 recall as the total number of TSG-cell line observations. Using this evaluation scheme, the mixed 156 Z-score approach outperforms comparable methods by a substantial margin, classifying more 157 than 722 PS-cell line instances at a 10% false discovery rate (FDR) (Figure 1d). This is roughly 158 50% more putative PSG than the closest alternative, a nonparametric rank-based approach, at 159 the same FDR. BAGEL 41,42 , a supervised classifier of essential genes, performed worst at TSG, 160 and the raw mean LFC approach also fared poorly, highlighting the need for variance 161 normalization across experiments. We applied this 10% FDR threshold for all subsequent 162 analyses. 163 164 Common tumor suppressor genes PTEN and TP53 were observed in ~15% of cell lines ( Figure  165 1e), with other well-known TSG appearing less frequently. Among 309 COSMIC TSGs for which 166 we have fitness profiles (representing 1.7% of all 18k genes), we find that 116 (37.5%) of these 167 genes occur as proliferation suppressors at least once (Supplementary Table 2) and make up 168 24.4% of total proliferation suppressor calls (Supplementary Figure 2a-b), a 14-fold enrichment. 169 All of the known TSG hits come from just 504 of the 808 cell lines (62.4%) in which proliferation 170 suppressor hit calls were identified (Figure 1f), yet we did not observe a bias toward particular 171 tissues: in every lineage, most cell lines carried at least one PSG (Supplementary Figure 1g). 172 173 To further validate our approach, we compared the set of TSGs in our PSG hits to other molecular 174 profiling data. When identified as a proliferation suppressor, 53% of the 116 TSGs demonstrate 175 higher mean mRNA expression relative to backgrounds where the same TSG is not a proliferation 176 suppressor (Supplementary Table 2). Similarly, 96.6% of the 116 TSGs, when classified as a 177 proliferation suppressor, demonstrate lower frequency of nonsilent mutations compared to 178 backgrounds where the TSG is not a hit (Supplementary Table 2). These observations were not 179 restricted to COSMIC TSGs however, as this was the case for all PSG hit calls of genes against 180 non-PSG hit calls (Supplementary Figure 2c and 2d). Copy number comparisons did not 7 suggest major distinctions between PSG vs non-PSG calls (Supplementary Figure 2e), however 182 there did appear to be more variation in PSG observations, possible stemming from smaller 183 grouped sample sizes. Together, these observations confirm the reliability of our method to detect 184 genes whose knockout results in faster cell proliferation, and that, analogous to essential genes, 185 these genes must be expressed and must not harbor a loss-of-function mutation in order to elicit 186 this phenotype. 187 188 We attempted to corroborate our findings using a second CRISPR dataset of 342 cell line screens 189 from Behan et al. 13  Although known TSG act as PSG in only a subset of cell lines, we observed patterns of co-203 occurrence among functionally related genes. PTEN co-occurs with mTOR regulators NF2 50 (P < 204 3x10 -11 , Fisher's exact test) and the TSC1/TSC2 complex (P-values both < 7x10 -13 ) 51 , as well as 205 Programmed Cell Death 10 (PDCD10) 52 , a proposed tumor suppressor 7,53 (Figure 2a). The p53 206 regulatory cluster (TP53, CDKN1A, CHECK2, TP53BP1) also exhibited a strong co-occurrence 207 pattern that was independent of the mTOR regulatory cluster (Figure 2a). mTOR 54 and cell cycle 208 checkpoint genes 55,56 have been heavily linked to cancer development, given their roles in 209 controlling cell growth and proliferation, and thus have been the focus of studies characterizing 210 patient genomic profiles to identify common pathway alterations 57,58 . 211

212
The modularity of mTOR regulators and TP53 regulators demonstrates pathway-level 213 proliferation suppressor activity. This reflects the concept of genes with correlated fitness profiles 214 indicating the genes operate in the same biochemical pathway or biological process 19,21,59,60 . monounsaturated fatty acid species, and Sterol Regulatory Element Binding Transcription Factor 283 1 (SREBF1), the master regulatory factor for lipid homeostasis in cells. 284 285 Clustering the AML cells lines according to these high-dPCC genes reveals two distinct subsets 286 of cells. The FAS cluster and its correlates show strong proliferation suppressor phenotype in four 287 cell lines, NB4, MV411, MOLM13, and THP1. The remaining thirteen AML cell lines show 288 negligible to weakly essential phenotypes when these genes are knocked out. The anticorrelated 289 genes, including SCD and SREBF1, show heightened essentiality in these same cell lines. 290 Together these observed shifts in gene knockout fitness indicates that this subset of AML cells 291 has a substantial metabolic rewiring. Because these cells share a genetic signature among fatty 292 acid synthesis pathway genes that is consistent with tumor suppressors, we call these cell lines 293 Fatty Acid Synthesis/Tumor Suppressor (FASTS) cells (Figure 3e). 294 295

Cas12a-mediated Genetic Interaction Screens Confirm Rewired Lipid Metabolism 296 297
We sought to confirm whether gene knockout confers improved cell fitness, and to gather some 298 insight into why some AML cells show the FASTS phenotype and others do not. Genetic 299 interactions have provided a powerful platform for understanding cellular rewiring in model 300 organisms, and we sought to apply this approach to deciphering the FASTS phenotype. We 301 designed a CRISPR screen that measures the genetic interactions between eight selected "query 302 genes" and ~100 other genes ("array genes"). The query genes include FASN and ACACA, from 303 the cluster of proliferation-suppressor genes, as well as lipid homeostasis transcription factor 304 SREBF1, anticorrelated with the FAS cluster in the differential network analysis, and 305 uncharacterized gene c12orf49, previously implicated in lipid metabolism by coessentiality 21 and 306 a recent genetic interaction study 60 . Additional query genes include control tumor suppressor 307 genes TP53 and PTEN and control context-dependent essential genes GPX4 and PSTK (Figure  308 4a). The array genes include two to three genes each from several metabolic pathways, including 309 various branches of lipid biosynthesis, glycolysis and glutaminolysis, oxphos, peroxisomal and 310 mitochondrial fatty acid oxidation. We include the query genes in the array gene set (Figure 4a) 311 to test for screen artifacts and further add control essential and nonessential genes to measure 312 overall screen efficacy (Supplementary Table 4-5). 313

314
We used the enCas12a CRISPR endonuclease system to carry out multiplex gene knockouts 35 . 315 construction of specific guide pairs through pooled oligonucleotide synthesis (Figure 4b). The  317   library robustly measures single knockout fitness by pairing three Cas12a crRNA per target gene  318   each with five crRNA targeting nonessential genes 7,47 (n=15 constructs for single knockout  319 fitness), and efficiently assays double knockout fitness by measuring all guides targeting query-320 array gene pairs (n=9) (Figure 4c & Supplementary Table 5). Using this efficient design and the 321 endogenous multiplexing capability of enCas12a, we were able to synthesize a library targeting 322 800 gene pairs with a single 12k oligonucleotide array. 323

324
We screened one AML cell line from the FASTS subset, MOLM13, and a second one with no FAS 325 phenotype, NOMO1, collecting samples at 14 and 21 days after transduction with a five-day 326 puromycin selection (Supplementary Table 6-7). Importantly, by comparing the mean log fold 327 change of query gene knockouts in the "A" position vs. the same genes in the "B" position of the 328 dual knockout vector, we find no positional bias in the multiplex knockout constructs (Figure 4d and triglycerides, and further discovered CHP1 as a key regulatory factor for GPAT4 activity 67,68 . 378 Within hematological cancer cell lines, the coessentiality network is significantly restructured, with 379 the ACACA/FASN module correlated with SCD in most backgrounds (r = 0.35, P < 10 -18 ) but 380 strongly anticorrelated in 36 blood cancer cell lines (r = -0.52, P < 10 -3 , Figure 3e). The magnitude 381 of this change in correlation is ranked #8 out of 31 million gene pairs (see Methods). In contrast, 382 387 Collectively these observations make a strong prediction about the metabolic processing of 388 specific lipid species. Faster proliferation upon knockout of genes related to saturated fatty acid 389 processing, coupled with increased dependency on fatty acid desaturase gene SCD (Figure 5a), 390 suggests that these cells are at or near their carrying capacity for saturated fatty acids. To test 391 this prediction, we exposed three  Figure 7). Palmitate-induced lipotoxicity has been 398 studied in many contexts -and importantly, the role of GPAT4 and CHP1 in mediating lipotoxicity 399 was well described recently 67,68 -but, to our knowledge, this is the first instance of a genetic 400 signature that predicts liposensitivity. To explore whether the FASTS phenotype has clinical relevance, we compared our results with 405 patient survival information from public databases. Using genetic characterization data from 406 CCLE 69 , we did not find any lesion which segregated FASTS cells from other CD33+ AML cells 407 (Figure 6a), so no mutation is nominated to drive a FASTS phenotype in vivo. Instead, we 408 explored whether variation in gene expression was associated with patient outcomes. We 409 included genes in the core FASTS module as well as genes with strong genetic interactions with 410 ACACA/FASN in our screen (Figure 6a). To select an appropriate cohort for genomic analysis, 411 we first considered patient age. Although AML presents across every decade of life, patients from 412 whom FASTS cell lines were derived are all under 30 years of age (sources of other AML cells 413 ranged from 6 to 68 years; Figure 6b). With this in mind, we explored data from the TARGET-414 AML 71 project, which focuses on childhood cancers (Figure 6c). Using TARGET data, we 415 calculated hazard ratios using univariate Cox proportional-hazards modeling with continuous 416 mRNA expression values for our genes of interest as independent variables. We observed that 417 consistent with a tumor suppressor signature (Figure 6d), and that no other gene from our set 419 shows a negative HR. Indeed, when stratifying patients from the TARGET cohort with high 420 expression of GPAT4, CHP1, PCGF1, and GPI (Figure 6e), we observe significantly improved 421 survival (P-value = 0.001, Figure 6f). These findings are not replicated for GPAT4, CHP1, and 422 GPI in the TCGA 72 or OHSU 73 tumor genomics data sets, possibly because they sample older 423 cohorts (Polycomb group subunit PCGF1 is observed to have a HR < 1 within the OHSU cohort, 424 Supplementary Figure 8a). However, age is not generally associated with expression of genes 425 in the FAS cluster in either cell lines or tumor samples (Supplementary Figure 8). been mainly focused on essential gene phenotypes, there is still much clinically relevant biology 431 that can be uncovered by examining other phenotypes from a genetic screen. We establish a 432 methodology that can reliably identify the proliferation suppressor phenotype from whole-genome 433 CRISPR knockout genetic screens. This represents, to our knowledge, the first systematic study 434 of this phenotype in the more than 1,000 published screens 8,10,11,13,48 . 435

436
The activity of proliferation suppressor genes is inherently context-dependent, rendering global 437 classification difficult. As with context-dependent essential genes, the strongest signal is attained 438 when comparing knockout phenotype with underlying mutation state. phenotype, we lack the statistical power to discover associations in an unbiased way. However, 464 by narrowing our search to strong hits from the differential network analyses, we found a 465 significant survival advantage in a roughly age-matched cohort for GPAT4 and CHP1 466 overexpression. This finding points to a wholly novel tumor suppressor signature for our PSG 467 module, though significant further study is necessary to determine whether this gene expression 468 signature confers a similar in vivo metabolic rewiring and sensitivity to saturated lipids. 469

470
The combination of genetic, biochemical, and clinical support for the discovery of a novel tumor 471 suppressor module has several implications. First, it provides a clinical signature that warrants 472 further research as a prognostic marker as well as a potential therapeutic target. Second, it 473 demonstrates the power of differential network analysis, and in particular differential genetic 474 interaction networks, to dissect the rewiring of molecular pathways from modular phenotypes. 475 And finally, it suggests that there still may be much to learn from data-driven analyses of large-476 scale screen data, beyond the low-hanging fruit of lesion/vulnerability associations.   CHEK2  ATM  TP53  TP53BP1  USP28  NRP1  PTEN  TSC1  TSC2  KIRREL1  NF2  PDCD10  TAOK1  PTPN14  LATS2  AMOTL2 TAF5L  TADA1  TADA2B  SUPT20H  MED19  TAF6L  USP22   IFNAR2  TYK2  STAT2  IRF9  JAK1  IFNAR1  EIF2AK2   NF1  SPRED2  LZTR1  SPRED1   FASN  GPAT4  CHP1  GPI  ACACA  PCGF1 GPAT4  FASN  CHP1  CERS6  SQLE  PCGF1  ACACA  GPI  TMEM41A  SREBF2  CPT2  SREBF1  SCD  LDLR  ACLY  CRLS1  FABP5  NB4  MV411  MOLM13  THP1  MONOMAC1  OCIAML3  KO52  HEL9217  HEL  MUTZ8  KASUMI1  MOLM14  NOMO1  OCIAML2  SHI1  P31FUJ  F36P  EOL1  OCIM2  M07E  TF1  AML193  U937   TP53  NRAS  KRAS  FLT3  The fitness enhancement introduced by PSG knockout, relatively weak compared to severe 682 defects from essential gene knockout, often precludes detection in a shorter experiment. In the 683 example F5 cell line (Figure 1a)   All cell lines were routinely tested for mycoplasma contamination and were maintained without 814 antibiotics except during screens, when the media was supplemented with 1% 815 penicillin/streptomycin. Cell lines were kept in a 37 °C humidity-controlled incubator with 5.0% 816 carbon dioxide and were maintained in exponential phase growth by passaging every 2-3 days. 817 The following media conditions and doses of polybrene, puromycin, and blasticidin, respectively, 818 were used: 819 MOLM13: RPMI + 10% FBS; 8 μg mL -1 ; 4 μg mL -1 ; 8 μg mL -1 820 NOMO1: RPMI + 10% FBS; 8 μg mL -1 ; 1 μg mL -1 ; 8 μg mL To score genetic interactions we used a custom python package, gnt 117 , available on the python 863 package index. We use log-fold changes (LFCs) as inputs to the scoring pipeline. We define "0 as 864 the observed LFC of a guide pair , and 12 M as this pair's expected LFC. We then calculate the 865 residual "0 − 12 M to generate an interaction score. To define expected LFCs, 12 M we fit a linear 866 regression for each guide, , saying 867 where is the LFC of each guide individually and " and " are the fit slope and intercept for 869 guide (Supplementary Figure 6b). We refer to as the anchor guide and its pairs as target 870 guides. We then Z-score residuals within each anchor guide. This approach is similar to the one 871 taken by Horlbeck et al. 33 . 872 To aggregate interaction scores at the gene level, we sum the z-scored residuals,