Transcriptional silencing of ALDH2 in acute myeloid leukemia confers a dependency on Fanconi anemia proteins

Hundreds of genes become aberrantly silenced in acute myeloid leukemia (AML), with most of these epigenetic changes being of unknown functional consequence. Here, we demonstrate how gene silencing can lead to an acquired dependency on the DNA repair machinery in AML. We make this observation by profiling the essentiality of the ubiquitin conjugation and ligation machinery in cancer cell lines using domain-focused CRISPR screening, which revealed Fanconi anemia (FA) proteins UBE2T (an E2) and FANCL (an E3) as unique dependencies in AML. We demonstrate that these dependencies are due to a synthetic lethal interaction between FA proteins and Aldehyde Dehydrogenase 2 (ALDH2), which function in parallel pathways to counteract the genotoxic effects of endogenous aldehydes. We provide evidence that DNA hypermethylation and transcriptional silencing of ALDH2 occur in a recurrent manner in human AML patient samples, which is sufficient to confer FA pathway dependency in this disease. Taken together, our study suggests that targeting of the ubiquitination reaction catalyzed by FA proteins can eliminate ALDH2-deficient AML.


INTRODUCTION
Organisms have evolved redundant genetic pathways for carrying out essential cellular processes. This often arises from gene duplication events, which produce paralogs that carry out overlapping functions (Kafri et al., 2009). Alternatively, non-homologous gene pairs can be redundant if they encode parallel pathways that regulate a shared cellular process (Innan and Kondrashov, 2010). For example, BRCA1 and PARP1 are biochemically distinct, yet function redundantly to repair DNA double stranded breaks (Bryant et al., 2005;Farmer et al., 2005). In this example of synthetic lethality, cells can tolerate a deficiency of either BRCA1 or PARP1 alone, yet a combined loss of these two factors causes cell death.
While genetic redundancies support robustness during normal processes, cancer cells often lack such redundancies owing to genetic or epigenetic alterations (Cereda et al., 2016;O'Connor, 2015). This offers therapeutic opportunities, such as drugs that block PARP1 in cancers caused by a genetic deficiency of BRCA1 (Bryant et al., 2005;Farmer et al., 2005). Due to its therapeutic significance, the identification of synthetic lethal genetic interactions remains an important objective in the study of human cancer (Kaelin, 2005).
In recent years, there has been a resurgent interest in the ubiquitination machinery as targets for cancer therapy. Ubiquitin is a 8.6 kDa protein that is covalently attached to lysine side chains as a form of posttranslational regulation of protein stability and function (Komander and Rape, 2012). The cascade reaction of ubiquitination requires the consecutive action of three enzymes: a ubiquitin-activating enzyme (E1), ubiquitin-conjugating enzymes (E2), and ubiquitin ligases (E3) (Grabbe et al., 2011). With ~40 E2 and over 600 putative E3 proteins encoded in the human genome, the ubiquitination machinery regulates many aspects of cell biology, including the cell cycle, DNA repair, and transcription (Senft et al., 2018). In addition, it has been established that the function of E2 and E3 proteins can be modulated with small molecules (Hoeller and Dikic, 2009;Skaar et al., 2014). For example, the E3 ligase protein MDM2 can be inhibited with small molecules to stabilize p53 and promote apoptosis of cancer cells (Hatakeyama, 2011). In addition, small molecules have been identified that alter the specificity of E3 ligases to trigger proteasome-mediated degradation of neo-substrates (Kronke et al., 2014;Lu et al., 2014). Despite the clear therapeutic potential of the ubiquitination machinery, there have been few efforts to date aimed at identifying E2/E3 dependencies in cancer cells using genetic screens.
The Fanconi anemia (FA) pathway, comprised of more than 20 protein components, repairs DNA interstrand crosslinks (ICLs) (Ceccaldi et al., 2016). A key step in the activation of the FA repair pathway is the mono-ubiquitination of FANCD2 and FANCI, which is performed by UBE2T/FANCT (an E2 conjugating enzyme) and FANCL (an E3 ligase) (Ceccaldi et al., 2016;Kottemann and Smogorzewska, 2013). Once ubiquitinated, a FANCD2/FANCI heterodimer encircles the DNA (Alcón et al., 2020;Wang et al., 2020), which facilitates lesion processing by nucleases and completion of the repair by homologous recombination (Ceccaldi et al., 2016;Wang and Smogorzewska, 2015). While DNA crosslinks can be caused by exogenous mutagens, emerging evidence suggests that endogenously produced aldehydes are an important source of this form of DNA damage (Langevin et al., 2011). Lossof-function mutations in FA genes lead to Fanconi anemia, a genetic disorder characterized by bone marrow failure and a predisposition to leukemia and aerodigestive tract squamous cell carcinoma (Kottemann and Smogorzewska, 2013). While a deficiency in FA proteins is cancer-promoting in certain tissue contexts, for the majority of human cancers the FA pathway remains intact and therefore may perform an essential function in established cancers. To our knowledge, a role of FA proteins as cancer dependencies has yet to be identified.
Here, we discovered an acquired dependency on FA proteins in a subset of AML that lacks expression of Aldehyde Dehydrogenase 2 (ALDH2). We show that epigenetic silencing of ALDH2 occurs in a recurrent manner in human AML and is sufficient to confer FA dependency in this disease. Our study suggests that blocking the UBE2T/FANCL-mediated ubiquitination can selectively eliminate ALDH2-deficient AML cells, but would spare ALDH2-expressing normal cells present in the majority of human tissues.
Biolabs; Cat. No. E2611). The LRG2.1T vector was derived from a lentiviral U6-sgRNA-EFS-GFP expression vector (Addgene: #65656). The pooled plasmids library was subjected to deep-sequencing analysis on a MiSeq instrument (Illumina) to verify the identity and representative of sgRNAs in the library. It was confirmed that 100% of the designed sgRNAs were cloned in the LRG2.1T vector and that the abundance of >95% of individual sgRNA constructs was within 5-fold of the mean.

Virus production and transduction
Lentivirus was produced in HEK293T cells by transfecting plasmids together with helper packaging plasmids (VSVG and psPAX2) using polyethylenimine (PEI 25000;Polysciences;Cat. No. 23966-1) transfection reagent. HEK293T cells were plated in 10 cm culture dishes and were transfected when confluency reached ~80-90%. Five plates of HEK293T were used to ensure the representation of library.
For one 10 cm dish of HEK293T cells, 10 µg of plasmid DNA, 5 µg of pVSVG and 7.5 µg psPAX2 and 64 µL of 1 mg/mL PEI were mixed, incubated at room temperature for 20 min and then added to the cells. The media was changed to fresh media 6-8 h post-transfection. The media containing lentivirus was collected at 24h, 48h and 72h post transfection and pooled together. Virus was filtered through 0.45 µM non-pyrogenic filter.
For shRNA knock-down experiments, retrovirus was produced in Plat-E cells, which were transfected with retroviral DNA (MLS-E plasmid), VSVG, and Eco helper plasmids in a ratio of 10:1:1.5. The media was changed to fresh media 6-8 h post-transfection. Retrovirus-containing supernatant was collected at 24, 48 and 72 h after transfection and pooled together. Virus was filtered through 0.45 µM non-pyrogenic filter.
For both lentivirus and retrovirus infection, target cells were mixed with corresponding volume of virus supplemented with 4 µg/mL polybrene, and then centrifuged at 600 x g for 40 min at room temperature.
If selection was needed for stable cell line establishment, corresponding antibiotics (1 µg/mL puromycin, 1 mg/mL G418) were added 72 h post infection.

Plasmid construction, sgRNA and shRNA cloning
For CRISPR screening, the optimized sgRNA lentiviral expression vector (LRG2.1T) and the lentiviral human codon-optimized Streptococcus pyogenes Cas9 vector (LentiV_Cas9_Puro, Addgene: 108100) were used. For the competition-based proliferation assays, sgRNAs were cloned into the LRG2.1T vector using BsmBI restriction site. LRCherry2.1 was derived from LRG2.1T by replacing GFP with mCherry CDS. All sgRNA and shRNA sequences used in the study are listed in Supplementary  (Shi et al., 2015), with Quick ligation kit (NEB; Cat. No. M2200L). The ligated DNA was PCR amplified with primers containing Illumina paired-end sequencing adaptors. The final libraries were quantified using bioanalyzer Agilent DNA 1000 (Agilent 5067-1504) and were pooled together in equal molar ratio for paired-end sequencing using MiSeq platform (Illumina) with MiSeq Reagent Kit V3 150-cycle (Illumina).
The sequencing data was de-multiplexed and trimmed to contain only the sgRNA sequence. The sgRNA sequences were mapped to a reference sgRNA library to discard any mismatched sgRNA sequences.
The read counts of each sgRNA were calculated. The following analysis was performed with a custom Python script: sgRNAs with read counts less than 50 in the initial time point were discarded; The total read counts were normalized between samples; Artificial one count was assigned to sgRNAs that have zero read count at final time point; The average log2 fold change in abundance of all sgRNA against a given domain/gene was calculated. AML-specific dependency was determined by subtracting the average of log2 fold-change in non-AML cell lines from average log2 fold-change in AML cell lines, and the score was ranked in ascending order. The E2/E3 CRISPR screening data are shown in Supplementary   Table 3.

Analysis of genetic dependencies and gene expression in DepMap and other data sets
Genetic dependency (CRISPR; Avana) data, RNA-seq gene expression (CCLE) data and DNA methylation data (CCLE, promoter 1kb upstream TSS) from cancer cell lines were extracted from the DepMap Public Project Achilles 20Q2 database (http://depmap.org/portal/). RNA expression data with genomic information in AML patient and other tumor patient samples was extracted from The Cancer Genome Atlas database via cBioportal (https://www.cbioportal.org/). Pearson correlation coefficient was calculated in RStudio 1.2.5.

Competition-based cell proliferation assay
Cas9-expressing cell lines were lentivirally transduced with LRG2.1T sgRNA linked with GFP or mCherry reporter. Percentage of GFP positive cell population was measured at day 3 or day 4 as initial time point using a Guava Easycyte HT instrument (Millipore). GFP% (mCherry%) was then measured every two days (for leukemia cell lines) or every three days (for non-leukemia cell lines) over a time course. The relative change in the GFP% (or mCherry%) percentage at each time point was then normalized to initial time point GFP% (or mCherry%). This relative change was used to assess the impact of individual sgRNAs on cellular proliferation, which reflects cells with a genetic knockout being outcompeted by non-transduced cells in the cell culture.

Western blot analysis
Cells were collected and washed once with PBS. Cell pellets were resuspended in RIPA buffer (Thermo Scientific;Cat. No. 89901), and sonicated to fragment chromatin. Cell lysate was mixed with SDSloading buffer containing 2-mercaptoethanol and boiled at 95 °C for 5 min. The cell extracts were separated by SDS-PAGE (NuPAGE 4-12% Bis-Tris protein Gels, Thermofisher), followed by transfer to nitrocellulose membrane using wet transfer at 30 V overnight. Membrane was blocked with 5% nonfat milk in TBST and incubate with primary antibody (1:500 dilution except FLAG antibody which is 1:1000 dilution) in 5% milk in room temperature for one hour. After incubation, membrane was washed for three times with TBST followed by incubation with secondary antibody for one hour at room temperature. After three times wash with TBST, membrane was then incubated with chemiluminescent HRP substrate (Thermo fisher; Cat. No. 34075). Primary antibodies used in this study included UBE2T

Immunofluorescence for phospho-H2AX foci
Cells were harvested four days after lentiviral spin-infection, and spun onto a slide using Shandon Cytospin 2 centrifuge. Cells were washed once in PBS, fixed with 3.7% formaldehyde (Sigma) in PBS for 10 min at room temperature, washed twice with PBS, permeabilized with 0.5% Triton X-100 (Sigma) in PBS for 10 min at room temperature, washed in PBS twice, incubated in 5% FBS/PBS for one hour at room temperature for blocking, incubated with primary antibody (anti-phospho-H2AX, clone JBW301; Millipore, Cat. No. 05-636) at 1:1000 dilution at 4 °C overnight, washed with 5% FBS/PBS for 5 min three times, incubated with secondary antibody (anti-moue AF594; Invitrogen, Cat. No. A11005) at 1:2000 dilution for one hour at room temperature, washed with 5% FBS/PBS for 5 min three times, once with PBS and mounted with DAPI Fluoromount-G® (SouthernBiotech). Images were obtained using Zeiss Axio Observer A1 microscope and AxioVision 4.9.1. software. Data were analyzed by CellProfiler 3.1.8.

RNA-Seq library construction
Total RNA of each cell line was extracted using TRIzol reagent (Thermo Scientific;Cat.No. 15596018) according to the manufacturer's instructions. Briefly, 3 million cells were lysed with 1 mL of TRIzol and 200 ul chloroform and incubated for 10 min at room temperature followed by centrifuge at 10,000 x g for 15 min at 4 °C. The aqueous phase was added to equal volume of isopropanol and incubated for 10 min at room temperature. RNA was precipitated at 10,000 x g for 10 min at 4 °C, the pellet was washed once with 75% ethanol and dissolved in DEPC-treated water. RNA-Seq libraries were constructed using TruSeq sample preparation kit V2 (Illumina) according to the manufacture's instruction. Briefly, polyA mRNA was selected from 2 µg of purified total RNA, and fragmented with fragmentation enzyme. First strand of cDNA was synthesized using Super Script II reverse transcriptase, and then second strand was synthesized. Double-stranded cDNA was end-repaired, 3'-adenylated, ligated with indexed adaptor, and then PCR-amplified. The quantity of the RNA-seq library was determined by nanodrop, and the average quantity of RNA-seq libraries ranged from 40 to 80 ng/µL.
The same molar amount of RNA-seq library was pooled together and analyzed by single-end sequencing using NextSeq platform (Illumina) with single-end reads of 75 bases.

RNA-seq data analysis
Sequencing reads were mapped into reference genome hg38 using STAR v2.5.2 with default parameters (Dobin et al., 2013). Read counts tables were created by HTSeq v0.6.1 with customed gtf file containing protein coding genes only. Differentially expressed genes were analyzed using DESeq2 with replicate (Love et al., 2014) using default parameters. RPKMS (reads per kilobase per million mapped reads) were calculated by using Cuffdiff v2.2.1 with default parameters (Trapnell et al., 2013). Genes with RPKMs of more than three in the control were considered as expressed and used in the subsequent analysis.
Genes were ranked by their log2 fold change calculated from DESeq2 as input for Pre-ranked GSEA analysis with all available signatures in the Molecular Signature Database v5.2 (MSigDB).

Aldefluor assay
The Aldefluor assay was performed following the instruction of the Aldefluor Kit (STEMCELL; Cat. No. 01700). Briefly, 0.5 x 10 6 fresh cell samples were collected and washed once in PBS buffer. The cells were then resuspended in Aldefluor Assay Buffer to 1 x 10 6 /mL. 5 µL of DEAB reagent was added to the cell lines as negative control. 5 µL of activated Aldefluor reagent was added to the control and test samples. The solution was mixed and incubated at 37 °C for 50 min. The cells were collected and resuspended in 500 µL Aldefluor Assay Buffer, and subjected to flow cytometry assay for data acquisition. The experiments were performed in triplicate.

Mitochondria fractionation
Mitochondrial fractionation was performed using the Mitochondria Isolation Kit for Cultured Cells following the manufacture's instruction (Thermo Scientific;Cat No. 89874). Briefly, cell membrane was first disrupted by three freeze and thaw cycles, and mitochondria fraction was collected by centrifugation.
The collected mitochondrial pellet was resuspended in RIPA buffer for further western blot analysis.

Nanopore sequencing
crRNA guides specific to the regions of interest (ROI) were designed as per recommended guidelines described in the Nanopore infosheet on Targeted, amplification-free DNA sequencing using CRISPR/Cas (Version: ECI_S1014_v1_revA_11Dec2018). Guides were reconstituted to 100 μM using TE (pH 7.5) and pooled into an equimolar mix. For each distinct sample, four identical reactions were prepared parallelly using 5 µg gDNA each. Ribonucleoprotein complex (RNPs) assembly, genomic DNA dephosphorylation, and Cas9 cleavage were performed as described in (Gilpatrick et al., 2019).
Affinity-based Cas9-Mediated Enrichment (ACME) using Invitrogen™ His- Tag  Real time basecalling was performed with Guppy v3.2, and files were synced to our Isilon 400NL storage server for further processing on the shared CSHL HPCC. Nanopolish v0.13.2 (Simpson et al., 2017) was used to call methylation per the recommended workflow. Briefly, indexing was performed to match the ONT fastq read IDs with the raw signal level fast5 data. The ONT reads were then aligned to the human reference genome (UCSC hg38) using minimap2 v2.17 (Li, 2018) and the resulting alignments were then sorted with samtools v0.1.19 (Li et al., 2009). Nanopolish call-methylation was then used to detect methylated bases within the targeted regions -specifically 5-methylcytosine in a CpG context. The initial output file contained the position of the CpG dinucleotide in the reference genome and the methylation call in each read. A positive value for log_lik_ratio was used to indicate support for methylation, using a cutoff value of 2.0. The helper script calculate_methylation.py was then used to calculate the frequency of methylation calls by genomic position.

RT-qPCR analysis following 5-azacytidine treatment
For dose-dependent experiment, 5-azacytidine was added to cell culture with different concentrations, and cells were collected after 36 h treatment. For time-course treatment, cells were treated with 1 µM 5azacytidine and collected at different time points. Total RNA was extracted using TRIzol reagent as described above. 1-2 µg of total RNA was treated with DNaseI and reverse transcribed to cDNA using qScript cDNA SuperMix (Quantabio; Cat. No. 84033), followed by RT-qPCR analysis with SYBR green PCR master mix (Thermo Fisher; Cat. No. 4309155) on a QuantStudio TM 7 Flex Real-Time PCR System. GAPDH was used as reference gene. The primers used in this study are listed in Supplementary Table   2.

Aligned enhanced reduced representation bisulfite sequencing (ERRBS)
myCpG files from AML (n=119) and normal (n=22) individuals were downloaded from GEO (accession number GSE98350) (Glass et al., 2017). After filtering and normalizing by coverage, a methylBase object containing the methylation information and locations of cytosines that were present in at least 5 samples per condition (meth.min=5) was generated using MethylKit (version 1.9.3) (Akalin et al., 2012) and R statistical software (version 3.5.1). Percent methylation for each CG for each donor, was calculated using the MethylKit 'percMethylation' function. Bedtools (Quinlan and Hall, 2010) intersect function was used to determine overlap with CpGi from hg19. The heatmap of the percent methylation of the cytosines covered within the ALDH2 CpGi was plotted using ComplexHeatmap (Gu et al., 2016), with the complex clustering method and Euclidian distances, and using light grey for NA values. For the boxplot of the average percent methylation for each sample at the CpGi, the mean methylation for each sample was calculated using all CG that were covered in that region, using R. Data was plotted using ggplot2 and significance calculated using the ggplot2 function 'stat_compare_means' with the Student's t-test method. To correlate methylation with gene expression, processed expression data that was generated for the above samples using the Affymetrix Human Genome 133 Plus2.0 GeneChips was downloaded (Glass et al., 2017;Verhaak et al., 2009). The average percent methylation of the regions covered within the CpGi versus RNA expression for ALDH2 was plotted using ggplot2 and fitted with a line calculated with a linear model. Pearson's correlation method was used to determine significance.

Statistics
Error bars represent the mean plus or minus standard error of the mean, and n refers to the number of biological repeats. Statistical significance was evaluated by p value using GraphPad Prism software as indicated in the figure legends. For Kaplan-Meier survival curves, the log rank (Mantel-Cox) test was used to estimate median overall survival and statistical significance.

A domain-focused CRISPR screen targeting the ubiquitination machinery identifies UBE2T and FANCL as unique AML dependencies
In this study, we pursued the identification of AML-specific dependencies on the ubiquitination machinery using domain-focused CRISPR sgRNA screening, which is a strategy for profiling the essentiality of protein domain families in cancer cell lines (Shi et al., 2015). Using an sgRNA design algorithm linked to protein domain annotation, we designed 6,079 sgRNAs targeting exons encoding 573 domains known to be involved in ubiquitin conjugation or ligation, which were cloned in a pooled manner into the LRG2.1T lentiviral vector ( Figure 1A; Table S1). Using this sgRNA library, we performed negative selection "dropout" screening in twelve Cas9-expressing human cancer cell lines, which included six AML and six solid tumor cell lines ( Figure 1B). Many of the dependencies identified in these screens were pan-essential across the twelve lines, such as ANAPC11, CDC16, and RBX1 ( Figure S1A). We ranked all ubiquitination-related genes based on their degree of essentiality in AML versus solid tumor contexts, which nominated UBE2T and FANCL as AML-specific dependencies ( Figure 1C and D). The known function of UBE2T and FANCL as E2 and E3 proteins, respectively, within the Fanconi anemia (FA) pathway suggested a unique necessity of this DNA repair function in AML (Ceccaldi et al., 2016;Kottemann and Smogorzewska, 2013).
We corroborated the biased essentiality of UBE2T/FANCL in blood cancers relative to other cancer types by analyzing data obtained from Project Achilles (version 20Q2), in which genome-wide CRISPR essentiality screening was performed in 729 cancer cell lines ( Figure 1E; Figure S1B) (Dempster et al., 2019;Meyers et al., 2017). In addition, these data revealed that several other FA pathway genes, including FANCI, FANCB and FANCG, are also blood cancer-biased dependencies in a manner that correlated with UBE2T/FANCL essentiality. These findings reinforce that blood cancer cells are hypersensitive to perturbation of FA proteins relative to other cancer types.

FA proteins are dependencies in a subset of AML cell lines under in vitro and in vivo conditions
To further validate the specificity of FA protein dependencies in AML, we performed sgRNA competition assays following inactivation of UBE2T, FANCL, and FANCD2 genes in 27 human cancer cell lines, including 14 leukemia, 5 pancreatic cancer, 4 lung cancer, and 4 sarcoma lines (Figure 2A and B; Figure S2A and B). These experiments showed that targeting any of these three FA genes suppressed the fitness of 9 human leukemia lines, including 3 generated by retroviral transduction of oncogenes into human hematopoietic stem and progenitor cells (Wei et al., 2008). In contrast, the fitness of 5 other human leukemia lines and all of the solid tumor cell lines was less sensitive to targeting of FA genes ( Figure 2A). As a positive control for this assay, targeting of CDK1 arrested the growth of all cancer cell lines tested. Western blotting confirmed that the variable pattern of growth arrest following UBE2T targeting was not due to differences in genome editing efficiency ( Figure 2B; Figure S2A).
As additional controls, we verified that the growth arrest caused by UBE2T or FANCL knockout in MOLM-13 cells was due to an on-target effect by rescuing this effect with a cDNA carrying silent mutations that abolish sgRNA recognition ( Figure 2C; Figure S2C-F). Using this rescue assay, we investigated whether the catalytic function of UBE2T was essential for AML growth by comparing the wild-type cDNA with the C86A mutation, which abolishes ubiquitin conjugation activity (Alpi et al., 2008;Machida et al., 2006). Despite being expressed at a similar level to the wild-type protein, the C86A mutant of UBE2T was unable to support AML growth, suggesting that the ubiquitination cascade involving UBE2T-FANCL is essential in this context ( Figure 2D, Figure S2G and H). To ensure that this dependency was not influenced by DNA damage caused by CRISPR-Cas9, we also validated FANCD2 dependency in AML using RNAi-based knockdown ( Figure 2E; Figure S2I). In addition, inactivation of FANCD2 suppressed the growth of MOLM-13 cells when propagated in vivo in immune deficient mice ( Figure 2F; Figure S2J).

Targeting of FA proteins in AML leads to cell cycle arrest and apoptosis in a p53-dependent manner
We next performed a deeper characterization of the AML cell phenotype following inactivation of FA proteins. A flow cytometry analysis of BrdU incorporation and DNA content revealed an accumulation of UBE2T-and FANCD2-deficient MOLM-13 cells in G1 and in G2/M phase, with a significant loss of cells in S phase ( Figure 3A and B). An Annexin V staining analysis of UBE2T-or FANCD2-deficient MOLM-13 cells also revealed evidence of apoptosis ( Figure 3C and D). Levels of γH2AX, a marker of DNA damage, were also increased following FA gene inactivation ( Figure S3A). As apoptosis and cell cycle arrest are known downstream consequences of DNA damage-induced p53 activation, we investigated this pathway in FA-deficient AML cells using RNA-seq in MOLM-13 cells. Inactivation of UBE2T, FANCL or FANCD2 led to significant upregulation of p53 target genes ( Figure 3E; Figure   S3B). In addition, we observed induction of pro-apoptotic genes and suppression of DNA replication genes following FA protein inactivation, in accord with the phenotypes described above ( Figure 3E; Figure S3B). To evaluate the contribution of p53 to FA dependence in AML, we inactivated TP53 or its target gene CDKN1A using CRISPR genome editing in MOLM-13 or MV4-11, which rendered cells less sensitive to loss of UBE2T ( Figure 3F and G; Figure S3C-F). These findings indicate that loss of FA proteins in AML triggers cell cycle arrest and apoptosis in a p53-dependent manner.

Low Aldefluor activity and ALDH1A1/ALDH2 expression in FA-dependent AML cell lines
We next hypothesized that an endogenous source of DNA damage might drive the elevated demand for FA proteins in AML. For example, excess accumulation of aldehydes can exacerbate the phenotypes of FA-deficient mice and humans (Garaycoechea et al., 2012;Hira et al., 2013;Langevin et al., 2011). This led us to investigate the status of aldehyde dehydrogenases (ALDH), which are comprised of 19 enzymes that oxidize diverse aldehydes into non-toxic acetates in a NAD-or NADP-dependent manner (Langevin et al., 2011). We first made use of the Aldefluor assay, in which cells are treated with BODIPYaminoacetaldehyde (BAAA), a cell-permeable fluorescent aldehyde that becomes trapped in cells following ALDH-mediated conversion to BODIPY-aminoacetate (BAA). After applying BAAA to a diverse collection of human AML cell lines, we observed that most FA-dependent lines exhibited minimal ALDH activity, with fluorescence levels similar to control cells treated with the pan-ALDH inhibitor N,N-diethylaminobenzaldehyde (DEAB). In contrast, ALDH activity was higher in most of the FA-dispensable group of AML lines ( Figure 4A). One exception to this correlation was U937 cells, which are FA-dispensable but lacks ALDH activity. However, we note that U937 lacks a functional p53 pathway (Ghandi et al., 2019), which would be expected to alleviate the FA-dependence in this context.
An RNA-seq analysis revealed that among the 19 ALDH enzymes, only ALDH1A1 and ALDH2 were expressed at reduced levels in the FA-dependent lines when compared to FA-dispensable lines ( Figure   4B; Figure S4A). Using western blotting, we confirmed that ALDH1A1 and ALDH2 proteins are present at higher levels in the FA-dispensable versus the FA-dependent AML lines ( Figure 4B). Of note, while ALDH1A1 and ALDH2 are both capable of oxidizing BAAA, each enzyme is known to have distinct cellular functions (Chen et al., 2014(Chen et al., , 2012Fan et al., 2003;Fernandez et al., 2006;Kitagawa et al., 2000;Ohsawa et al., 2008;Tomita et al., 2016). ALDH1A localizes in the cytosol and is known to oxidize retinol aldehydes into retinoic acids (Fan et al., 2003). In contrast, ALDH2 is localized in the mitochondria and oxidizes acetaldehyde (derived from exogenous ethanol) and 4-Hydroxynonenal (4-HNE), an endogenous aldehyde derived from lipid peroxidation (Chen et al., 2014).

Re-expression of ALDH2, but not ALDH1A1, renders FA proteins dispensable for AML growth
Considering the inverse correlation between ALDH1A1/ALDH2 expression and FA-dependence in AML lines, we next performed experiments to explore a synthetic lethal genetic interaction in this context. We lentivirally transduced the FA-dependent MOLM-13 line with ALDH1A1 or ALDH2 cDNA and confirmed protein expression via western blotting ( Figure 4C). Aldefluor assays revealed that the lentivirally expressed proteins were enzymatically active, with both ALDH1A1 and ALDH2 causing oxidation of BAAA ( Figure 4D), consistent with prior findings (Garaycoechea et al., 2012;Nakahata et al., 2015). We next used CRISPR to target UBE2T in the ALDH1A1-or ALDH2-expressing MOLM-13 cells, followed by competition-based assays to track changes in cell fitness. Remarkably, the ALDH2expressing MOLM-13 cells became resistant to growth arrest caused by UBE2T inactivation, whereas ALDH1A1-expressing cells remained sensitive to UBE2T targeting ( Figure 4E). To evaluate the generality of this result, we expressed ALDH1A1 and ALDH2 in four other FA-dependent AML contexts, which likewise showed that ALDH2, but not ALDH1A1, alleviated the dependency on FA proteins ( Figure 4E; Figure S4B and C). To address whether inactivation of ALDH2 is sufficient to confer FA dependence in AML, we made use of the murine AML cell line RN2, which was derived by retroviral transformation of hematopoietic stem and progenitors cells with the MLL-AF9 and Nras G12D cDNAs (Zuber et al., 2011). Notably, this cell line retains an intact p53 pathway and expresses ALDH2, but not ALDH1A1 ( Figure S4D). We targeted ALDH2 using CRISPR in this cell line and confirmed loss of protein expression and loss of Aldefluor activity ( Figure 4F and G). Notably, in competitionbased cell fitness assays we observed that ALDH2-deficient RN2 cells became hypersensitive to the inactivation of FANCD2 relative to the parental cells ( Figure 4H). These experiments suggest that loss of ALDH2 confers dependency on FA proteins in AML.
We next investigated whether the catalytic function of ALDH2 was needed for the bypass of FA pathway dependency by comparing the wild-type ALDH2 cDNA with the E268K and the C302A alleles (Nene et al., 2017). Importantly, these two mutant proteins were expressed normally, but lacked any detectable Aldefluor activity. In addition, both mutants were unable to rescue the UBE2T dependency of MOLM-13 cells ( Figure S4E-H). As an additional control, we considered whether ALDH1A1 might be capable of rescuing the FA dependency if we forced its localization into the mitochondria using two different localization signals ( Figure S5A). Despite effective targeting to the mitochondria confirmed by cell fractionation and robust Aldefluor activity of these mutant proteins ( Figure S5B and C), the mitochondrial forms of ALDH1A1 were unable to rescue UBE2T dependence ( Figure S5D). Other ALDH enzymes (ALDH1A2, ALDH1B1, ALDH6A1, ALDH7A, or ALDH1A3) were also tested in this assay, but were unable to rescue the UBE2T dependence when lentivirally expressed in MOLM-13 cells ( Figure S5E-H). Taken together, these experiments suggest a unique capability of ALDH2 to detoxify a specific subset of endogenous aldehydes that drive FA protein dependency in AML.
Interestingly, restoration of ALDH2 expression in MOLM-13 cells led to no detectable impact on cell proliferation in vitro ( Figure S6A). In addition, an RNA-seq analysis suggested that the overall transcriptome of MOLM-13 cells was largely unaffected by re-expression of ALDH2 ( Figure S6B).
These findings suggest that loss of ALDH2 in AML leads to a discrete defect in aldehyde detoxification, which is largely compensated for by the presence of the FA DNA repair pathway.

Silencing and hypermethylation of ALDH2 occurs in a recurrent manner in human AML
We next investigated whether ALDH2 silencing is associated with DNA hypermethylation, which is known to be aberrantly distributed across the AML genome (Glass et al., 2017). In AML cell lines profiled in the Cancer Cell Line Encyclopedia Project (Barretina et al., 2012;Ghandi et al., 2019), we detected an inverse correlation between levels of DNA methylation and ALDH2 expression ( Figure 5A).
To confirm this finding, we analyzed DNA methylation at the ALDH2 promoter in MOLM-13 and MV4-11 cells using Nanopore sequencing (Simpson et al., 2017), which confirmed dense hypermethylation in the vicinity of the ALDH2 promoter ( Figure 5B). Importantly, this same location was hypomethylated in normal human hematopoietic stem and progenitor cells ( Figure 5B) (Hodges et al., 2011). In addition, silencing of ALDH2 correlated with diminished histone acetylation at this genomic region ( Figure 5C).
Using an inhibitor of DNA methyltransferase activity 5-azacytidine, we confirmed a time-and dose-dependent increase in ALDH2 expression in MOLM-13 and MV4-11 cells following compound exposure ( Figure 5D and E; Figure S7A and B). These results suggest that silencing of ALDH2 in AML cell lines is associated with the acquisition of DNA hypermethylation.
We next investigated whether silencing and hypermethylation of ALDH2 occurs in human AML patient specimens. By analyzing RNA-seq data obtained from diverse tumors included in The Cancer Genome Atlas (TCGA) pan-cancer analysis, we found that AML had greater variability in ALDH2 expression when compared to most other human cancer types ( Figure 5F). Within this group of TCGA patient samples, we designated ALDH2-low and ALDH2-high AML, distinguished by a ~100-fold difference in ALDH2 expression ( Figure 5G). Notably, we observe that a housekeeping gene ACTB only shows a 4-fold variance in expression across TCGA AML samples ( Figure S7C). In accord with cell line observations, ALDH2-low AML samples in the TCGA possessed elevated levels of DNA methylation at ALDH2 relative to ALDH2-high AML samples ( Figure 5H). We further confirmed the aberrant hypermethylation and silencing of ALDH2 in a subset of AML in an independent collection of patient samples (Glass et al., 2017), whereas normal bone marrow cells remained hypomethylated ( Figure 5I; Figure S7D-E). Together, these findings suggest that DNA hypermethylation and silencing of ALDH2 occurs in a recurrent manner in human AML.

DISCUSSION
Using a genetic screen focused on the ubiquitination machinery, we uncovered a role for FA proteins as dependencies in AML. We account for this observation by the aberrant expression of ALDH2, an enzyme that oxidizes aldehydes in normal tissues but becomes epigenetically silenced in this disease context. We propose that silencing of ALDH2 in AML leads to an accumulation of endogenous aldehydes, which in turn leads to the formation of DNA crosslinks that necessitate repair by FA proteins. Upon inactivation of FA genes in ALDH2-deficient AML, the levels of aldehyde-induced DNA damage reach a threshold that triggers p53-mediated cell cycle arrest and programmed cell death. This study reinforces how aberrant gene silencing can disable redundant pathways and can lead to acquired dependencies in cancer.
The synthetic lethal interaction between ALDH2 and FA genes is well-supported by observations in mice, in which the combined deficiency of Aldh2 and Fancd2 leads to developmental defects, a predisposition to cancer, and a hypersensitivity to exogenous aldehydes (Langevin et al., 2011). These phenotypes were thought to be uniquely present in normal hematopoietic stem and progenitor cells (Garaycoechea et al., 2012), but our study now shows that this genetic interaction extends to malignant myeloid cells. In normal mouse hematopoietic stem cells, aldehyde-induced DNA damage leads to the formation of double-stranded DNA breaks, which ultimately causes stem cell attrition, a phenotype that resembles the clinical presentation of germline FA deficiency in humans (Garaycoechea et al., 2018). Similar to our observations in AML, hematopoietic stem cells in Aldh2/Fancd2 compound deficient mice become cleared through a p53-dependent mechanism (Garaycoechea et al., 2018). The redundant function of ALDH2 and FA proteins is also supported by evidence in humans, in which a combined hereditary deficiency of ALDH2 and FA genes correlates with an accelerated rate of bone marrow failure when compared to FA patients with intact ALDH2 function (Hira et al., 2013). Thus, our study builds upon prior work by revealing a clinical context in which the ALDH2/FA protein redundancy is disabled in humans and could be exploited to eliminate AML in vivo.
Several prior studies have applied Aldefluor assays to human clinical samples and observed diminished ALDH activity in AML when compared to normal hematopoietic stem and progenitor cells, in accord with our own findings (Gasparetto et al., 2017;Gerber et al., 2012;Hoang et al., 2015;Schuurhuis et al., 2013). Loss of ALDH1A1 expression was previously observed in AML, which correlated with favorable prognosis and was found to render cells hypersensitive to toxic ALDH substrates, such as arsenic trioxide (Gasparetto et al., 2017). Our study validates ALDH1A1 silencing as a recurrent event in AML cell lines, however our functional experiments demonstrate that this event is unrelated to FA dependency in this disease. Collectively, our study and the work of Gasparetto et al point to distinct functional consequences upon loss of ALDH1A1 versus ALDH2 in AML.
ALDH1A1 and ALDH2 are known to have overlapping aldehyde substrates, however our study points to the existence of endogenous genotoxic aldehydes that are uniquely oxidized by ALDH2 in AML.
While the high reactivity of aldehydes precludes us from performing an unbiased assessment of aldehyde species in AML cells, it is known that 4-HNE levels are elevated in Aldh2-deficient mice and humans (Guo et al., 2013;Ohsawa et al., 2008). 4-HNE is a byproduct of lipid peroxidation, and can form toxic adducts with DNA and proteins in a variety of cellular pathologies. (Czerwińska et al., 2014;Voulgaridou et al., 2011). Nevertheless, a key unanswered question in the field is the identity of the endogenous sources of genotoxic aldehydes and the therapeutic utility of their pharmacological modulation in disease.
Our experiments suggest that ALDH2 silencing has no measurable impact on AML cell fitness, owing to compensation via FA proteins. Why then is ALDH2 recurrently silenced in AML? We propose at least three possibilities. First, loss of ALDH2 expression might be under positive selection to confer a critical metabolic adaptation for the early in vivo expansion of an AML clone, while at later stages of AML progression (reflected by our cell lines) ALDH2 silencing is no longer relevant for cell proliferation. A second possibility is that loss of ALDH2 promotes the genetic evolution of AML by increasing the probability of acquiring aldehyde-induced genetic mutations. A third hypothesis is that ALDH2 is merely a gene susceptible to DNA hypermethylation, with epigenetic silencing occurring as a passenger event in this disease. Irrespective of these different scenarios, our study demonstrates how epigenetic silencing can lead to acquired dependencies in cancer.
Our study reveals a paradox: a germline deficiency of FA genes leads to an elevated risk of AML formation while sporadic AML can acquire a dependency on FA proteins to sustain cell proliferation and survival. The contextual nature of this pathway is reminiscent of other DNA repair regulators, such as ATM, which act to protect normal tissues from cancer-causing somatic mutations while inhibition of this pathway can hypersensitize transformed cells to DNA damaging agents (Cremona and Behrens, 2014;Helleday et al., 2008;Sullivan et al., 2012). Considering the age-dependent onset of symptoms in FA patients, a possibility exists that acute and reversible inhibition of the FA pathway may have a therapeutic index in AML, as has been demonstrated for targeting of other DNA repair proteins (Santos et al., 2014). Therefore, our study provides justification for evaluating pharmacological inhibition of UBE2T/FANCL-mediated ubiquitination as therapeutic approach for eliminating ALDH2-deficient AML.

ACKNOWLEDGMENTS
We thank James C. Mulloy for sharing genetically engineered human AML cell lines. This work was supported by Cold Spring Harbor Laboratory NCI Cancer Center Support grant 5P30CA045508.

CONFLICT OF INTEREST DISCLOSURES
C.R.V. has received consulting fees from Switch, Roivant Sciences, and C4 Therapeutics, has served on the scientific advisory board of KSQ Therapeutics and Syros Pharmaceuticals, and has received research funding from Boehringer-Ingelheim during the conduct of the study.
Negative sgRNA or TP53 sgRNAs were infected first, selected with neomycin, followed by transduction with the sgRNAs indicated at the bottom of the graph (linked with GFP). n=3. All bar graphs represent the mean ± SEM. All sgRNA experiments were performed in Cas9-expressing cell lines.  Heatmap of the percent methylation of the covered CG within the ALDH2 promoter. Each column is one CG and each row is one patient. Rows are clustered using Euclidian distances and the complex clustering method and missing CGs are depicted in gray. Select patient mutational and cytogenic data is plotted on the right of the heatmap. All bar graphs represent the mean ± SEM (n=3).    Competition-based proliferation assays in MOLM-13 cells following sequential sgRNA transduction.

SUPPLEMENTAL FIGURE LEGENDS
Negative sgRNA or CDKN1A sgRNAs were infected first, selected with neomycin, followed by transduction with the sgRNAs indicated at the bottom of the graph (linked with GFP). n=3.     (76) cullin (6) U-box (11) HECT (28) APC subunits (16) DCAF (21) SOCS-box ( T1  T2  T3  T4  T5  T6   T1  T2  T3  T4  T5  T6   Figure 2        GFP% (Normalized to D4) s g C D K 1 s g R o s a s g F a n c d 2 # 1 s g F a n c d 2 # 2 100 50 0 150 linked with GFP sgRosa s g C D K 1 s g R o s a s g F a n c d 2 # 1 s g F a n c d 2 # 2 s g C D K 1 s g R o s a s g F a n c d 2 # 1 s g F a n c d 2 # 2     r e s i s t a n t s g U B E 2 T # 2 r e s i s t a n t s g U B E 2 T # 1 r e s i s t a n t s g U B E 2 T # 2 r e s i s t a n t    Aldefluor activity 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 Aldefluor activity  s g U B E 2 T # 1 s g U B E 2 T # 2 s g C D K 1 s g N E G s g U B E 2 T # 1 s g U B E 2 T # 2 s g C D K 1 s g N E G s g U B E 2 T # 1 s g U B E 2 T # 2 s g C D K 1 s g N E G s g U B E 2 T # 1 s g U B E 2 T # 2 s g C D K 1 s g N E G s g U B E 2 T # 1 s g U B E 2 T # 2 s g C D K 1 s g N E G s g U B E 2 T # 1 s g U B E 2 T # 2 s g C D K 1 s g N E G s g U B E 2 T # 1 s g U B E 2 T # 2 s g C D K 1 s g N E G s g U B E 2 T # 1 s g U B E 2 T # 2 H 300 200 100 0 400 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 Count Count Aldefluor activity 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5 10 10 2 10 3 10 4 10 5