Biomolecular Condensation Drives Leukemia Caused by NUP98-Fusion Proteins

NUP98-fusion proteins cause acute myeloid leukemia via unknown molecular mechanisms. All NUP98-fusion proteins share an intrinsically disordered region (IDR) featuring >35 repeats of Phenylalanine-Glycine (FG) in the NUP98 N-terminus. Conversely, different C-terminal NUP98-fusion partners are often transcriptional and epigenetic regulators. Given these structural features we hypothesized that mechanisms of oncogenic transformation by NUP98-fusion proteins are hard-wired in their protein interactomes. Affinity purification coupled to mass spectrometry of five distinct NUP98-fusion proteins revealed a conserved set of interactors that was highly enriched for proteins involved in biomolecular condensation. We developed biotinylated isoxazole-mediated condensome mass spectrometry (biCon-MS) to show that NUP98-fusion proteins alter the global composition of biomolecular condensates. In addition, an artificial FG-repeat containing fusion protein was able to phenocopy the induction of leukemic gene expression as mediated by NUP98-KDM5A. Thus, we propose that IDR-containing fusion proteins have evolved to uniquely combine biomolecular condensation with gene control to induce cancer. AML, NUP98, fusion protein, AP-MS, LLPS, biCon-MS, condensate


Introduction
Cancer-associated chromosomal rearrangements often result in the expression of pathogenic fusion proteins. Leukemia features a particular high frequency of fusion oncogenes 1 and the functional investigation of leukemia-associated fusion proteins has provided invaluable insights into the molecular mechanisms of cancer development 2 .
The protein complexes around fusion oncoproteins play defining roles in shaping oncogenic gene expression patterns 3 . Thus, the investigation of protein interactions is critical for uncovering actionable targets to devise more effective and better targeted cancer therapies.
Several studies have used affinity-purification coupled to mass spectrometry (AP-MS) to identify novel key effectors of fusion-protein-related leukemia [4][5][6] . Yet, while the most common leukemia fusion proteins have been extensively characterized, functional understanding of the many rare fusions which affect a significant number of patients and have limited treatment options is lacking.
The N-terminal part of the Nucleoporin 98 (NUP98) gene (N-NUP98) is fused to over 30 different C-terminal partner loci in acute myeloid leukemia (AML) 7 . While NUP98 rearrangements are rare (~2% of all AML) 7,8 , they are more frequent in pediatric AML and are associated with a particularly bad prognosis 9 .
The endogenous NUP98 protein is part of the nuclear pore complex (NPC), which mediates bidirectional transport of macromolecules between nucleus and cytoplasm 10,11 . NUP98 is part of the group of FG Nucleoporins, which contain intrinsically disordered regions (IDRs) consisting of repeats of either Phe-Gly (FG), or Gly-Leu-Phe-Gly (GLFG) amino acid residues. GLFG repeats can serve as binding sites for RNA-binding proteins, like the mRNA export factor RAE1, thus mediating trafficking of RNA molecules through the NPC 7,12 . While the majority of mature NUP98 protein is directly recruited to the NPC, NUP98 was also found to associate with the anaphase promoting complex (APC) 13 and to interact with chromatin, where it actively regulates gene expression in an NPC-independent fashion 14,15 .
C-terminal fusion partners of N-NUP98 in AML are enriched for proteins with roles in transcriptional control and epigenetics. Several studies have addressed the functional contribution of different protein modules present in NUP98-fusions to leukemogenesis. For instance, the plant homeodomain (PHD) in NUP98-PHF23 16 and the RNA helicase motif in NUP98-DDX10 17 were required for leukemogenesis, demonstrating critical roles for the Cterminal fusion partner. Conversely, deletion of the N-terminal NUP98-moiety in NUP98-NSD1 also prevented myeloid progenitor immortalization and high HOX gene expression, highlighting the importance of the conserved IDR domain for oncogenic transformation of NUP98-fusion proteins 18 . Yet, the exact molecular mechanisms of NUP98-fusion proteininduced leukemogenesis remain poorly understood, and it is not clear which of the oncogenic properties of NUP98-fusion proteins depend on molecular functions of endogenous NUP98 are governed by novel functions that are mediated by the fusion partner 19 . We hypothesized that NUP98-fusion proteins interact with specific networks of cellular proteins, and that oncogenic functions of NUP98-fusion proteins are hard-wired in their protein interactomes.
Here we show by AP-MS-based interactome analysis that NUP98-fusion proteins do not act in the context of the NPC. Consistent with the specific nuclear localization pattern of NUP98fusion proteins, the core NUP98-fusion interactome is highly enriched for proteins with known roles in liquid-liquid phase separation (LLPS) and the formation of biomolecular condensates.
To investigate global changes in biomolecular condensation we established biotinylated isoxazole-mediated condensome mass spectrometry (biCon-MS), a sensitive method to globally characterize the dose-dependent potential of IDR-containing cellular proteins to form precipitates in the presence of the chemical biotinylated-isoxazole (b-isox). We show that biCon-MS greatly expands the cellular catalogue of proteins involved in LLPS beyond known protein complexes that act in biomolecular condensates. Furthermore, biCon-MS revealed that NUP98-fusion protein expression specifically alters the composition of cellular condensomes, indicating that NUP98-fusion-driven oncogenesis involves altered biomolecular condensation. In fact, an artificial FG-repeat-containing IDR-fragment fused to the C-terminal fusion partner KDM5A was capable of inducing leukemia-associated gene expression. Our data show that the biophysical properties of oncogenic fusion protein partners have the potential to specifically alter cellular biomolecular condensation to drive cancer-specific gene expression programs.  20 , NUP98-KDM5A did not co-localize with the nuclear membrane but was present in intra-nuclear speckles (Figure 1b). To enable unambiguous annotation of protein complexes to endogenous NUP98 vs. NUP98-KDM5A, we purified endogenous NUP98-protein complexes from HL-60 cells using a highly specific anti-NUP98 antibody, while the NUP98-KDM5A interactome was isolated via the Strep-tag present in the fusion protein (Figure 1c). Enrichment of baits was confirmed by Western blot and functional purification of protein complexes was shown by co-precipitation of the known NUP98-binding partner RAE1 21 ( Figure S1a). In line with its altered localization, NUP98-KDM5A did not coprecipitate with NUP98 ( Figure S1a).

NUP98
Purified protein complexes were analyzed by liquid chromatography coupled to mass spectrometry (LC-MS/MS) using a one-dimensional gel-free approach, identifying 315 and 390 proteins in anti-NUP98-and in STREP-tag-mediated purifications of NUP98-KDM5A, respectively. After stringent filtering for background and non-specific interactions 22 , the NUP98 interactome featured 267 proteins (Table S1), and the NUP98-KDM5A interactome consisted of 227 proteins. The interactome of endogenous NUP98 recapitulated protein complexes that were previously reported 15 to interact with NUP98 with high affinity, including the NPC (26 proteins) 23 , the APC (39 proteins) 24 and factors involved in nuclear transport (8 proteins) (Figure 1d, S1B). In contrast, NUP98-KDM5A mainly co-purified with distinct sets of RNA binding protein complexes, including RNA helicases (13 proteins) and RNA binding factors (55 proteins) (Figure 1d, S1b). Most importantly, only 19 of 497 proteins interacted with both endogenous NUP98 and NUP98-KDM5A. Among those was RAE1, which was reported to bind the N-terminus of NUP98 25 (Figure 1d). These data suggest that despite sharing the same N-terminal sequence, NUP98-and NUP98-KDM5A operate in largely nonoverlapping cellular contexts, and that NUP98-KDM5A is not co-localizing with the NPC.

Functional proteomic identification of conserved interactors of diverse NUP98-fusion proteins
Given the structural heterogeneity of the >30 NUP98-fusion partners found in AML, it was unclear if potential effector mechanisms that are critical for NUP98-fusion dependent leukemogenesis might converge on a shared set of conserved interaction partners. To characterize the conserved interactome of NUP98-fusion proteins, we chose four molecularly diverse oncoproteins in addition to NUP98-KDM5A. We based our selection of fusion proteins both on their abundance in the AML patient population, but also on functional diversity of endogenous partner proteins. The most frequent fusion partners of NUP98 with KDM5A (JARID1A), a histone 3 lysine 4 (H3K4) di-and tri-demethylase 26,27 , and NSD1, a histone methyltransferase (HMT) for H3K36 and H4K20 28 . NUP98-HOXA9 was chosen to represent the recurrent fusions of NUP98 to members of the HOX gene cluster. HOXA9 is a transcription factor that is highly expressed in hematopoietic stem/progenitor cells 29  Analysis of AP-MS data yielded 616 proteins engaging in over 6000 interactions. After subtracting proteins from lysates of non-transduced HL-60 cells and filtering for non-specific interactions, 501 differentially enriched proteins were retained as high-confidence interactors of NUP98-fusion proteins (Table S2). Each individual NUP98-fusion protein had between 19 and 61 exclusive interactions. In contrast, the majority of proteins in the network interacted with more than one NUP98-fusion protein (Figure 2c). Overall, 157 proteins were found in three or more NUP98-fusion interactomes, and a conserved set of 27 proteins was present in all five NUP98-fusion protein complexes (Figure 2c). This conservation of interaction partners indicates significant functional overlap among different NUP98-fusions, pointing to the presence of similar molecular mechanisms. The 157 conserved NUP98-fusion interactors were enriched in protein complexes involved in RNA splicing, ribosome biogenesis and transcriptional control (Figure 2d). Further analysis using Gene Ontology (GO) annotation confirmed a significant enrichment for DNA-and RNA-related processes, including mRNA processing and transcription, which is in line with initial results from the NUP98-KDM5A interactome ( Figure 2e).
Notably, this proteomic survey of different NUP98-fusion proteins does not show colocalization with the nuclear pore complex. The conserved core interactome of NUP98-fusion proteins is enriched in factors regulating mRNA metabolism and transcription, pointing to an involvement of NUP98-fusion proteins in these processes.

The NUP98-fusion protein interactome is enriched for proteins with roles in biomolecular condensation
Biomolecular condensates are membrane-less structures that govern biological processes through the dynamic compartmentalization of macromolecules 30 . Their formation can be driven by multivalent, low-affinity interactions between IDR-containing proteins, which are able to undergo LLPS 31 . Nuclear biomolecular condensates mediate important regulatory roles in transcription, splicing and chromatin organization 32 . Several well-described factors with roles in biomolecular condensation were present in the conserved NUP98-fusion protein interactome, including the RNA-binding protein FUS 33 , heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1) 31 and H/ACA ribonucleoprotein complex subunit 1 (GAR1) 34 .
To evaluate a potential enrichment of IDR-containing proteins involved in biomolecular condensation among the NUP98-fusion protein interactome, we used an algorithm that was designed to predict phase separation properties of proteins in an unbiased fashion. The PScore classifies the potential of proteins to self-associate based on the abundance of piorbital containing amino-acid residues, as pi-pi interactions are important features in phase separation 35 . Binning of NUP98-fusion-interacting proteins based on their PScores (STAR methods) revealed a significant overrepresentation of LLPS-prone proteins (Pscore >4) within functional categories that are enriched in the NUP98-fusion interactome, including RNA metabolism, RNA binding, RNA processing and transcription ( Figure 3a). In line with this, the mean PScore of the individual 157 core interactors was significantly higher than a size-matched list of the human proteome that was obtained by random subsampling ( Figure   3b). In contrast, the mean PScore of proteins assigned to the GO category "Nuclear Membrane" did not significantly divert from the human proteome ( Figure S3a), and the subsampling process did not alter the global distribution of PScores ( Figure S3b).
Given the enrichment of LLPS-prone factors among the core NUP98-fusion interactome we next aimed to investigate the biophysical properties of NUP98-fusion proteins with regard to biomolecular condensation. The chemical substance biotinylated isoxazole enables precipitation of IDR-containing proteins via the formation of microcrystals in solution ( Figure   3c) 36,37 . All NUP98-fusion proteins precipitated efficiently upon incubation with 100 µM b-isox ( Figure 3d). This effect was likely mediated by the IDR domain in the NUP98-N-terminus, which was previously shown to undergo LLPS in vitro 38 . Despite the absence of IDR domains, the known NUP98 interactor RAE1 was efficiently co-precipitated with NUP98fusion proteins in this assay, while the highly structured heat shock cognate 71 kDa protein (HSC70), which does not interact with NUP98-fusion proteins, was insensitive to b-isox precipitation ( Figure 3d).
Our data show that the interactome of NUP98-fusion proteins is enriched for proteins with biophysical properties that are predictive of biomolecular condensation via LLPS. Consistent with their speckled nuclear localization, we find that NUP98-fusion proteins and interacting proteins are efficiently precipitated by b-isox, indicating that this assay allows investigation of the condensation behavior of protein assemblies within complex cellular lysates. Taken together, our results indicate that NUP98-fusion proteins and their interactomes are involved in biomolecular condensation in AML cells.

biCon-MS globally charts the cellular condensome
Given the potential of the b-isox precipitation assay to capture protein complexes that are prone to biomolecular condensation, we attempted to leverage this effect to analyze the entirety of cellular proteins that are sensitive to b-isox in an unbiased fashion using mass spectrometry. We developed an experimental setup that allows the characterization of subsets of the cellular proteome that are dynamically integrated in b-isox-precipitates in a dose-dependent manner (Figure 4a, S4a,b). This optimized method, which we termed biotinylated isoxazole-mediated condensome mass spectrometry (biCon-MS) efficiently reduces background noise caused by non-specific precipitation and increase the likelihood of true positive hits by eliminating proteins that do not display dose-dependent enrichment, as exemplified by Western blot analysis for the NUP98-interaction partner RAE1 (Figure 4b).
biCon-MS analysis of HL-60 lysates identified 931 proteins that exhibited dose-dependent precipitation behavior (Table S3) Together, these data show that biCon-MS analysis of cellular lysates represents a powerful approach to investigate the global composition of the cellular condensome, providing an unbiased and comprehensive map of cellular proteins that are involved in biomolecular condensation.

Expression of NUP98-fusion proteins dynamically alters the cellular condensome
As we found that NUP98-fusion proteins specifically interact with proteins capable of LLPS, we next aimed to investigate dynamic changes in the cellular condensome that result from the expression of oncogenic NUP98-KDM5A or NUP98-NSD1 fusion proteins by biCon-MS ( Figure 5a). In addition to proteins that consistently present in biCon-MS, such as FUS, TAF15, EWSR1 ( Figure S4c (Figure 5f). In addition, important factors in leukemia development, such as RUNX2 and TET2 were also enriched in NUP98-fusion dependent condensomes. (Figure 5f).
In summary, these data show that NUP98-fusion proteins cause extensive restructuring of the cellular condesome, supporting the hypothesis that NUP98-fusion proteins are able to specifically reconstitute biomolecular condensates.

Discussion
In this study, we show that structurally different NUP98-fusion proteins localize to biomolecular condensates and that the structural determinants guiding biomolecular condensation as encoded in the NUP98 N-terminus are sufficient to evoke leukemia-specific gene expression in the context of oncogenic fusion proteins. Thus, we propose that alteration of biomolecular condensation mediated by IDR-containing oncogenic fusion proteins represents a novel mechanism of oncogenic transformation.
Our results provide the first comprehensive AP-MS-based interactome analysis of NUP98fusion proteins vs. endogenous NUP98 in human cells. In line with previous reports, our data show that NUP98 is recruited to the nuclear pore complex 23 and the anaphase promoting complex 24 . Furthermore, we confirm its important role in RNA transport via interaction with RAE1 25 and the ATP-dependent RNA helicase A (DHX9) 15  This is consistent with our AP-MS data, supporting roles of NUP98-fusion proteins in RNA metabolism and gene expression in the context of biomolecular condensates. Alternatively, the b-isox-mediated precipitation assay can be used to assess the propensity of proteins to form aggregates in the context of complex cellular lysates 36,37 . The NUP98 N-terminus contains two large IDRs that consist of 38 di-amino acid repeats of phenylalanine-glycine (FG) and NUP98 is indeed able to form biomolecular condensates 38,48 . Consistent with the biochemical and biophysical properties of the NUP98 N-terminus, all NUP98-fusion proteins were highly susceptible to b-isox precipitation. This is in line with previous observations showing that NUP98 can localize to so-called "GLFG bodies" whose formation was dependent on NUP98 N-terminus 49 .
To globally analyze the composition of biomolecular condensates in an unbiased manner and in a cellular context we developed biCon-MS. In this technique, cellular lysates are incubated with increasing concentrations of b-isox followed by MS-based identification of precipitated proteins. Beyond recovering the majority of proteins previously found to be sensitive to b-isox precipitation 36,37 , biCon-MS efficiently recovered several proteins with welldescribed roles in LLPS, such as the FET protein family (FUS/EWSR1/TAF15) 50  uM beta-mercaptoethanol in the presence of murine stem cell factor (mSCF, 150 ng/ml), murine interleukin 3 (mIL-3, 10 ng/ml) and murine interleukin 6 (mIL-6, 10 ng/ml) (all PreproTech).

Immunofluorescence analysis
Cells were spotted on glass histology slides using a Shandon CytospinTM Centrifuge II and air-dried. Spots were fixed with 4% Formaldehyde (Histofix, Roth, P087.6) for 10 minutes at 4 °C. Cells were permeabilized with 0.2% Triton X100 in PBS for 10 min at room temperature, followed by 1 hour incubation with primary antibody in 2% BSA/0.2% Triton Images were acquired using a Zeiss SM 880 Airyscanconfocal laser scanning microscope with the Zeiss ZEN-black software. Post-processing of images was performed using ZENblue and ImageJ for brightness and contrast enhancement.

Live cell imaging
Cells were transiently transfected with 250 ng plasmid DNA and imaged after 72 h on a Zeiss SM 880 confocal microscope. Nuclei were stained with 5 µg/ml Hoechst 33342 (Thermo Fisher Scientific, H1399) for 7 minutes. Post-processing of confocal z-slice images were accomplished using ZEN-blue software and ImageJ for contrast enhancement.

Affinity purification of protein complexes
Lysates were used at 2 mg/ml for immunoprecipitation (2 mg for western blot, 12 mg for mass spectrometry). 100 µl lysate was used as input sample. The remaining sample was Eluates were subsequently used for western blot analysis or submitted to tryptic digestion and LC-MS/MS analysis.

Biotinylated isoxazole-mediated precipitation
The assay was performed as previously described 36

Protein network analysis
Protein networks were illustrated with Cytoscape 3.6.1 66

O/E ratios of binned PScores (Figure 3)
PScores of individual proteins found within each Gene Ontology term were grouped into 6 bins ranging from <0 to >4 (observed distribution). The same was done for the PScores of the human proteome 35 (expected distribution). Next, the ratio between observed and expected frequencies were calculated for each bin and illustrated as a heatmap. The heatmap was generated with the heatmap.2 function from the R-package gplots (version 3.0.1) 68 .

Group size dependent testing statistic (GSDTS)
GSDTS tests a list of values x against a larger universe y of values (x < y) by sub-sampling z-times a list of length of x out of the universe. Each sub-sampled list is checked against the already created lists to ensure unique lists to satisfy "sub-sampling without replacement".
The lists x and y as well as the number of sub-sampling repeats z need to be provided by the user. As result the sum, mean or median of all sub-sampling results are plotted as histogram and the respective sum, mean or median of list x is then indicated by a red bar in the histogram. The script is implemented in R and available at: https://github.com/Edert/R-scripts For the histograms shown, we applied GSDTS with a z of 10,000 to obtain 10,000 subsampled lists of length x of the universe y. As universe y we chose the whole human proteome 35 and x was chosen to match the length of the query list. We then used the mean to plot the histogram and for indicating the tested list as a red line.

RNA-seq data analysis
The quality of the raw sequence files was checked with FastQC (version 0.11.4) 69 . Based on this the quality trimming and filtering and length filtering was applied using PRINSEQ-lite (version 0.20.4) 70 . Remaining reads were aligned against the mouse reference genome (GRCm38/mm10) by STAR (version 2.5.0b) 71

Declaration of Interests
The authors declare no competing interests.

Male Urogenital Diseases
Blast Phase

Pathological Conditions, Signs and Symptoms
Liver Cirrhosis, Experimental