EZH2 inhibition results in genome-wide PRC2 redistribution

Histone methyltransferase polycomb repressive complex 2 (PRC2) plays a critical role in cell fate determination, and its catalytic subunit EZH2 is a key oncogenic driver in GCB-DLBCL. EZH2 inhibition in some GCB-DLBCL cell models leads to a global loss of H3K27me3, the derepression of a subset of silenced PRC2 target genes, and ultimately cell death. Here we show that EZH2 inhibition causes global redistribution of PRC2 components. Focal PRC2 accumulation and H3K27me3 retention occur at many canonical PRC2 nucleation sites. H3K27me3 retention sites are also enriched in plasma/memory cell genes repressed by PRC2 activity in the germinal center. We see H3K27me3 retention at differentiation-related genes both in cell lines that are insensitive to killing by EZH2 inhibition, and in sensitive cell lines. Thus, loss of H3K27me3 on B cell differentiation genes is not sufficient to explain the differential effects of EZH2 inhibitors on sensitive and insensitive DLBCL cell lines.


Introduction
Research over the past several years has demonstrated that chromatin regulating factors are frequently targeted in tumorigenesis (Flavahan et al 2017;Morgan and Shilatifard 2015). Polycomb repressive complex 2 (PRC2) is one of the most investigated chromatin regulating factors. Both gain and loss of function mutations in PRC2 components have been implicated in cancer formation (Helin 2016). PRC2 is one of several Polycomb group (PcG) protein complexes and plays an important role in body plan formation during animal development by regulating the expression of homeobox genes at different body segments (Margueron and Reinberg 2011). PcG proteins are also critical in maintaining cellular memory of cell identity in later stages of development by repressing differentiation toward other cell types (Lee et al. 2006;Thornton et al. 2014). PRC2 is a transcription repressive factor composed of core components EZH2, SUZ12, and EED. At the molecular level, PRC2 catalyzes the mono-, di-, and tri-methylation of histone H3K27 (Margueron and Reinberg, 2011) and H3K27me3 is the hallmark of facultative heterochromatin and transcription silencing (Di Croce and Helin, 2013; Kirmizis et al., 2004;Margueron and Reinberg, 2011) . PRC2 represses transcription by methylating H3K27 at its target genes and induces the formation of repressive chromatin structure.
In a number of cancers, the EZH2 gene is often either amplified, overexpressed, or carries an activating mutation that alters its substrate specificity (Hock, 2012) ; cancer cells use the altered chromatin landscape and transcription output from the abnormal PRC2 activities to gain survival advantage. In particular, tumors from a high percentage of germinal center B cell-like diffuse large B cell lymphoma (GCB-DLBCL) patients carry a somatic mutation within EZH2 that promotes the conversion of H3K27me2 to H3K27me3 (Morin et al., 2010) . DLBCL is the most common type of non-Hodgkin's lymphoma, but it is heterogeneous; GCB-DLBCL is a molecularly defined subtype of DLBCL in which the tumor cells are thought to arise from germinal center B cells (Lenz et al., 2008) . EZH2 activity in the germinal center during normal B cell development temporarily suppresses genes whose activity is restored upon terminal differentiation and plasma/memory cell formation. It has been proposed that the EZH2 activating mutation in GCB-DLBCL leads to the sustained hypermethylation of those loci, forcing the terminal differentiation genes deeper into a repressive state, thus preventing both their reactivation and terminal B cell differentiation (Béguelin et al., 2013) .
In the face of accumulating pre-clinical evidence that EZH2 plays a critical role in the development of several cancer types including GCB-DLBCL and prostate cancer, several EZH2 inhibitors have been developed and several are currently in clinical trials in different disease indications (Gulati et al., 2018;Italiano et al., 2018;Yap et al., 2016) . EZH2 inhibition leads to a global reduction of H3K27me3 and eventually cell death in a dose and time dependent manner in many cancer cell models (Bradley et al., 2014;Garapaty-Rao et al., 2013;Girard et al., 2014;McCabe et al., 2012;Tiffen et al., 2015;Zeng et al., 2017) ; however, how EZH2 inhibition impacts the genome-wide distribution of H3K27me3 and PRC2 has not been fully investigated.
In this study we show that EZH2 inhibition does not reduce H3K27me3 levels equally across the genome in GCB-DLBCL cell models; instead H3K27me3 in small areas of the genome is more refractory to EZH2 inhibition, and PRC2 components are redistributed to these locations. In GCB-DLBCL models, the set of genes that accumulate PRC2 after EZH2 inhibition are enriched for those expressed during the terminal differentiation of germinal center B cells, suggesting that GCB-DLBCL cells attempt to repress terminal differentiation genes to stay in a proliferating state even when EZH2 is inhibited. However, as we see retention of repressive H3K27me3 at terminal differentiation genes in both EZH2 inhibitor insensitive and sensitive cell models, our data suggest that loss of H3K27me3 on differentiation genes is not sufficient to explain the cell killing effects of EZH2 inhibitors.

EZH2 inhibition leads to genome-wide loss of H3K27me3
We previously described the characterization of two small molecule inhibitors, CPI-360 and CPI-169, of EZH2 which showed time-and dose-dependent efficacy in Non-Hodgkin's Lymphoma models in vitro and in vivo (Bradley et al., 2014) . These EZH2 inhibitors demonstrated robust target engagement in cell models by decreasing global H3K27me3 and H3K27me2, but not H3K27me1 levels (Bradley et al., 2014) . However, potential effects of EZH2 inhibition on H3K27me3 and PRC2 distribution across the genome were not previously investigated. To assess these effects, we treated KARPAS-422, a DLBCL cell model that is sensitive to EZH2 inhibitor treatment, with DMSO control or EZH2 inhibitor for 4 and 8 days and carried out ChIP-seq campaigns for histone marks H3K27me3 and H3K4me3, and PRC2 components EZH2 and SUZ12.  Consistent with previous studies, H3K27me3 was globally reduced upon multi-day EZH2 inhibition while PRC2 components EZH2 and SUZ12 remained unchanged ( Figure 1A). Analysis of ChIP-seq data revealed that in DMSO-treated cells, H3K27me3 is abundant and occupies broad segments of the KARPAS-422 genome. EZH2 inhibition greatly reduces H3K27me3 levels but not PRC2 components EZH2 and SUZ12 on chromatin (see examples in Figure 1B). On the other hand, H3K4me3 levels remained steady both at bulk level ( Figure 1A) and at randomly selected genomic regions ( Figure S1). Because promoter-associated H3K27me3 is associated with repression of gene expression, we examined the ChIP-seq signal around transcription start sites (TSSs). The average ChIP-seq signal at TSSs was consistent with the Western blot observations: H3K27me3 signal showed marked reduction but EZH2, SUZ12 and H3K4me3 showed little change ( Figure 1C).

TSSs with high H3K27me3 are more resistant to EZH2 inhibitor treatment
Although the level of H3K27me3 was decreased at most loci, several distinct patterns of H3K27me3 distribution were observed around TSSs, as well as different levels of changes in response to EZH2 inhibition. To explore these patterns and changes, we performed k-means clustering of the H3K27me3 signal at TSS regions in both DMSO-and EZH2 inhibitor-treated cells ( Figure 2A). We found that although the level of H3K27me3 was reduced in all clusters, not all H3K27me3-marked TSSs responded to EZH2 inhibition the same way: cluster 1, for example, retained much higher levels of H3K27me3 after inhibitor treatment than all other clusters. Interestingly, cluster 1 also showed the highest average H3K27me3 signal at baseline, suggesting that retention of H3K27me3 preferentially occurs at high H3K27me3 sites.

Figure 2. Differential patterns of H3K27me3 at TSSs and H3K27me3 retention phenomenon
A. Heatmaps of the six H3K27me3 clusters at TSS regions (+/-1.5kb) reveal different patterns of mark changes upon EZH2 inhibition in KARPAS-422 cells. B. Average H3K27me3, EZH2, and H3K4me3 signal for each TSS cluster, with and without EZH2 inhibition, for the region +/-1.5kb of the TSS. TSSs in cluster 1 show increased EZH2 signal upon EZH2 inhibitor treatment. C. ChIP-seq signal for H3K27me3, EZH2, and SUZ12 are shown for representative retention and non-retention TSSs. D. Change in H3K27me3 vs change in EZH2 signal across all TSSs. Red indicates TSSs from cluster 1. Change is regularized log2 fold change. TSSs that accumulate EZH2 tend to retain more H3K27me3. E. Scatter plots of ChIP-seq signal downstream of each TSS for EZH2 and SUZ12, treated with DMSO or CPI-360. Red indicates TSSs in cluster 1. EZH2 and SUZ12 are strongly correlated both at baseline and after EZH2 inhibitor treatment.

PRC2 accumulates at H3K27me3-retaining TSSs and is lost from active TSSs upon EZH2 inhibition
To investigate the H3K27me3 retention phenomenon further, we determined the average H3K27me3, EZH2 and H3K4me3 levels within each H3K27me3 TSS cluster ( Figure 2B). H3K4me3 is highest in clusters 3 and 5, where H3K27me3 levels are relatively low downstream of the TSS, suggesting that these clusters comprise actively transcribing genes. H3K4me3 levels showed little change upon EZH2 inhibition, consistent with the average signals from aggregated TSSs ( Figure 1C), and the Western blot ( Figure 1A). Unexpectedly, EZH2 patterns changed significantly, with redistribution away from TSSs in clusters 3 and 5, and strong gain of signal in cluster 1 ( Figure 2B, middle column). The results suggest that the retention of H3K27me3 is associated with a marked accumulation of chromatin-bound EZH2. Examples of H3K27me3 retention and non-retention loci are shown in Figure 2C.
To address whether the correlation between EZH2 accumulation and H3K27me3 retention held true at individual transcript start sites, we plotted the H3K27me3 change upon EZH2 inhibition against the change in EZH2 signal for each TSS. The resulting scatter plots show that cluster 1 TSSs (red) contained most of the TSSs with an accumulation of EZH2 and the least reduction of H3K27me3 ( Figure 2D), while TSSs from other clusters showed greater H3K27me3 loss and no change or a loss of EZH2 ( Figure S2). Furthermore, a scatter plot of baseline H3K27me3 against relative change in EZH2 signal demonstrates that TSSs with high baseline H3K27me3 are more likely to accumulate EZH2, and most of these TSSs are within cluster 1 ( Figure S3). These results reiterate the tendency that upon EZH2 inhibition, EZH2 accumulation preferentially occurs at TSSs with higher H3K27me3 levels. H3K27me3 retention and EZH2 accumulation were confirmed with ChIP-qPCR at three retention loci (BHLHE41, KLHL29, and RUNX2) and two non-retention loci (ABAT1, MPEG1, Figure S4).
To determine whether EZH2 accumulates at TSSs on its own or as part of the PRC2 complex, SUZ12 signal intensities were plotted against EZH2 for each TSS ( Figure 2E). The occupancy levels of the two proteins at TSSs are highly correlated at both baseline (R=0.842) and upon EZH2 inhibition (R=0.961), and both proteins accumulate on cluster 1 TSSs upon EZH2 inhibition ( Figure 2E, right panel). The strong correlation between EZH2 and SUZ12 binding is consistent with a tight functional link between the two proteins and suggests that the PRC2 complex is redistributed throughout the genome in response to EZH2 inhibition.

Figure 3. EZH2 redistribution upon EZH2 inhibitor treatment and retention site analysis
A. Loss of H3K27me3 and redistribution of EZH2 at CpG islands and H3K27ac sites upon EZH2 inhibition. B. Emission matrix for ChromHMM-based definition of chromatin states to identify EZH2 retention C. Enrichment of retention state at genomic elements relative to average across the genome.
Fraction of each type of genomic element having retention state is shown in blue.

PRC2 preferentially redistributes to TSSs and CpG islands upon EZH2 inhibition
The striking accumulation of EZH2 and SUZ12 at a subset of TSSs upon EZH2 inhibition prompted us to investigate whether similar accumulation occurs at other genomic elements upon EZH2 inhibitor treatment. CpG islands and active enhancers were selected for the analysis due to the critical roles of these elements in transcription regulation. Active enhancers were identified as H3K27ac-positive regions. Interestingly, we saw a strong average increase of EZH2 signal at CpG islands and a strong loss of EZH2 at H3K27ac sites in EZH2 inhibitor treated cells ( Figure 3A, top panels). These results demonstrate that EZH2 accumulation occurs preferentially at a subset of genomic elements upon EZH2 inhibitor treatment. SUZ12 showed similar changes at these genomic elements ( Figure 3A, lower panels), suggesting a global redistribution of PRC2 complex upon EZH2 inhibitor treatment.
To investigate PRC2 accumulation and H3K27me3 retention genome-wide, the ChIP-seq datasets were analyzed with the chromatin state calling algorithm ChromHMM (Ernst and Kellis, 2012) . We defined retention sites as those genomic loci with a chromatin state having high probability of EZH2 and H3K27me3 in the EZH2 inhibitor treated sample ( Figure 3B). This definition identified 21,112 contiguous segments (144,720 200 bp segments) covering roughly 0.92% of the genome as being in the retention state in EZH2 inhibitor treated KARPAS-422 cells.
We tested for overlap of the retention state segments with various types of genomic elements, and confirmed that retention sites were highly enriched in CpG islands and TSSs relative to the rest of the genome ( Figure 3C). Gene bodies and regions marked with H3K27ac were not enriched, while exons showed slight enrichment. The presence of a TSS or CpG island, however, was not sufficient to establish a retention site: only 12.9% of CpG islands and 10.6% of TSSs overlapped at all with a retention site.

H3K27me3 retention sites are partially conserved across cancer cell models
The observed H3K27me3 retention and PRC2 redistribution in KARPAS-422 upon EZH2 inhibition raises the question of whether the retention of H3K27me3 and the redistribution of PRC2 is locus-specific, and whether it is common among cancer cell lines. To answer this question, we studied the distribution of H3K27me3 and EZH2 in 5 additional cell models: two DLBCL cell lines with activating EZH2 mutations (RL and SU-DHL-6) and three multiple myeloma cell lines (KMS28BM, KMS28PE, and RPMI8226) with wild type EZH2. ChromHMM analysis identified H3K27me3 retention sites upon EZH2 inhibition in all 5 cell models, suggesting H3K2.7me3 retention is a common phenomenon. Consistent with the observations in KARPAS-422 (which also has an activating EZH2 mutation), the retention sites in all cell lines are enriched in TSSs and CpG islands ( Figure 4A). Furthermore, many retention sites are conserved across cell lines, while others are found only in a single cell line (see examples in Figure S6A). To further examine the conservation of H3K27me3 retention and PRC2 accumulation across cell lines, we defined retention TSSs as TSSs within 2.5 kb of an H3K27me3 retention site. In KARPAS-422, 68% of retention TSSs are in cluster 1 and 22% are in cluster 2. Of the 5,379 retention TSSs observed in the three lymphoma cell lines, 436 are conserved, and 226 out of the 1,900 retention TSSs in the myeloma lines are common between the three cell lines ( Figure  4B, S7). Overall, 55 TSSs are conserved among all six cell lines ( Figure S6B). The conservation of retention sites across cell lines occurs more frequently than would be expected at random. Moreover, the better conserved retention sites showed higher enrichment over random background ( Figure S7C). Interestingly, however, fewer than 50% of the retention TSSs are conserved between KMS28BM and KMS28PE, two multiple myeloma cell lines derived from the same patient, suggesting that the formation of retention sites can also be dynamic as cancer cells evolve. MSIGDB analysis (Liberzon et al., 2011) of the 55 common retention TSSs showed a very strong overlap with PRC2 binding sites in embryonic stem cells ( Figure 4C), and the more conserved a retention TSS, the more likely that TSS is to be a PRC2 target sites in embryonic stem (ES) cells ( Figure 4D, S7B).

Many H3K27me3 retention TSSs are also marked with H3K4me3
Transcription start site regions can be monovalent -marked with either repressive H3K27me3 or activation-associated H3K4me3 -or in a bivalent state carrying both marks (Bernstein et al., 2006) Comparing H3K27me3 and H3K4me3 ChIP-seq signals in KARPAS-422, we found that some of the retention-site-containing TSSs are bivalent ( Figure 5A). To examine co-occurrence of H3K27me3 and H3K4me3, H3K4me3 signal was plotted against H3K27me3 signal immediately downstream of the TSS ( Figure 5B, retention genes in red): 34.8% of retention TSSs were bivalent, having average H3K4me3 and average H3K27me3 signals above our defined cutoffs.

Gene expression changes are not limited to H3K27me3-repressed genes
Using previously published RNA-seq data from KARPAS-422 cells treated with DMSO or EZH2 inhibitor (GSE62056, Bradley et al., 2014) , we examined gene expression levels for each cluster ( Figure S5A). Clusters most highly marked with H3K27me3 had the lowest gene expression, while clusters with the greatest H3K4me3 levels showed the highest gene expression.
EZH2 inhibition leads to up-and down-regulation of many genes, as has been reported previously (Bradley et al., 2014) . Up-regulation occurs in genes from all clusters ( Figure S5B and C). Down-regulation, on the other hand, is much more prevalent in the actively transcribed genes of clusters 3 and 5 ( Figure S5B and C), having high H3K4me3 and low H3K27me3. The top PRC2 recruitment genes are nearly all in cluster 1, but more of them are expressed, and more down-regulated by EZH2 inhibition, than other cluster 1 genes (data not shown). Bivalent (H3K27me3 + H3K4me3) genes are approximately three times more likely to be up-regulated than those marked only by only H3K4me3 or H3K27me3 (Table S4). Overall, the patterns of gene expression change show that the transcriptional effects of EZH2 inhibition are not limited to the upregulation of genes directly silenced by H3K27me3.

Germinal Center B-cell H3K4me3/H3K27me3 bivalent genes are highly enriched in retention genes
To explore the function of the retention genes, we performed gene set enrichment analysis against the MSIGDB C2 collection (Liberzon et al., 2011;Subramanian et al., 2005) , and identified conserved H3K27me3 and PRC2 component target genes in ES cells as enriched and conserved across all cell lines ( Figure 4C, D). Recent studies in GCB-DLBCL have identified additional gene sets that include GCB-specific H3K27me3 and bivalent targets, genes upregulated on EZH2 inhibition or knockdown, and genes upregulated on exit from the germinal center to form plasma/memory B cells (Béguelin et al., 2013) as well as H3K4me3/H3K27me3 bivalent genes in germinal center B cells (Béguelin et al., 2016) . We tested for enrichment of these categories among retention genes in different chromatin states ( Figure 5C, left), and identified high enrichment across multiple categories, with the highest enrichment for GCB-specific bivalent genes in our bivalent retained genes. Retained and bivalent retained genes also showed strong enrichment for plasma/memory cell signature genes and naive B cell bivalent genes, but not genes upregulated on EZH2 knockdown. The enrichment for GCB bivalent gene and plasma/memory cell signature genes is of particular interest, as it has been proposed that cells undergoing the germinal center reaction temporarily repress genes required for terminal differentiation by means of PRC2-mediated H3K27me3 addition to their promoters. Cancer cells harboring EZH2 activating mutations would hyper-repress these genes, thereby preventing the cells from exiting the proliferative state. The significant overlap between the KARPAS-422 bivalent retention genes and the GCB bivalent genes confirms the previously reported chromatin state at these genes in DLBCL, but also suggests that their hyper-repression in DLBCL is not readily lifted by EZH2 inhibition. Similarly, the enrichment of the plasma/memory cell signature amongst retention genes confirms previous reports of PRC2-mediated suppression of terminal differentiation in DLBCL, while suggesting a resistance to removal of the H3K27me3 mark in response to EZH2 inhibition.

Enrichment of GCB-bivalent and plasma/memory cell signature genes is conserved between sensitive and insensitive DLBCL models
As there is significant overlap between retention genes and the plasma/memory cell signature in KARPAS-422, we asked whether this is a common phenomenon in other DLBCL cell lines. RL is an EZH2-mutant carrying DLBCL cell line that is insensitive to EZH2 inhibition; interestingly, PRC2 redistribution and H3K27me3 retention were still observed in RL upon EZH2 inhibition, and about 65% (751/1156) of the retention genes in RL are also retention genes in KARPAS-422 ( Figure 4C). As in KARPAS-422, retention genes in RL show significant overlap with both the plasma/memory cell signature and with GCB bivalent genes ( Figure 5C, middle). In fact, genes that are retention genes in both RL and KARPAS-422 also show significant overlap with both the differentiation and GCB bivalent signatures. That both of these EZH2-mutant cell lines repress these genes with H3K27me3 is consistent with germinal cell biology and the model of hyper-repression in DLBCL. However, EZH2 inhibition kills KARPAS-422 cells but not RL cells. The similarity of PRC2 accumulation and H3K27me3 retention between these cell lines suggests that de-repression of H3K27me3-marked differentiation genes does not fully explain sensitivity to cell killing.

Retention of H3K27me3 and redistribution of PRC2
In this study we examined genome-wide changes in H3K27me3 and PRC2 upon EZH2 inhibitor treatment of cancer cell models. Consistent with previous data, we observed global loss of H3K27me3 upon inhibitor treatment; however, at a small subset of genomic loci a significant fraction of H3K27me3 is retained, accompanied by accumulation of PRC2. We also found that H3K27me3 retention and PRC2 accumulation tends to occur at genomic loci with high H3K27me3 level at baseline. Focal H3K27me3 retention has been reported in a variety of contexts, including EZH2 inhibition (Xu et al., 2015) , deletion of EZH2 in a mouse model, and up-regulation of NSD2 (Popovic et al., 2014) . Accumulation of PRC2 components has been previously reported upon NSD2 up-regulation (Popovic et al., 2014) , mutation of H3.3K27  , and epithelial-mesenchymal transition (Malouf et al., 2013) . Here, we report a striking level of focal PRC2 accumulation in the context of EZH2 inhibition.  (Oksuz et al., 2018) . The same mechanism of PRC2 recruitment to H3K27me3 and methylation of an adjacent histone tail could support maintenance of H3K27me3 domains in response to dilution with new nucleosomes during genome replication, or even removal of the histone mark by enzymatic demethylation. We postulate that during EZH2 inhibitor treatment, as the deposition of H3K27me3 is inhibited, these H3K27me3 domains can no longer be maintained across cell divisions. However, the H3K27me3-independent recruitment of PRC2 to the nucleation sites is still in effect. With the same amount of PRC2 available in the nucleus, and reduced binding sites available, PRC2 may accumulate on the remaining loci. This is similar to the mechanism proposed in a different context (high levels of the H3K27me3-competing H3K37me3 mark) by Popovic et al. (2014) : loss of accessibility to PRC2 binding to H3K27me3 would lead to an increase in nucleoplasmic free PRC2 concentration and greater recruitment to remaining binding sites. Our hypothesis is supported by the observation that many of the retention sites are found at CpG islands where PRC2 may be recruited by factors that may include histone-free linker DNA, MTF2, KDM2B-recruited PRC1 and JARID2 (Herz and Shilatifard, 2010;van Mierlo et al., 2019;Oksuz et al., 2018;Perino et al., 2019;Wang et al., 2017) . The presence of H3K27me3 at these PRC2 accumulation sites could be due to incomplete inhibition of EZH2 activity, combined with the greater levels of PRC2. As our EZH2 inhibitors are 100 fold more selective toward EZH2 than to EZH1 (Bradley et al., 2014) , it is also possible that the residual H3K27me3 in the cells after EZH2 inhibition at least in some part reflects the activities of EZH1-containing PRC2.
We observe a small peak of EZH2 or SUZ12 at TSSs in clusters 3 and 5, with actively transcribing genes. This is consistent with observations that PRC2 is recruited to nucleosome-free regions of chromatin (Wang et al., 2017) , and models that suggest PRC2 "samples" open chromatin throughout the genome, only writing H3K27me3 at inactive genes. Our observation of loss of that binding with EZH2 inhibition suggests that recruitment or residence time at active TSSs is impaired, either by loss of enzymatic function or by interference with a binding function at the active site.
Our observation that PRC2 accumulates preferentially at CpG islands upon EZH2 inhibition supports the hypothesis that these sites are the PRC2 nucleation sites in the genome. However, only 12.9% of the CpG islands are occupied by a retention site in EZH2 inhibitor treated KARPAS-422 cells, which may be explained by additional recruitment and retention requirements such as the state of DNA methylation and lack of gene transcription. Interestingly, our analysis showed that retention sites are better conserved among cell lines from the same tissue of origin, and are only moderately conserved between cell lines from different tissues. The observations point to a dynamic PRC2 nucleation site selection strategy; although PRC2 nucleates at CpG islands, different sets of CpG islands are utilized for PRC2 nucleation in different tissue types, reflecting the tissue-specific chromatin states and expression patterns that also influence recruitment.

Up-regulation of bivalent genes is independent of sensitivity to EZH2 inhibitor
Given that H3K27me3 is known to maintain gene repression, we might expect that EZH2 inhibition would lead to up-regulation of H3K27me3-marked genes. While we do see some up-regulation of H3K27me3-marked genes, there are many up-regulated genes that do not have H3K27me3, and many H3K27me3-marked genes that are not up-regulated --including some that are down-regulated. This broad range of up-regulation could be due to secondary effects of genes that are induced upon loss of H3K27me3, but could also be due to other mechanisms. We do see greater up-regulation of bivalent (H3K27me3 + H3K4me3) genes than genes having either mark alone, suggesting that some of the up-regulation is a direct effect of H3K27me3 reduction at poised genes. Down-regulation, on the other hand, is largely limited to genes that are not marked by H3K27me3, and is not explained by direct H3K27me3 targeting.
GCB-DLBCL often has activating mutations of EZH2, and/or high expression of EZH2. In the germinal center reaction required for normal immune function, there is rapid cell division accompanied by somatic hypermutation for antibody maturation; cells in the germinal center reaction are maintained in a proliferative state partly by PRC2-mediated repression of differentiation genes that lead to cellular differentiation into plasma and memory B cells (Béguelin et al., 2013 and references therein) . Many of these genes have bivalent promoters carrying both H3K4me3 and H3K27me3, reflecting the transient nature of their silencing. It has been proposed that the PRC2 activating mutation in GCB-DLBCL hyper-methylates the promoters of these differentiation genes, making the activation of these genes unlikely and trapping the cells in an indefinite proliferative state. EZH2 inhibitor treatment leads to partial depression of many of the differentiation genes; however, as both sensitive and insensitive GCB-DLBCL cell lines demonstrate similar retention of H3K27me3 at differentiation genes, loss of H3K27me3 leading to direct up-regulation of expression does not fully explain the variation in sensitivity to EZH2 inhibitors.

Author Contributions
The experimental work was conceived by PT and CCY. CCY designed and carried out the experimental work and conceived of the experimental/bioinformatics integration. Bioinformatic analyses and visualizations were designed and carried out by AJ, BB and GTK. Defining retention sites using chromatin state models was conceived by AJ. Work between Constellation and NUS was organised and coordinated by BB and GTK. CCY, AJ, GTK and BB wrote the initial drafts. All authors reviewed and edited the final manuscript.

Declaration of Interests
BB and PT are employees and shareholders of Constellation Pharmaceuticals. At the time this work was carried out, CCY was an employee and shareholder of Constellation Pharmaceuticals.
Normalization . For quantitative comparison of genome-wide mark patterns between DMSO and EZH2 inhibitor treated samples, Drosophila chromatin and Drosophila H2AV antibody (Active Motif 39715) were added to regular ChIP assays as described previously (Egan et al., 2016) . For visualization, the total ChIP-seq signal was scaled to equalize the Drosophila component across samples for a given antibody (Egan et al., 2016) . For data generated with fly chromatin spike-in, we created a combined human (hg19) and Drosophila (dm6) genome file from UCSC by combining the FASTA files [ http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit and http://hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/dm6.2bit ]. Human chromosomes and Drosophila chromosomes were labeled differently to allow later separation. For ChIP-seq data generated without fly chromatin spike-in (H3K27me3 and EZH2 data for SU-DHL-6, KMS28BM, and KMS28PE, and EZH2 data for RPMI8226), we used normalization factors estimated based on western blot experiments. Only the human (hg19) genome file from UCSC was used for this data. Normalization factors for all ChIP-seq samples are listed in Table S2.
Mapping and coverage . FASTQ files containing 50-base-pair reads were mapped to either the combined human-fly genome or human genome using Bowtie 2 version 2.2.5 and compressed to BAM files using Samtools version 1.2; samtools was also used to sort and index the aligned read files. Duplicate reads were removed using the MarkDuplicates function in picard-tools, with lenient validation stringency. For reads mapped to the combined genome, mapped reads were then separated into reads mapped to human and fly chromosomes using the differential annotation used for the chromosomes. BamToBed from the Bedtools suite (version 2.24) converted the BAM files to intervals; bedtools' slop function along with chromosome sizes obtained from UCSC was used to extend the reads to the 200bp fragment size.
Visualizations . Normalized BEDGRAPH and BIGWIG files were generated using the normalization factors; details are in the supplemental methods. For viewing in IGV (Robinson et al., 2011;Thorvaldsdóttir et al., 2013) , BEDGRAPH files were converted to BIGWIG files using bedGraphToBigWig utility from UCSC. Within each IGV plot, y-axis scales are consistent across the genes for each mark. A heatmap for global mark levels was generated using ngsplot (Shen et al., 2014) . The program was modified to use custom normalization factors. Clustering was done using K means clustering. Average profiles were obtained by ngsplot, again modified to use custom normalization factors, based on the coverage from BAM files. Scatter plots of change of ChIP-seq signal at TSS used log fold change as described in the supplemental methods.

Model generation
Because the PRC2 components and H3K27me3 marks have broad and noisy signals, we chose to use chromatin states defined across multiple ChIP-seq tracks to assess chromatin state. We applied ChromHMM (version 1.0) (Ernst and Kellis, 2012) to the EZH2 and H3K27me3 tracks treated with DMSO or EZH2 inhibitor. Our code for running ChromHMM on our data is available upon request. H3K27me3 and EZH2 signal near TSSs showed patterns we were most interested in. Therefore we developed the ChromHMM models using ChIP-seq data within a region around TSSs starting 500 nucleotides upstream and going 2100 nucleotides into each gene. TSSs were defined using the transcription start site information from the hg19 refGene table from UCSC, created as described in supplemental methods. The txStart position was used for + strand genes, and the txEnd position was used for the -strand genes.
ChromHMM input files were generated from the BED files only containing isolated TSSs. From the original set of genes, we created BED files representing the range -500 to +2100 around each TSS. We filtered to keep only TSSs with non-overlapping regions; this step removed potentially confusing regions containing multiple TSSs. Minus stranded genes were flipped in order to minimize strand-bias. Each gene was taken as a separate chromosome so that segmentation by ChromHMM will occur at the same relative distance away from TSSs. These input files were binarized with BinarizeBed function then were used to generate a model using LearnModel function (with options -p 6 -s 123 -r 1000 -printposterior).

Model application
To apply the ChromHMM model genome-wide, we ran BinarizeBed on the full-genome ChIP-seq data. We then used MakeSegmentation (with option -printposterior) to apply the learned model to the genome-wide binarized data. The output segment file was filtered to keep only segments having posterior probability at least 0.8.

Retention site identification
Retention state was defined to be the state with high emission probability (at least 0.75) of either EZH2 or H3K27me3 mark upon EZH2 inhibition. All segments in the retention state were taken to be retention sites (loci). Retention genes were identified by finding genes with its -500 +2100 regions around TSS with retention sites.

Bivalent retention identification
To identify bivalent retention sites, the same pipeline was used with 8 input tracks (4 marks --H3K27me3, EZH2, SUZ12, and H3K4me3 --with and without EZH2 inhibition). Bivalent retention state(s) were defined as the state with high emission probability (at least 0.9) of both EZH2 and H3K4me3. All segments in the bivalent retention states were taken to be bivalent retention loci. Bivalent gene candidates were first identified by finding genes with its -500 +2100 regions around TSS with bivalent retention sites. This list of genes were compared to the retention genes list obtained earlier in the 2 marks analysis. The final list of bivalent genes were determined based on whether the bivalent gene candidate was also picked up by the earlier analysis as a retention gene. Emission probabilities of the retention states can be viewed in Table S1.
Finding overlap to enhancer sites, promoters, CpG islands Retention sites overlapping various genomic regions were identified by intersecting the respective files using bedtools intersect (Quinlan, 2014) . Fold enrichment was calculated based on the relative fraction of coverage by retention sites, using ChromHMM's OverlapEnrichment function.

RNA-seq data processing
KARPAS-422 cells treated with DMSO or EZH2 inhibitor were harvested and total RNA was isolated with Qiagen RNeasy Plus Mini kit (Qiagen 74136). Downstream sample processing, library generation and deep sequencing were carried out using the services of Ocean Ridge Biosciences (https://www.oceanridgebio.com/). RNA-seq data was processed in R/Bioconductor using library DESeq2. Gene features were obtained from TxDb.Hsapiens.UCSC.hg19.knownGene using exonsBy() with parameter by="tx". DESeq2 function summarizeOverlaps was run on the aligned BAM files using those exon features, Union mode, singleEnd=FALSE, ignore.strand=TRUE, and fragments=TRUE. DESeq was run on the dataset after specifying which samples were treated with EZH2i. The results were written to a CSV file for further analysis (Karpas_DESeq_output.R). Transcripts with base mean expression lower or equal to 9 were considered no expression. Transcripts with base mean expression higher than 9, but adjusted p-value higher than or equal to 0.1 were considered not differentially expressed. Transcripts with base mean expression higher than 9 and adjusted p-value lower than 0.1 were considered differentially expressed. Among them, if log2foldchange value was greater than 0, the transcripts were considered as upregulated. The rest were considered as downregulated.

Signature analysis
Web-based signature analysis was performed the MSigDB "Investigate Gene Sets" tool (Subramanian et al., 2005) at the Broad Institute, with default parameters and the C2 (curated gene sets) category (Liberzon et al., 2011) . In order to include additional gene sets, we downloaded the full gene sets from the Broad Institute, as well as supplemental data from the respective papers, and used the GeneOverlap (v.1.14.0) Bioconductor package (Shen, 2019) to calculate p values based on the Fisher's exact test, as well as an odds ratio for strength of association. When plotted as heatmaps, p values were converted to false discovery rate estimates based on the total number of tests, and plotted for tests with FDR < 0.1 and odds ratio >= 2. Figure S1 : Scatter plot of 10,000 randomly sampled 1kb regions throughout the genome, showing strong correlation between H3K4me3 signal regardless of treatment.   Violin plots of log2 fold change of gene expression. Clusters 1, 2 and 4 show more up-regulated genes than down-regulated genes, and clusters 3 and 5 show more down-regulated genes than up-regulated genes. Bottom: Pie charts showing fraction of genes that are up-or down-regulated in each cluster: no expression (white), no differential expression (gray), up-regulated (blue), down-regulated (red). The pie charts show that gene expression changes are not consistent within each cluster. (B) The table shows the number of genes in each pie chart. For numbers, see Table S3.   . The size of the intersection is shown as a vertical bar, and the identity of the intersecting sets is shown by the connected dots below the bars. Horizontal bars (left) show the total number of retention loci in each cell line. (B) Enrichment of H3K27me3 sites at retention loci were calculated for each intersection with a Fisher's exact test, followed by a Benjamini-Hochberg correction based on the number of intersections tested. Each bar is shown as the estimate of the Fisher's exact test (a form of enrichment over chance). Bars are shaded based on FDR < 0.05 (corrected for the number of distinct intersections tested). (C) Enrichment of intersections over chance. Chance intersections were simulated by taking random draws of the observed number of retention loci for six cell lines from a gene pool of the same size as the data. 100 such random gene sets were drawn, and intersection sizes calculated. The enrichment was calculated as (1 + observed intersection size)/(1 + max(chance intersection size)) to avoid dividing by zero.