A comprehensive single cell data analysis of in lymphoblastoid cells reveals the role of Super-enhancers in maintaining EBV latency

We probed the lifecycle of EBV on a cell-by-cell basis using single cell RNA sequencing (scRNA-seq) data from nine publicly available lymphoblastoid cell lines (LCL). While the majority of LCLs comprised cells containing EBV in the latent phase, two other clusters of cells were clearly evident and were distinguished by distinct expression of host and viral genes. Notably, both were high expressors of EBV LMP1/BNLF2 and BZLF1 compared to another cluster that expressed neither gene. The two novel clusters differed from each other in their expression of EBV lytic genes, including glycoprotein gene GP350. The first cluster, comprising GP350−LMP1hi cells, expressed high levels of HIF1A and was transcriptionally regulated by HIF1-α. Treatment of LCLs with Pevonedistat, a drug that enhances HIF1-α signaling, markedly induced this cluster. The second cluster, containing GP350+LMP1hi cells, expressed EBV lytic genes. Host genes that are controlled by super-enhancers (SEs), such as transcription factors MYC and IRF4, had the lowest expression in this cluster. Functionally, the expression of genes regulated by MYC and IRF4 in GP350+LMP1hi cells were lower compared to other cells. Indeed, induction of EBV lytic reactivation in EBV+ AKATA reduced the expression of these SE-regulated genes. Furthermore, CRISPR-mediated perturbation of the MYC or IRF4 SEs in LCLs induced the lytic EBV gene expression, suggesting that host SEs and/or SE target genes are required for maintenance of EBV latency. Collectively, our study revealed EBV associated heterogeneity among LCLs that may have functional consequence on host and viral biology. Importance Epstein-Barr virus (EBV) establishes a life-long latency program within host cells. As such, EBV immortalized lymphoblastoid cells (LCLs) often carry the latent EBV genome and only a small percentage of LCLs containing lytic EBV. However, the cellular programs that distinguish latent from lytic cells and the heterogeneity of cells in latent or lytic phases remains poorly explored. To explore these unknowns, we reanalyzed publicly available single cell RNA-seq data from nine LCLs. This approach permitted the simultaneous study of cells in both latent and lytic phases. We identified three cell populations with distinct lytic/latent activity and further characterized the transcriptomes of these cells. We also identified a new role of super-enhancers in regulating EBV lytic replication. Collectively, our studies revealed EBV associated heterogeneity among LCLs that contribute to EBV life cycle and biology.


Results 119
Single cell RNA-sequencing analyses resolve LCLs into three distinct populations 120 To better understand how EBV in infected cells spontaneously enter the lytic life cycle, 121 we analyzed publicly available single cell RNA-sequencing (scRNA-seq) data from nine LCLs 122 from several independent sources (see Methods) (33,(35)(36)(37)(38)(39). Briefly, we performed an unbiased 123 integrative analysis across all these LCLs after regressing for potential batch effects, doublets 124 and/or artifacts and known sources of heterogeneity, such as the stage of cell cycle using the 125 Seurat platform (40) (Figs. S1a-c-see methods). Unsupervised clustering of all 46,205 cells 126 according to expressions of both host and viral genes at three different resolutions yielded 127 several clusters (Fig. S1d). Further examination of these clusters based on the expression of 128 salient EBV genes, including GP350 and LMP1/BNLF2, and separation in UMAP space revealed 129 that these clusters fall into three major groups according to the status of EBV gene expression, 130 namely latent, early lytic and full lytic EBV cell clusters. The multiple clusters corresponding to 131 EBV in the latent state were recently examined thoroughly (33). Since our focus was mainly on 132 understanding the biology of EBV lytic life cycle, we combined all the latent cells into one 133 cluster, resulting in three major clusters (Fig. 1a). These were evident in all LCL datasets 134 examined (Figs. S1a-b). These clusters contained GP350 -LMP1 lo , GP350 -LMP1 hi and 135 GP350 + LMP1 hi cells, representing cells with EBV in the latent, early lytic and fully lytic states 136 (Figs. 1a, S1e). Approximately 50-100 host genes were differentially expressed in each cluster 137 compared to all other clusters (Figs. 1b, S1f and Table S1). Consistently, GP350 + LMP1 hi cells 138 was the cluster expressing the most EBV genes, while GP350 -LMP1 lo cells represented the 139 cluster showing the lowest expression of EBV genes (Figs. 1b-d). 140 GP350 -LMP1 lo cells comprised 94-98% of all LCLs across all the samples (Figs. 1a, 141 S1a-b). They displayed minimal expression of LMP1/BNLF2 and minimal or no expression of 142 EBV lytic genes, including GP350, BMRF1, BALF1 and BALF3 (Fig. 1c). Additionally, these 143 cells expressed latency genes, including EBNA1 and EBNA2, indicating that this cluster mainly 144 consisted of transformed cells that were in the EBV latent phase (Fig. 1d). This cluster was also 145 the highest expressor of genes from immunoglobulin heavy or light chains, indicating the mature 146 status of these transformed B cells (Figs. 1e, S1g). Consistently, nearly a quarter of cells in this 147 cluster expressed high levels of PRDM1, indicating that these cells might have entered 148 plasmacytic differentiation (41). 149 The two lytic clusters, GP350 -LMP1 hi and GP350 + LMP1 hi cells, respectively, each 150 accounted for 1-5% of all LCLs (Figs. 1a, S1b). The salient viral feature of both clusters was the 151 high expression of LMP1/BNLF2, a gene with well-established contribution to oncogenic human 152 B-cell transformation (42) and BZLF1. GP350 + LMP1 hi cells were the highest expressors of EBV 153 lytic genes, including GP350, BZLF1 and BMRF1, while GP350 -LMP1 hi cells express very few 154 lytic genes (Fig. 1c). Remarkably, these two clusters had distinct expressions of host genes (Figs. 155 1e, S1f). Consistent with previous reports (33, 34), GP350 + LMP1 hi cells highly expressed host 156 NFATC1, MIER2, SFN and SGK1 genes and were the highest expressors of host box-dependent 157 myc-interacting protein 1 (BIN1). Conversely, GP350 -LMP1 hi cells had the highest expression of 158 host genes HSPB1, ABCB10, MALAT1 and CD44 (Fig. 1e). 159 To obtain insights into the functional state of cells in each cluster, we performed geneset 160 enrichment analysis (GSEA), comparing the transcriptomes of cells in each cluster against those 161 of cells from the other two clusters (Fig. 1f), and querying enrichment of all 50 hallmark 162 genesets curated by the Molecular Signatures Database (MSigDB) (43). Genes differently 163 regulated in GP350 -LMP1 lo cells were significantly enriched in MYC targets, MTORC1 164 signaling and inflammatory response. Conversely, genes differently regulated in GP350 -LMP1 hi 165 cells were enriched in tumor necrosis factor alpha signaling, apoptosis and hypoxia (Fig. 1f). As 166 expected, genes differently regulated in GP350 + LMP1 hi cells were significantly depleted of 167 genesets from most hallmark pathways, including MYC targets, MTORC1 signaling and 168 interferon responses (Figs. 1f, S1h). This is consistent with the fact that fully lytic EBV 169 reactivation pauses transcription of most host genes and pathways, which is evidenced by 170 significantly reduced numbers of total host transcripts in lytic cells (Fig. S1i). 171 Collectively these analyses indicated that LCLs are predominantly comprised of three 172 distinct cell populations characterized by differences in expression of both host and viral genes, 173 notably cells containing EBV in the latent phase (GP350 -LMP1 lo ), cells containing virus in the 174 lytic phase (GP350 + LMP1 hi ) and cells that were in between lytic and latent phases (GP350 -175 LMP1 hi ). Furthermore, these data suggested that distinct functional states of individual LCL 176 clusters may be related to expression of genes encoded by EBV and the host cell. We next explored the transcriptional regulators of host gene expression. Our attention 182 was drawn to HIF1A because GP350 -LMP1 hi cells were high expressors of several genes 183 including HSPB1, MALAT1 and CD44 (Fig. 1e) that in other settings are known to be regulated 184 by hypoxia or HIF1-α (44-46) and because our GSEA analysis had also indicated that the 185 transcriptomes of these cells are highly enriched in the hypoxia gene set (Fig. 1f). HIF1-α is a 186 critically important TF that is tightly regulated by oxygen tension and transactivates many genes 187 essential for cellular responses and adaptation to hypoxia (47). To better characterize GP350 -188 LMP1 hi cells, we therefore quantified the mRNA expression of HIF1A, the gene that encodes 189 HIF1-α, and its classical direct target PDL1 (48) in all clusters. HIF1A and PDL1 were both 190 more highly expressed in GP350 -LMP1 hi cells compared to others (Fig. 2a). This was 191 specifically evident for HIF1A as its expression levels were significantly higher in GP350 -192 LMP1 hi cells (Figs. 2a, S2a). To determine whether the changes in HIF1A expression could have 193 any functional consequence, we next assessed the expression of HIF1-α target genes. We 194 sourced a public list (49) of HIF1-α-induced (n=110) and HIF1-α-repressed (n=77) genes from 195 MSigDB (50) and assessed the expression of these two sets in all 3 identified LCL clusters (Fig. 196 2b and Table S2). GP350 -LMP1 hi cells had the highest and lowest expressions among all 197 clusters for HIF1-α-induced and HIF1-α-repressed genes, expressed as the module score (51), 198 respectively ( Fig. 2b). We confirmed these findings using two additional independent lists of 199 HIF1-α-regulated genes (52) (Fig. S2b-c and Table S2). 200 We next predicted pharmaceutical agents that could induce the unique gene signatures of 201 cells in the GP350 -LMP1 hi cells, using methods established previously by our group (32,53). 202 Among the topmost significant drugs predicted to enhance host gene expression pattern of 203 GP350 -LMP1 hi cells was Pevonedistat (MLN4924) (Fig. 2c). Pevonedistat is a ubiquitin-204 activating enzyme E1 inhibitor that significantly stabilizes HIF1-α to potentiate its function (54). 205 Because the HIF1-α pathway was one of the main features of GP350 -LMP1 hi cells, we 206 hypothesized that enhancing HIF1-α signaling might preferentially induce this program. To test 207 this hypothesis, we treated LCLs with Pevonedistat and measured HIF1A, LMP1, PDL1 and 208 GP350. HIF1-α potentiation markedly induced mRNA expression of HIF1A, LMP1 and PDL1 209 ( Fig. 2d), but not GP350 or BZLF1 (Fig. 2e). To further substantiate these observations at the 210 single cell level, and confirm expression of protein, we treated three different LCLs with 211 Pevonedistat and performed flow cytometry. Pevonedistat reduced cell viability by nearly 30% 212 (Figs. 2f, S2e). The frequency of LMP1 + cells was significantly increased (Figs. 2g-h, S2f-g) 213 without increasing that of ZTA (Figs. 2i, S2h). The frequency of PDL1 + cells and PDL1 214 expression were also significantly increased upon treatment (Fig. 2j-k, S2i). We have also 215 performed dose titration of Pevonedistat and have observed dose dependent increase of LMP1 216 and PD-L1, but not BZLF1, in gated live cells (Fig. S3), suggesting that Pevonedistat 217 preferentially induce LMP1 + cells without increasing full lytic cycle. 218 219 GP350 -LMP1 lo LCLs have distinct MYC-dependent transcriptional programs 220 We next focused on transcriptional regulators of GP350 -LMP1 lo LCLs, the cluster 221 containing EBV in the latent phase. MYC-regulated genes were among the top affected pathways 222 when comparing transcriptomes of LCL clusters against each other (Fig. 1f) and box-dependent 223 myc-interacting protein 1 (BIN1) was one of the top host genes distinguishing GP350 + from 224 GP350cells (Fig. 1e). Moreover, MYC has been described as a key host factor repressing EBV 225 lytic reactivation (55). Thus, we further examined the role of MYC in shaping the distinct LCL 226 clusters. Because MYC is a transcription factor, we first determined the fraction of differently 227 expressed genes in each cluster directly bound by MYC. For this, we sourced a publicly 228 available ChIP-seq dataset for MYC in GM12878 (GSM822290, curated by ENCODE). Nearly 229 18-24% of genes differently expressed in each cluster were directly bound by MYC, with 230 GP350 + LMP1 hi cluster having the most number of MYC targets ( Fig. 3a and Table S2). This 231 was significantly higher than what would be expected by chance because only ~10% of all 232 human genes are bound by MYC in GM12878 (Fig. Sa). 233 The mRNA expression of MYC was significantly higher in GP350 -LMP1 lo LCLs than in 234 either of the other two clusters (Fig. 3b). To determine whether MYC is biologically active, we 235 looked for the signature of genes regulated by MYC. We curated a list of genes regulated by 236 MYC in GM12878 from a publicly available dataset (55) ( Table S2). Expression of MYC-237 induced genes was significantly higher (Fig. 3c, left panel) and MYC-repressed genes 238 significantly lower (Fig. 3c, right panel) in GP350 -LMP1 lo cells than in the other two clusters. 239 We also performed GSEA comparing transcriptomes of cells from each cluster against MYC 240 targets curated by MSigDB (43). Consistent with our earlier observation (Fig. 1f), genes that 241 were more highly expressed in GP350 -LMP1 lo cells were highly enriched in MYC targets ( Fig.  242 3d, left and right panels), while there was no significant difference between LMP1 hi clusters 243 ( Fig. 3d, middle panel). Collectively, these data indicated that MYC preferentially regulates a 244 subset of genes that are differently expressed in GP350 -LMP1 lo LCLs. 245 246 Super-enhancer-regulated genes are less highly expressed in GP350 + LMP1 hi LCLs 247 EBV-infected cells periodically enter the lytic phase to produce progeny viruses but in 248 EBV-immortalized lymphoblastoid cells EBV is mostly in the latent state. Earlier studies have 249 shown that a small percentage of these cells are lytic (30). However, due to the technical 250 challenges at the time, it was difficult to distinguish the cells in lytic phase from cells at latency 251 phase in a mixed population. The recent development of scRNA-seq techniques allows us to 252 capture the cells in lytic phase together with their transcriptome. 253 Nearly 10% of genes are regulated by multiple enhancers forming a complex 254 architecture known as "super-enhancers" (SEs). SE-regulated genes are critically important for 255 cell identity (27) and are associated with both Mendelian and polygenic diseases (56, 57) as well 256 as cancers (58). Enhancer-promoter interactions are the cornerstones of mammalian gene 257 regulation. We have previously shown that EBV episomes make reproducible contacts with the 258 human genome at SE loci (23). To explore whether EBV disrupts modes of gene regulation in 259 the three LCL subsets, we sourced a list of 257 annotated SE regulated genes from GM12878 260 (26) and determined whether these genes are differently expressed in the three identified LCL 261 clusters. Unexpectedly, expression of SE-regulated genes, summarized as the module score, was 262 significantly lower in GP350 + LMP1 hi cells, the cluster containing EBV in the lytic state, than in 263 the other two subsets (Fig. 4a). We observed similar results when we used an independent 264 curated set of 187 EBV-associated SEs (22) (Fig. S4a). Examples of such genes included MYC, 265 which contains one of the largest SEs in the genome (22), IRF4, RUNX3, PAX5, IKZF3 and 266 DUSP22 (Fig. 4b). These findings suggested that lytic EBV is associated with disruption of 267 expression of host genes regulated through SEs. To explore this possibility, we performed GSEA 268 analysis comparing the transcriptomes of GP350 + LMP1 + cells to cells from the other two 269 clusters for enrichment of all SE-regulated genes. This orthogonal approach also indicated that 270 genes less highly expressed in GP350 + LMP1 + cells compared to the cells in the other two 271 clusters were markedly enriched in SE-regulated genes (Fig. S4b). 272 We next assessed whether genes in these clusters were differently expressed when their 273 associated SEs physically interact with EBV episomes. To this end, we divided SE-regulated 274 genes into those that physically interact, or not, with EBV episomes and performed GSEA 275 analysis comparing GP350 + cells to the cells of other two GP350subsets. Genes that were more 276 highly expressed in GP350cells were significantly enriched in SEs that interact with EBV 277 episomes (Fig. 4c, left panel). This enrichment was less evident for SEs that do not interact with 278 EBV episomes (Fig. 4c, right panel). To determine the functional consequences of differential 279 expression of SE-regulated genes across LCL clusters, we focused on the transactivator IRF4 and 280 transcription factor RUNX3 for which we could source their direct targets from ChIP-seq 281 experiments and assess the expression of their targets. We noted that expression of both IRF4-282 and RUNX3-bound genes, summarized as the module score, was significantly lower in 283 GP350 + LMP1 hi cells (Figs. 4d, S4c), in which expression of both these TFs was also the lowest 284 ( Fig. 4b). 285 Since GP350 + LMP1 hi cells represented the cluster in which lytic reactivation of EBV was 286 apparent (Fig. 1e), we tested whether EBV reactivation affects the expression of SE containing 287 genes. To this end, we treated EBV + AKATA cells with either anti-IgG or carrier. Anti-IgG is a 288 potent inducer of EBV lytic reactivation in these cells (59). After stimulation, we measured 289 mRNA and/or protein expression of EBV lytic markers and the host SE-regulated gene MYC 290 (Figs. 4e-h). As expected, anti-IgG induced strong expression of the EBV lytic markers RTA, 291 ZTA and BMRF1 (Fig. 4e). In contrast, the expression of both MYC and IKZF3 were 292 significantly repressed following anti-IgG-treatment of cells (Fig. 4f). This effect was 293 specifically a predicate of EBV-reactivation since anti-IgG did not repress MYC or IKZF3 294 expression in EBV -AKATA cells (Fig. 4g). These observations were further confirmed by 295 immunoblots of ZTA, BMRF1 and MYC proteins (Fig. 4h). Consistently, a recent study has 296 shown that depletion of MYC reactivates the EBV lytic cycle (55). To test whether depletion of 297 IRF4 can similarly reactivate EBV lytic cycle, we reanalyzed RNA-seq from GM12878 LCLs 298 that were subjected to IRF4 deletion via the CRISPR/Cas9 system (29). In this setting, depletion 299 of IRF4 induced multiple EBV lytic genes, including GP350, RTA, ZTA and BMRF1 (Fig. S4d). 300 Consistently, a recent study has found that IRF4 knockdown in LCLs induces EBV lytic 301 reactivation in LCLs and lytically infected cells have increased NFATc1 and decreased IRF4 302 expression (34). Collectively, our data suggest that SE-regulated genes are less highly expressed 303 in GP350 + LMP1 hi LCLs, which show evidence of lytic EBV reactivation, and that experimental 304 induction of EBV lytic cycle also represses expression of these genes. 305 306

Disruption of super-enhancers in LCLs induces EBV lytic reactivation. 307
The reciprocal relationship between expression of SE-regulated genes in GP350 + LMP1 hi 308 LCLs and EBV reactivation suggested the possibility that SEs may be necessary for maintenance 309 of EBV in the latent phase. To test this possibility, we initially selected SEs near MYC and 310 IRF4/DUSP22 and performed CRISPR-mediated knockout or inactivation and then measured the 311 expression of EBV lytic markers. For these experiments, appropriate guide RNAs were situated 312 within the SEs at sites of maximal H3K27ac signal, a marker of active regions of the genome, 313 especially promoters and enhancers. The selected sites were bound by one or more viral 314 transcription factors (e.g. EBNA2, LP, 3A and 3C) and/or host NF-κB family members (e.g. 315 RelA, RelB, cRel, p50 and p52) and interacted within a topologically associated domain that 316 contained the SE (Figs. 5a, 5c). Dual sgRNAs targeting both sides of MYC SE (~525 kb 317 upstream) successfully deleted MYC SE from the genome (Fig. S5a), which led to a reduction of 318 MYC transcription and upregulation of EBV lytic genes, namely ZTA, RTA, BGLF5 and 319 BMRF1 expression (Fig. 5b). Similarly, inactivation of IRF4 SE by CRISPR-dcas9 tethered with 320 a repressor consisting of KRAB and the transcription repression domain of MeCP2 successfully 321 reduced IRF4 SE activity (Fig. S5b). This led to decrease in IRF4/DUSP22 mRNA expression 322 and a significant increase of EBV lytic gene expression two days post inactivation (Fig. 5d). 323 Deletion of both MYC and IRF4 genes have been previously shown to induce EBV lytic phase. 324 Unexpectedly, however, when we measured the expression of EBV lytic genes at earlier time 325 points, we observed that EBV lytic genes were significantly induced prior to decrease of IRF4 326 (Fig. S5c). This suggests the possibility that SE might be also necessary for the maintenance of 327 EBV latency. To further explore this possibility, we selected another SE in the same topological 328 associated domain of MIR155HG (Fig. 5e), using the same criteria as above and performed 329 CRISPR-mediated inactivation. The disruption of this SE did not significantly reduce the 330 expression of MIR155HG; however, it significantly increased the expression of EBV lytic genes 331 ( Fig. 5f), namely ZTA, RTA, BGLF5 and BMRF1. CRISPRi disruption of RUNX3 SE also had 332 similar activity (Fig. 5g, h). Collectively these data indicate that deletion of select host SEs leads 333 to lytic reactivation of EBV and, by extension, that host SEs, or their target genes, are necessary 334 for maintenance of EBV in the latent phase. 335

336
Discussion 337 338 LCLs have been instrumental for genetic and functional studies of human diseases over 339 the past several decades (60). We and others have previously analyzed large numbers of LCL 340 bulk RNA-seq data and found that EBV lytic gene expression correlates with cellular cancer-341 associated pathways, such as interferon-alpha, WNT and B cell receptor signaling (4, 30). 342 However, these data were generated from bulk populations of cells, which biases insights 343 towards those occurring in the largest sub-populations. While the majority of LCLs contain EBV 344 in the latent phase of its life cycle, a small fraction (<5%) demonstrate spontaneous EBV 345 reactivation, indicating that LCLs as a whole are heterogeneous. Important aspects of LCL 346 heterogeneity have recently been explored using single cell RNA-sequencing (33). This analysis 347 focused on heterogeneity within and across LCLs with respect to immunoglobulin isotypes, 348 which further associated with pathways involving activation and differentiation of B cells. We 349 have taken an integrative approach to combine the same data with several more datasets that are 350 generated across different conditions and eliminate batch and technical effects. This integrative 351 analysis provides a consistent representation of the data for downstream analyses and thus has 352 the potential to uncover previously undetected biology. Specifically, we found LCLs to have 353 higher heterogeneity in relation to the EBV status than previously appreciated. Specifically, we 354 identified three prominent clusters that were marked by the expression of the EBV genes GP350 355 and LMP1/BNLF2. Of note, the EBV genome has extensive numbers of overlapping genes such 356 as LMP1 and BNLF2a/b, making the quantification of their mRNA expression more challenging 357 (61). This challenge could be further exacerbated by the 3' mRNA capture bias in some of the 358 current single cell technologies. Nevertheless, we showed that these clusters have distinct 359 transcriptional programs and identified MYC and HIF1-α as transcriptional regulators of gene 360 expression. LCLs in the GP350 + cluster expressed SE-regulated genes at significantly lower 361 levels compared to cells in the other two clusters. Physical interactions between SE-containing 362 loci and EBV episomes marked genes in GP350 + LCLs that were particularly lowly expressed. 363 Indeed, in proof-of-principle experiments we found that experimental lytic reactivation of EBV 364 disrupted expression of SE-regulated genes and, conversely, that disruption of SEs induced EBV 365 lytic reactivation. For IRF4 or MIR155HG associated SEs, lytic reactivation after SE disruption 366 occurred prior to IRF4 downregulation, suggesting that these SEs themselves might be necessary 367 for the maintenance of EBV latency. However, further studies are needed to fully discern this 368

observation. 369
In the largest subset of LCLs, annotated as GP350 -LMP1 lo , EBV was clearly in the latent 370 phase. This cluster showed a host gene signature enriched in MYC-regulated targets. As an 371 oncogene, MYC is exquisitely carefully regulated by an archetypal SE. MYC itself binds to the 372 EBV genome origin of lytic replication and suppresses DNA looping to the promoter of the lytic 373 cycle initiator gene BZLF1 (55). MYC depletion reactivates the lytic cycle in different cells (55). 374 Consistent with this, when we deleted the MYC SE, MYC expression was decreased and EBV 375 lytic genes were induced, supporting the role of MYC as a repressor of EBV lytic activation. 376 Thus, it appears that both the MYC gene and its associated SE have a role in maintenance of 377

EBV latency. 378
Another GP350 negative cluster, characterized by high expression of LMP1/BNLF2, was 379 the highest expressor of several host genes including HSPB1, MALAT1 and CD44 that are known 380 features of cancer stem cells (62-64) and escape from apoptosis (65, 66). Interestingly, LMP1 381 alone induces CSC features in nasopharyngeal cell lines (67). However, such characteristics have 382 not previously been ascribed to LCLs and warrant further investigation. LMP1 is a known 383 oncogene and expressed in most EBV associated cancers (68) and has been previously associated 384 with synthesis of HIF1-α protein and its DNA binding activity (69). Here, we also found that 385 GP350 -LMP1 hi LCLs have a prominent HIF1-α signature and could be preferentially induced by 386 Pevonedistat. These cells also expressed higher frequencies of PDL1, which were markedly 387 increased upon Pevonedistat treatment. Interestingly, a recent study has found an association correlation between specific immunoglobulin isotype and cell differentiation markers (33). 400 However, these immunoglobin genes were not specifically characterized in lytic cells. We have 401 found that the mRNA expression of PRDM1 and a range of immunoglobulin genes in 402 GP350 + LMP1 hi cells was lower than in latent cells, which contrasts with previous reports about 403 the role of PRDM1 in EBV lytic reactivation (Fig. S1e). It is possible that PRDM1 is important 404 for initiation, but not maintenance, of the lytic cycle. Another possibility is that the transcription 405 factor activity, but not the overall expression level, of PRDM1 is important for lytic replication. 406 Further study is clearly required to delineate this relationship. 407 In summary, we performed integrative analysis of publicly available single cell RNA-seq 408 data from different LCLs to help resolve their heterogeneity. We identified a novel cluster of 409 cells that are between lytic and latent stage, marked by LMP1 and controlled by HIF1a. We also 410 found that the mRNA expression of super-enhancer target genes is inversely correlated with lytic 411 status of the cells and consistently CRISPR perturbation of super-enhancers increased the 412 expression of EBV lytic genes. Our studies revealed EBV associated heterogeneity among LCLs 413 that contribute to EBV life cycle and biology. GM12878, AKATA EBV positive and AKATA EBV negative cells were cultured in RPMI1640 436 (VWRL0105-0500) media supplemented with 10% fetal calf serum (Gibco or Hyclone), 437 100 unit/mL streptomycin and 100 mg/mL penicillin (Gibco or Life Technologies). HEK293T 438 cells purchased from ATCC were cultured in Dulbecco modified Eagle medium supplemented 439 with 10% fetal calf serum (Gibco), 100 unit/mL streptomycin and 100 mg/mL penicillin. All the 440 cells were maintained at 37 °C in a 5% CO 2 humidified chamber. Cells were routinely confirmed 441 to be mycoplasma negative according to PCR Mycoplasma Detection Kit (ABM Inc.) and were 442 used at low passage (<10) number but were not independently authenticated.

444
CRISPRi repression 445 Plasmid dCAS9-KRAB-MeCP2 (#110821) purchased from Addgene was packaged with 446 lentiviruses and used to transduce LCLs for 2 days followed by selection with 5ug/ml 447 blasticidin for another 5 days. The expression of dCAS9-KRAB-MeCP2 was verified by western 448 blot. sgRNAs targeting genomic sites of interest were designed with online software Benchling 449 (www.benchling.com). sgRNAs were annealed and cloned into LentiGuide-Puro vector 450 according to previously published protocol (76). LentiGuide-Puro containing sgRNAs were 451 packaged into lentiviruses and were used to transduce LCLs stably expressing dCAS9-KRAB-452 MeCP2. Cells were selected with 3ug/ml puromycin for 3 days and then allowed to grow for 453 another 2 days. The list of sgRNAs are provided in Single-cell RNA sequencing analysis 519 10x Genomics Cell Ranger 6.0.2 count (77) was used to align the raw sequencing reads to a 520 customized human (GRCh38) and EBV (NC_007605, obtained from (61)) hybrid reference 521 genome to generate barcode and UMI counts. Seurat (v4) (78) was applied for the downstream 522 analysis and visualization of the data as following: Genes that were expressed in less than 3 cells 523 were discarded. Cells with >20% of their unique molecular identifiers (UMIs) mapping to 524 mitochondrial genes or cells with <250 detected genes were discarded. Only cells with >80% 525 log10 (Genes per UMI) were retained. Cell cycle score for each cell was calculated 526 by Seurat function CellCycleScorin using human cell cycle genes. SCTransform was then 527 used to normalize the dataset using default parameters while regressing out 528 mitochondrial genes and cell cycle scores (S and G2M) and identify variable genes. Doublets 529 were removed by R package DoubletFinder (v2.0) (79). Then, the datasets were integrated based 530 on "anchors" identified among datasets (nfeatures = 2000, normalization.method = "SCT") prior 531 to perform linear dimensional reduction by Principal Component Analysis (PCA). The 532 top 50 PCs were included in a Uniform Manifold Approximation and Projection (UMAP) 533 dimensionality reduction. Clusters were identified on a shared nearest neighbor (SNN) graph the 534 top 50 PCs with the Louvain algorithm using three resolutions (i.e., 0.1, 0.3, 0.5). the clusters 535 corresponding to latent EBV life cycle were combined as one cluster for the downstream 536 analyses. Differential gene expression was determined by "FindMarkers" function on SCT 537 normalized expression values with the default Wilcox Rank Sum test either as one versus rest or 538 as a direct comparison with default parameters except logfc.threshold = 0. The cell annotation 539 was based on the EBV genes and top differentially expressed genes. Gene list module scores 540 were calculated with Seurat function AddModuleScore (51). This calculates the average scaled 541 expression levels of each gene list, subtracted by the expression of control feature sets (n= 542 100). All the displayed expression values on violin plots, feature plots and dot plots are SCT 543 normalized expression values. The IRF4 bound genes are sourced from the ChIP-Atlas "Target 544 Genes" database (80) with options: "hg38" as the genome and "+/-5kb" as distance from 545 TSS. Target genes with binding score not less than 500 in GM12878 cells are selected. All 546 genesets used in this study are provided in Table S2. 547 548 Gene set enrichment analysis (GSEA) 549 GSEA was performed using pre-ranked mode and "No Collapse" options. The pre-ranked gene 550 lists were ranked by the SCT normalized expression fold-change between comparison groups. 551 EBV-contacted and EBV-non-contacted genesets are curated from our previous study (23) and 552 provided in Table S2.