Mesenchymal-epithelial transition in lymph node 1 metastases of oral squamous cell carcinoma is 2 accompanied by ZEB1 expression 3

Oral squamous cell carcinoma (OSCC), an HPV-negative head and neck cancer, frequently 17 metastasizes to the regional lymph nodes but only occasionally beyond. Initial phases of 18 metastasis are associated with an epithelial-mesenchymal transition (EMT), the consolidation 19 phase is associated with mesenchymal-epithelial transition (MET). This dynamic is referred to 20 as epithelial-mesenchymal plasticity (EMP). While it is known that EMP is essential for cancer 21 cell invasion and metastatic spread, less is known about the heterogeneity of EMP states within 22 a tumor and even less about the heterogeneity between the primary and metastatic lesions. lymph node metastases. The lack of E-cadherin mRNA expression suggests this is a partial 41 MET. This study reveals that EMP enables different partial EMT and epithelial phenotypes of OSCC 44 cells, which are endowed with capabilities essential for the different stages of the metastatic 45 process, including maintenance of cellular integrity. During MET, ZEB1 appears to be 46 functionally active, indicating a more complex role of ZEB1 than mere induction of EMT.

a tumor and even less about the heterogeneity between the primary and metastatic lesions. 23

Methods 24
To capture heterogeneity of EMP states in OSCC, we performed single-cell RNA sequencing 25 (scRNAseq) of 5 primary tumors and 9 matching lymph node metastases and re-analyzed 26 publicly available scRNAseq data of 9 additional primary tumors. To account for possible bias 27 in cell type compositions by scRNAseq, these were also deconvoluted from bulk transcriptome 28 analyses. Protein expression of selected genes were confirmed by immunohistochemistry. 29

Results 30
From the 23 OSCC lesions the single cell transcriptome of a total of 7,263 carcinoma cells 31 was available for in-depth analyses. We initially focused on one lesion to avoid inter-patient 32 heterogeneity as a confounding factor and identified OSCC cells expressing genes 33 characteristic of different epithelial and partial EMT stages, such as keratins and SPRR1B 34  Figure 2). CNVs were inferred using the R package inferCNV version 1.6.0 with the not 217 normalized, filtered count matrix including all cells as input and algorithm run in "samples" 218 mode . Inferred CNVs of mitochondrial genes were excluded. Differential gene expression was 219 performed by calculating the log2 foldchanges between one cluster and all other cells from the 220 subset based on log-normalized data using NormalizeData function and scale factor of 10,000. 221 We filtered for genes with log2 foldchange greater than 0.25 and a minimum percentual 222 expression of at least 10 % within the cluster or all other cells. For calculating the cosine 223 similarity, we did not filter the log2 foldchanges. GSEA was performed using the "fgsea" R 224 package version 1.16.0, log2 foldchanges from differential gene expression and gene ontology 225 biological processes (GO:BP,C5 v7.1) as well as hallmark gene sets (H, v7.1) downloaded 226 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint from MSigDB database (24-26). Gene sets were included if they have at least 15 genes or at 227 maximum 500 genes within the gene set using 10,000 permutations. 228 For deriving epithelial differentiating and pEMT gene signatures of patient 1 we calculate 229 foldchanges between the "epi" and "pEMT" cluster and included genes with log2 foldchanges 230 greater or lesser than 1, respectively, and with at least 10 % of either epi or pEMT cells 231 expressing that gene. Gene set variation analysis (GSVA) was performed using the R package 232 GSVA version 1.38.2 with default settings, i.e., gaussian kernel (27). As input, EMTome 233 signatures, the pEMT and epithelial differentiation 1 and 2 signature from Puram et al. (8) and 234 the three EMT and the epithelial senescence signatures from Kinker et al. were used (12). 235 Trajectories were inferred using SlingShot version 1.8.0 with malignant cell clusters as shown 236 in Figure 2A of patient 1 and the first 20 PCs (28). RNA velocity was inferred using the VeloCyto 237 python and R package (version 0.6) (29). Creation of the loom file was done using default 238 options and gene annotations as used for Cellranger processing. Genes were filtered based 239 on the minimum average expression magnitude with a threshold of 0.05 for spliced and 0.02 240 for unspliced reads. Velocity estimates were calculated using as distance the inverse 241 correlation coefficient of the PC embedding correlation matrix, the top/bottom 2 % quantiles for 242 gamma fit, 50 neighboring cells projecting 1 deltaT into the future and projected on the UMAP 243 using 200 neighbors and 50 grid points. 244 Transcription factor activity was inferred using the VIPER algorithm (version 1.24.0) and 245 regulons from the DoRothEA database (version 1.2.2) (30-32). Hierarchical clustering in the 246 heatmaps was performed using Euclidean distance and ward.D2 method unless otherwise 247 noted. Visualization was performed using ggplot2 version 3.3.3 and ComplexHeatmap version 248 2.6.2 (33, 34). 249

Reanalysis of HNSCC dataset from Kürten et al. 250
From the publicly available scRNAseq data set on primary HNSCC tumors we downloaded the 251 FASTQ files of CD45-negative and HPV-negative libraries from the sequencing read archive 252 (SRA) under accession ID SRP301444 (15). The data sets were analyzed the same as 253 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint described above. HPV-negative samples were chosen for comparability to the OSCC dataset. 254 However, the HN07 tumor originated from the larynx, while all other samples indeed originated 255 from the oral cavity. We adjusted the cell identification thresholds based on our evaluation 256 criteria pooled for all libraries: cells were defined by having more than 175 genes expressed, 257 less than 10 % mitochondrial gene expression and more than 60 housekeeper genes 258 expressed (see Additional file 1). 259 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint

Single-cell gene expression signatures of tumor cells from a single metastasis show 261
several predominant, but not necessarily exclusive, functional phenotypes. 262 To avoid inter-patient heterogeneity as a confounding factor, we first focused on the analysis 263 of a single OSCC metastasis to develop hypotheses which would be subsequently tested in 264 the entire cohort. For this, we chose a metachronous lymph node metastasis that was removed 265 one year after the primary tumor, because we assumed that consolidation processes are of serine and glycine metabolism into glycolysis and therefore fuel glycolysis with amino acids 303 (38); hence, these cells appear to be adapted to low-glucose conditions. 304

OSCC cells in lymph node metastases undergo mesenchymal-epithelial transition 305
We next focused on possible dynamics within the predominantly EMP-related cancer cell 306 clusters using a higher resolved shared-nearest neighbor (SNN) clustering. We defined 4 307 pEMT clusters (pEMT-1 to 4), 4 clusters of more epithelial differentiated cells (epi-1 to 4) and 308 one cluster with mixed phenotypes (mix) (Figure 2A). pEMT-1 is enriched for genes involved 309 in coagulation and play a role in angiogenesis as THBS1, CYR61 and F3 ( Figure 2B, 310 Supplementary Figure 4A). pEMT-2 and 3 showed higher expression of extracellular matrix 311 (ECM) remodeling genes, but pEMT-2 also higher expressed cytokeratin KRT15 and 312 chemokine CXCL14 while pEMT-3 higher expressed the serine protease inhibitor SERPINA1 313 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint and podoplanin (PDPN) which mediates efficient ECM degradation by controlling invadopodia 314 (39). Of the more epithelial differentiated cell clusters, epi-2's expression profile is closest to 315 the pEMT cluster: higher expression of MMP1 and lower expression of SPRR1B and 316 S100A8/A9 than epi-1, 3 and 4 ( Figure 2B, E). Epi-3 showed higher expression of S100A7 and 317 KRTDAP, whereas epi-4 has higher expression of kallikreins (KLK6/7), prostate stem cell 318 antigen (PSCA) and adipogenesis regulatory factor (ADIRF). Both ADIRF and PSCA play a 319 role in prostate cancer and PSCA is also reported as highly expressed in mucosal tissue, but 320 less in HNSCC (40-42). 321 To gain a better understanding on the gene expression dynamics we estimated RNA velocity, 322 which predicts the short-term future development in gene expression of individual cells using 323 the ratio of spliced and non-spliced mRNA counts ( Figure 2C). This analysis revealed that

Intra-tumoral heterogeneity of OSCC is driven by EMP 338
To test whether our observation that OSCC cells in one lymph node metastasis undergo a 339 mesenchymal-epithelial transition is generally valid we extended our analyses by adding 5 340 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint primary tumors and 8 matched lymph node metastases from 6 patients (Table 1). From the 341 publicly available scRNAseq data set on primary HNSCC tumors published by Kürten et al. 342 (15) we chose to include the 9 HPV-negative primary tumors, which were all but one originating 343 from the oral cavity (HN07 originated from the larynx) in our analysis. In total, we analyzed 344 7,263 cancer cells from 16 different patients ( Figure 3A). Importantly, the frequency of cancer 345 cells was unevenly distributed across samples, which could not be explained by differences in 346 tumor cell content across samples as determined by histopathology (Supplementary Figure 1). 347 Since the recovery of cells by scRNAseq is also low, the stability of epithelial tumor cell 348 assemblies, which may not be sufficiently broken up by dissociation protocols, likely interfered 349 with the generation of OSCC single-cell suspensions. In addition, as expected from the inter- to interpret due to differences in cell numbers, and patient-specific gene expression patterns. 360 To confirm the validity of our scRNAseq data, we examined the bulk transcriptome and 361 deconvoluted the respective cell types for our 18 OSCC samples (Supplementary Figure 6). 362 Using this approach, we observed a higher tumor and stroma content and less lymphocytes, 363 which was expected as scRNAseq is known to be biased towards leukocytes. Therefore, we 364 chose an alternative approach: to compare the EMP-related intrapatient heterogeneity of all 365 analyzed tumor cells between patients, we calculated the similarities of the differential gene 366 expression within each patient of all respective clusters of all patients. We first considered each 367 patient individually and performed clustering, annotation, inferCNV and differential expression 368 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. patterns with predominantly immune-and metabolic-related clusters. In epithelial differentiated 376 cells, the most upregulated genes are S100A8 and S100A9, encoding calprotectin, and 377 SPRR1B, all members of the epidermal differentiation complex (44). Of note, hypoxia-and 378 stress-related heterogeneity is similar between patients suggesting a reactive response rather 379 than an aspect of tumor evolution. 380

EMT-related transcription factor ZEB1 is highly active in metastatic epithelial 381 differentiated OSCC cells 382
The transcription factors ZEB1/2, TWIST1/2, Snail (SNAI1) and Slug (SNAI2) are key for 383 regulation of EMP [7, 27]. While mRNA expression of SNAI2 within single OSCC cells was 384 recently reported, the others were not detected (8). Here, we confirm this observation as we 385 detected SNAI2 mRNA in almost half of the OSCC cells ( Figure 4A). However, detection of 386 low expressed genes such as transcription factors by scRNAseq, especially in 10X genomics 387 technology, becomes unreliable due to dropout effects. Also, the activity of transcription factors 388 is often not reflected by the dynamics of their mRNA expression alone, as their activity depends 389 additionally on protein stability and posttranslational modifications; for example, ZEB1 protein 390 is more stable than Snail (45). To circumvent this problem, we inferred the activity of these 391 transcription factors based on the mRNA expression profile of their target genes using the 392 algorithm VIPER with regulons defined by DoRothEA database (30-32). Using this approach, 393 we were able to detect high activities of ZEB1, ZEB2, Snail and Slug in OSCC cells of different 394 patients with varying EMP phenotype ( Figure 4B). Since, on the one hand, the observation that 395 epithelial differentiation in OSCC metastases is associated with higher activity of the EMT 396 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint activator ZEB1 was unexpected and, on the other hand, the method used to derive 397 transcription factor activities appears to maybe overestimate the activity for transcriptional 398 repressors, we addressed the plausibility of this observation ( Figure 4C, Supplementary Figure  399 7). Consistent with the fact that one of the main functions of ZEB1 is the downregulation of E-400 cadherin (46) (CDH1), OSCC cells generally showed low expression of CDH1 ( Figure 4B). activity, but also the co-localization of cornifin-B and ZEB1 protein expression in OSCC lymph 470 node metastases. Thus, although ZEB1 activity is crucial for the induction of the pEMT state, 471 it does not seem to completely prevent partial epithelial differentiation. Remarkably, no relevant 472 differences in CDH1 expression were detected between the different EMP states in the 473 metastatic OSCC lesions. Therefore, the epithelial differentiated phenotypes we observed 474 most closely correspond to a partial epithelial differentiation analogous with the observed 475 pEMT phenotypes. We speculate that the driving force behind this EMP-associated 476 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022.

Conclusions 481
Single cell transcriptomics reveals that heterogeneity within OSCC cells is dominated by EMP 482 differences resulting in distinct partial EMT and epithelial differentiated phenotypes. 483 Particularly the partial EMT phenotypes can be accompanied by features related to metabolic 484 adaptations, stress, and interaction with the immune system. These EMP phenotypes likely 485 endow capabilities that are essential for the different stages of the metastatic process, 486 including maintenance and cellular integrity. This could be a possible additional function of 487 ZEB1, as it is also expressed during progressive epithelial differentiation in OSCC metastases. 488 . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022.  CAV1  FST  MT1E  IFITM3  MT2A  MMP13  TXNDC17  MMP10  HIST1H4C  HMGB2  UBE2C  STMN1  CDKN3  CDK1  TYMS  RRM2  DUT  MKI67  S100A8  S100A7  KRTDAP  S100A9  SLPI  SPRR1B  KRT6B  FABP5  KRT16  SBSN  CXCL3  CXCL2  CXCL1  CXCL8  CCL20  IER3  TNFAIP3  NFKBIA  DUSP2  HSPA1A  NDRG1  SERPINE1  LAMB3  IGFBP3  TIMP3  P4HA1  SLC2A1  EGLN3  HSPG2  CA9  TRIB3  ASNS  ASS1  PSAT1  MTHFD2  SLC7A5  PHGDH  DDIT4  WARS  AREG  WFDC2  FOS  GLUL  CHCHD10  ISYNA1  SPRY1  CEBPD  HES4  JUN 4   THBS1  CYR61  F3  CAV1  FST  CXCL14  MMP10  MMP13  KRT15  MMP1  SERPINA1  MT2A  BGN  PDPN  TXNDC17  CDKN3  UBE2C  LGALS1  CENPW  HMGN2  HIST1H4C  HMGB2  UBE2C.1  STMN1  CDKN3.1  SPRR1B  S100A8  KRT16  S100A9  LYPD3  CTSC  FABP5  RHOV  LGALS7  MT1X  S100A7  KRTDAP  TMEM45A  PI3  SLPI  PSCA  ADIRF  KLK6 chr1  chr2  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chr10  chr11  chr12  chr13  chr14  chr15  chr16  chr17  chr18  chr19  chr20  chr21  chr22  . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is   Figure 2A and right side from patient HN01 depicted in E. We included only patients with more than 50 tumor cells.
. CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted October 2, 2022. ; https://doi.org/10.1101/2022.02.03.478962 doi: bioRxiv preprint