An international report on bacterial communities in esophageal squamous cell carcinoma

The incidence of esophageal squamous cell carcinoma (ESCC) is disproportionately high in the eastern corridor of Africa and parts of Asia. Emerging research has identified a potential association between poor oral health and ESCC. One proposed biological pathway linking poor oral health and ESCC involves the alteration of the microbiome. Thus, we performed an integrated analysis of four independent sequencing efforts of ESCC tumors from patients from high- and low-incidence regions of the world. Using whole genome sequencing (WGS) and RNA sequencing (RNAseq) of ESCC tumors and WGS of synchronous collections of saliva specimens from 61 patients in Tanzania, we identified a community of bacteria, including members of the genera Fusobacterium, Selenomonas, Prevotella, Streptococcus, Porphyromonas, Veillonella, and Campylobacter, present at high abundance in ESCC tumors. We then characterized the microbiome of 238 ESCC tumor specimens collected in two additional independent sequencing efforts consisting of patients from other high-ESCC incidence regions (Tanzania, Malawi, Kenya, Iran, China). This analysis revealed a similar tumor enrichment of the ESCC-associated bacterial community in these cancers. Because these genera are traditionally considered members of the oral microbiota, we explored if there is a relationship between the synchronous saliva and tumor microbiomes of ESCC patients in Tanzania. Comparative analyses revealed that paired saliva and tumor microbiomes are significantly similar with a specific enrichment of Fusobacterium and Prevotella in the tumor microbiome. Together, these data indicate that cancer-associated oral bacteria are associated with ESCC tumors at the time of diagnosis and support a model in which oral bacteria are present in high abundance in both saliva and tumors of ESCC patients. Longitudinal studies of the pre-diagnostic oral microbiome are needed to investigate whether these cross-sectional similarities reflect temporal associations.


INTRODUCTION
common-abundant bacterial genera between saliva-tumor pairs. Several genera 222 including Porphyromonas and Veillonella were at higher relative abundance in the 223 saliva, while Prevotella and Fusobacterium were enriched in the tumor microbiome 224 ( Figure 3C). Finally, the relative abundance of tumor-associated bacteria including 225 Fusobacterium, Prevotella,Selenomonas,Veillonella,Streptococcus,and 226 Campylobacter are strikingly similar between the microbiomes of tumor and oral pairs 227 ( Figure 3D). Altogether, these data support the hypothesis that there is an association 228 between the oral and tumor microbiome of ESCC patients in Tanzania. 229

DISCUSSION 231
This report provides an analysis of bacterial communities present in ESCC tumors from 232 nine countries from different regions of the world, analyzed in four independent 233 sequencing efforts. We found traditionally oral, cancer-associated, bacterial genera in 234 tumors from patients in Tanzania, Malawi, Kenya, China, and Iran. These results 235 provide evidence that these bacterial genera may be associated with ESCC in high-236 incidence regions. We also identified similar bacterial genera in ESCC tumors from low-237 incidence regions, although this finding is based on a small sample size and only one 238 sequencing cohort. Finally, in a sub-analysis of tumor and saliva pairs available from 239 Tanzania, we demonstrated that the synchronous collected saliva and tumor 240 microbiomes of ESCC patients are strikingly similar at the time of diagnosis; in 241 particular, we identified a specific correlation between the saliva and tumor relative 242 abundance of the bacterial genera Fusobacterium, Veillonella,Streptococcus,and 243 Porphyromonas, with Prevotella and Fusobacterium significantly enriched in the tumor 244 microbiome. 245 246 Many of the bacterial genera identified in this study have been previously implicated in 247 the carcinogenesis of gastrointestinal cancers. For example, studies have found that 248 oral microbiota including Fusobacterium, Prevotella,Selenomonas,Veillonella,249 Streptococcus, and Campylobacter can be used to distinguish individuals with colorectal 250 cancer from healthy controls (45), and that Fusobacterium nucleatum strains that 251 colonize the oral cavity and tumors of patients with colorectal cancer are identical in 252 some patients (46), raising the possibility that the oral cavity is a source of extra-oral 253 cancer microbiota. Our group has previously shown that Fusobacterium, Selenomonas,254 and Prevotella can be visualized invasively within colorectal tumors and liver 255 metastases (25). Fusobactium nucleatum has been previously identified in esophageal 256 cancers and is associated with shorter survival (47). Members of the genus 257 Porphyromonas have been previously observed invasively within ESCC tumors (29) 258 and have been reported to promote oral squamous cell carcinoma through a variety of 259 mechanisms (30,31). Campylobacter jejuni has been reported to promote 260 tumorigenesis in mice (32), and Streptococcus species have been identified in human 261 esophageal cancers (33). In addition, the striking association of Streptococcus bovis 262 with colorectal cancer has led to the recommendation that colonoscopy be performed 263 alternative hypotheses that warrant mention. For example, it is possible that the ESCC-277 associated bacterial genera simply represent common members of the esophageal 278 microbiome (50) and that the microbial populations we observed in these cancers are 279 not significantly different from those found in normal esophagus tissue. A limitation of 280 our study is a lack of normal esophageal tissue from ESCC cases or healthy controls in 281 these settings, which would allow us to address this possibility. Another possible 282 explanation is that ESCC tumors provide a favorable niche in which these bacteria are 283 sequestered and allowed to colonize due to the propensity of this disease to cause 284 malignant obstruction. Thus, it is plausible that ESCC-associated bacteria are not 285 necessarily promoting ESCC carcinogenesis but rather represent passengers resulting 286 from the sequestration of oral secretions proximal to an obstructing tumor. While the 287 previous association of these bacterial genera with other cancers is consistent with the 288 hypothesis that they influence carcinogenesis of ESCC, future studies are necessary to 289 identify which, if any, direct influences these bacterial genera have upon ESCC 290 carcinogenesis. Nevertheless, even if these bacterial genera do not have a role in 291 increasing ESCC risk, but arise at the time of disease onset, they may have an 292 important role to play as part of a non-invasive early-detection biomarker. Finally, a 293 concern of all microbiome analyses is that observed bacteria can be a consequence of 294 contamination at some step between tumor harvest and sequencing. While some TCGA 295 samples may be contaminated by Actinobacteria as previously noted, the presence of 296 Fusobacterium, Prevotella,Selenomonas,Veillonella,Streptococcus,and 297 Campylobacter in four independently collected cohorts indicates that these finding are 298 unlikely due to contamination. 299 300 While this study focused on the presence of bacteria with ESCC in high-incidence 301 regions, we found evidence of similar cancer-associated bacteria in tumors in patients 302 from low-incidence regions (USA, Ukraine, Vietnam, and Russia). A limitation of this 303 assessment is the small sample size (n=36) and reliance on a single TCGA cohort that 304 likely contains contaminants (42). Regardless, this finding does not exclude the 305 possibility that the microbiome could be a factor driving patterns of ESCC incidence. For 306 example, it is possible that the prevalence of ESCC-associated bacteria in people could 307 vary across regions, which in turn could drive these differing rates of ESCC incidence. 308 This is an important topic for future study. 309

310
We found that the structure of synchronous paired tumor and oral microbiomes were 311 strikingly similar. It is possible that this similarity is driven by transient contact of saliva 312 and its associated microbiome with the tumor (e.g., during swallowing or tumor 313 extraction). However, we found that only four of sixteen common-abundant bacterial 314 genera correlate in abundance between the tumor and oral microbiomes, suggesting 315 tumor-oral microbiome similarity is not driven exclusively by "in-trans" interactions 316 between the saliva and tumor. We also found that genera including Prevotella and 317 Fusobacterium are often specifically enriched in the tumor microbiome, supporting a 318 model where specific oral bacterial preferentially colonize the tumor. A caveat of this 319 study is that we infer oral bacterial populations from the saliva, despite diverse 320 communities of bacteria throughout the oral cavity (51). However, we do observe 321 Fusobacterium in the saliva despite its general association with periodontal plaques 322 (52), suggesting saliva is capable of detecting periodontal pathogens. Additionally, 323 because the samples studied here are from patients with late-stage disease, it is 324 possible that tumor-induced changes to upper-gastrointestinal physiology and 325 dysphagia symptom-induced major dietary changes could themselves alter the oral 326 microbiomes of these patients. The previous findings from the ESCCAPE studies in 327 Kenya and Tanzania (17,19) which found strong associations with dental staining (ORs 328 > 10) and for which photographic validation studies suggest that most dental staining 329 was not fluorosis, also point to a recent build-up of chromogenic bacteria. Studies of the 330 oral microbiome of patients at earlier stages of ESCC and in prospective studies are 331 necessary to address this possibility. We restricted our analysis to 21 tumor-oral pairs 332 that have a sufficient number of bacterial reads (at least 10,000). It is likely that 333 excluded samples are not molecularly distinct from included samples but that the 334 relatively low bacterial read counts in some tumors is simply reflective of low 335 sequencing depth. Bacterial abundance analyses and plotting were conducted in R (v3.5.1). To calculate 383 relative abundance at a phylogenetic level (e.g., phylum or genus), GATK-PathSeq 384 results were filtered for taxa at the level, and relative abundance was calculated for 385 each taxon as follows: (# of taxon reads)/(total # reads at the selected phylogenetic 386 level). The rows of all bacterial abundance heatmaps are arranged according to the 387 mean abundance across all samples. The sample order of relative abundance stacked 388 barplots were determined based on Fusobacterium genus relative abundance except 389 where noted. In Figure 2D, if any cohort contained more than 50 samples, 50 samples 390 were randomly selected for plotting. The distribution of relative abundances of genera of 391 interest in all samples can be found in Figure S2, where width of each violin represents 392 the relative distribution of observed bacterial relative abundance for all patients in each 393 patient cohort. 394 395 Jaccard distance between RNAseq and WGS data from each ESCC tumor was 396 calculated in R based on bacterial genera with at least 1% relative abundance. The 397 qualitative Jaccard index was used in this case because the comparison was between 398 DNA and RNA analytes which would not be expected to be quantitatively identical. Only tumor-saliva pairs from the MUHAS Tanzania cohort with at least 10,000 reads 402 mapped to the bacterial superkingdom were available for analysis. This resulted in a 403 total of 21 tumor-oral pairs. Bray-Curtis dissimilarity metrics between tumor-oral pairs 404 were calculated using the R package vegan (55). Figure 3A presents the Bray-Curtis 405 similarity (1 -Bray-Curtis dissimilarity), for each tumor-oral pair. 406

407
To determine the correlation between the relative abundance of specific genera 408 between tumor and saliva microbiomes, common-abundant genera that are at least 1% We thank Liu et al. (37)   Bacterial burden of ESCC tumors for each patient cohort. Units are bacterial reads per million human reads as determined by GATK-PathSeq analysis. Each dot represents one sample. Analyte type (RNA or DNA) and tumor type (ESCC or COAD) are indicated by color. C.
Shannon diversity of ESCC tumors for each patient cohort. Shannon diversity was determined for each sample at the genus level based on genera that are at least 1% relative abundance. Each dot represents one sample. Analyte type (RNA or DNA) and tumor type (ESCC or COAD) are indicated by color. D.
Heatmap describing the relative abundance of the ve top phyla sorted by average phylum relative abundance. Each column represents one sample. Rows represent the indicated phyla. Units are relative abundance. Samples from each cohort are WGS unless noted with "(RNA)", in which case they are RNAseq. Bacterial genera relative abundance of WGS data from the MUHAS Tanzania cohort. Each column represents a single sample. Samples are ordered by decreasing Fusobacterium relative abundance. Units are relative abundance of bacterial genus-mapping reads. Color indicates the genus, and seven genera are speci ed. Only patients with GATK-PathSeq analysis from both RNAseq and WGS tumor data are plotted (n=59). Columns are ordered by decreasing relative abundance of Fusobacterium genus reads. B.
Bacterial genera relative abundance of RNAseq data from the MUHAS Tanzania cohort. Each column represents a single sample. Here, column order is dictated according to the patient order in Figure 2A. Units are relative abundance of bacterial genus-mapping reads. Color indicates the genus, and seven genera are speci ed. Only patients with GATK-PathSeq analysis from both RNAseq and WGS tumor data are plotted (n=59). Samples are ordered in the same order as Figure 2A, which is by Fusobacterium genus relative abundance in the WGS data. C.
Jaccard index between RNAseq and WGS data of tumors from the MUHAS Tanzania cohort. For the "Paired by Sample" column, Jaccard indices were calculated only between the WGS and RNAseq data from the same tumor (n=59 comparisons). For the "Random Pairs" column, Jaccard indices were calculated between all possible WGS-RNAseq pairs independent of patient of origin to represent the expected random distribution of Jaccard indices (n=3,481 comparisons). Jaccard index was calculated from relative abundance at the genus level based on genera that are at least 1% relative abundance. The width of the violin represents the relative proportion of comparisons with each Jaccard index, and lines indicate 25th, 50th, and 75th percentiles. D.
Bacterial genera relative abundance of the remaining patient cohorts, including RNAseq and WGS data as indicated. Each column represents a single sample. Samples are ordered by decreasing Fusobacterium relative abundance within each patient cohort. Units are relative abundance of bacterial genus-mapping reads. Color indicates the genus, and seven genera are speci ed. Here, if there were more than 50 samples in a patient cohort, 50 samples were randomly selected for visualization. USA -United States, UA -Ukraine, RU -Russia.   Bray Curtis Similarity comparing tumor-saliva pairs from patients in the MUHAS Tanzania cohort. Analysis was restricted to the 21 tumor-saliva pairs that contained at least 10,000 bacterial reads. This analysis was conducted at the genus level and using relative abundance. For the "Paired by Patient" column, Bray Curtis Similarity was calculated only between the tumor and saliva WGS data from the same patient. For the "Random Pairs" column, Bray Curtis Similarity was calculated between all possible tumor-saliva pairs independent of patient of origin to represent the expected random distribution of Bray Curtis Similarity. (p=0.0003, Wilcoxon rank sum test). B.
Correlation between the relative abundance of common-abundant bacterial genera in paired saliva and tumor WGS data. Analysis was restricted to the 21 tumor-saliva pairs that contained at least 10,000 bacterial reads. Common-abundant bacterial genera are bacterial genera that are at least 1% abundance in at least 3 tumor-saliva pairs -16 bacterial genera made this cuto . Correlation represents a two-sided Pearson correlation. X-axis is the correlation coe cient, and Y axis is the correlation P-Value plotted on a log scale. C.
Enrichment of genera in the oral or tumor microbiome. Each row details one of the 16 common-abundant bacterial genera. Each row contains one data point per patient, for a total of 21 data points. The value of each point represents the di erence in the relative abundance of the speci ed genus in the tumor and oral microbiomes of one patient, with positive values indicating a genus is at higher relative abundance in a patient's tumor. For example, if a genus is at a relative abundance of 0.7 (70%) in the tumor and 0.3 (30%) in the saliva of a patient, the plotted value for that genus and that patient is 0.4. Curves represent the distribution of this relative abundance di erence across the tumor-oral pairs, with dots indicating individual tumor-oral pairs. Vertical red lines indicate quartiles. D.
Relative abundance barcharts of tumor-saliva pairs. Analysis was restricted to the 21 tumor-saliva pairs that contained at least 10,000 bacterial eads. Units are relative abundance of bacterial genus-mapping reads. Color indicates the genus, and seven genera are speci ed.  Boxplots indicating the number of GATK-PathSeq Human-mapped reads and GATK-PathSeq microbe-mapped reads for each patient cohort. Samples from each cohort are WGS unless noted with "(RNA)", in which case they are RNAseq. B.
Heatmap describing the relative abundance of the 15 top phyla sorted by average phylum relative abundance. Each column represents one sample. Rows represent the indicated phyla. Units are relative abundance. Samples from each cohort are WGS unless noted with "(RNA)", in which case they are RNAseq. C.
Heatmap describing the relative abundance of the 15 top genera sorted by average genera relative abundance. Each column represents one sample. Rows represent the indicated genera. Units are relative abundance. Samples from each cohort are WGS unless noted with "(RNA)", in which case they are RNAseq. D.
Boxplot representing the Shannon diversity of genera that fall within the phylum Actinobacteria for each patient in each cohort. Samples from each cohort are WGS unless noted with "(RNA)", in which case they are RNAseq.  Figure S2. Distribution of Fusobacterium, Selenomonas, Prevotella, Streptococcus, Porphyromonas, Veillonella, and Campylobacter relative abundance of genus reads for all samples in each study.
A. The distribution of the relative abundance of genus-mapping reads for seven selected genera in all studies. The width of each violin represents the proportion of samples which have the indicated relative abundance of each genus. In contrast to Figure 2D, which only plots up to 50 samples per study, this plot includes all patients. Samples from each study are WGS unless noted with "(RNA)", in which case they are RNAseq.