Abstract
Background Colorectal cancer is the second most deadly and third most common cancer in the world. Its development is heterogenous, with multiple mechanisms of carcinogenesis. Two distinct mechanisms include the adenoma-carcinoma sequence and the serrated pathway. The gut microbiome has been identified as a key player in the adenoma-carcinoma sequence, but its role in serrated carcinogenesis is unclear. In this study, we characterized the gut microbiome of 140 polyp-free and polyp-bearing individuals using colon mucosa and fecal samples to determine if microbiome composition was associated with each of the two key pathways.
Results We discovered significant differences between colon mucosa and fecal samples, explaining 14% of the variation observed in the microbiome. Multiple mucosal samples were collected from each individual to investigate the gut microbiome for differences between polyp and healthy intestinal tissue, but no such differences were found. Colon mucosa sampling revealed that the microbiomes of individuals with tubular adenomas and serrated polyps were significantly different from each other and polyp-free individuals, explaining 2-10% of the variance in the microbiome. Further analysis revealed differential abundances of Eggerthella lenta, Clostridium scindens, and three microbial genes across tubular adenoma, serrated polyp, and polyp-free cases.
Conclusion By directly sampling the colon mucosa and distinguishing between the different developmental pathways of colorectal cancer, this study helps characterize potential mechanistic targets and diagnostic biomarkers for serrated carcinogenesis. This research also provides insight into multiple microbiome sampling strategies by assessing each method’s practicality and effect on microbial community composition.
Introduction
Colorectal cancer (CRC) is the second most deadly and third most common cancer globally, accounting for over 900,000 deaths in 2020.1 The etiologies of CRC are multifactorial, with only 5-10% of cases being attributable to hereditary germline mutations.2 Significant risk factors include diets high in red meat and low in fiber, obesity, physical inactivity, drug and alcohol usage, and chronic bowel inflammation.3–6 Each of these factors is associated with compositional and functional changes in the collective community of bacteria, fungi, archaea, and viruses that inhabit the colon.7–10 Commonly referred to as the gut microbiome, this community of microorganisms has been identified as a potential regulator of CRC initiation and progression.
Colorectal polyp formation precedes cancer development and is influenced by various environmental factors and host genetics. Polyps most commonly progress into malignancy through the adenoma-carcinoma sequence.11 This pathway is characterized by chromosomal instability and mutations in the adenomatous polyposis coli (APC) gene, KRAS oncogene, and TP53 tumor suppressor gene.12 Alternatively, 15 to 30% of CRCs develop through the serrated pathway.13 This pathway is characterized by the epigenetic hypermethylation of gene promoters to produce a CpG island methylator phenotype.13 In addition to the epigenetic inactivation of tumor suppressor genes, BRAF or KRAS mutations are also common.13 The serrated pathway often results in the production of hyperplastic polyps (HPPs), traditional serrated adenomas (TSAs), and sessile serrated polyps (SSPs). Premalignant polyps from both pathways can be screened for and removed during colonoscopy to prevent CRC formation, but incomplete polyp resection or escaped detection can result in the development of interval cancers. Compared to other colorectal polyps, SSPs are disproportionately responsible for interval cancers, as their flat morphology makes them difficult to detect.14 Therefore, additional detection methods, such as SSP-specific biomarkers, would assist with CRC prevention.
One potential avenue for polyp-specific biomarker discovery is the gut microbiota. SSPs often overexpress mucin forming genes, like MUC6, MUC5aC, MUC17, and MUC2, producing a mucus cap, which may harbor mucin-degrading bacteria and other microbes.15 Finding microbiome alterations in patients consistent with the presence of SSPs would enable gastroenterologists to personalize their technique and screening frequency for these higher risk patients. Additionally, elucidating the microbiome alterations specific to the adenoma-carcinoma sequence and the serrated pathway would help better understand the mechanisms of how particular microbes, their metabolites, and dysbiosis may contribute to CRC carcinogenesis.
Studies comparing the microbiomes of these two pathways with healthy controls have yet to discover differences between healthy individuals and those with serrated polyps.16–18 One limitation of existing CRC research is the dominant use of stool samples for assessing the microbiome, which has been shown to reflect the bacteria found in the lumen of the colon.19 Mucosal samples, on the other hand, reflect the bacteria that are adherent to the epithelium and thus may be more clinically relevant in CRC pathogenesis.20 To investigate this, and the role of the microbiome in the adenoma-carcinoma sequence and serrated pathway, we used multiple sampling techniques to obtain microbiome samples from colorectal polyps. Samples were collected during and after colonoscopy from healthy individuals or those with tubular adenomas (TAs), HPPs, or SSPs. When possible, samples from the same individual were collected from polyps and the healthy colon epithelium opposite from these polyps. Stool samples were also collected 4-6 weeks after colonoscopy. We used a combination of amplicon (16S and ITS) and shotgun sequencing to characterize the microbial communities of samples. The purpose of our work was to: 1) investigate the feasibility and utility of using two novel methods of microbiome sampling during colonoscopy; 2) elucidate hyperlocal differences in the microbiome of the colon to find polyp specific biomarkers; and 3) elucidate changes in the microbiome specific to CRC precursors in the adenoma-carcinoma sequence versus the serrated pathway. Our key hypothesis was that there would be distinct differences between the microbiomes of individuals with serrated polyps versus TAs.
Methods and Materials
Subject Recruitment and Criteria
Individuals who presented for colonoscopy with indications of screening or a prior history of colorectal polyps were asked to participate in the study. Subjects who were pregnant, had taken antibiotics within 6 weeks of colonoscopy, or with known inflammatory bowel diseases, were excluded. In total, 140 individuals were recruited for this study.
Colonoscopy Preparation, Procedure, and Sample Collection
Before colonoscopy, subjects were asked to adhere to a clear liquid diet for 24 hours. Bowel cleansing was done using Miralax, or polyethylene glycol with electrolytes administered as a split dose, typically 12 and 5 hours before the procedure. Sample collection focused on two direct and two indirect microbiome sampling methods. The first direct sampling method involved brushing the mucosa of colon tissue during colonoscopy. Since mucosal brushes can potentially damage or agitate the intestine, we also employed a novel method of direct microbiome sampling in which colonoscopy washing fluid was sprayed directly on to the target mucosa and immediately re-suctioned into a storage vial. Participants with polyps had additional mucosal washing aspirates taken near the polyp, as well as brushings of the polyp and opposite wall of the polyp. When more than one polyp was found, the largest polyp was targeted for mucosal brushing and aspirate sampling. The first indirect sampling method involved collecting an aliquot of the post-colonoscopy lavage fluid. This lavage fluid was from the entirety of the procedure, thus reflecting the microbiome throughout the colon. Sampling during colonoscopy resulted in a total of 1,685 samples, which were collected in sterile cryogenic tubes and placed on ice until the colonoscopy procedure was finished. Afterwards, the samples were stored at -80°C. Additional data collected included indication for procedure, age, sex, ethnicity, BMI, family history, and findings, including the size, shape, location, and pathology of all polyps found or removed.
Collection of Fecal Samples
For the second indirect sampling method, subjects were encouraged to send follow-up fecal samples four to six weeks post-colonoscopy. Subjects who complied were compensated $20 USD. Subjects were provided with a fecal collection kit, which contained collection equipment, prepaid shipping labels, and Zymo DNA/RNA shield preservation buffer (R1101). Samples were returned via the United States Postal Service. After arrival, samples were stored at -80°C. A total of 38 fecal samples were returned.
Polyp and Subject Type classification
Polyp biopsies collected during colonoscopy were sent to a pathologist for classification. This information was then recorded for the corresponding mucosal brush aspirate samples. Polyp pathology reports were also used to broadly categorize samples by subject type. The three subject types were polyp-free subjects, TA-bearing subjects, and serrated polyp-bearing subjects, which included HPPs and SSPs. For example, if a sample was taken from healthy intestinal tissue of an individual who was found to have a TA, that sample and all others from the same individual would be included in the TA-bearing subject type. Three individuals had both a TA and an SSP and were classified as serrated polyp-bearing subjects.
DNA extraction
Colonoscopy and fecal samples were thawed on ice for DNA extraction. For mucosal aspirates and lavage aliquot samples, 250 uL of fluid were taken from each sample and then extracted using ZymoBiomics DNA Miniprep Kit (D4300) according to the manufacturer’s protocol. For mucosal brushes, 750 uL of ZymoBIOMICS™ Lysis Solution was mixed with the brushes in their original sterile cryogenic tube and vortexed for 5 minutes to suspend the contents of the brush into solution. The solution was then transferred and extracted according to the manufacturer’s protocol. Fecal samples stored in Zymo DNA/RNA shield were thawed, mixed by vortexing, and 750 uL of fecal slurry was extracted according to the manufacturer’s protocol.
Amplicon Library preparation and sequencing
To amplify the V4 region of the bacterial 16S rRNA gene, 0.5 ng of DNA template was combined with 9.5 uL of PCR grade water, 12.5 uL of 1x AccustartII PCR tough mix (QuantaBio), 1 ug of BSA, 0.2 uM of 926R reverse 16S primer, and 0.2 uM of barcoded, Illumina adapter sequence-tagged forward primer 515F in 25 uL reactions. Each sample was amplified for 30 cycles (94°C for 3 min; 94°C for 45 sec, 55°C for 30 sec, 72°C for 20 sec; repeat steps 2-4 30 times; 72°C for 10 min).
The ITS2 region of the fungal 18S rRNA gene was amplified using the methods described by Looby et al.51 Briefly, 0.5 ng of DNA template was combined with 9.5 uL of PCR grade water, 12.5 uL of 1x AccustartII PCR tough mix (QuantaBio), 1 ug of BSA, 0.3 uM of ITS9, and 0.3 uM of a staggered, barcoded reverse ITS4 primer for a total reaction volume of 25 uL. Samples were PCR amplified at 94°C for 5 min, 35 cycles of 95°C for 45 sec, 50°C for 1 min, 72°C for 90 sec, and a final extension step of 72°C for 10 min.
Both purified amplicon libraries were quantified using the Qubit dsDNA HS Assay Kit and pooled at equimolar concentrations. The pooled amplicon library was cleaned and concentrated using Agencourt AMPure XP beads (Beckman-Coulter, A63880) according to the manufacturer’s protocol. Equimolar PhiX was added at 10% final volume to the amplicon library and sequenced on the Illumina MiSeq platform, yielding 300bp paired-end sequences. An average of 41,578 +/- 35,920 reads per sample were obtained for 16S amplicons, while an average of 22,252 +/- 17,000 reads per sample were obtained for ITS amplicons.
Shotgun Library preparation and sequencing
Shotgun sequencing libraries were prepared using the Illumina Nextera Flex kit, using an adapted protocol as described by Bruinsma et al.52 Here, a maximum of 5 uL or 50 ng (whichever was reached first) of DNA from each sample was tagmented by Nextera bead-linked transposomes for 15 min at 55°C. Next, 1.25 uL of 1 uM forward and 1.25 uL of 1 uM reverse barcodes were added to each sample and annealed via PCR using KAPA HiFi DNA Polymerase (Roche Life Science). Samples were then pooled and cleaned of smaller fragments using the included sample purification beads according to the manufacturer’s protocol. The pooled sample libraries were quantified using Quanti-iT PicoGreen dsDNA (P7589) and ran on a bioanalyzer to check fragment size. Lastly, libraries were packaged on dry ice and shipped overnight to Novogene Corporation Inc. (Sacramento, CA) to be sequenced using Illumina’s Hiseq 4000 for 150 bp paired-end sequencing. An average of 1,267,359 +/- 690,384 reads per sample were obtained.
OTU Table generation
For amplicon data, sequences were processed using Qiime2-2019.1.53 Index sequences were extracted from reads and demultiplexed in the presence of quality filtering using DADA2 in QIIME2.54 Reads were filtered to remove chimeric sequences, and clustered into amplicon sequence variants (ASVs). Taxonomy was assigned with the Greengenes database (May 2013) for bacteria.55
For sequences produced by shotgun metagenomic sequencing, we first removed sequencing adapters, barcodes, and sequencing artifacts, followed by demultiplexing using BBMap.56 After demultiplexing, sequences were quality filtered so that the minimum quality score did not fall below 28 using PRINSEQ++.57 Removal of human-derived reads was performed in Bowtie2 by aligning samples to a reference human genome, hg38.58 Next, we used IGGSearch on the lenient preset to characterize the taxonomy of our quality-filtered sequences to generate an operational taxonomic unit (OTU) table.59
OTU Table analysis
Analysis of OTUs was performed using R 3.6.3. A sequencing control was included for each library to remove potential contaminants before analysis (ZymoBIOMICS #D6305). OTUs that appeared in the control, but which were not part of the microbial community standard were removed from the OTU table. To account for read-depth variability, 16S and ITS OTU counts were rarefied to 3,000 and 1,000 reads, respectively.60 For shotgun sequence data, 1,500 marker gene reads from each sample were randomly subsampled ten times and then averaged.
Alpha diversity for each OTU table was calculated using the diversity and specnumber functions in the Vegan v2.5-6 package.61 Significance of alpha diversity scores was tested using a linear-mixed effect model, adjusted for repeated measurements and sample plate effects using the nlme package, v3.1-148.62 The adonis function in Vegan was used to generate Bray-Curtis matrices and perform PERMANOVA analyses from OTU tables. Non-metric multidimensional scaling was performed using the metaMDS function in Vegan. SparCC was used for correlation analysis of OTUs using the fastspar and MakeBootstraps.py functions.63 Pseudo p1 values were calculated for sparCC correlations using the PseudoPvals.py script. Plotting for all analyses was done using ggplot2 v3.3.0.64
To identify differentially abundant microbes, we utilized Random Forest and ANCOM v2.1.65 Random forests were performed using the rfPermute v2.1.81 package in R. Random Forest analysis was performed to identify the top variables of importance for healthy to TA-bearing, healthy to serrated polyp-bearing, and TA to serrated polyp-bearing host comparisons. Significance testing for differentially abundant microbes was performed using Dunn’s Kruskal-Wallis on the top variables of importance. Microbial abundances were averaged for each subject to avoid repeated measurements, and p-values were adjusted for false discovery rate. Random Forest out-of-box accuracy was assessed by plotting Random Forest proximity and confidence scores, using the proximityPlot and plotConfMat function within rfPermute. ANCOM v2.1 was performed on OTU tables to identify differentially abundant microbes between sample types while accounting for multiple samples per individual.
Functional Microbiome analysis
Functional annotation of shotgun sequences was performed by cross-assembling samples into contigs using MEGAHIT v1.1.1.66 Contigs larger than 2,500 bp had open reading frames (ORFs) called by Prodigal v2.6.3.67 The resulting ORFs were functionally annotated using eggNOG mapper v2, using the eggNOG v5.0 database.68 Individual samples were aligned to annotated ORFs using bowtie2 v2.3.5.1 to obtain per-sample ORF abundances. Per sample ORF abundances were compiled into a single ORF abundance table using the pileup.sh script from BBMap v38.79. Principal coordinate analysis was performed by normalizing gene abundances using MicrobeCensus and the cmdscale function from vegan.69 KEGG BRITE annotations were used to visualize the relative abundances of functional hierarchies. DESeq2 v1.2.6 was used to identify differentially abundant microbial genes using the DESeq function in R.70 Proximal and distal mucosal aspirates were processed through DESeq2 separately to avoid repeated measurements. Differentially abundant genes that were not present in at least 10% of subjects were removed from the analysis. The taxonomy of differentially abundant genes was resolved by aligning their protein sequences to the NCBI nr database using blastp.
Results
Microbiomes of Mucosal and Lavage Samples are similar to each other but different from those in Feces
To determine whether microbiomes varied between sample types, mucosal brushes, mucosal aspirates, and lavage aliquots were collected from 140 individuals during colonoscopy; 38 of these individuals provided a fecal sample 4-6 weeks after their procedure (Figure 1). Using 16S amplicon sequencing on a subset of individuals, we observed no significant differences in Shannon diversity or richness across mucosal brushes, mucosal aspirates, and lavage aliquots (linear mixed effects model (LME): p > 0.05, Figure 2A). PERMANOVA analysis of Bray-Curtis dissimilarities revealed that the individual explained the greatest amount of variation in microbiome composition (R2 = 0.56, p = 0.001; Supplemental table 1). This analysis found no significant differences in the microbiomes associated with mucosal brushes, mucosal aspirates, and lavage aliquots from within the same individual (R2 = 0.12, p = 0.49; Supplemental table 1). The lack of significance was consistent with no discernable clusters based on sample type (Figure 2B). We were unable to identify any microbes whose abundances significantly differed across the three sampling methods, except for three ASVs; one from the Gemellaceae family and two Streptococcus spp., whose abundances were all higher in mucosal aspirates compared to mucosal brushes (ANCOM2: FDR < 0.05; Supplemental figure 1).
A total of 140 different individuals were recruited for this study, including 50 healthy/polyp-free individuals, 45 with tubular adenomas, and 33 with serrated polyps (HPP, TSA, or SSP). For the remaining 12 individuals, 11 had unknown pathology and one had an adenocarcinoma. Multiple samples were taken from each subject during colonoscopy. This included mucosal brushes (orange), mucosal aspirates (yellow), and lavage aliquots (purple). Fecal samples (brown) were collected from participants four to six weeks post-colonoscopy. DNA extraction and sequencing produced two sample sets. The first sample set had a median age of 60, a median BMI of 25, and was 57% male and 43% female. The second sample set had a median age of 65, a median BMI of 25, and was 50% male, 40% female, and 10% unknown/other. Some individuals appeared in both sample sets.
A and C) Shannon diversity and richness estimates across mucosal aspirates (yellow), mucosal brushes (orange), lavage aliquots (purple), and fecal samples (brown). The first sample set was sequenced using 16S sequencing (A), while the second sample set was sequenced using shotgun sequencing (C). B and D) Non-metric multidimensional scaling of Bray-Curtis dissimilarities in the first (B) and second (D) sample sets. Each point corresponds to a single sample with multiple samples per individual. The individual of origin is denoted numerically within each point. The number of samples per sample type and subject category are annotated parenthetically. Significant comparisons (p < 0.05) are denoted by an asterisk (*).
ITS2 sequencing was also performed on the same subset of samples to investigate the effect of sampling method on the fungal microbiome. We observed no differences in Shannon diversity or richness across mucosal brushes, mucosal aspirates, and lavage aliquots (LME: p > 0.05, Supplemental figure 2). Like 16S amplicon data, PERMANOVA analysis of Bray-Curtis dissimilarities showed that the individual explained the greatest amount of variation in fungal community composition (R2 = 0.28, p < 0.001), with no significant associations with fungal community composition and our three sampling methods (R2 = 0.38, p = 0.122; Supplemental table 3).
Following the collection of fecal samples, we performed shotgun sequencing on a second set of samples, representing 105 individuals. Mucosal brushes were excluded from the second sample set because a pilot shotgun sequencing run revealed these samples contained an large percentage of human-derived reads (Supplemental figure 3). Based on estimates of Shannon diversity and species richness, the microbiomes in fecal samples were significantly more diverse than those in the mucosal aspirates (LME: p = 0.007 and p = 0.002, respectively) and marginally more diverse than those in lavage aliquots (LME: p = 0.053 and p = 0.047, respectively; Figure 2C). With respect to microbial beta diversity, the individual explained the greatest amount of variation in microbiome composition (PERMANOVA: R2 = 0.75, p < 0.001; Supplemental table 4). In addition, fecal samples had microbial communities that were distinct when compared to mucosal aspirates and lavage aliquots from the same individual, with sampling method explaining 14% of variation in the microbiome (PERMANOVA: p = 0.001; Figure 2D). Fecal samples had a mean relative abundance of 63% for Firmicutes, 27% for Bacteroides, 3.5% for Actinobacteria, and 4.5% for Proteobacteria. Mucosal aspirates and lavage aliquots were more similar and had a mean relative abundance of 73% and 75% for Firmicutes, 15% and 11% for Bacteroides, 4.7% and 5.2% for Actinobacteria, and 4.0% and 6.6% for Proteobacteria, respectively (Supplemental figure 4). Differential abundance analysis of microbes revealed 44 OTUs that were significantly different between fecal samples and mucosal aspirates (ANCOM2: FDR < 0.05; Supplemental table 5). Six OTUs were differentially abundant between fecal samples and lavage aliquots (Supplemental table 6), and no OTUs were significantly different between mucosal aspirates and lavage aliquots (ANCOM2; FDR > 0.05).
The Microbiomes of Polyps and Opposite Wall Healthy Tissue are similar within Individuals
To identify any polyp-specific microbial biomarkers, 14 mucosal brush samples were collected from polyps and opposite wall healthy tissue and sequenced as part of the first sample set (Figure 3A). Based on 16S sequencing, we observed no significant differences in Shannon diversity or richness between polyp and opposite wall healthy tissue from within the same individual (Figure 3B). Similarly, there were no significant differences in beta-diversity across polyp and opposite wall healthy tissue pairs (PERMANOVA: R2 = 0.18, p = 0.62; Figure 3C; Supplemental table 7). We were unable to identify any differentially abundant microbes between polyp and opposite wall tissue brushes. Microbiomes were mostly individualistic, with subject origin explaining 53% of the variance in microbiome composition (PERMANOVA: p = 0.005; Figure 3D; Supplemental table 7). We detected significant associations between microbiome composition and colon side (right/proximal versus left/distal), with 16% of the variance in microbiome composition explained by the colon side (PERMANOVA: p = 0.003; Supplemental table 7). Significant associations within the microbiome were observed when both polyp and opposite wall tissue pairs were categorized by subject type, explaining approximately 10% of variance (PERMANOVA: p = 0.03; Supplemental table 7).
A) An illustration of the sampling strategy used to characterize the hyperlocal differences in the microbial community of polyps (red) and opposite wall healthy tissue (green). Brush samples were sequenced as part of the first sample set using 16S sequencing. B) Shannon diversity and richness estimates of mucosal brushes from polyp and opposite wall healthy tissue. C) Non-metric multidimensional scaling of Bray-Curtis dissimilarities of polyp and opposite wall healthy tissue brushes. The individual of origin is denoted numerically within each point. The shape of each point denotes the right (proximal) and left (distal) side of the colon. D) The relative abundance of the top ten microbial genera across all samples. Samples are grouped by each individual and labeled by polyp type, where tubular adenoma = TA, hyperplastic polyp = HPP, and sessile serrated polyp = SSP.
Tubular Adenoma-bearing, Serrated Polyp-bearing, and Healthy Individuals have distinct Microbiomes
We next analyzed all samples from the first and second sample sets to ask whether the type of polyp an individual had (subject type) was significantly associated with microbial diversity and composition. In both 16S and shotgun data, we observed no significant differences between polyp-free, TA-bearing, and serrated polyp-bearing samples based on Shannon diversity or richness estimates (LME: p > 0.05; Supplemental figure 5). In ITS data, we observed significantly increased Shannon diversity, but not richness, in samples from polyp-free individuals when compared to those from TA-bearing individuals (LME: p = 0.03; Supplemental figure 5). Beta diversity analysis of 16S data from the first sample set demonstrated that subject type explained 5% and 2% of the variance associated with the microbiome, respectively (16S PERMANOVA: p = 0.001; Supplemental table 1 and ITS PERMANOVA: p = 0.09; Supplemental table 3). A similar result was observed in second sample set mucosal aspirates. We found significant associations between the microbiome and an individual’s polyp type, explaining 2% of the variance observed (PERMANOVA: p = 0.001; Supplemental table 4). This association between microbiome composition and polyp type was only observed in samples obtained directly from the mucosa and not in lavage aliquots (PERMANOVA: p > 0.05; Supplemental table 8) or fecal samples (PERMANOVA: p > 0.05; Supplement table 9).
Plotting the microbial relative abundances from shotgun mucosal aspirates revealed elevations of Lachnospiraceae and depletions of Bacteroidaceae in TA-bearing individuals (Figure 4A). Random Forest classification of shotgun sequences from healthy and TA-associated mucosal aspirates resulted in an out-of-box accuracy of approximately 86%, suggesting distinct microbial compositions between the two groups (Figure 4B). The top OTUs of importance for this Random Forest classification included Ruthenibacterium sp., Tyzzerella HGM12567, Ruminococcus gnavus, Coprococcus comes, and Clostridium scindens (Figure 4E). Differential abundance analysis revealed an increased abundance of C. scindens in TA mucosal aspirates when compared to serrated polyps (Dunn’s Kruskal-Wallis: p-adjusted = 0.005) and healthy controls (Dunn’s Kruskal-Wallis: p-adjusted = 0.016). Tyzzerella HGM12567 was also significantly enriched in TA mucosal aspirates when compared to healthy aspirates (Dunn’s Kruskal-Wallis: p-adjusted = 0.01), but not serrated polyp aspirates (Dunn’s Kruskal-Wallis: p-adjusted = 0.07).
A) The top seven most abundant microbial families across all samples from the second sample set. The number of samples per sampling method and subject type is annotated parenthetically. B-D) Non-metric multi-dimensional scaling of random forest proximity scores per sample using only mucosal aspirates from the second sample set. Central points are colored by the known subject type. The outer circles represent predicted subject type as determined by the Random Forest. The out-of-box accuracy percentage for each Random Forest classification is embedded within each graph. E-F) The relative abundance of the top five variables of importance in second sample set mucosal aspirates for healthy vs. TA (E) and healthy vs. serrated (F) Random Forest comparisons. Significant comparisons (p-adjusted < 0.05) are denoted by an asterisk (*). G-H) The top ten most significantly correlated microbes for C. scindens (G) and E. lenta (H) as determined by sparCC correlation across all second sample set samples (pseudo-p1 < 0.05).
Despite more similar microbial compositions, Random Forest classification of healthy and serrated polyp mucosal aspirates resulted in an out-of-box accuracy of approximately 87% (Figure 4C). OTUs of importance for this Random Forest classification included Eggerthella lenta, Dorea longicatena, Anaerostipes hadrus, Ruminococcus obeum, and DTU089 HGM12731 (Figure 4F). Differential abundance analysis revealed a significant depletion of E. lenta in shotgun mucosal aspirates from serrated polyp-bearing individuals compared to healthy (Dunn’s Kruskal-Wallis: q = 0.004) and TA-bearing individuals (Dunn’s Kruskal-Wallis: q = 0.004). Lower abundances of E. lenta were also observed in serrated polyp mucosal aspirates from the first sample set, although the differences were not statistically significant (Supplemental figure 6).
To determine if the microbiome could be used to distinguish mucosal aspirates from TAs and serrated polyps, another Random Forest classification was performed, which produced an out-of-box accuracy of approximately 85% (Figure 4D). The top five variables of importance were E. lenta, C. scindens, Ruthenibacterium sp., Sellimonas sp., and UBA7182 HGM12585. SparCC correlation of E. lenta and C. scindens demonstrated that they significantly most positively correlated with one another (SparCC: pseudo-p1 < 0.05; Figure G and H).
Microbiome Functional Potential is distinct across Sampling Methods and Subject Types
The functional characteristics of the shotgun sequence data were next explored using KEGG and eggNOG annotations. Based on KEGG BRITE hierarchies, we observed a high degree of functional conservation across metagenomes, spanning both sample and subject types (Supplemental figure 7). A principal coordinate analysis of individual gene abundances revealed a distinct cluster of fecal samples when compared to mucosal aspirates and lavage aliquots. PERMANOVA analysis of gene abundances confirmed an association between functional microbiome capacity and sampling methods, significantly explaining 10.9% of the observed variance (PERMANOVA: p = 0.001; Supplemental table 10). In contrast, the individual of origin explained approximately 75% of the observed variance in the functional microbiome (PERMANOVA: p = 0.001; Supplemental table 10).
Subject type was also significantly associated with functional gene composition, explaining 1.2% of the observed variance (PERMANOVA: p = 0.001; Supplemental table 10). Differential abundance analysis revealed three microbial genes that varied across polyp-bearing and polyp-free controls. SdaA, a gene which encodes an L-serine dehydratase, was depleted in serrated mucosal aspirates when compared to polyp-free controls (Dunn’s Kruskal-Wallis: p-adjusted < 0.001) and TA mucosal aspirates (Dunn’s Kruskal-Wallis: p-adjusted < 0.001, Figure 5B). Taxonomic characterization of sdaA showed that it belonged to E. lenta (Blastp: E-value = 0.0, per. identity = 99.81%). Next, an alpha-galactosidase in the glycoside hydrolase (GH) family 31 was found to be depleted in serrated mucosal aspirates when compared to polyp-free mucosal aspirates (Dunn’s Kruskal-Wallis: p-adjusted = 0.035), but not TA mucosal aspirates (Dunn’s Kruskal-Wallis: p-adjusted = 0.22, Figure 5C). Lastly, the transporter-encoding tonB was enriched in polyp-free mucosal aspirates when compared to TA mucosal aspirates (Dunn’s Kruskal-Wallis: p-adjusted = 0.038), but not serrated mucosal aspirates (Dunn’s Kruskal-Wallis: p-adjusted = 0.065; Figure 5D). Both GH31 (Blastp: E-value = 0.0, per. identity = 99.92%) and tonB (Blastp: E-value = 0.0, per. identity = 100%) belonged to Phocaeicola massiliensis.
A) Principal coordinate analysis of gene Bray-Curtis dissimilarities across second sample set mucosal aspirates, lavage aliquots, and fecal samples. B-D) The abundance of DESeq2 normalized reads across each subject type in second sample set mucosal aspirates for the genes sdaA (B), GH31 (C), and tonB (D). The number of samples per sampling method and subject type are annotated parenthetically. Significant comparisons (p-adjusted < 0.05) are denoted by an asterisk (*).
Discussion
Sampling Method and Microbiome Characterization
In this study we used direct and indirect methods to sample the colon to characterize the microbiomes of healthy and colorectal polyp-bearing individuals. Using amplicon sequencing, we did not observe significant differences in the diversity or composition of microbiomes in samples obtained directly from mucosal brushes and mucosal aspirates. In contrast, fecal samples were significantly more diverse and compositionally distinct when compared to mucosal aspirates.
Due to their ease of collection, fecal samples are commonly used to study the human microbiome in the context of CRC. Fecal samples poorly represent the microbiota adherent to the colon mucosa, and instead capture those found in the intestinal lumen.19, 20 Additionally, they do not provide information on the biogeography of the microbiome within the gastrointestinal tract.21, 22 These limitations suggest stool samples are not ideal for studying the microbiota adherent to colorectal polyps, which have less robust microbial signatures of dysbiosis than adenocarcinomas and carcinomas, and therefore would benefit from more direct sampling methods.
Using stool samples, Peters et al. found significant associations between the microbiota and distal conventional adenoma cases, but not proximal, hypothesizing that stool samples were a poor proxy for measuring the proximal microbiota.16 Through direct sampling methods, we observed significant associations between the microbial composition and both proximal and distal TA cases using mucosal aspirates, supporting this hypothesis. Peters et al. also did not find significant differences in microbial diversity or composition between healthy controls and serrated polyps, which predominantly develop in the proximal colon.16 Using mucosal aspirates, but not fecal samples, we found compositional differences between serrated polyp cases and healthy controls. Together, these data suggest that mucosal brushes or aspirates, but not fecal samples, are sensitive enough to study the microbiome of colorectal polyps found within the proximal colon. These results contradict a study published by Yoon et al., who did not find significant compositional differences the in mucosa-associated gut microbiota among polyp-free, conventional adenoma, SSP, and CRC bearing individuals.17 The authors did note, however, that this result was likely driven by the small sample size of the study, with only 6 samples per group, and 24 samples total. In contrast, we managed to find significant compositional differences in the mucosa-associated microbiome which could be used to distinguish individuals with and without polyps using more samples.
Compared to mucosal brushes, mucosal aspirates had a lower risk of damaging the host epithelium, provided larger collection volumes for downstream sample processing, and resulted in a lower proportion of human derived reads with shotgun sequencing. Because of these advantages, we recommend using mucosal aspirates rather than mucosal brushing for characterizing the microbiomes of colorectal polyps.
Hyperlocal Microbiome Comparisons
Although the sample size was limited, direct sampling of polyp mucosa with brushes revealed no differences in the hyperlocal microbiome of polyp tissue versus opposite wall healthy tissue. As a result, we were unable to identify microbial biomarkers specific to SSPs or other polyp types. One factor which could have disrupted the potential hyperlocal differences in the gut microbiota is the colonoscopy preparation and lavage. As part of the preparation, individuals were advised to adhere to a low fiber, clear liquid diet 24 hours prior to colonoscopy. Dietary fiber is important in maintaining the longitudinal and lateral organization of the gut microbiota within the colon, as mice on a low fiber diet show disrupted microbial organization.19 Changes in diet can rapidly shift the composition of the gut microbiome, often within 24 hours, in both humans and mice.7, 23, 24 A laxative-based cleansing and colonoscopy rinse was also performed, potentially obscuring the hyperlocal organization further. Nevertheless, significant compositional differences between the microbiomes of samples taken from the proximal and distal colon were observed, suggesting that broad microbial organization remained present in the gut after colonoscopy preparation and lavage.
Microbiome Signatures of the CRC Carcinogenesis Pathways
Compositional differences were observed in the microbiome across TA-bearing, serrated polyp-bearing, and polyp-free individuals using mucosal aspirate sampling, but not fecal sampling. Based on mucosal aspirates, the microbiomes of serrated polyp-bearing individuals more closely resembled those of polyp-free controls rather than those from TA cases. Despite this, we still found significant differences in the microbial composition of serrated and polyp-free mucosal aspirates. These findings may suggest that the gut microbiome functions differently across the adenoma-carcinoma sequence and the serrated pathway. In the adenoma-carcinoma sequence, the gut microbiome exists in, and potentially contributes to, an inflammatory environment known to promote CRC development. For example, enterotoxigenic Bacteroides fragilis produces a metalloprotease that causes oxidative DNA damage and cleaves the tumor suppressor protein, E-cadherin.25–27 Another CRC-associated microbe, pks+ E. coli, synthesizes colibactin that induces double stranded DNA breaks.28.29
Although we did not find any significant increases in the relative abundances of these CRC-associated microbes in our data, we did see an increased abundance of C. scindens and Tyzzerella using mucosal aspirates in TA-bearing individuals. No dietary information was collected in this study, but it is possible that the increased abundance of C. scindens was influenced by a high fat diet. High fat diets stimulate increased primary bile acid concentrations above normal physiological concentrations.30 Gut microbes, like C. scindens, can 7α-dehydroxylate excess primary bile acids not absorbed by the small intestine into carcinogenic secondary bile acids.31, 32 High concentrations of bile acids can cause oxidative stress, nitrosative stress, DNA damage, apoptosis, and mutations.30 Secondary bile acids also act as farnesoid X receptor antagonists, resulting in enhanced wnt signaling during the adenoma-carcinoma sequence.33 With respect to Tyzzerella, this microbe has been found to be elevated in APC min+/− mice, which develop colitis-associated CRC, when compared to wild-type mice.34 We also observed an increasing trend of R. gnavus in TA-associated mucosal aspirates, which has been previously associated with CRC and inflammatory bowel disease.35, 36
Another CRC-associated microbe is Fusobacterium nucleatum, which can activate Wnt signaling by binding to host E-cadherin using its FadA adhesin to facilitate CRC development in the adenoma-carcinoma sequence.37 As reviewed in DeDecker et al, F. nucleatum has also been implicated in the serrated pathway through its association with serrated pathway lesions and features, such as mismatch repair deficiency, MLH1 methylation, CpG island methylator phenotype, and high microsatellite instability.38 A study by Rezasoltani et al. quantified the fecal abundance of F. nucleatum and seven other CRC-associated microbes across TA, villous/tubulovillous, HPP, SSP, and polyp-free individuals using qPCR.18 Elevated levels of F. nucleatum, E. faecalis, S. bovis, enterotoxigenic B. fragilis, and Porphyromonas spp. were observed in TA and villous/tubulovillous groups, but not polyp-free, HPP, and SSP cases.18 Here, we did not find differences in F. nucleatum abundances across HPPs, SSPs, TAs, and polyp-free controls. Instead, we found that E. lenta was depleted in mucosal aspirates from serrated polyp bearing individuals.
Within the colon, E. lenta metabolizes inert plant lignans into bioactive enterolignans, such as enterolactone and enterodiol.39 These enterolignans have anti-proliferative and anti-inflammatory effects, and help modulate estrogen signaling, lipid metabolism, and bile acid regulation.40 They have also been associated with reduced cancer risk.40, 41 Interestingly, the sdaA gene, which was found to be depleted in serrated mucosal aspirates, was identified as belonging to E. lenta. This gene catalyzes the conversion of L-serine into pyruvate and ammonia during gluconeogenesis. It is not clear whether this enzyme is involved in enterolignan production in E. lenta, but sdaA in Campylobacter jejuni is required for avian gut colonization.42 Further investigation is needed to determine sdaA is necessary for E. lenta colonization in the human gut.
Another microbial gene that was depleted in serrated mucosal aspirates was an alpha-galactosidase in the glycoside hydrolase family 31 from Phocaeicola massiliensis, formerly named Bacteroides massilliensis. These enzymes are carried by microbes to digest the glycosidic linkages that join plant fibers. P. massiliensis has been shown to utilize starch and porcine mucin O-linked glycans as sole carbon sources.43 Diets rich in plant fiber, like starch, have been associated with decreased CRC risk.6, 44 Fiber is fermented by the intestinal microbiota to produce short chain fatty acids, including acetate, butyrate, and propionate. Butyrate is the primary energy source for colonocytes and has anti-inflammatory and anti-tumor properties.45–47 Butyrate also is involved in the epigenetic expression of genes as a histone deacetylase inhibitor.48 In the serrated pathway, the gene SLC5A8, which mediates short chain fatty acid uptake into colonic epithelial cells, is frequently inhibited via promoter methylation, suggesting that dietary fiber may be required for proper cellular epigenetic regulation.49 Taken together, we hypothesize that decreased abundances of E. lenta, sdaA, and GH31 in serrated polyp samples result from low-dietary fiber consumption, which leads to subsequent epigenetic modifications within colonocytes to promote serrated carcinogenesis. Nevertheless, more research is needed to elucidate the role of the gut microbiome during serrated carcinogenesis.
TonB from P. massiliensis is another gene which we observed to be increased in polyp-free controls. TonB transporters allow microbes to actively transport essential micronutrients, such as iron. Diets high in red meat consumption are rich in iron and have been associated with increased CRC risk.6 It is possible that our polyp-free controls consumed less red meat, thus necessitating more tonB transporters. This contradicts a recent metaproteomic analysis of stool samples from healthy individuals and those with CRC, which found an increase of tonB in the CRC group.50
Conclusion
The complex and individualistic nature of the human gut microbiome has made it difficult to mechanistically link the microbiome with CRC carcinogenesis. By describing the association between the gut microbiota and serrated polyp development, our study aims to elucidate mechanistic targets for the epigenetic-based serrated pathway to CRC. In addition, our data underscores the importance of distinguishing between different pathways of colorectal carcinogenesis when investigating the gut microbiome. Finally, transitioning future microbiome studies to direct sampling methods will enable the discovery of previously unassociated microbes and novel mechanistic targets as demonstrated here.
Declarations
Author contributions
Katrine L. Whiteson and William E. Karnes devised the study design with support from Lauren DeDecker. Subject recruitment was performed by William E. Karnes and Zachary Lu. Sample collection was performed by William E. Karnes, Lauren DeDecker, and Zachary Lu with guidance from Katrine L. Whiteson. Julio Avelar-Barragan, Bretton Coppedge, and Zachary Lu processed samples for data acquisition. Julio Avelar-Barragan performed the data analysis and wrote the manuscript with guidance from Katrine L. Whiteson.
Ethics approval
This study was approved by the Institutional Review Board (IRB) of the University of California, Irvine (HS# 2017-3869).
Funding details
This study was funded by institutional research grant #IRG-16-187-13 from the American Cancer Society.
Disclosure of interest
The authors declare no competing or conflicts of interest.
Data availability statement
Bash and R code for data processing and analysis is available on GitHub at: https://github.com/Javelarb/ACS_polyp_study. Additional data and materials are available upon reasonable request.
Data deposition
Sequencing data is available on the Sequence Read Archive under the BioProject ID, PRJNA745329.
Abbreviations
- CRC
- Colorectal cancer
- APC
- Adenomatous polyposis coli
- HPP
- Hyperplastic polyp
- TSA
- Traditional serrated adenoma
- SSP
- Sessile serrated polyp
- TA
- Tubular adenoma
- OTU
- Operational taxonomic unit
- ASV
- Amplicon sequence variant
- ORF
- Open reading frame
- LME
- Linear mixed effects model
- GH
- Glycoside hydrolase
Acknowledgements
We would like to thank Claudia Weihe and Jennifer B.H Martiny for allowing us to borrow laboratory equipment and giving insightful feedback, Andrew Oliver and Jason A. Rothman for their bioinformatic expertise, Clark Hendrickson for his assistance with sample preparation, and Heather Maughan for her helpful edits and suggestions.