Abstract
Long-term colonization of the gut microbiome by carbapenemase-producing Enterobacteriaceae (CPE) is a growing area of public health concern as it can lead to community transmission and rapid increase in cases of life-threatening CPE infections. Leveraging the observation that many subjects are decolonized without interventions within a year, we used longitudinal shotgun metagenomics (up to 12 timepoints) for detailed characterization of ecological and evolutionary dynamics in the gut microbiome of a cohort of CPE-colonized subjects and family members (n=46; 361 samples). Subjects who underwent decolonization exhibited a distinct ecological shift marked by recovery of microbial diversity, key commensals and anti-inflammatory pathways. In addition, colonization was marked by elevated but unstable Enterobacteriaceae abundances, which exhibited distinct strain-level dynamics for different species (Escherichia coli and Klebsiella pneumoniae). Finally, comparative analysis with whole genome sequencing data from CPE isolates (n=159) helped identify sub-strain variation in key functional genes and the presence of highly similar E. coli and K. pneumoniae strains with variable resistance profiles and plasmid sharing. These results provide an enhanced view into how colonization by multi-drug resistant bacteria associates with altered gut ecology and can enable transfer of resistance genes, even in the absence of overt infection and antibiotic usage.
Introduction
The global dissemination of antibiotic resistance genes among pathogenic bacteria is a major public health problem that, if left unaddressed, would lead to reduced efficacy of current treatment options, elevated treatment costs, and increased mortality1. A particular area of concern is the spread of carbapenemase-producing Enterobacteriaceae (CPE)2,3, with their ability to degrade carbapenems often acquired in gram-negative bacteria from plasmids with carbapenemase genes4,5, thus rapidly endangering the utility of these antibiotics of last resort6,7. In addition to causing life-threatening infections, asymptomatic colonization of CPE in the human gut is increasingly common8,9, creating a reservoir for transmission of antibiotic resistance10. While prior CPE studies have focused on epidemiology3,11 and molecular aspects7,12,13, the natural history of gut colonization including ecological and evolutionary changes linked to antibiotic resistance transmission or CPE decolonization remain unexplored.
In recent years, studies into host-microbiome-pathogen interactions have provided important insights into pathogenesis14, immune response15 and treatment avenues16 for various viral and microbial pathogens. These studies typically leverage metagenomic approaches to track microbial community composition over time and understand ecological responses to overt infection16,17. As microbial populations often have rapid turnover, whole-genome sequencing of pathogenic isolates has been used to study intra-host evolution during chronic infections, identifying key enzymes for host adaptation and colonization18,19. Alternatively, deep shotgun metagenomic sequencing can simultaneously reveal nucleotide level variation for many bacterial species of interest20,21, shedding light on strain-level dynamics in the community. This approach has been used to study stable microbiomes in healthy individuals as well as dynamic changes during fecal microbiota transplantation22. Asymptomatic gut colonization of CPE strains presents a unique opportunity to study an intermediate phenomenon i.e. strain competition with commensals, and associated ecological and evolutionary adaptations, in the absence of an overt infection or disease.
Here we conducted longitudinal gut microbial analysis for a cohort of index subjects (n=29, CPE colonized at recruitment) and their family members (n=17, not CPE colonized) with up to 12 time points over the duration of a year, to obtain multiscale23 (microbiome composition, strains and gene-level) characterization of ecological and evolutionary changes during CPE colonization. Based on deep shotgun metagenomic sequencing of stool DNA, we observed distinct ecological shifts marked by recovery of diversity and key commensals in association with CPE decolonization. CPE colonization was marked by elevated but unstable Enterobacteriaceae abundances, which exhibited specific dynamics at the strain-level for different species (Escherichia coli and Klebsiella pneumoniae). Comparative analysis with whole genome sequencing data from CPE isolates (n=159) helped identify the presence of highly similar E. coli and K. pneumoniae strains with variable resistance profiles and plasmid sharing. These results provide an enhanced view into how colonization by multi-drug resistant bacteria associates with altered gut ecology and can enable transfer of resistance genes, even in the absence of overt infection and antibiotic usage.
Results
CPE colonization is associated with ecological shifts that are resolved during recovery
Leveraging the observation that CPE carriage in hospital patients can be resolved within 3 months, with 98.5% probability within a year for our cohort24 (though other cohorts have reported longer durations25,26), we tracked gut microbiome composition in this cohort of individuals for a year to understand ecological changes associated with decolonization (up to 12 timepoints, 361 samples in total; Table 1, Supplementary File 1). Specifically, stool samples were obtained from hospital patients who screened positive for CPE carriage (n=29, index subjects), as well as their non-CPE-colonized family members (n=17, serving as home environment-matched controls) and characterized via deep shotgun metagenomic sequencing (>50 million Illumina 2×100bp reads, on average; Methods). Principal coordinates analysis with average-linkage clustering based on taxonomic profiling of the data showed that there are multiple distinct community configurations (I, II, III, IV), where CPE positive samples (based on stool culture and qPCR24) were less commonly seen in configurations I and II, and more commonly seen in configurations III and IV (Figure 1a, Supplementary Figure 1, Supplementary File 2). This statistically significant shift of CPE positive samples along PCoA1 (Wilcoxon rank-sum p-value<1.4×10−8, Supplementary Figure 2a) is defined by a gradient of relative abundances that are most strongly correlated for the genera Escherichia (negative i.e. more abundant in configuration IV samples) and Bacteroides (positive; Supplementary Figure 2b). A similar shift was observed when comparing taxonomic profiles for configuration IV versus configuration I microbiomes (Supplementary File 3). Interestingly, while configuration IV has no microbiomes from family members, a few CPE negative samples from index subjects also cluster here.
Grouping timepoints based on their proximity in time to CPE clearance, highlighted that while CPE positive samples have the lowest average diversity27, there is a gradual increase in diversity around the time of decolonization and post decolonization, with diversity reaching the higher levels seen in family members after 2 months (Figure 1b). This pattern was seen even after accounting for potential confounding factors including antibiotic usage, hospitalization status, multiple timepoints for an individual, gender and ethnicity in a linear mixed-effects model (Supplementary Figure 3; Methods). We investigated if colonization of Enterobacteriaceae species alone could explain these changes by computationally subtracting all of them from taxonomic profiles and recomputing diversity metrics. We noted that both genus-level richness and Shannon diversity consistently preserved the trend of increasing during and after decolonization (Supplementary Figure 4), suggesting that these observations do not have a simplistic explanation due to CPE colonization, and point to a more pronounced shift in the microbiome.
The temporal shifts in diversity during CPE colonization were also reflected in terms of overall similarity among microbiomes, with Bray-Curtis distances (genus-level) to family members being highest in CPE positive samples, gradually reducing during and post de-colonization towards baseline values seen among family members (Supplementary Figure 5). These results highlight the ecological shift associated with CPE colonization that largely resolves post decolonization, but might have residual effects in some individuals.
To further probe into key bacterial species associated with CPE colonization we conducted differential abundance analysis based on CPE status (Methods, Supplementary File 2). While most Enterobacteriaceae species were not differentially abundant, Klebsiella pneumoniae3,7 had one of the strongest associations with CPE positive status (Figure 1c). In addition, only one other species (Bifidobacterium breve) was significantly enriched in CPE positive samples, while 7 other species were significantly depleted relative to CPE negative samples. These included several important commensal species that are known to help reduce gut inflammation through diverse pathways, including Bacteroides dorei (by decreasing gut microbial lipopolysaccharide production28), Faecalibacterium prausnitzii (through butyrate production29) and other Bifidobacterium species (bifidum and pseudocatenulatum, via inhibition of NF-κB activation30), and may thus play a role in suppressing Enterobacteriaceae growth and CPE colonization31.
Pathway analysis based on differential abundance as a function of CPE status providing further supporting evidence that key inflammatory pathways (e.g. sulfate reduction) are enriched during CPE colonization, indicating that they may play a role in the process (Supplementary Figure 6, Supplementary File 4). In addition, pathways related to aerobic respiration and oxidative phosphorylation (e.g. pentose phosphate pathway) were also more abundant during CPE colonization consistent with a model of oxygenation of the gut as proposed by Andreas Baumler and Sebastian Winter32. In particular, these results were recapitulated after removal of Enterobacteriaceae species from functional profiles, highlighting that they are not directly explained by CPE colonization and have substantial contributions from other species as well (Supplementary Figure 7). Microaerophilic niches for Enterobacteriaceae species due to antibiotic treatment could provide another potential explanation33, as antibiotic usage was common in this study (before ∼25% of sampled timepoints, Supplementary File 1). As expected, while antibiotic resistance and carbapenemase genes were enriched in gut microbiomes for CPE positive timepoints, no significant differences were observed between index subjects and family members at other timepoints (Supplementary Figure 8).
While Enterobacteriaceae species were enriched overall in CPE positive samples relative to CPE negative samples (Wilcoxon rank-sum p-value<3.5×10−5), index subjects at CPE negative timepoints also showed significantly enriched relative abundances compared to family members (Wilcoxon rank-sum p-value=0.01, Supplementary Figure 9). In addition, the composition of Enterobacteriaceae species varied across individuals with Escherichia coli and Klebsiella pneumoniae being the most common species, but other Escherichia, Klebsiella, Enterobacter and Proteus species also being moderately abundant across some individuals and timepoints (Supplementary Figure 9). Of note, while several Enterobacteriaceae species exhibited high abundance across individuals, these did not necessarily correspond to the CPE species colonizing a subject (e.g. subject 0505-T in timepoints 1-3). In addition, we observed rapid shifts in Enterobacteriaceae profiles (e.g. in 0457-T and 0512-T at timepoint 6) and overall higher variation in Enterobacteriaceae abundances across timepoints in index subjects (Wilcoxon rank-sum p-value<0.05; Figure 1d, Supplementary Figure 9). Together these results indicate that CPE colonization may be maintained by a altered, dynamic pro-inflammatory microenvironment that supports Enterobacteriaceae species, which is resolved in association with recovery of microbiome diversity and function34.
Distinct strain-level dynamics of Enterobacteriaceae species in the gut microbiomes of index patients and family members
We next analyzed the deep shotgun metagenomic sequencing data at a higher resolution looking for within-species strain-level dynamics across individuals for the two most prevalent Enterobacteriaceae species (E. coli and K. pneumoniae). Read mapping to reference genomes was used to call high-confidence single-nucleotide variants, and modes in allele frequency distributions were used to infer the number of strains present using a classical approach in population genetics20,35 (Methods, Supplementary Figure 10). For 53% of the samples (63% for E. coli, 38% for K. pneumoniae) where a species was confidently detected (relative abundance >0.1%), read coverage was sufficient to identify strain variation (one, two or multiple strains, otherwise classified as low coverage; Figure 2a, 2b, Supplementary Figure 11). Overall, as expected for a gut commensal36,37, E. coli was found at comparable frequencies in index subjects (86%) and family members (90%), and was also more frequently detected in gut microbiome samples overall relative to K. pneumoniae (Fisher’s exact p-value<5×10−20, Figure 2a). K. pneumoniae was, however, more frequently found in index subjects (70%) relative to family members (39%), consistent with the hypothesis that a distinct pro-inflammatory environment might be facilitating colonization in these individuals (Fisher’s exact p-value<2×10−9, Figure 2b).
In terms of strain variations, of the samples that were assigned a classification, we noted that E. coli was frequently observed as a single distinct strain in the gut microbiome of index patients (44%) while family members more often had multiple strains (44%; Supplementary Figure 11), suggesting that a single clone may often dominate in a pro-inflammatory environment. A few individuals also maintained a single strain state over the course of several months (up to a year, e.g. 0505-T, 1667-T, 0506-T) indicating that this can be a stable state for some individuals (Figure 2a). For K. pneumoniae, despite being detected more sporadically in index subjects and family members, the multi-strain state was the more common observation (49%), consistent with the hypothesis that even in a pro-inflammatory environment no distinct clone will typically outcompete others38 (Figure 2b, Supplementary Figure 11). Overall, in agreement with our previous observations (Figure 1d, Supplementary Figure 9), we noted that strain compositions were highly variable for these Enterobacteriaceae species over time.
Capturing transition frequencies between various strain compositions as a first-order Markov model (maximum likelihood with Laplace smoothing), we noted distinct patterns for E. coli and K. pneumoniae, as well as between index subjects and family members (Figure 2c, Methods). For example, E. coli colonization is more likely to stay in a single strain state for index patients (69%), relative to family members (57%), as well as relative to single strain K. pneumoniae colonization (52%, Figure 2c). Also, when E. coli is not detected, this state is more likely to be maintained in index subjects (47%) than in family members (22%, Figure 2c). Overall the Markov model predicts that E. coli in index subjects and family members tend to be in the one strain state (41% for subjects, 39% for family members). In contrast, K. pneumoniae frequently converges to the not detected state in subjects (42%) and in family members (75%). Grouping various classes of detection and strain status in different ways, we then tested if index subjects and family members show different transition probabilities in E. coli or K. pneumoniae (Figure 2d). For E. coli, transition probabilities were not significantly different between index subjects and family members (Fisher’s exact p-value>0.05, Figure 2d). In contrast, driven by the stark detected/not detected patterns seen for K. pneumoniae, index subjects had significantly different transition probabilities compared to family members for various groupings that involve the not detected state (“All classifications”, “Detected vs not detected” and “Fixed vs variable strains”, Fisher’s exact p-value<10−2, Figure 2d). These results further highlight the differences in strain-level dynamics for Enterobacteriaceae species in the potentially pro-inflammatory gut microbiome milieu of index subjects.
Sub-strain variation and plasmid sharing in Enterobacteriaceae species in relation to CPE decolonization
Samples that were determined to have a single-strain can nevertheless exhibit sub-strain variation in relation to this genomic background, similar to quasi-species diversity in viral populations. Characterizing the distribution of such intra-host variations across genes can help identify adaptive changes that may be important for CPE colonization, similar to recent studies with mouse models and strain isolates39,40. To analyze this standing variation in Enterobacteriaceae species, we identified low-frequency (<50%) single-nucleotide variants in single-strain timepoints (30,155 and 13,061 SNVs for E. coli and K. pneumoniae, respectively), and analyzed them for protein function altering changes to identify potential adaptive changes in the genomes of Enterobacteriaceae strains during gut colonization (Supplementary File 5, Methods). In total we found 5,919 and 1,787 putative function altering changes in E. coli and K. pneumoniae, including several in key polysaccharide utilization and virulence (e.g. lacZ, lacY, ECIAI39_4258 [Putative invasin/intimin protein]) similar to what has been described based on isolate sequencing as being key genes undergoing selection for colonization of the human gut41,42 (Table 2). In particular, we visualized function-altering SNVs in genes implicated in polysaccharide utilization, where adaptive mutations can reflect pressures to make use of polysaccharides derived from the host diet, to identify several structural motifs that might be key to their function (Supplementary Figure 12). Consistent with the fact that we are studying low-frequency SNVs, we noted that most regions bear signatures of purifying selection for these SNVs (dN/dS<0.5, Supplementary Figure 12), though overall the identified genes were significantly enriched relative to the genome-wide average for non-synonymous SNVs (Table 2).
To study these variations further in relation to CPE decolonization, the time-series information was used to cluster SNVs that co-vary (Methods). Interestingly, in some subjects multiple clusters were revealed by this analysis, indicating that there were distinct sub-strain lineages that differed by a few hundred SNVs genome-wide (e.g. 1674-T, Figure 3a, b; Supplementary Figure 13-15). In particular for subject 1674-T, we noted that both E. coli and K. pneumoniae have a dominant cluster during CPE positive timepoints (V00–V03) that match the SNV signature seen in the genomes for E. coli and K. pneumoniae CPE isolates for this individual (Shared and Cluster 1 SNVs, Figure 3c, d, Supplementary Figure 15). In contrast, the sub-dominant cluster (Cluster 2, Figure 3c, d, Supplementary Figure 15; likely representing a sub-lineage of Cluster 1) has a SNV signature that is not seen in the CPE isolates and is still detected in the post-decolonization timepoint (V05, based on stool PCR testing), indicating that these sub-strain lineages may be discordant for CPE status despite their overall genomic similarity. For both E. coli and K. pneumoniae, we noted that decolonization coincides with the appearance of a distinct strain with >1,000 SNVs distinguishing them from the CPE strains (V05 unique, Figure 3c, d, Supplementary Figure 15; V05 classified as two-strain timepoint). Interestingly, despite these shared patterns within E. coli and K. pneumoniae strains, we noted that they exhibited dissimilar trends in terms of overall relative abundance, with the abundance of E. coli being reduced leading up to the decolonization timepoint (V05) while K. pneumoniae abundance peaks at this point (Figure 3e, f). In general, while a few dense trajectories of co-varying SNVs were detected in other individuals, many SNVs varied independent of these clusters (Supplementary Figure 13, 14). Overall, these results suggest that Enterobacteriaceae species may share patterns of sub-strain dynamics in relation to CPE decolonization, despite having species-specific ecological properties.
Leveraging the availability of multiple timepoints across subjects, we identified SNVs whose populations frequencies varied notably over time (>30%). These were then overlapped across subjects to identify SNVs that have this property recurrently (Table 3, Supplementary File 6), identifying a range of polysaccharide utilization (lacZ, lacI, treA), pyruvate metabolism (pflB, pykF) and protein synthesis (dnaK, 30S and 50S ribosomal subunits) genes that have been implicated in adaptive evolution under nutrient limitation43, antibiotic44 and environmental stress45,46 conditions. In particular, several genes were common to the lists for E. coli and K. pneumoniae (srmB, pnp, nlpI, pheT), suggesting that similar selection constraints might be acting on strains for both species. The lacZ gene was highlighted as having the most recurrent, frequency-varying SNVs in this analysis (n=12), with all SNVs occurring in surface-exposed regions (Figure 3g). Comparing the accessible surface area (ASA) of protein residues between variant and other sites revealed that variant residues are significantly more exposed to the solvent (mean=73.1Å2) than other residues (mean=34.1Å2, Welch’s t-test p-value <0.01). In addition, 6 SNVs occurred in the activating interface, a region near the amino-terminus of lacZ that is required for tetramerization47, indicating that they may influence lacZ function via complex formation dynamics.
Among the genetic features prominent in CPE strains seen in subject 1674-T, we noted that variants in polysaccharide utilization genes were common as discussed previously (Figure 3g). In addition, we analyzed plasmid sequences across timepoints and identified two important plasmids that were shared between E. coli and K. pneumoniae CPE strains (Figure 3h, Supplementary Figure 16, Methods). This included the pKPC2 plasmid that was recently identified in hypervirulent, carbapenem-resistant Klebsiella pneumoniae isolates from Singapore and harbors blaKPC-2, a carbapenemase gene that was the basis of CPE designation for these isolates24. In addition, the pMS6192B plasmid was shared between all E. coli isolates and the Klebsiella pneumoniae isolate from the first visit (V00, Figure 3h). The shared plasmids have a total sequence length of >140kbp and no SNVs distinguishing the two species, indicating that they have a recent common source. Plasmid transfer experiments with pKPC2 between E. coli and K. pneumoniae strains suggest moderate conjugation frequency under in vitro conditions (∼0.1%, Methods). In addition, half of the plasmid bearing clones (3/6) were observed to have a SNV in pKPC2 after 300 generations, defining an upper-bound on the divergence of plasmid-bearing isolates having no SNVs being 5 months (Binomial p-value <0.05).
Discussion
The availability of metagenomic data from up to 12 timepoints over the period of a year in this longitudinal study allowed us examine long-term dynamics, enabling comparison of microbiome configurations before and after CPE decolonization in a subject-matched fashion to reveal microbiome shifts associated with decolonization. This analysis revealed ecological shifts that cannot be explained solely by the loss of CPE strains (e.g. increase in species richness), and the specific taxonomic and functional changes observed point to the role of inflammation in maintaining an Enterobacteriaceae-favorable gut environment in index subjects (e.g. Pantoea species; Supplementary Figure 2, Supplementary File 3). In addition, our data indicates that this configuration may be unstable in many individuals, opening up the possibility that interventions that reduce gut inflammation directly or via the action of probiotics could reduce Enterobacteriaceae abundances and promote CPE decolonization.
In particular, gut inflammation has been known to create a niche for enterics such as Salmonella48, where some species can use sulfate, nitrate and tetrathionate as the terminal electron acceptor for anaerobic respiration (e.g. E. coli). The enriched pathways in CPE colonized subjects are marked by menaquinol biosynthesis, glycolysis and respiration (TCA cycle), even after computationally subtracting out the contribution of Enterobacteriaceae, indicating that the gut environment in this group is qualitatively different in oxygenation. In addition, fucose and rhamnose degradation, as well as 1, 2-propanediol degradation are enriched in CPE colonization, potentially serving as carbon sources for Enterobacteriaceae such as K. pneumoniae which can demonstrate competitive fitness in the gut with oxygen as terminal electron acceptor under such conditions38. The enrichment of the pentose phosphate pathway could indicate a need for reducing equivalents of NADPH+ to maintain redox conditions or serve as nucleic acid precursors to fuel growth. Overall, the shift in microbial pathways in CPE colonized subjects appears to be largely independent of Enterobacteriaceae species, but favoring their growth. Further work is needed to understand if this shift is primarily established by gut inflammation (e.g. as seen in colitis49, potentially through direct measurement of protein biomarkers such as Calprotectin) or if a diverse set of factors play a strong role in an individual-specific manner (e.g. antibiotics for some subjects33). In particular, while the reduction in microbial diversity during CPE colonization could not be solely attributed in this study to factors such as antibiotic usage or hospitalization at a timepoint, these could be delayed effects and would therefore need a more controlled study design to explore this further.
An alternative strategy to promote CPE decolonization could be based on the introduction of species that were relatively depleted in the colonized state (e.g. Faecalibacterium prausnitzii or Bifidobacterium bifidum), either in the form of probiotic formulations or through the use of fecal microbiota transplants50. Matching donors to recipients to supplement missing species or to promote further instability in Enterobacteriaceae abundances based on ecological models could be a promising avenue to explore here similar to studies for Clostridoides difficile51,52. The observed differences in colonization dynamics for Enterobacteriaceae species (E. coli and K. pneumoniae) suggest that CPE decolonization strategies might also have to be species-specific. For example, the presence of multiple K. pneumoniae strains in index subjects is consistent with the hypothesis that they are not well-adapted for gut colonization but are instead opportunistically exploiting a niche. Decolonization of K. pneumoniae strains may therefore require elimination of conditions that favor this niche such as inflammation or availability of specific sugars. On the other hand, the presence of a single strain of E. coli in many index subjects supports a model where gut adapted strains have acquired antimicrobial resistance cassettes, and plasmid targeting strategies might be better suited in this case. Interestingly, data from our cohorts suggests that human gut microbiomes can harbor multiple strains of commensal species such as E. coli (in contrast to observations in mouse studies53,54, even among non-CPE colonized family members55,56 (Figure 2a). Further studies using high-throughput culturing and single-cell sequencing could help accurately reconstruct strain genomes and unravel the factors that determine niche competition57.
Understanding the factors that support gut colonization by CPE species can provide another avenue to identify targets for intervention. As we show here, the analysis of high-coverage metagenomic data to identify sub-strain variations with functional impact can provide promising hypotheses based on in vivo evolution, similar to the quasi-species analysis of viruses58,59, or mutagenesis-based experiments60. Furthermore, identification and isolation of sub-strain lineages with distinct advantages in colonizing the host or avoiding decolonization (e.g. as may be the case for cluster 2 in 1674-T), can help narrow down the genetic features that need to be investigated in vitro. Finally, the role of the gut microbiome as a reservoir for AMR determinants, and plasmid sharing across Enterobacteriaceae species is of particular concern. While we cannot definitively conclude that the data for index subject 1674-T represents an example of plasmid transfer, these observations and the isolated strains serve as important resources to guide further investigations into plasmid transmission and CPE decolonization.
Methods
Sample collection and CPE classification
A prospective cohort study involving CPE carriers was conducted from October 2016 to February 2018. Study participants were recruited from two tertiary healthcare centers in Singapore. CPE carriers were identified by routine collection of rectal swab samples for clinical care and infection prevention and control measures, in accordance with local infection control policies. The study received ethics approval from the Singapore National Healthcare Group Domain Specific Review Board 74 (NHG DSRB Reference: 2016/00364) prior to commencement. Stool samples were first collected weekly for four weeks, then monthly for five months, and finally once every two months for six months. In addition to the CPE-colonized subjects, stool samples from a number of family members were also obtained to provide a control dataset. Samples obtained from index subjects were classified as either CPE positive or CPE negative, based on whether CPE genes (including blaNDM-1, blaKPC, blaOXA-48, blaIMI-1, and blaIMP) were positively identified from Enterobacteriaceae isolates found to be resistant to either meropenem or ertapenem24 (Supplementary Table 1). The presence of CPE negative samples was used to detect CPE clearance and samples were further classified based on the amount of time elapsed since clearance i.e. before clearance, within two months post-clearance, and more than two months post-clearance. Due to the focus on household transmission and CPE carriage, dietary information was not collected in the clinical study.
Isolate sequencing and assembly
DNA for all CPE isolates obtained from stool samples in this study (all subjects, all timepoints) was collected from Tan Tock Seng Hospital (TTSH) and transferred to the Genome Institute of Singapore (GIS) for whole genome sequencing. Library preparation was performed using the NEBNext Ultra DNA Library Prep Kit for Illumina, and 2×151 base-pair sequencing was performed using the Illumina HiSeq 4000. Raw FASTQ reads were processed using in-house pipelines at GIS for de novo assembly with the Velvet assembler61 (v1.2.10), parameters optimized by Velvet Optimiser (k-mer length ranging from 81 to 127), contig scaffolding with Opera62 (v1.4.1), and finishing with FinIS63 (v0.3).
Shotgun metagenomic sequencing
DNA from 361 stool samples was extracted using the PowerSoil DNA Isolation Kit (12888, MoBio Laboratories) with modifications to the manufacturer’s protocol. Specifically, to avoid spin filter clogging, we extended the centrifugation to twice the original duration, and solutions C2, C3 and C4 were doubled in volume. DNA was eluted in 80µL of Solution C6. Concentration of DNA was determined by Qubit dsDNA BR assay (Q32853, Thermo Fisher Scientific). For library construction, 50ng of DNA was re-suspended in a total volume of 50µL, and was sheared using Adaptive Focused Acoustics (Covaris) with the following parameters: duty factor of 30%, peak incident power (PIP) of 450, 200 cycles per burst, and treatment time of 240s. Sheared DNA was cleaned up with 1.5× Agencourt AMPure XP beads (A63882, Beckman Coulter). Gene Read DNA Library I Core Kit (180434, Qiagen) was used for end-repair, A-addition and adapter ligation. Custom barcode adapters were used for cost considerations (HPLC purified, double stranded, 1st strand: 5’ P-GATCGGAAGAGCACACGTCT; 2nd strand: 5’ ACACTCTTTCCCTACACGACGCTCTTCCGATCT) in replacement of Gene Read Adapter I Set for library preparation. Library was cleaned up twice using 1.5× Agencourt AMPure XP beads (A63882, Beckman Coulter). Enrichment was carried out with indexed-primers according to an adapted protocol from Multiplexing Sample Preparation Oligonucleotide kit (Illumina). We polled the enriched libraries in equi-molarity and sequenced them on an Illumina HiSeq 2500 sequencing instrument at GIS to generate 2×101 base-pair reads, which yielded around 17.7 billion paired-end reads in total and 49 million paired-end reads on average per library.
Taxonomic and functional profiling
Read quality trimming was performed using famas (https://github.com/andreas-wilm/famas, v0.10, --no-order-check), and microbial reads were identified by mapping and filtering out reads aligned to the human reference genome (hg19) using bwa-mem64 (v0.7.9a, default parameters; >90% microbial reads on average). Taxonomic profiling was done using MetaPhlAn65 (v2.0, default parameters, filtering taxa with relative abundance<0.1%) and functional profiles were obtained with HUMAnN66 (v2.0, default parameters). As a sanity check, we confirmed that species and genus-level taxonomic profiles were not dominated by taxa that are commonly attributed to reagent or laboratory contamination67 (Supplementary File 2). Average-linkage hierarchical clustering of taxonomic profiles was used to group samples with the number of clusters determined using Akaike information criterion (AIC). Sample α-diversity was computed using the Shannon diversity index with the vegan library in R. Differential abundance analysis was performed using LEfSe68 (v1.0.8), as a non-parametric and conservative approach to identify significantly varying taxa and functions across groups69. These results were further validated using Songbird70 (v1.0.3; –epochs 10000 –differential-prior 0.5) analysis with Bonferroni-corrected p-value<0.05. Abundances of antibiotic resistance genes (ARGs) in the metagenomes was computed using a direct read mapping approach implemented in SRST271 with default parameters and the CARD_v3.0.8_SRST2 database72.
Linear mixed-effects modeling
Linear mixed effects modeling was conducted using the lmer function from the lme4 package in R. For each model, genus-level Shannon diversity was set as the response variable, with colonization status as the fixed effect and potential confounders (e.g. antibiotic usage since last visit, hospitalization status, individual subjects, gender and ethnicity; Supplementary File 1) as random effect covariates. Residual Shannon diversity values were derived for visualization by subtracting the intercept terms corresponding to random effects.
Single-nucleotide variant analysis
Genome assemblies were aligned to their respective reference genomes using nucmer (v3.23, -maxmatch -nosimplify) and consensus SNVs were called using the show-SNVs function in MUMmer73. References for E. coli (NC_011750) and K. pneumoniae (NC_016845.1) were selected to minimize median distance from isolate genomes. Metagenomic SNVs (consensus and low-frequency) were identified based on read mapping using bwa-mem64 to the E. coli and K. pneumoniae references (v0.7.10a; soft-clipped reads and reads with >3 or 4 mismatches for K pneumoniae and E. coli respectively were filtered out to avoid mis-mapped reads) and variant calling with LoFreq74 (v1.2.1; default parameters). Note that our stringent mapping approach restricts to only reads with >96% identity with the reference, and thus will typically exclude mis-mapping of reads from other genomes. Additionally, genomic regions with frequent ambiguous mappings were identified based on isolate sequencing data and metagenome data from samples without target species as determined from taxonomic classification (>5× coverage with E. coli reads on K. pneumoniae genome or vice versa). Calls in these regions that match positions where variants were called between isolate reads and reference sequence (allele frequency > 95%) were excluded from downstream analysis. The validity of this pipeline was confirmed by noting that very few K. pneumoniae SNVs (median=2, mean=5.5) were called genome-wide when analysing metagenomes where taxonomic profiling detected few K. pneumoniae reads (10 samples with 107-288 reads). Note that SNVs from such “low coverage” samples are also excluded from further analysis in this study as defined below. To assess the impact of a shared, but potentially divergent, reference on SNV calling, reads were also mapped onto CPE isolate genomes (where available) to call SNVs and compute concordance. Isolate genome based SNVs were translated to the common reference coordinate system using the UCSC liftover tool75 with chain file generated using flo76 (-fastMap -tileSize=12 -minIdentity=90).
Strain analysis
Metagenomic coverage of samples for E. coli and K. pneumoniae was determined from bwa-mem read mappings using genomeCoverageBed77 (v2.25.0). Samples with too low relative abundance for confident identification (<0.1%) were designated as “not detected”, while samples with low median read coverage (<8) were designated as “low coverage”. Of the remaining samples, those with >90% of the SNVs at or above an allele frequency of 0.9 were designated as “one strain”, exhibiting a unimodal distribution as is classically expected in the single haplotype setting20 (Supplementary Figure 10a). A k-means clustering approach (based on allele frequency values < 0.98, k=2) was used on other samples to identify “two strains” (silhouette score > 0.8, indicating good concordance with 2 clusters for a bimodal distribution) and “multiple strains” cases where there may be more than 2 clusters (Supplementary Figure 10a). Note that this analysis was only used to determine strain “states” (Figure 2), and the corresponding clusters were not used for downstream haplotype analysis. To confirm metagenomic SNV calling quality and strain designations, “one strain” cases were compared to SNVs from corresponding isolates (where available) and noted to have high precision for both E. coli and K. pneumoniae (>98%, Supplementary Figure 10b). A first-order Markov model of the transition frequencies between the strain compositions was estimated using the markovchain package in R78 (maximum likelihood estimator with Laplace smoothing parameter = 1).
Sub-strain analysis
SNVs with mean allele frequency >0.9 across timepoints were identified as likely fixed across all strains in a sample. Non-fixed SNVs from “one strain” cases were further annotated for their impact on protein function using SnpEff79 (v4.3). The ratio of the rate of non-synonymous (dN) to synonymous (dS) mutations was calculated using the package Biopython.codonalign.codonseq with the ‘NG86’ method.
Leveraging the availability of multiple “one strain” timepoints in some individuals, non-fixed SNV trajectories were clustered to identify co-varying SNVs that may belong to a common sub-strain background. Specifically, the DBSCAN algorithm in R80 was used to cluster SNV trajectories in selected individuals with multiple “one strain” timepoints (ε=0.2, minPts=2n as recommended) and identified clusters were visualized as a sanity check.
Plasmid analysis
A Mash screen search approach was used with PLSDB81 to obtain a list of plasmids that are potentially present in the CPE isolate genomes. The union of all such plasmid sequences was then aligned with isolate genome assemblies to identify plasmids hits with >85% coverage at 95% identity (only alignments >500bp). Plasmid hits were clustered into groups using hierarchical clustering at 95% identity (hclust function in R, average linkage based on Mash distance82), with the longest plasmid serving as a representative. Only plasmids longer than 10kbp are included in the figure to avoid spurious/redundant matches to shorter plasmids.
Plasmid conjugation assay
Donor E. coli harbouring the pKPC2 plasmid with a kanamycin selection cassette (MG1655) and recipient K. pneumoniae strains (ATCC13883) were streaked on selective LBA and incubated overnight at 37°C. Bacterial colonies were resuspended in LB (1 mL), diluted to OD600 = 0.5 and mixed in a 1:1 ratio and spotted onto 0.22 µm nitrocellulose membrane (Sartorius) placed on top of LBA (20 µL). After 4 hours incubation at 37°C, the bacterial mixture was resuspended in 2 mL of PBS, serially diluted and plated on LBA with appropriate antibiotic selection. Kanamycin (50 micrograms/ml) and fosfomycin (40 micrograms/ml) were used for selection of transconjugants. Plates were incubated at 37°C overnight and colonies were enumerated. Conjugation frequency was calculated as the total number of transconjugants per total number of recipients.
Code and data availability
Source code for scripts used to analyze the data are available in a GitHub project at https://github.com/CSB5/CPE-microbiome. Isolate and shotgun metagenomic sequencing data is available from the European Nucleotide Archive (ENA – https://www.ebi.ac.uk/ena/browser/home) under project accession number PRJEB49334.
Footnotes
↵* Joint First Authors
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.