Abstract
Preterm birth (PTB) is the leading cause of neonatal morbidity and mortality. The vaginal microbiome has been associated with PTB, yet the mechanisms underlying this association are not fully understood. Understanding microbial genetic adaptations to selective pressures, especially those related to the host, may yield new insights into these associations. To this end, we analyzed metagenomic data from 705 vaginal samples collected longitudinally during pregnancy from 40 women who delivered preterm spontaneously and 135 term controls from the Multi-Omic Microbiome Study-Pregnancy Initiative (MOMS-PI1). We find that the vaginal microbiome of pregnancies that ended preterm exhibits unique genetic profiles. It is more genetically diverse at the species level and harbors a higher richness and diversity of antimicrobial resistance genes, likely promoted by transduction. Interestingly, we find that Gardnerella species, a group of central vaginal pathobionts, are driving this higher genetic diversity, particularly during the first half of the pregnancy. We further present evidence that Gardnerella spp. undergoes more frequent recombination and stronger purifying selection in genes involved in lipid metabolism. Overall, our results reveal novel associations between the vaginal microbiome and PTB using population genetics analyses, and suggest that evolutionary processes acting on the vaginal microbiome may play a vital role in adverse pregnancy outcomes such as preterm birth.
Introduction
Preterm birth (PTB), childbirth at <37 weeks of gestation, is the leading cause of neonatal morbidity and mortality2. Each year, approximately 15 million infants are born preterm globally, over 500,000 of them in the US3. Preterm infants are at a high risk of respiratory, gastrointestinal and neurodevelopmental complications4. While a number of maternal, fetal, and environmental factors have been associated with PTB2,5,6, its etiopathology remains largely unknown, and early diagnosis and effective therapeutics are still lacking.
Over the past decades, growing evidence has pointed to potential involvement of the vaginal microbiome in PTB1,7–10. This involvement has so far been mostly characterized as an ecological process, meaning changes in microbial abundances and vaginal community states. For instance, increased richness and diversity of microbial communities and the presence of particular community state types (CST), have been repeatedly associated with PTB1,9,11–15. In addition, vaginal microbiomes of women who delivered preterm appear to be less stable during pregnancy, with some studies reporting a significant decrease in the richness and diversity of these microbial communities during pregnancy1,12.
Multiple endogenous factors, such as hormonal changes, nutrient availability and microbial interactions, and exogenous factors, such as genital infections, antibiotic treatment and exposure to xenobiotics, could trigger ecological processes and alter the vaginal microbial composition16,17. These factors may also act as selective pressures that affect genetic variation in the microbial populations that make the vaginal microbiome. Such adaptive evolution in the vaginal environment, even during pregnancy, is highly plausible given the high mutation rates, short generation times, and large population sizes of microbes18. They are further supported by observations of rapid adaptation to environmental changes in other human-associated microbial ecosystems19–21. The way by which vaginal microbes respond to various selective pressures may, in turn, affect the host, including pregnancy outcomes. Therefore, a comprehensive investigation of the genetic diversity of the vaginal microbiome at the population level, which we term “microdiversity”, and the underlying evolutionary forces that shape it, holds promise for a better understanding of the etiopathology of PTB.
Here, we performed an in-depth population genetics analysis and characterized the population structure of the vaginal microbiome along pregnancy and in the context of preterm birth. We used metagenomic sequencing data from 705 vaginal samples collected longitudinally during pregnancy as part of the Multi-Omic Microbiome Study-Pregnancy Initiative (MOMS-PI1). Our analyses include samples from 40 women who subsequently experienced spontaneous preterm birth (sPTB) and 135 women who had a term birth (TB). We show that the vaginal microbiome of pregnancies that ended preterm exhibits higher nucleotide diversity at the species level and higher antimicrobial resistance potential. We find that this higher nucleotide diversity is driven by Gardnerella spp., a group of central vaginal pathobionts, especially during the first half of pregnancy, and suggest that this may be related to optimization of growth rates in this taxon. We further identify a strong association between evolutionary signatures and sPTB in Gardnerella spp., including more frequent homologous recombination and stronger purifying selection. Overall, our results show novel associations between the vaginal microbiome and sPTB at the population genetics level, and suggest that evolutionary processes acting on the vaginal microbiome may play a critical role in sPTB, and potentially also in other adverse pregnancy outcomes.
Results
The phylogenetic composition of the vaginal microbiome associates with sPTB
We assembled a total of 1,078 metagenome-assembled genomes (MAGs) with at least medium quality22 (>50% completeness, <10% contamination; Supplementary Table 1; Methods) from previously published1 metagenomic reads generated from 705 vaginal samples1. These samples were collected from 175 women visiting maternity clinics in Virginia and Washington at various time points along pregnancy1. We clustered these MAGs into 157 species-level phylogroups at the level of 95% average nucleotide identity (ANI), which roughly corresponds to the species level23; and selected the most complete MAG with the least contamination as the representative for each phylogroup. These representative MAGs were 86±14% (mean±SD) complete and 1.1±1.8% contaminated, with 93 (59% of 157) of them estimated to have high quality22 (>90% completeness and <5% contamination; Supplementary Table 1). Taxonomic assignment of these representative MAGs (Methods) revealed that the phylogroups represent at least 8 phyla, with genome size (adjusted by completeness) ranging from 0.6 to 7.4 Mbps and GC content ranging from 25.3% to 69.7% (Fig. 1a, Supplementary Table 1). Actinobacteria had the most phylogroups detected in the samples, followed by Firmicutes and Bacteroidetes (Fig. 1a).
Of note, 12 of these species-level phylogroups (PG042-PG053) were assigned to Gardnerella vaginalis according to CheckM24, supporting the existence of multiple genotypes at the species level within the ‘species’ G. vaginalis25,26. To better resolve the classification of these G. vaginalis phylogroups, we compared the average nucleotide identity (ANI) for the representative MAGs of these phylogroups against updated reference genomes for Gardnerella, including G. vaginalis, G. piotti, G. leopoldii, G. swidsinskii, and nine species remained to be characterized (gs2-3 and gs7-13)25; gs-2-3 and gs7-13 correspond to group 2-3 and 7-13 shown in Fig. 1 in 25. The ANI analysis shows that PG043 represents G. vaginalis, PG044 represents G. swidsinskii, PG042 represents G. piotti, and PG046, PG049, PG051 and PG053 represent G. gs7, G. gs8, G. gs13 and G. gs12, respectively (Supplementary Fig. 1). The remaining phylogroups (PG045, PG047, PG048, PG050, and PG052) do not cluster with any reference species and may represent novel species of Gardnerella. Here, we refer to phylogroups PG042-PG053 as Gardnerella spp.
To understand if the temporal dynamics of the vaginal microbiome is associated with sPTB, we employed a revised version of compositional tensor factorization (CTF)27 to assess temporal changes to the composition of the microbiome during pregnancy. This analysis shows a significant separation of women by pregnancy outcomes (PERMANOVA F = 8.0492; P = 0.002; Fig. 1b) based on the dynamics of their microbiome composition over time, specifically observed for component 1 in the CTF analysis (Mann-Whitney P = 0.015; Fig. 1c). We further found that the top features contributing to this difference belong to Lactobacillus helveticus (PG081), Lactobacillus crispatus (PG080), Lactobacillus gasseri (PG079), and Lactobacillus jensenii (PG076 and PG077) that are associated with TB and Megasphaera genomsp. (PG061), Gardnerella spp. (PG047, PG050, PG052), and Atopobium vaginae (PG041) that are associated with PTB (Fig. 1d); these species were previously found to be associated with pregnancy outcomes1,9,28. These results suggest that the vaginal microbiome has a different temporal trajectory during pregnancies ending preterm, consistent with previous findings1,7, and with Gardnerella as an important factor. Overall, our results demonstrate that de-novo metagenomic analysis replicates and expands previous findings with respect to associations between the composition of the vaginal microbiome and sPTB.
Next, we sought to examine the diversity of microbial strains detected within species, and its association with sPTB. We found strains of M. genomosp. showed significantly higher ANI between women who delivered preterm compared to a null distribution calculated based on ANIs from any two randomly selected women (Permutation P = 0.002, adjusted P < 0.05; Methods), a relationship not observed between women who delivered at term (P = 0.208; Supplementary Fig. 2). This result indicates that M. genomosp. were more closely related than expected by chance across women who delivered preterm. It suggests that sPTB-associated vaginal conditions across women may be more conserved, harboring a group of significantly closely related M. genomosp. strains, compared to TB-associated vaginal conditions.
Gardnerella species have higher microdiversity in the first half of pregnancies that ended preterm
Human microbes can adapt to host-induced environmental changes (e.g., diet, antibiotics) through genetic variations29. Therefore, the microbial populations of the same species in different hosts can have a different genetic structure, which provides them a competitive advantage. These genetic differences, in turn, may be related to the phenotype of the host. To understand the genetic structure of microbial populations in the vaginal environment and its association with pregnancy outcomes, we calculated the nucleotide diversity for each identified phylogroup. Overall, vaginal microbial populations had a significantly higher genome-wide nucleotide diversity in sPTB than in TB (median along pregnancy; Mann-Whitney P = 0.0073; Fig. 2a). Stratifying by phylogroups, we found that this difference was mainly driven by Gardnerella spp. (P = 0.017; Fig. 2b). G. piotti (PG042), G. swidsinskii (PG044), and G. gs13 (PG051) and a potentially novel Gardnerella spp. (PG045), along with a phylogroup of Atopobium vaginae, a suspected vaginal pathobiont30, showed significantly higher genome-wide nucleotide diversity in sPTB (P < 0.05, adjusted P < 0.1 for all; Supplementary Fig. 3). These results imply that microbial populations composed of more diverse strains from the same species, and particularly Gardnerella spp., are growing in the vaginal environment associated with sPTB.
To understand how the nucleotide diversity of Gardnerella spp. changes over time during pregnancy, we analyzed temporal trajectories of term and preterm pregnancies. To this end, we pooled the data from all women in each group, binned pregnancy weeks and used splines to smooth the temporal curves (Methods). We found a significant difference between the temporal trajectories of Gardnerella spp. nucleotide diversity in pregnancies ending at term and preterm (Permutation test < 0.001(ref. 31), Wilcoxon signed-rank P < 0.003; Methods; Fig. 2c).
Specifically, we found that the nucleotide diversity of Gardnerella spp. increased at the beginning of pregnancies which ended preterm, with a peak at around gestational week 13, and then dropped to its initial value at around gestational week 20 (Fig. 2c). In comparison, nucleotide diversity of Gardnerella spp.in TB remained relatively stable (Fig. 2c). Given that gestational week 20 is the middle of a full-term pregnancy, we subsequently analyzed samples with respect to two time periods - first half (0-19 gestational week) and second half of pregnancy (20-37 gestational week; 37 was chosen to ensure a similar time range for both sPTB and TB). As expected, the nucleotide diversity of Gardnerella spp.in sPTB was significantly higher than TB in the first half of pregnancy (median along first half; Mann-Whitney U P = 0.0091; Fig. 2d), but not in the second half (P = 0.71; Fig. 2e). We further found that nucleotide diversity had a significantly stronger correlation with synonymous mutations than with nonsynonymous mutations across Gardnerella spp. (paired t-test P =0.0011; Supplementary Fig. 4), suggesting a more important role of purifying selection in shaping genomic diversity. Overall, these results suggest that genetic diversity of Gardnerella spp. in the first half of pregnancy is important to birth outcomes, and could perhaps be used as a biomarker for early diagnosis of sPTB.
To understand if any particular genes are driving the association between sPTB and the microdiversity of Gardnerella spp. in the first half of pregnancy, we further analyzed nucleotide diversity at the gene level for these species. We identified 21 and 47 genes (out of 825 and 531) in G. swidsinskii (PG044) and G. vaginalis (PG043), respectively, that showed significantly different nucleotide diversity between sPTB and TB (median along the first half of pregnancy; Mann-Whitney P < 0.05, adjusted P < 0.1 for all). These genes included one gene encoding the putative tail-component of bacteriophage HK97-gp10 (P = 0.0012) and one gene encoding putative AbiEii toxin, Type IV toxin–antitoxin system (P = 5×10−4), which might be involved in the interaction with maternal health32,33. To further identify what functions were related to these associations, we then performed functional enrichment analysis (Methods) using the eggNOG functional annotation of genes (Supplementary Table 2). We found that the KEGG pathway ‘drug metabolism - other enzymes’ (ko00983) was significantly enriched among genes from G. swidsinskii (PG044) that had significantly higher microdiversity (P < 0.05, adjusted P < 0.1; Fig. 2f). This result suggests that the more diverse gene pool in G. swidsinskii (PG044) detected in sPTB may be associated with adaptation to drugs present in the environment. This may be consistent with our recent finding that xenobiotics detected in the vaginal environment are strongly associated with sPTB34.
To verify that the higher nucleotide diversity we observed in sPTB pregnancies was not caused by sampling or sequencing bias, we compared the read count and quality of MAGs obtained from sPTB and TB samples. If this higher diversity is the result of a higher read count in sPTB samples or more complete MAGs, we would expect read count and MAG completeness to be higher in sPTB samples. Instead, we found the completeness and contamination of MAGs assembled from sPTB samples were not significantly different from TB (Mann-Whitney P = 0.71 and 0.73, respectively; Supplementary Fig. 5a,b). Next, we assessed the correlation between the number of reads mapped to each phylogroup and its genome-wide diversity. If a higher diversity is caused by more reads mapped to the MAG representing the phylogroup, we would expect a positive correlation between these two measurements. However, only 3 phylogroups (PG064, a Dialister spp.; PG102, a Peptoniphilus spp.; and PG122, a Bradyrhizobium spp.) had a significant positive correlation between read counts and nucleotide diversity (Spearman ρ = 0.70, 0.54, and 0.42, respectively; non-adjusted P = 0.035, 0.024, and 0.00015, respectively). In 98% of phylogroups, we did not observe a statistically significant positive correlation (median [IQR] Spearman correlation of -0.067 [-0.19, -0.13]). None of the four Gardnerella spp. Phylogroups that showed significantly higher nucleotide diversity in sPTB pregnancies in Supplementary Fig. 3 were significantly positively correlated (Spearman ρ = -0.00, -0.12, - 0.02, and -0.35 and P = 0.96, 0.091, 0.77, and 0.00045 for PG042, PG044, PG045, and PG051, respectively; Supplementary Fig. 5c).
Finally, as we have observed a significantly higher read count in sPTB samples (Mann-Whitney P = 0.0004 and 0.061 for samples and subjects, respectively; Supplementary Fig. 5d and 5e, respectively), we subsampled an identical number of reads (105) from each sample, retaining 75% of samples, and repeated our analyses of nucleotide diversity. As with the first analysis (Fig. 2), nucleotide diversity was significantly higher in sPTB pregnancies across all phylogroups, and particularly in Gardnerella spp.(Mann-Whitney P = 0.015 and P = 0.0043, respectively; Supplementary Fig. 5f and 5g, respectively). Similarly, in the first half of pregnancy, the nucleotide diversity of Gardnerella spp.was significantly higher in sPTB (P = 0.026; Supplementary Fig. 5h), while in the second half, there was no significant difference (P = 0.22; Supplementary Fig. 5i). Overall, these results confirm that the sPTB-associated nucleotide diversity we observed was not biased by technical artifacts.
Evolutionary forces acting on Gardnerella species are associated with pregnancy outcomes
Adaptation should increase the fitness of an organism, its ability to survive and reproduce in a given environment. To better understand if the Gardnerella spp. populations with higher genetic diversity grow better in the vaginal environment associated with sPTB, we inferred fitness using two measures: relative abundance and growth rate. Indeed, we found that nucleotide diversity in these species was positively correlated with relative abundance (Spearman ρ = 0.35, P = 0.0013; Fig. 3a). This correlation was not observed in other phylogroups (Supplementary Fig. 6a). L. crispatus (PG080) and L. iners (PG086) even showed a significantly negative correlation (ρ = -0.39 and -0.32, P = 0.026 and 0.0014, respectively; Supplementary Fig. 6b,c). We additionally used gRodon35 to predict the maximum growth rate of microbes based on codon usage bias in highly expressed genes encoding ribosomal proteins. We found that in the first half of pregnancy, Gardnerella spp. had a somewhat higher, albeit not statistically significant, maximum growth rate in sPTB pregnancies (Mann-Whitney P = 0.057; Fig. 3b), while in the second half of pregnancy, the difference was diminished (P = 0.15; Fig. 3c). These results suggest that the sPTB-associated genetic diversity observed in Gardnerella spp. may be related to the optimization for faster growth in the sPTB-associated vaginal environment.
Microbial population structure is influenced by various evolutionary processes including selection and homologous recombination36. Competence, a mechanism of horizontal gene transfer which involves homologous recombination, has been identified in Gardnerella spp37. To better interpret the significant differences we observed in the microdiversity patterns of Gardnerella spp. between sPTB and TB, we quantified the degree of homologous recombination using the normalized coefficient of linkage disequilibrium between alleles at two loci, D’. A value of D’ closer to 0 indicates a higher degree of recombination38. Interestingly, we found that the median D’ of Gardnerella spp.was significantly smaller in sPTB pregnancies in both the first (Mann-Whitney P = 0.041; Fig. 3d) and second halves of pregnancy (P = 0.013; Fig. 3e), and the same was also observed for the D’ of three specific Gardnerella spp., G. piotti (PG042), G. gs7 (PG046), and PG047, in the first half of pregnancy (P < 0.05, adjusted P < 0.1 for all; Supplementary Fig. 7a). This result suggests that Gardnerella spp. tends to have more frequent recombination in women who delivered preterm during both halves of pregnancy.
Next, we quantified the degree of selection using dN/dS in this species (Methods). This measure quantifies the ratio between synonymous and non-synonymous mutations, and hence offers insight into the type of selection, with values close to zero indicating purifying selection, and values higher than one indicating positive selection39. dN/dS is calculated in relation to the reference, and can therefore detect selection on mutations that have already been fixed within the population40. Consistent with the gut and ocean microbiomes41–43, purifying selection is predominant across all genes of the vaginal microbiome (dN/dS << 1; median [IQR] dN/dS of 0.17 [0.10, 0.29]; Supplementary Fig. 7b). While the median dN/dS of all Gardnerella spp. genes was not significantly different between sPTB and TB pregnancies (Mann-Whitney U P = 0.48), we detected some differences when examining high-level functions (COG categories44) within each half of pregnancy. In the first half, the median dN/dS of Gardnerella spp. genes was somewhat lower in sPTB pregnancies for inorganic ion transport and metabolism, lipid transport and metabolism, secondary structure, and cell wall/membrane/envelope biogenesis, though this was not statistically significant after adjusting for multiple testing (COG categories “P”, “I”, “Q”, and “M”, respectively; Mann-Whitney P < 0.05, adjusted P > 0.1 for all; Supplementary Fig. 7c). In the second half of pregnancy, the median dN/dS was significantly lower in sPTB pregnancies for lipid transport and metabolism and cell motility (COG categories “I” “N”; Mann-Whitney P = 0.0040 and P = 0.04, adjusted P = 0.07 and 0.40, respectively; Fig. 3f). Our results suggest that Gardnerella spp.genes involved in lipid transport and metabolism may undergo stronger purifying selection in sPTB. As purifying selection maintains the fitness of organisms by constantly sweeping away deleterious mutations and conserving functions, Gardnerella spp. may benefit from this stronger purifying selection targeting lipid functioning when growing in the sPTB-associated vaginal environment during pregnancy.
sPTB-associated vaginal microbiomes have a higher antibiotic-resistance potential
Antibiotics are widely used during pregnancy, sometimes even topically in the vagina45. This exposure may promote antimicrobial resistance (AMR). To assess if antibiotic-resistance potential in the vaginal microbiome is associated with sPTB, we subsampled an identical number of reads (105) from each sample and mapped them to the Comprehensive Antibiotic Resistance Database46. The total number of reads mapped to AMR reference genes was significantly higher in the first half of sPTB pregnancies (Mann-Whitney U P = 0.015; Fig. 4a), but not in the second half (P = 0.76; Fig. 4b). In addition, to assess the difference of specific AMR genes between the vaginal microbiomes of sPTB and TB, we identified AMR genes in the genomic assemblies. A significantly higher median count and Shannon-Wiener diversity of AMR genes were detected in vaginal microbes sampled at the first half of pregnancies that ended preterm (3-times higher on average; Mann-Whitney U P = 0.011 and P = 0.0078, respectively; Fig. 4c and 4e, respectively), yet this difference was not detected in the second half (P = 0.16 for both; Fig. 4d and 4f, respectively). Exploring the source of these genes, we found a significantly higher median fraction of phage-borne AMR genes in the microbiomes of women who delivered preterm (P = 0.031; Fig. 4g), suggesting transduction may promote the higher median count and diversity of AMR genes observed in the first half of sPTB pregnancies (Fig. 4c,e). Among the 9 AMR gene categories that had genes present in at least 10% of women, phenicol and aminoglycoside resistance genes showed a significantly higher median fraction in the sPTB microbiome (P = 0.041 and P = 0.032, respectively; Fig. 4h). These results suggest a unique antibiotic resistance profile associated with the first half of sPTB pregnancies, potentially indicative of usage of specific antibiotics. Indeed, we detected a somewhat higher richness of AMR genes along the first half of pregnancy in women who used antibiotics in the past 6 months before pregnancy than those who did not (Mann-Whitney U P = 0.079; Fig. 4i). This is also consistent with our observation that genes with sPTB-associated nucleotide diversity were enriched for drug metabolism in G. swidsinskii (PG044) (Fig. 2d).
To check if the strong association between sPTB and the AMR potential of the vaginal microbiome is contributed by a particular phylogroup, we performed a similar analysis for each phylogroup. We found, however, that none of them showed a significant difference in the median count and diversity of AMR genes between sPTB and TB (Mann-Whitney U P > 0.05 for all). This result suggests that the higher AMR potential associated with sPTB may be a property of the vaginal microbiome as an ecosystem. However, this lack of association could also be driven by underestimation of AMR genes due to the limitation of MAG binning methods in recovering mobile genetic elements47.
Discussion
Microbial genomes can exhibit large variations even within the same species, as a result of adaptation to various environments48. Associations between the vaginal microbiome and preterm birth have been widely reported7,8,12,49. However, there is still much left to explore regarding potential mechanisms underlying host-microbiome interactions in this context. Here, by leveraging publicly available metagenomic data1, we provide a population genetic view of the vaginal microbiome during pregnancy. We identify a number of novel microbial features including population nucleotide diversity, selection metrics, and antibiotic resistance potential that are associated with sPTB. Interestingly, we find that the higher population nucleotide diversity is driven by Gardnerella spp. during the first half of pregnancy. This species appears to undergo more intense changes in the population structure contributed by recombination and purifying selection in pregnancies which ended preterm. We also show evidence that this sPTB-associated genetic pattern of Gardnerella spp. may be related to optimization of growth rates in vaginal conditions linked to sPTB. Our results are indicative of adaptation of the vaginal microbiota to the host, which in turn may influence pregnancy outcomes.
Our findings regarding a relationship between ecological processes in the pregnancy vaginal microbiome and subsequent preterm birth are consistent with previous studies1,7,11,12,14,50,51. We add to these previous studies by exploring an additional layer of microbial variability associated with sPTB - microbial genetic diversity. It is known that genomic variation within species can result in phenotypic diversity and adaptations to different environments48. These adaptations, in turn, can affect host phenotypes such as disease outcomes52. Such associations between microbial genomic variation and host phenotypes have been reported in the gut microbiome42,53,54. Our study suggests that this phenomenon also occurs in the vaginal ecosystem, and suggests that it may be associated with pregnancy outcomes. Nethertheless, the associations between microbial genetic diversity and pregnancy outcomes we detect might also be a consequence of a different process that acts on both variables, and while we find this unlikely, this should be determined by future studies.
Interestingly, we found that the association of genetic diversity and sPTB was largely driven by Gardnerella spp., a group of species commonly associated with BV50,55,56. A number of studies reported a higher abundance of these species in sPTB pregnancies9,50,57,58,1,9. We show that Gardnerella spp. populations with more genetically diverse strains may also be associated with sPTB. In addition, we found that this taxon has the capacity to grow 1.5 times faster in pregnancies that ended preterm, consistent with an overall higher relative transcriptional rate of G. vaginalis which was previously reported1. These more genetically diverse strains appear to have adapted to the vaginal environment associated with sPTB, exhibiting higher fitness. Notably, the higher genetic diversity associated with sPTB in Gardnerella spp. was detected during the first half of the pregnancy (<20 gestational week) rather than the second half. Most potential biomarkers of sPTB (e.g., serum alpha-fetoprotein59 were so far identified using samples from the second trimester of pregnancy (gestational week 14-27). Our results suggest that high resolution analysis of microbiome samples from even earlier stages of pregnancy (<week 20) may yield informative biomarkers of pregnancy outcomes.
As in the human gut microbiome19–21, we show evidence that adaptive evolution also occurs in the vaginal microbiome. Several environmental factors affecting the vaginal ecosystem, such as pH, neutrophil levels, and xenobiotics, have been reported to be associated with sPTB34,60.
These environmental factors may act as selective stressors that lead to different evolutionary patterns in the vaginal microbiome. Indeed, we detected more frequent homologous recombination and stronger purifying selection within Gardnerella spp. during pregnancies that end preterm. Homologous recombination is a critical mechanism speeding adaptation by increasing fixation probability of beneficial mutations61 and reducing clonal interference (i.e., competition between beneficial mutations) in bacteria62. Purifying selection also contributes to adaptation by sweeping away deleterious mutations and conserving functions, such as in oligotrophic nutrient conditions41,63. Notably, we found that sPTB-associated purifying selection is particularly strong on genes involved in lipid transportation and metabolism. This is consistent with previous identification of lipid metabolites (e.g., monoacylglycerols and sphingolipids) as signatures of sPTB34,64,65. Whether this stronger purifying selection targeting lipid transportation and metabolisms in pregnancy that ended preterm leads to changes in the concentrations of lipid metabolites however requires further experimental testing. As both recombination and purifying selection can reduce genetic diversity, sPTB-associated recombination and purifying selection along pregnancy may explain the higher nucleotide diversity of Gardnerella spp. in sPTB in the first half of pregnancy compared to the second half.
Antibiotics are common selective stresses acting on the human microbiome29 and have been associated with preterm birth66. We detected higher count and diversity of AMR genes associated with sPTB, which our analysis suggests to be facilitated by prophages in preterm vaginal microbiomes. While multiple phages (e.g., Siphoviridae, Myoviridae, and Microviridae) have been detected in the vagina of pregnant women, their association with sPTB is rarely studied67. Our results imply a potentially important role of bacteria-phage interactions in pregnancy outcomes via transferring of AMR genes. We also found that genes related to phenicol and aminoglycoside resistance were more abundant in vaginal microbiomes during pregnancies that ended preterm. While both antibiotics have been frequently used to treat gynecologic infection for decades68, and some phenicols (e.g., chloramphenicol) are thought to be safe for use during pregnancy69 aminoglycoside is teratogenic. Previous studies reported that exposure to antibiotics could change the composition of the vaginal microbiome45,70, indicating an ecological effect. In comparison, our results may suggest adaptation of the vaginal microbiome to more frequent antibiotics usage in women who delivered preterm, leading to an enrichment of AMR genes as well as higher nucleotide diversity in Gardnerella spp. genes encoding enzymes for drug metabolism. While this hypothesis requires further study, it is further supported by the fact that a higher proportion of women who delivered preterm (31%) had used antibiotics in the past 6 months before pregnancy than women who delivered at term (23%).
Despite its findings, our study is limited by low sequencing depth (median bacterial read count < 5×105) and inconsistent sampling frequency during pregnancy (1 to 8 samples per pregnancy, with an average of 4). These limitations lead to high sparsity in the features analyzed, preventing a more in-depth temporal and predictive analysis of the link between population genetics of the vaginal microbiome and sPTB. Our results warrant a high-resolution investigation of the vaginal metagenome, with frequent sampling and high sequencing depth.
In summary, through in-depth population genomic analyses, our study identified novel genetic and functional associations between the vaginal microbiome and preterm birth. We revealed evidence of microbial genetic adaptation to the host environment linked to preterm birth and highlighted the importance of microbial evolutionary processes to adverse pregnancy outcomes, particularly in Gardnerella spp.. Future investigation on the pressures driving the sPTB-associated microbial adaptation is warranted to fully understand the molecular mechanisms underlying preterm birth.
Methods
Sample selection and metagenomic data
Metagenomic sequencing data1 generated from 135 vaginal samples collected longitudinally during pregnancy from 40 women with majority of them identifying as Black women who eventually delivered preterm spontaneously (sPTB) and 570 vaginal samples from 135 women who delivered at term (TB) were obtained from dbGaP (study no. 20280; accession ID phs001523.v1.p1). We used the same definitions for preterm birth as in Fettweis et al. 20191: spontaneous preterm birth is defined as live birth between 23 and 37 gestational weeks without medical indication, and term birth is defined as live birth at or after 39 gestational weeks. To check for the presence of some potential confounders for vaginal microbiome-sPTB associations, we calculated propensity scores71 for each subject based on income, age, and race using a logistic regression model. We found that propensity scores for both sPTB and TB subjects exhibited a similar distribution (Kolmogorov–Smirnov test P = 0.21), suggesting the associations we detect are not likely to be confounded with these variables (Supplementary Fig. 8). These results suggest a negligible confounding effect of income, age, and race in this study on microbiome-sPTB associations.
Metagenomic assembly, genomic binning, genome annotation, and relative abundance
Our analysis follows the accepted standards used in refs72–75, using the ATLAS pipeline76. Bases with quality scores <25, raw reads <50 bp lengths, and sequencing adapters were removed using Trimmomatic v.0.3977. Reads mapped to human and PhiX genome sequences were removed by mapping with Bowtie2 v.2.3.5.178. Assembly and binning were done with ATLAS: filtered reads were assembled using metaSPAdes v.3.15.279, and contigs were binned into metagenome-assembled genomes (MAGs) using MetaBAT2 v.2.14.080 with a minimum contig length of 1500. Quality, GC content, genome size, and taxonomy of MAGs were estimated using CheckM v.1.0.924. MAGs were de-replicated using dRep v.3.2.081 with an average nucleotide identity (ANI) of 0.95, minimum completeness of 50%, and maximum genome contamination of 10%. The MAG with the highest dRep score within each 95% ANI cluster, termed here as a phylogroup, was selected as the representative MAGs for that phylogroup. Genes were predicted using Prodigal v.2.6.382 and annotated using EggNOG v.5.083. Filtered reads were mapped to representative MAGs using Bowtie2 v.2.3.5.184. The relative abundance of each representative MAG was calculated by dividing the number of reads that mapped to that MAG, corrected to the genome size and completeness, by the total number of reads in each sample.
Tensor factorization
To characterize and compare the dynamics of the vaginal microbiome in term and preterm pregnancies, we used a revised version of compositional tensor factorization27
Phylogeny
Amino acid (AA) sequences of 120 marker genes were called and aligned for representative MAGs using GTDB-Tk v.1.5.185. MAGs with <60% of AA in the alignment were excluded in the phylogenetic tree construction. The best evolutionary model LG+G+I (the Le Gascuel model + gamma distribution + invariant sites) was identified using prottest3 v.3.4.286 and 500 bootstraps were used for tree construction using RAxML v.8.2.1287. The tree was rooted by midpoint and visualized in iTol v.6.388.
Microdiversity profiling, growth rate estimation, and antimicrobial resistance genes
Population microdiversity metrics including genome-wide nucleotide diversity, gene-wide nucleotide diversity, linkage disequilibrium measures (D’) and dN/dS, were calculated using InStrain v1.0.040 using the 157 representative MAGs as the reference database. Maximum growth rate was estimated for each MAG using gRodon35. Antimicrobial resistance (AMR) genes were detected in assemblies and MAGs using PathoFact v.1.089 with default parameters.
Functional enrichment analysis
To identify COG/KEGG pathways that were enriched in genes showing significant difference in nucleotide diversity between sPTB and TB, the frequency of each COG/KEGG category was first calculated from significant genes (observed frequency). Then, the frequency of each COG/KEGG category was calculated from an identical number of genes randomly selected from all genes (expected frequency). This process was repeated 10,000 times. The null hypothesis was that the observed frequency of COG category is smaller than the expectation. For each COG, probability P of the null hypothesis was calculated using the formula:P=|[xi ϵ X: xi > = k]| / 10000, where […] denotes a multiset, x = (x1, x2, …, xn) is a list of expected values, and k is the observed value.
Statistical analysis
A different number of samples was available for each woman in the database. In our analyses, we therefore used the median along pregnancy (or its first or second half). The false-discovery rate procedure (FDR) of Benjamini and Hochberg (BH)90 was used to correct for multiple testing. Adjusted P < 0.1 was used as the significance cutoff.
Temporal analysis
To generate the trajectories representing the change in nucleotide diversity over time in term and preterm deliveries, we pooled the temporal data of Gardnerella spp. from all women in each group (term and preterm). When we had more than one observation per gestational week, we took the median value across samples. We then binned the temporal data into bins of 3 weeks, except for the first bin which spanned weeks 1-7, and took the median of each bin as a summary. To smooth the observed binned data we applied splines, which is a special function defined piecewise by polynomials for data smoothing. To compare between the temporal trajectories of preterm and term, we performed a permutation test, in which we generated a null distribution of euclidean distances by shuffling the these trajectories 104 times and comparing to the euclidean distance in the original data31.
Data availability
The dataset used is available from dbGaP (phs001523).
Author contributions
J.L. and T.K. designed the study. J.L. and L.S. analyzed the data with input from T.K., M.S., and G.A.B. J.L. wrote the manuscript with input from L.S., T.K., M.S., B.Z. and G.A.B. G.A.B. assisted with data access and acquisition.
Competing interests
G.A.B. is a member of the Scientific Advisory Board of Juno, LTD., a startup biotech firm focused on using the vaginal microbiome to address issues of women’s gynecologic and reproductive health. Juno had no involvement in the current study. Other authors declare no competing interests.
Supplementary figures
Supplementary tables
Supplementary Table 1 Genome assembly features of representative MAGs for phylogroups and taxonomy.
Supplementary Table 2 eggNOG functional annotation of genes.
Acknowledgements
We thank members of the Korem lab for useful discussions. This study was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) of the National Institutes of Health under award number R01HD106017, the Program for Mathematical Genomics at Columbia University, and the CIFAR Azrieli Global Scholarship in the Humans & the Microbiome Program (T.K.). The dataset used was obtained from dbGaP (phs001523), using data provided by Gregory A. Buck, Ph.D. and colleagues and supported by NICHD (U54 HD080784) (G.A.B).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.
- 74.
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵