Abstract
Saccharibacteria (formerly TM7) have reduced genomes, a small size, and appear to have a parasitic lifestyle dependent on a bacterial host. Although there are at least 6 major clades of Saccharibacteria inhabiting the human oral cavity, cultured isolates or complete genomes of oral Saccharibacteria have been previously limited to the G1 clade. In this study, nanopore sequencing was used to obtain three complete genome sequences from clade G6. Phylogenetic analysis suggested the presence of at least 3-5 distinct species within G6, with two discrete taxa represented by the 3 complete genomes. G6 Saccharibacteria were highly divergent from the more well-studied clade G1, and had the smallest genomes and lowest GC-content of all Saccharibacteria. Pangenome analysis showed that although 97% of shared pan-Saccharibacteria core genes and 89% of G1-specific Core Genes had putative functions, only 50% of the 244 G6-specific Core Genes had putative functions, highlighting the novelty of this group. Compared to G1, G6 encoded divergent metabolic pathways. G6 genomes lacked an F1F0 ATPase, the pentose phosphate pathway, and several genes involved in nucleotide metabolism, which were all core genes for G1. G6 genomes were also unique compared to G1 in that they encoded lactate dehydrogenase, adenylate cyclase, limited glycerolipid metabolism, a homolog to a lipoarabinomannan biosynthesis enzyme, and the means to degrade starch. These differences at key metabolic steps suggest a distinct lifestyle and ecological niche for clade G6, possibly with alternative hosts and/or host-dependencies, which would have significant ecological, evolutionary, and likely pathogenic, implications.
Importance Saccharibacteria are ultrasmall, parasitic bacteria that are common members of the oral microbiota and have been increasingly linked to disease and inflammation. However, the lifestyle and impact on human health of Saccharibacteria remains poorly understood, especially for the 5 clades (G2-G6) with no complete genomes or cultured isolates. Obtaining complete genomes is of particular importance for Saccharibacteria, because they lack many of the “essential” core genes used for determining draft genome completeness and few references exist outside of clade G1. In this study, complete genomes of 3 G6 strains, representing two candidate species, were obtained and analyzed. The G6 genomes were highly divergent from G1, and enigmatic, with 50% of the G6 core genes having no putative functions. The significant difference in encoded functional pathways is suggestive of a distinct lifestyle and ecological niche, probably with alternative hosts and/or host-dependencies, which would have major implications in ecology, evolution, and pathogenesis.
Observation
Saccharibacteria (formerly TM7) have an ultrasmall cell size, reduced genomes, and are thought to be obligate epibionts, dependent on physically-associated host species (1-3). Common constituents of the oral microbiota, Saccharibacteria have been increasingly linked to inflammation and disease (4-6). Saccharibacteria contains at least 6 distinct clades (G1-G6)(7, 8), however all currently available human-associated complete genomes and cultured isolates belong to clade G1, leaving clades G2-G6 quite poorly understood. Several recent publications have provided the first draft genomes from clades G3, G5, and G6 (4, 8-11). Obtaining complete genomes is of particular importance for Saccharibacteria, because they lack many of the “essential” single-copy core genes that are typically used to estimate genome completion, as well as complete reference genomes outside of the G1 clade.
A recent, short-read-based oral microbiome study provided 21 Saccharibacteria draft genomes from clades G1, G3, and G6 (4), with several being high quality (high N50, relatively contiguous, low predicted contamination). Therefore, nanopore sequencing of the same saliva samples that had produced the draft genomes, followed by long-read and/or hybrid assembly, was used to improve these genomes, resulting in 3 complete, circular G6 genomes: JB001 (662,051 bp), JB002 (639,751 bp), and JB003 (663,165 bp). Table 1 is a summary of the genomes improved during this study and the Supplemental Methods contain a full description of the DNA extraction, sequencing, assembly, and analysis methods. These methods are a modified version of a previously reported protocol (Baker 2021, in-press). Although the G1 and G3 “near complete” improved genomes that were obtained are useful in their own right, they are still incomplete, and/or may contain contamination, therefore the 3 complete G6 genomes are the focus of this report, and the near complete genomes are briefly discussed in the Supplemental Methods.
Phylogenetic analysis using concatenated protein sequences was performed using Anvi’o (12), and included the 8 improved/completed genomes from this study, all 26 complete Saccharibacteria genomes available on NCBI (as of 1 April 2021), and 90 Saccharibacteria draft genomes from 5 recent studies (Table S1). JB001, JB002, and JB003 were indeed members of Saccharibacteria clade G6 (Figure 1A, Figure S1), and represent the only human-associated, complete Saccharibacteria genomes outside of clade G1. Notably, G6 had the smallest genomes and the lowest GC-content of all Saccharibacteria (Figure 1A). Percent average nucleotide identity (ANI) between the G6 genomes was calculated using Anvi’o and suggested that there are at least 3-5 distinct species within the clade (Figure 1B; a cutoff of 95% ANI is frequently used to estimate the species level (13, 14)). JB001, JB003, JCVI_1_bin.12, and G6_32_bin_33_unicycler appear to be the same species, with an ANI of ≥95%, despite their source from different human subjects and independent genome assembly (Figure 1B). JB002 and T-C-M-Bin-00022 were over 98% ANI, likely representing the same distinct species, while CMJM-G6-HOT-870 and T-C-M-Bin-00011 were ∼98% ANI and formed what is likely an additional G6 species (Figure 1B). CLC Genomics Workbench was used to perform whole genome alignment for JB001, JB002, JB003, and the G1 reference strain, TM7x (Figure 1C). While JB001 and JB003 were completely syntenic, and there were moderate differences between JB001/JB003 and JB002, TM7x and the G6 Saccharibacteria have undergone many genomic re-arrangements and instances of gene gain/loss since their last common ancestor (Figure 1C).
To examine functional and metabolic differences between the G6 clade and the more well-understood G1 clade, pangenome analysis was performed using Anvi’o (15) on the 3 complete G6 genomes and 4 diverse G1 complete genomes (Figure 2, Table S3). This identified 223 “pan-Saccharibacteria Core Genes” appearing in all genomes, as well as all 94 “G1 Core Genes”, and 244 “G6 Core Genes” (Figure 2A). While 97% of the pan-Saccharibacteria Core Genes and 89% of the G1 Core Genes had known COG functions and pathways, only 50% of the G6 Core Genes had known COG functions and pathways (Figure 2A), highlighting the enigmatic nature of this clade. The likely reason for the lower number of G1 core genes is the larger amount of known diversity within the G1 clade and the genomes analyzed here (8, 9), leading to less conservation across the G1 pangenome. A larger pangenome analysis, examining all 11 G6 genomes and 14 diverse G1 genomes is available in Figure S2 and Table S4. This generated similar results, but note that this analysis contains incomplete draft genomes which are incomplete and/or may contain contamination. A complete metabolic network illustrating the known KEGG pathways identified in the three sets of core genes identified in Figure 2A is shown in Figure 2B. Both G1 and G6 genomes encode partial cell wall metabolism, glycolysis (missing phosphofructokinase), and arginine biosynthesis pathways, and do not encode fatty acid metabolism, a TCA cycle, or amino acid metabolism (other than arginine) (Figure 2B). Notable pathways present in G6 genomes but absent in G1 include: maltase glucoamylase (to metabolize starch), fructose bisphosphate aldolase (a glycolytic step), adenylate cyclase, lactate dehydrogenase, partial lipoarabinomannan (LAM) biosynthesis, and partial glycerolipid metabolism. Conversely, G1 genomes encode the non-oxidative phase of the pentose phosphate pathway, an F1F0 ATPase, alpha galactosidase, and several steps in nucleotide metabolism, which were not present in the G6 genomes (Figure 2B). Between JB001 and JB002, most differences were genes with unknown functions, therefore the differences in the KEGG pathways encoded were minor (Figure S3). The G6 genomes examined did not contain predicted elements of a CRISPR system. Although it is not known how Saccharibacteria obtain needed metabolites from the host, a type IV pilus-like system is generally well-conserved across the group, has been proposed as a candidate mechanism (8, 9), and was present in the G6 genomes here. The species-level clade that included JB001 and JB003 encoded a ∼10,000bp putative prophage element, which was flanked by homologs to the PinE invertase and contained a T4SS VirD4 homolog and 4 hypothetical proteins, all with ∼95% homology to a similar region in Streptococcus salivarius.
Taken together, these analyses indicate that Saccharibacteria clade G6 is highly divergent from clade G1, and may have a different lifestyle, host, and host-dependencies. This is in line with the recent hypothesis that G6 reside on the tongue (G6 are referred to as ‘T2’ in reference 9) and have a long history of association with animal hosts, while G1 reside in dental plaque and were a much more recent acquisition from the environment (8, 9). Interestingly, the species-level clade containing JB002 (the most reduced Saccharibacteria genome, with only 615 genes) was the only Saccharibacteria group that resided both on the tongue and in dental plaque (9). Although all cultured isolates of Saccharibacteria were epibionts of Actinomyces spp., they were all G1 strains. Residing in a different environment, G6 may have distinct host species, possibly Streptococcus, given the acquired homologous sequence. It is likely that G6 fallen into the ‘unknown’ taxonomic bucket in the majority of past microbiome studies, thus the role of G6 in human health remains to be elucidated. The high percentage of genes with unknown functions further adds to the obscurity of this clade. Overall, this article highlights an urgent need for study of Saccharibacteria, since almost nothing is known about the lifestyle, host, or ecological impact of Saccharibacteria clade G6, and even less still is understood about clades G2, G3, G4, and G5.
Data availability
The complete genome sequences of JB001, JB002, and JB003 have been deposited in GenBank under the accession numbers: CP072208, CP076101, and CP076102. The BioProject accession for this project is PRJNA624185. The short reads used to generate the assemblies are available in the SRA database with the accession numbers SRX4318838, SRX4318837, and SRX4318835. The long reads used to generate the assemblies are available in the SRA dataset with the accession numbers SRX10387815, SRX11020560 and SRX11020561.
Acknowledgements
I thank Karrie Goglin-Almeida, Jelena Jablanovic, and Kara Riggsbee for performing the library preparation and sequencing, and Jeffrey S. McLean for helpful discussions. This research was supported by NIH/NIDCR K99-DE029228.
Footnotes
Updated NCBI Accession numbers have been added, tables have been added.