Genome sequencing of 196 Treponema pallidum strains from six continents reveals additional variability in vaccine candidate genes and dominance of Nichols clade strains in Madagascar

In spite of its immutable susceptibility to penicillin, Treponema pallidum (T. pallidum) subsp. pallidum continues to cause millions of cases of syphilis each year worldwide, resulting in significant morbidity and mortality and underscoring the urgency of developing an effective vaccine to curtail the spread of the infection. Several technical challenges, including absence of an in vitro culture system until very recently, have hampered efforts to catalog the diversity of strains collected worldwide. Here, we provide near-complete genomes from 196 T. pallidum strains – including 191 T. pallidum subsp. pallidum – sequenced directly from patient samples collected from 8 countries and 6 continents. Maximum likelihood phylogeny revealed that samples from most sites were predominantly SS14 clade. However, 99% (84/85) of the samples from Madagascar formed two of the five distinct Nichols subclades. Although recombination was uncommon in the evolution of modern circulating strains, we found multiple putative recombination events between T. pallidum subsp. pallidum and subsp. endemicum, shaping the genomes of several subclades. Temporal analysis dated the most recent common ancestor of Nichols and SS14 clades to 1717 (95% HPD: 1543-1869), in agreement with other recent studies. Rates of SNP accumulation varied significantly among subclades, particularly among different Nichols subclades, and was associated in the Nichols A subclade with a C394F substitution in TP0380, a ERCC3-like DNA repair helicase. Our data highlight the role played by variation in genes encoding putative surface-exposed outer membrane proteins in defining separate lineages, and provide a critical resource for the design of broadly protective syphilis vaccines targeting surface antigens. Author Summary Each year, millions of new cases of venereal and congenital syphilis, caused by the bacterium Treponema pallidum (T. pallidum) subsp. pallidum, are diagnosed worldwide, resulting in significant morbidity and mortality. Alongside endemic circulation of syphilis in low-income countries, disease resurgence in high-income nations has underscored the need for a vaccine. Due to prior technological limitations in culturing and sequencing the organism, the extent of the genetic diversity within modern strains of T. pallidum subsp. pallidum remains poorly understood, hampering development of a broadly protective vaccine. In this study, we obtained 196 near-complete T. pallidum genomes directly from clinical swabs from eight countries across six continents. Of these, 191 were identified as T. pallidum subsp. pallidum, including 90 Nichols clade genomes. Bayesian analysis revealed a high degree of variance in mutation rate among subclades. Interestingly, a Nichols subclade with a particularly high mutation rate harbors a non-synonymous mutation in a putative DNA repair helicase. Coupling sequencing data with protein structure prediction, we identified multiple novel amino acid variants in several proteins previously identified as potential vaccine candidates. Our data help inform current efforts to develop a broadly protective syphilis vaccine.


Country
Year ( analyzed the temporal signal present among TPA strains by regressing the root-to-tip distances in 2 8 0 the SNP-only maximum-likelihood tree ( Figure 3A). The left panel shows this calculation 2 8 1 performed on a tree that included 11 highly passaged laboratory strains (eight in Nichols clade SNPs during routine passage of the laboratory strains for decades between collection and 2 8 6 sequencing. Therefore, laboratory strains were excluded from further dating analysis.

8 7
We were curious as to why the Pearson correlation coefficients of the SS14 and Nichols hypothesized that this may be due to differences inherent to the polyphyletic structure of both 2 9 0 clades. We tested this by plotting the residuals of the regression by subclade and found 2 9 1 significant differences between groups ( Figure 3B, p < 2e -16 , ANOVA), suggesting that rates of 2 9 2 SNP accumulation may differ across the TPA phylogeny. Therefore, we proceeded to Bayesian ancestral reconstruction and dating of clinical specimens by BEAST 2 (38), using an uncorrelated relaxed clock with a starting rate of 3.6x10 -4 2 9 5 (24,39) as a prior model to account for differences in rates of mutation in different branches of 2 9 6 the tree. Figure 3C shows the dated Bayesian phylogeny, with branches colored to reflect the rate Nichols clade to 1893 (1839-1940), and the SS14 clade to 1921 (1868-1964), and found that the  Host immune pressure drives mutation in the same putative antigens in SS14 and Nichols Observed differences in accumulation of SNPs among subclades may represent the 3 0 8 effects of sampling bias or bottlenecks or may reflect differences in the underlying biology. To 3 0 9 examine the functional differences that define each subclade (including loci identified as  putative ORFs were altered between SS14 and Nichols ancestral nodes, with a total of 134 non-3 1 8 synonymous mutation events. We defined a mutation event as a single amino acid change, 3 1 9 insertion/deletion, or frameshift. We did not separately include the effects of putative 3 2 0 recombination events because we did not attempt to formally characterize recombination donors, 3 2 1 and therefore could not disentangle the effects of recombination from selective pressure driving 3 2 2 increased mutation.

2 3
We next attempted to define functional changes between the SS14 and Nichols clades by examining overrepresentation of altered loci in categories annotated by structural similarity (41).

2 5
We used the annotation of the Protein Data Bank (PDB) structure of the highest scoring model, with a confidence cutoff of 75%, allowing 798 coding sequences (CDSs) to be assigned to a total 3 2 7 of 62 unique PDB categories. We then performed Fisher's exact tests to test for category altered. However, because these annotations are by structural similarity rather than 3 3 2 known function, it is likely that testing for overrepresentation of structural annotations does not 3 3 3 fully capture the functional differences between any two clades.

4
Because functional annotation of T. pallidum proteins is still hampered by the absence of 3 3 5 a reverse genetics system, we chose next to focus on alteration of proteins known or suspected to interact with the host immune system. We included proteins that reacted with pooled sera from 3 3 7 individuals with known syphilis infection (42,43) or otherwise known to be surface-exposed 3 3 8 (Supporting Information 3 -Antigens) and again performed overrepresentation tests ( Figure 4B).

9
Along branches with more than 10 altered proteins, only two nodes (N015, Nichols C, and N005, antigenic proteins represents more than 30% of the amino acid variability in more than half of that become mutated relative to their parent node in multiple subclades, representing separate 3 4 5 events ( Figure 4D). Furthermore, among antigens that were mutated relative to the parent node in 3 4 6 more than one subclade, none was exclusive to either the SS14 or Nichols clade. These data suggest that interaction with the host immune system drives a large proportion of the evolution  However, although antigens are enriched for non-synonymous mutations relative to the T. pallidum pathogenicity and immune interaction. When examining proteins whose mutation 3 5 2 was unique to a single clade ( Figure 4D), we found a C394F mutation in the ERCC3-like DNA rate of SNP accumulation than any other subclade ( Figure 3C). It is plausible that mutation of 3 5 5 this helicase compromises DNA repair and contributes to a more rapid rate of evolution within 3 5 6 this clade. and TolC (TP0966, TP0967) (45) and reviewed in (46). TP0136 is a lipoprotein that appears to vaccine candidates in rabbits, with TP0136 delaying ulceration but not providing full protection  Accordingly, for the five most frequently mutated putative outer membrane antigens, we 3 7 5 developed models that highlight the positions predicted to undergo the most structural change bonds. Furthermore, it allows "tuning" of the structural effect of a mutation on each atom, with 1 9 terminal β-barrel. Consistent with previous studies (34,45,57,59), we found that extracellular polymorphic residues, rendering the entire exposed surface of the protein variable due to strain-  In contrast to TP0326, the structure and function of which has been studied extensively, less is known about TP0548, a predicted homolog of the E. coli fatty acid transporter FadL. We 4 0 7 predict the structure to be a 14-stranded β-barrel, with periplasmic C-terminal α-helices, conformation of the flexible loop rather than true structural variation. (Supplementary Figures 4D and 5D, arrows). Rather, in TP0966, the polymorphic charged Finally, we generated a structural model of TP0136, and found it to adopt a 7-bladed repeats are a unique structural feature of TP0136; the beta-propeller fold of TP0136 allows these 4 3 5 intrinsically disordered regions to form unstructured loops between beta strands. Unsurprisingly,  and gaining actionable insights into T. pallidum evolution that inform vaccine design.

5 6
With these goals, we generated 196 near-complete T. pallidum genomes from diverse The remaining two Italian strains, collected in Turin and Bologna, were of two distinct 4 6 3 Nichols subclades, one of which clustered with three Japanese syphilis strains in Nichols circulating in the regions most hard hit by the modern pandemic.

9 5
Our temporal analysis generally agreed with previous estimates of mutation rate (39,63) 4 9 6 in spite of the fact that we used a relaxed, rather than fixed, clock model to determine whether 4 9 7 there were differences in the rate of mutation along different branches of the T. pallidum subsp. biological differences contributing to the phenotype. Indeed, we found significant differences in 5 0 0 the rates of mutation among the subclades ( Figure 3C). The Nichols A subclade was particularly 5 0 1 interesting to us, given its high median rate of mutation along branches within the subclade with By definition, an effective syphilis vaccine needs to protect against most strains 5 1 4 circulating where the vaccine is administered. Our work further supports that the majority of 5 1 5 non-synonymous mutations that define T. pallidum subsp. pallidum subclades are in proteins 5 1 6 putatively located in the outer membrane, or known to react with serum from syphilis patients 5 1 7 ( Figure 4) (42,43). These data, along with recent structural modeling of T. pallidum outer ( Supplementary Figures 2-6). Given the paucity of T. pallidum outer membrane proteins, and the 5 2 4 extensive mutation of predicted epitopes, a multivalent vaccination strategy may engender a 5 2 5 polyclonal humoral response capable of neutralizing a wider array of strains, a strategy currently 5 2 6 being adopted in our laboratory.

2 7
Finally, an important caveat to these data is that, due to their extensive recombination and 5 2 8 duplication, we excluded arguably the most important T. pallidum proteins that interact with the 5 2 9 host immune system, the Tpr family (14,28). Although this approach has been used before to well as to prevent mistakes due to improper resolution of their repetitive elements during de 5 3 2 novo assembly (39), an understanding of how the tpr genes evolve and influence host immunity is critical to developing an efficacious vaccine to T. pallidum. Accordingly, we are currently The data presented in this study represent a step forward toward developing a successful Spirochete. mBio. 2018 Jun 12;9(3). Jan;2(1):16245. proteome-wide structural modeling of Treponema pallidum subspecies pallidum, the issue):W244-248.  Biotechnol. 2019 Apr;37(4):420-3. 3;45(W1):W24-9.