Ancient bacterial genomes reveal a formerly unknown diversity of Treponema pallidum strains in early modern Europe

Sexually transmitted (venereal) syphilis marked European history with a devastating epidemic at the end of the 15th century, and is currently re-emerging globally. Together with non-venereal treponemal diseases, like bejel and yaws, found in subtropical and tropical regions, it poses a prevailing health threat worldwide. The origins and spread of treponemal diseases remain unresolved, including syphilis’ potential introduction into Europe from the Americas. Here, we present the first genetic data from archaeological human remains reflecting a previously unknown diversity of Treponema pallidum in historical Europe. Our study demonstrates that a variety of strains related to both venereal syphilis and yaws were already present in Northern Europe in the early modern period. We also discovered a previously unknown T. pallidum lineage recovered as a sister group to yaws and bejel. These findings imply a more complex pattern of geographical prevalence and etiology of early treponemal epidemics than previously understood.


Introduction
Treponemal infections, namely yaws, bejel (endemic syphilis), and most notoriously, the sexually transmitted syphilis, represent a reoccurring, global threat to human health. Venereal syphilis, caused by Treponema pallidum ssp. pallidum (TPA) infects the worldwide human population with millions of new cases every year 1 ,2 . The two endemic treponemal subspecies closely related to T. pallidum ssp. pallidum are T. pallidum ssp. pertenue, (TPE) and T. pallidum ssp. endemicum (TEN). TPE is common in the tropical regions of the world, where it causes yaws and a form of treponematosis in non-human primates. TEN is the causative agent of bejel, which is mostly found in hot and arid environments. Both of these treponematoses are usually milder in manifestations than syphilis, and have a lower incidence on the population level, yet their transmission rates have prevailed over the recent years 3,4 . Resistance against second-line antibiotics has recently developed in T. pallidum ssp. pallidum 5 whereas penicillin treatment still remains effective 1,6 . Sexually transmitted syphilis progresses slowly, while afflicting considerable damage to bone, internal organs and the nervous system 7,8 . The endemic types of T. pallidum (TPE, TEN) are transmitted and mainly manifested through skin lesions, but can also affect bones and joints in a comparable way to the venereal form 4,6 . TPA frequently transmits congenitally, resulting in various disorders for both mother and child during pregnancy, birth and infancy 7,8 . For TPE and TEN infections today, this form of transmission is atypical, if not entirely unprecedented 9,10 . Although the three T. pallidum subspecies can be separated by genetic distinctions 11,12 , their clinical symptoms in skeletal material are difficult to distinguish 6 .

4
The re-emergence of syphilis is a reminder of the formidable threat it may represent with its continuous adaptations 5,13 . Devastating outbreaks of syphilis have been documented in historical times. Early medical reports from the late 15 th century portray the most well-known epidemic, a rapid and Europe-wide spread of venereal syphilis in the wake of the 1495 Italian war 14,15 . These statements also describe it having gradually changed into a milder, more chronic disease in the subsequent decades, similar in manifestation to the cases of modern day syphilis 16,17 . These events coincide with historical expeditions, and have ignited a long-persisting hypothesis suggesting that syphilis was introduced to Europe by Columbus and his crew upon their return from the New World in 1493 18,19 . The alternative multiregional hypothesis contradicts this assumption, and presumes a pre-Columbian prevalence of syphilis on the European continent, potentially as a result of prehistoric spread of the disease through African and Asian routes [20][21][22] . Skeletal evidence from human remains carrying pathological marks characteristic for treponematoses and dated prior to Columbus' return from the New World, have been reported 23,24 . However, to date no genetic evidence exists that could confirm the existence of pre-Columbian syphilis in Europe 18,25 . Potential syphilis infections are mentioned in the literature since Medieval times, but these diagnoses may have been confounded by symptomatic similarities with other diseases, and delusively called 'venereal' or 'hereditary' leprosy 17,25 . Misdiagnoses occurred until recently, due to the challenges in recognizing T. pallidum infections from other diseases and its subspecies from each other 26 . In the presentday, the treponematoses are largely segregated in medical diagnostics either with traditional serological tests or with the more recently developed multi-locus sequence typing (MLST) schemes [27][28][29][30] . Before the modern genetic classifications were introduced, supporters of the 'unitarian hypothesis' claimed that all treponematoses were in fact one and the same disease 31,32 . Although this theory was justified in questioning the geographical distribution and clinical symptoms as the main means of categorization, the understanding of phylogenetic cladality between the treponemal subspecies has since disputed its more general principle. Genomic studies have found that the TPA and TPE strains, although clearly separated phylogenetically, remain extremely similar, and their geographical origin and time of emergence have proved complicated to confirm 11,15,33,34 . A recent genetic study on modern lineages of treponematoses supported a common ancestor of all current TPA strains in the 1700's 34 , whereas the more general diversification of T. pallidum into subspecies has been addressed in previous studies and assumed to have happened in prehistoric times 19,35 . Since modern genomes reflect the evolutionary situation of their isolation time, the mutation rate estimates drawn from them can be biased by natural selection yet to happen 36 . Past lineages may also reveal lost variation unrepresented by the currently known pathogen strains available for research 37 . For these reasons, reconstructed ancient bacterial genomes have an unprecedented potential to illuminate their species' unresolved divergence times and origin. Several historically significant pathogens have been successfully reassembled for investigation and their reconstructed genomes have greatly contributed to our understanding of the evolution and spread of reemerging infectious diseases [38][39][40] . Ancient DNA (aDNA) studies concerning treponematoses have so far remained scarce for both biological and methodological reasons. The treponemal spirochetes survive poorly outside their host organism and are present in extremely low quantities during late stage infections, often evading detection even in living patients 41 . The final, tertiary stage produces the most notable alterations to the skeleton in response to the human immune system, making these treponemal cases most frequently recognized, but less likely to yield genetic evidence, due to the clearance or latency of the pathogen 42,43 . Most notably, the bones that likely contain a large amount of treponemal agents belong to congenitally infected neonates. These fragile remains rarely survive and are, even when present, often overlooked in the archaeological record 44,45 . Previously, it would not have been feasible to use samples with low bacterial loads to detect pathogen DNA; however, recent advances in target-enrichment, high-throughput sequencing and sensitive screening methods for aDNA have aided overcoming this issue 46 . Currently, the technical advancements, together with careful selection of samples affected by treponemal pathogenesis, are enabling genomic studies on this elusive pathogen for the first time. The means to recover T. pallidum from historic human remains were recently established in a study on perinatal and infant individuals from colonial Mexico, in which two ancient genomes of T. pallidum ssp. pallidum and one ancient T. pallidum ssp. pertenue genome were described 45 . However, attempts to retrieve T. pallidum DNA from historical adult individuals have so far been unsuccessful.
Here, we analyze ancient bacterial genomes from four novel historical T. pallidum strains retrieved with target enrichment from pathological human remains, including adult and subadult individuals, originating in central and northern Europe. The newly reconstructed ancient genomes represent a variety of T. pallidum subspecies including a formerly unknown form of treponematosis phylogenetically basal to both bejel and yaws lineages. For the first time, treponemal genomes dated temporally close the New World contact have been retrieved from European samples, including closely related strains to the endemic types today mostly restricted to the tropics and subtropics.

Geographical origins and osteological analyses of samples
For this study, remains from nine individuals were included: five from the Crypt of the Holy Spirit in Turku, Finland, one from the Dome churchyard in Porvoo, Finland, one each from St. Jacob's cemetery and from St. George's cemetery in Tartu, Estonia, and finally, one from Gertrude's Infirmary in Kampen, the Netherlands (Supplementary Table 1 All four positive samples were radiocarbon dated (in Klaus-Tschira-Archäometrie-Zentrum am Curt-Engelhorn-Zentrum, Mannheim, Germany, in the Laboratory of Chronology, Finnish Natural History Museum, Helsinki, and in the AMS laboratory, ETH Zürich). For PD28, the calibrated dates range from 1666 to 1950 CE. The church cemetery, however, was replaced in 1789 CE, indicating the last possible burial time for the individual 49 . CHS119 and SJ219 both show AMS results dating the samples starting from the 15 th century CE. For CHS119 the upper limit of dating ranges to the 17th century, whereas for SJ219, two independent laboratory estimates confirmed an upper limit within the 15 th century CE. The disarticulate bone KM14-7 is C 14 dated to a range from the late 15 th to early 17 th century CE 50 . Marine and freshwater reservoir effects can cause an offset in C 14 ages between contemporaneous remains of humans or animals mainly relying on terrestrial food sources, and those principally using food sources from aquatic environments 51 . Calibration corrections for the affected radiocarbon results are feasible, but require a local baseline of isotopic signature which is currently unavailable for many regions in the world. Alternatively, other available carbon-based materials from the grave can be used to confirm the dating of an individual in uncertain cases. The putatively pre-Columbian samples CHS119 and SJ219 went through additional attestment of the dating procedures: A fragment of the wooden coffin of the individual SJ219 was used for additional dating analysis and the reservoir effect corrections were produced for the sample CHS119 52,53 .
The resulting estimates, however, gave both individuals a date range upper limit reaching 17 th century CE. For more details on radiocarbon dating and reservoir effect correction, see Supplementary Note 2.
Genome reconstruction and authenticity estimation of ancient DNA A screening procedure using MALT 54 and Megan extension for visualizing 55 Table 3 from an earlier study 45 and three of our historical European genomes, namely PD28, CHS119 and SJ219 (Table 2). Sample KM14-7 was excluded from the recombination analysis due to its sporadic placement in the Maximum-Likelihood (ML) tree topologies, which were derived for entire genomes and for each gene individually. Congruence between the complete genome alignments and gene trees was tested after evaluating the corresponding phylogenetic signal for each gene. For 40 loci, the phylogenetic signal and incongruence was significant. For those cases, we further verified the presence of at least three consecutive SNPs supporting a recombination event. Twelve loci passed this test and also correspond to those found as recombinant loci in a more extensive study of modern T. pallidum genomes (n=75) 76 . Two of the recombining genes identified in Arora et al. 22 were also confirmed in association with the ancient European genomes. Of our ancient genomes, PD28 was possibly involved in one recombinant event of the TP0136 gene as a putative recipient, along with the Nichols clade, with the TPE/TEN clade, CHS119 and colonial Mexican 133 genomes as putative donors. The same possibility was observed in the recombination event detected in the TP0179 gene, although only with TPE/TEN clade and 133 as presumptive donors. One putative recombination event concerning the TP0865 gene was identified between the TPE/TEN clade, including the CHS119 and 133 genomes and the SEA86, NE20 and SEA81-4 lineages. Finally, there is another recombination event concerning the TP0558 gene, with the TPE/TEN clade and CHS119 genome as potential donors and the SS14 clade, MexicoA, 94A and 914B from colonial Mexico, and PD28 genomes as recipients. Other assumed events involving recombination events between the modern strains and the previously published ancient genomes from the New World are listed in Supplementary   Table 5 Median estimates and 95% HPD intervals are given in Supplementary Tables 6.B and 6.C. TMRCA (time for the most recent common ancestor) calculated for the whole T. pallidum family is placed far in the prehistoric era, at least 2500 BCE. However, time-dependency of molecular rates (TDMR) may lead to underestimating deep divergence times when mutation rates are inferred from genomes collected within a relatively restricted time period 77,78 . Applying a model accounting for TMDR may be possible, but would require an inclusion of genomes sampled over wide and distinct time-periods 79,80 . The latest common ancestor of the venereal syphilis strains was placed between the 10 th and 15 th century CE. The divergence of TPE and TEN (yaws and bejel) was dated between the 9 th century BCE and the 10 th century CE, while the most recent common ancestor of TPE was placed between the 14 th and 16 th century CE. Among the TPA strains the TMRCA of the Nichols clade (13 th to 17 th century CE) was clearly older than that of the SS14 clade (18 th to 20 th century CE). Due to the inclusion of 4 historical genomes, the above divergence times are substantially older than the times reported in Arora et al (2016) 34 .
Similarly, the estimated mean molecular clock rate (median estimate 1.037 × 10 −7 s/s/y, 95% HPD 6.856 × 10 −8 -1.447 × 10 −7 s/s/y) is slower than the clock rates reported in recent studies 5,34 for either T. pallidum as a whole or for TPA strains exclusively. Nonetheless, the 95% HPD of the mean clock rate overlaps with the estimates previously reported 5 . For more information, see Supplementary Figure 5).
Molecular clock dating allows us to refine the sampling date estimates of three of the four historical genomes (Figure 4b). The posterior distributions of the sampling dates of PD28 and CHS119 place most of the weight on more recent dates, while that of 133 favours an older sampling date. This is especially pronounced for CHS119 and 133 with the 95% HPD interval not including any dates older than 1526 CE for CHS119 or younger than 1773 CE for 133. On the other hand, for SJ219, the 95% HPD of the sampling date spans nearly the entire range defined by radiocarbon dating, making it impossible to exclude a pre-Columbian sampling date (posterior probability 0.26) (Supplementary Table 6. C).

Early emergence of syphilis in Europe
In this study, four ancient Treponema genomes were retrieved from human skeletal remains dating to early modern Europe, providing unprecedented insights into the first reported epidemics of syphilis at the end of the 15 th century. Two of our ancient genomes, PD28 and SJ219, were identified as TPA strains, the causative agent of syphilis, representing the first molecularly identified specimen of T. p. pallidum from historical Europe. These genomes fall within the modern variety of the TPA strains. They form a sister clade to the modern TPA branch relatively basal to all lineages, although their precise position regarding the two major clades, Nichols and SS14, could not be recovered with high confidence. The TPA-carrying sample PD28 was placed in the early modern period by combined analyses of archaeological context and C 14 dating. Two independent radiocarbon analyses were performed on the sample SJ219

Yaws in Europe
Of our four historical genomes, two fall outside the variation of TPA. One of them, the genome CHS119 from Finland, clusters with the present TPE group (causative agent of yaws). Although the direct radiocarbon dating places the sample in the 15 th -16 th century, a full confidence of the exact date cannot be gained, due to the potential marine reservoir effect 52,53 . This sample provides the first evidence of yaws infections in historical Northern Europe, far from the tropical environment in which present-day yaws is typically found. Strikingly, the contemporaneous genome KM14-7 from the Netherlands falls basal to both the bejel-causing lineage and all known strains causing yaws, unveiling a previously unidentified lineage of T.

pallidum.
Due to the endogenous DNA coverage retrieved for the KM14-7 sample, the inclusion of this genome in the recombination and time-calibrated phylogenetic analysis was limited. Despite this, the ML tree topology with the KM14-7 in the basal position to TPE and TEN clades could be further confirmed with a closer nucleotide level inspection. The lineage shows genetic similarities to both currently existing syphilis (13 unique SNPs) and yaws (25 unique SNPs), but it represented a distinct form from both, and had apparently diverged from their common root before the cluster consisting of yaws and bejel today, that we dated to at least 1000 years BP.
Altogether, these different ancient treponemal genomes from northern and central Europe point to an early existing variety of T. pallidum in the Old World. Their existence does not refute the potential introduction of new strains of treponemes from the New World in the wake of the European expeditions, but provides credibility to an endemic origin of the 15 th century epidemics.
Whereas recombination events between the three modern day treponemal subspecies are deemed rare 76,86 , such events were observed across the subspecies in our study. These recombination events presumably happened in the past, before the geographical niches were acquired by the TPA, TPE, and TEN agents 34,45,76 . The historical cases of syphilis and yaws in an overlapping area provides a plausible opportunity for recombination. The potential recombination events observed in this study involved both the lineages present in the modern day variation and the ancient genomes PD28 and CHS119 from Europe and 94A and 914B genomes from Mexico 45 .
Overall, our observations point towards recombination events happening in the direction of syphilis-causing clades from the yaws and bejel -causing ancestors. These recombinations between the clades further support a geographically close common history of the TPA and TPE lineages, which cannot be concluded from the geographical distribution of modern day lineages. belonged to carried genetic similarities to both currently existing syphilis and yaws, yet it appears to be a distinct form from both. The pathogenesis of this agent may have resembled the endemic types of treponematoses, since the majority of its recovered SNPs are shared with the yaws and bejel lineages. It has been suggested that yaws or its ancestors represent the original form of treponematosis that appeared and spread around the world thousands of years ago, was re-introduced to the Iberian Peninsula via the Central and Western African slave trade, some 50 years before Columbus' travels, and eventually gave rise to the venereal syphilis 35,87 . It is indeed possible that the more severe venereal form in the Old World developed from a mild endemic type of disease, enhanced by genetic recombination events or in response to a competition between the various existing pathogens 31,84 . Likewise, recombination events may have occurred between the endemic European strains and novel lineages introduced at the wake of the New World contact, precipitating the epidemic events at that time. While cladality between the different subspecies clearly exists in both the past and modern day, it now seems likely that recombination has interconnected these clades in the past, and that the genetic differences do not necessarily define the treponemal pathogenesis observed in the archaeological remains.
Since diagnostic signs in skeletal remains are hard to distinguish between yaws and venereal syphilis, and so far undiscovered early treponematoses may have existed simultaneously, only further genetic studies on samples originating from all continents can properly address hypotheses about the direction of spread and order of the epidemic events. Presumably many past treponemal lineages remain unknown today and, once revealed, will prove pivotal in uncovering the relationship between treponemal strains and in dating their emergence.

Outlook and implications on sampling strategies
Retrieving treponeme's DNA from skeletal material is highly challenging, and the feasibility of the effort has been strictly questioned before the recently published colonial Mexican genomes 42,43,45 . Here, four out of the nine included individuals yielded a sufficient amount of treponemal aDNA for in-depth genomic analyses (Supplementary Table 1. a). While the previously published Mexican genomes were obtained from neonates and infants only 45 , we were able to recover T. pallidum DNA also from subadult (KM14-7, SJ219) and adult (CHS119) individuals, including one with only a tentative diagnosis of the disease (SJ219). Using bone tissue directly involved with ongoing inflammation or possessing an ample blood flow, such as an active lesion (KM14-7) and dental pulp (CHS119), probably facilitated the successful sampling, although in the case of the neonate (PD28), even a petrous bone proved highly successful, yielding an entire genome of historical TPA strain up to 136-fold. This first retrieval of pathogen DNA from a petrous bone was likely due to a systemic condition related to an early congenital infection with an extremely high bacterial load 88 .
Notably, one of the colonial Mexican infants from a previous study 45 likely suffered from yaws infection, as well as one of the Finnish individuals (CHS119) in this study. These cases lend credibility to the notion that the different treponemal agents cause essentially similar skeletal alterations, and are highly adaptable to environmental circumstances 33,89 . We therefore propose that the geographical separation criteria between the treponemal diseases should be used with caution, especially when it comes to earlier forms of treponematoses and their diagnostic manifestations in the archaeological record. Overall, the reconstruction of novel treponemal genomes from these various ancient sources further proves the feasibility of retrieving treponemal aDNA from skeletal material and raises the hope of achieving progress with the prevailing cases of advanced and latent infections. Improving methodologies targeted for samples with low bacterial load and genomic coverage may soon aid in recovering positive aDNA results from putative cases of treponematoses from early-to pre-historic contexts, thereby illuminating the most persistent quandaries of the field, such as the ultimate origin of venereal syphilis.

Sample processing
Documentation and UV-irradiation of the bone material for decontamination, as well as laboratory procedures for sampling, DNA extraction, library preparation, and library indexing were all conducted in facilities dedicated to ancient DNA work at the University of Tübingen, with necessary precautions taken including protective clothing and minimum contaminationrisk working methods.

Sampling and DNA extraction
Before extracting DNA from the samples, all surfaces were irradiated with ultraviolet light (UVirradiated) to minimize potential contamination from modern DNA. DNA extraction was performed according to a well-established extraction protocol for ancient DNA 56 . For DNA extraction, 30-120 mg of bone powder was used per sample. The bone powder was obtained by drilling bone tissue using a dental drill and dental drill bits. For different individuals, variable amounts of extracts were produced. During each extraction one positive control (ancient cave bear bone powder sample) and one negative control were included for every ten samples.

Library preparation
In this study, double-stranded (ds) and single-stranded (ss) DNA-libraries were produced. All DNA library preparation procedures that were applied in this study are described in the following paragraphs. The whole-genome capture was performed as described above using the same array enrichment strategy. In addition to the blocking oligonucleotides for double-stranded libraries, specific blocking oligonucleotides 4, 6, 8, and 11 57 were used for single-stranded libraries. The wholegenome enrichment for treponemal DNA was produced in three rounds of array capture and a maximum of two libraries from different individuals were pooled for each array. Enrichment pools were diluted to 10 nMol/L for sequencing.

In-solution capture for KM14-7
An additional in-solution capture procedure was performed for sample KM14-7 to obtain higher coverage. Genome-wide enrichment of single-stranded libraries was performed through custom target enrichment kits (Arbor Bioscience). RNA baits with a length of 60 nucleotides and a 4bp tiling density were designed based on three reference genomes (Nichols, GenBank: CP004010.

Read Processing, Mapping and Variant Calling.
The capture data from the sequencing runs were merged sample-wise and data processing was performed using EAGER version 1.92.37 (Efficient Ancient GEnome Reconstruction) 62

Genomic dataset and multisequence alignment
We constituted a genomic dataset representative of the extant diversity of T. pallidum and including the three previously published ancient genomes from Mexico (Supplementary Table   2). Raw sequencing data was gathered for strains that were high-throughput sequenced. The previously exposed procedure was then applied to obtain vcf files for each genome. We then used MultiVCFAnalyzer 97 to produce alignments with the following parameters: bases were called if covered by at least two reads with a mapping quality of 30 and a consensus of at least 90% (with the one-read-exception rule implemented in MultiVCFAnalyzer). The resulting alignment was realigned with already assembled genomes (isolates BosniaA, CDC2, Chicago, Fribourg, Gauthier, MexicoA, PT_SIF1002, SS14, and SEA81_4_1), using AliView version 1.21 98 .

SNP quality assessment
Calling SNPs from ancient bacterial DNA data is challenging due to DNA damage, potential environmental contamination and low genome coverage which may lead to the recovery artifactual genetic variation in reconstructed DNA sequences. This can interfere with all subsequent analyses and, in particular, lead to artificially long branches in phylogenetic trees and impede time-calibrated analyses. Artifactual SNPs resulting from environmental contamination shared between several samples may also lead to biases in inferred phylogenetic tree topologies or generate misleading evidence of genetic recombination.
In order to filter artifactual SNPs, we used the SNPEvaluation tool 99 as proposed by Keller and colleagues 100 . More specifically, for all newly generated ancient genomes, as well as for all previously published genomes for which the mean sequencing coverage was below 20, we reviewed any unique SNP and any SNP shared by less than 6 genomes that had at least one of the following features in a 50-bp window around the SNP: (i) some positions were not covered, (ii) the reference was supported by at least one read or (iii) the coverage changed depending on the mapping stringency (i.e. we compared the initial alignments with "low-stringency" alignments produced with bwa parameter n=0.01). Any SNP supported by less than four reads was excluded (i.e. N was called at that position) if at least one read supported the reference or if the SNP was "damage-like" (i.e resulting from a C to T or G to A substitution). Furthermore, the specificity of the reads supporting the SNPs was verified by mapping them against the full GenBank database with BLAST 101 . Any SNP supported by reads mapping equally or better to other organisms than T. pallidum was excluded. Since many of these false SNPs likely arising from non-specific mapping were located in tRNAs, we excluded all tRNAs from the alignments.
The list of excluded positions was written in a gff file (Supplementary Data 1) and removed from the full alignment generated by MultiVCFAnalyzer still contains excluded SNPs, which we removed using an in-house bash script.
Phylogenetic and recombination analysis.

KM14-7 SNP analysis
In the phylogenetic tree, KM14-7 was placed basal to TPE and TEN clades. Although bootstrap support was very high, we decided to further evaluate the authenticity of this remarkable position because this genome contained a large fraction of missing data. We investigated genomic positions for which the ancestral variant of TPE/TEN and TPA was likely different. Our rationale was that if KM14-7's position on the branch connecting TPE/TEN and TPA was authentic, the genome should contain a significant number of both TPE/TEN and TPA-like variants. In practice, we looked at positions (i) resolved in KM14-7, (ii) for which the majority variant was differing between TPE/TEN and TPA clades, but (iii) shared by more than 90% of the (modern) genomes within each clade. We then looked at the proportion of TPE/TEN and TPA-like variants in KM14-7 and compared that with all other genomes. Because KM14-7 was not included in the recombination analysis, we did not trim the recombining region for this analysis in order to avoid a bias. Finally, we also produced ML trees based on positions resolved in KM14-7 (141 SNPs). The resulting trees corresponded to the previously observed topology, with the KM14-7 recovered basal to TPE/TEN ( Supplementary Figures 2 and 3). The strength of the molecular clock signal in the dataset was investigated by regressing the root-to-tip genetic distance (measured in substitutions/site) of genomes against their sampling dates 106,107 (Figure 3.A). Root-to-tip genetic distances were calculated on a midpoint-rooted ML tree estimated in RAxML v. 8.2.11 108 using the same procedure described above (Figure 3.B).
Sampling dates of historical sequences were fixed to the middle of the date range defined by radiocarbon dating (Supplementary Table 6.A). To assess the significance of the correlation we permuted the sampling dates across genomes and used the Pearson correlation coefficient as a test statistic 109 ,106 (Figure 3.C). We performed 1,000 replicates and calculated the p-value as the proportion of replicates with a correlation coefficient greater than or equal to the truth (using the unpermuted sampling dates).
Divergence times and substitution rates were estimated using BEAST v. 2.6 110  To confirm clock-like evolution we performed a Bayesian date randomization test 107,109,115 (DRT) by permuting sampling dates across genomes and repeating the analysis. We performed 50 replicates and assessed significance by comparing the molecular clock rate estimates of the replicates to those estimated under the true sampling dates. As in the permutation test for the root-to-tip regression analysis above, we fixed the sampling dates of historical genomes to the middle of the date range defined by radiocarbon dating.
MCMC chains were run for 50 million steps and parameters and trees sampled every 5000 steps. Convergence was assessed in Tracer 116 after discarding 30% of the chains as burn-in and Treeannotator was used to compute MCC trees of the resulting posterior tree distributions.
Results were visualized in R using ggplots2 117,118 , ggtree 119 and custom scripts.

Virulence analysis
Virulence factors represented by the four ancient European genomes were assessed through a gene presence/absence analysis as described by Valtueña and colleagues 81 . A set of 60 sequences previously associated with putative virulence functions 45,82,83,85 were examined based on the annotated Nichols reference genome (NC_021490.2) and without preliminary quality filtering. The coverage over each gene was calculated using genomeCoverageBed in BEDTools version 2.250 120 . The heatmap visualization of the gene-by-gene coverage of reads was created using ggplot2 package in R 117,118 ).