A phylogeography of the second plague pandemic revealed through the analysis of historical Y. pestis genomes

37 The second plague pandemic (14 18 century AD), caused by the bacterium Yersinia pestis, is 38 infamous for its initial wave, the Black Death (1346-1353 AD), and its repeated scourges in 39 Europe and the vicinity until the Early Modern Era. Here, we report 32 ancient Y. pestis 40 genomes spanning the 14 to 17 century AD through the analysis of human remains from nine 41 European archaeological sites. Our data support an initial entry of the bacterium from Eastern 42 Europe and the absence of genetic diversity during the Black Death as well as low diversity 43 during local outbreaks thereafter. Moreover, analysis of post-Black Death genomes shows the 44 diversification of a Y. pestis lineage into multiple genetically distinct clades that may have given 45 rise to more than one disease reservoir in, or close to, Europe. Finally, we show the loss of a 46 genomic region that includes virulence-associated genes in strains associated with late stages of 47 the second plague pandemic (17 18 century AD). This deletion could not be detected in 48 extant strains within our modern dataset, though it was identified in a today-extinct lineage 49 associated with the first plague pandemic (6 8 century AD), suggesting convergent evolution 50 during both pandemic events. 51 52 Introduction 53 One of the most devastating pandemics of human history was the second plague pandemic, 54 which began with the infamous Black Death (BD, 1346-1353 AD) and subsequently continued 55 with recurrent outbreaks in Europe until the 18 century AD. Its causative agent, Yersinia 56 pestis, is a highly virulent bacterium that causes bubonic, pneumonic, and septicaemic plague 57 and today is maintained among wild rodent populations in Eastern Europe, Asia, Africa and the 58 Americas. The source of the second pandemic, as well as the route that plague followed prior 59 to its entry into Europe, remains hypothetical. Genetic characterisation of BD genomes analysed 60 alongside modern diversity has associated its initiation with a star-like diversification of Y. pestis 61 lineages, and has led to the proposal of an East Asian origin, as modern strains that are directly 62 ancestral to this event have been identified in China near historical trade routes. However, the 63 fact that until today the majority of Y. pestis strains have been isolated from China and to a 64 lesser extent from Central Asia−which is also known to harbour a variety of disease 65 foci−creates a sampling bias in the data and questions the accuracy of such conclusions. The 66 view of an East Asian origin for the BD is also supported by certain historical interpretations 67 suggesting that the bacterium travelled westward either via the Mongol army or via Silk Road 68 traders during the early 14 century. Such claims, however, can not yet be verified given the 69 . CC-BY 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . http://dx.doi.org/10.1101/481242 doi: bioRxiv preprint first posted online Nov. 30, 2018;


Introduction
One of the most devastating pandemics of human history was the second plague pandemic, which began with the infamous Black Death (BD, 1346-1353 AD) and subsequently continued with recurrent outbreaks in Europe until the 18 th century AD 1,2 .Its causative agent, Yersinia pestis 3 , is a highly virulent bacterium that causes bubonic, pneumonic, and septicaemic plague and today is maintained among wild rodent populations in Eastern Europe, Asia, Africa and the Americas 4-6 .The source of the second pandemic, as well as the route that plague followed prior to its entry into Europe, remains hypothetical.Genetic characterisation of BD genomes analysed alongside modern diversity has associated its initiation with a star-like diversification of Y. pestis lineages 7 , and has led to the proposal of an East Asian origin, as modern strains that are directly ancestral to this event have been identified in China near historical trade routes 7,8 .However, the fact that until today the majority of Y. pestis strains have been isolated from China 7 and to a lesser extent from Central Asia−which is also known to harbour a variety of disease foci 9,10 −creates a sampling bias in the data and questions the accuracy of such conclusions.The view of an East Asian origin for the BD is also supported by certain historical interpretations suggesting that the bacterium travelled westward either via the Mongol army or via Silk Road traders during the early 14 th century 11 .Such claims, however, can not yet be verified given the scarcity of documentary sources from China describing precise disease symptoms 12 , as well as the lack of molecular evidence from Central and East Asia dating to the early 14 th century.
The first clearly documented outbreaks of the second pandemic occurred in 1346 in the Lower Volga and Black Sea regions 1,13 .From there, the bacterium dispersed through the rest of Europe within the next five years, causing reductions in the human population estimated to be as high as 60% 1 .At present, the only ancient genetic evidence from Eastern Europe is from the city of Bolgar in the Middle Volga region and stems from the time period shortly after the BD (1362-1400 AD) 14 , during which Y. pestis seems to have expanded eastward to later become a precursor of the third plague pandemic that spread worldwide from China during the 19 th century [14][15][16] .Thus, it has been difficult to genetically assess whether this region also contributed to the disease's introduction to Europe, since supporting genomic evidence from the earliest phases of the second pandemic have thus far been elusive.
After the BD, plague was a common scourge in Europe as evidenced by the thousands of recorded epidemics it supposedly caused between 1353 and the late 18 th century 2,17 .Whether these were caused by multiple introductions of the disease from an Asian source or rather by its local persistence in Europe is an on-going subject of scientific debate 14,[18][19][20] .While data from climatic proxies are interpreted to support the former hypothesis 18 , genetic evidence currently favours the latter 14,19 .Analysis of Y. pestis genomes from 16 th -century Ellwangen, Germany 14 and the Great Plague of Marseille in France (L'Observance, 1720-22 AD) 20 revealed the existence of a previously unidentified Y. pestis lineage that was responsible for both of these post-Black Death (post-BD) outbreaks.This lineage descends from the strain associated with the BD thus far identified in both London and Barcelona, and, therefore, likely represents plagues' legacy and persistence in or around Europe after 1353.The limited number of published ancient Y. pestis genomes 8,14,20 , however, challenges our ability to construct hypotheses of whether this lineage was solely responsible for the numerous post-BD outbreaks in Europe and whether it derived from a single or multiple reservoirs.
Here, we take steps to overcome these limitations by greatly expanding the number of available Y. pestis genomes from multiple time periods and locations, in order to gain additional knowledge on the early stages of the second pandemic, and to study the genetic diversity of the bacterium present in Europe after the BD.We analyse material recovered from nine archaeological sites located in Russia, Germany, Switzerland, England and France from which we reconstruct 32 Y. pestis genomes.Our data supports the entrance of Y. pestis into Europe through the East during the initial wave of the pandemic and consistently demonstrates an absence of genetic diversity in the bacterium during the BD.In addition, our genomic analysis of post-BD outbreaks from Central and Western Europe suggests the local diversification of a single plague lineage between the 14 th and 18 th centuries that may have resided in more than one reservoir.

Y. pestis qPCR screening
Human archaeological material was screened for the presence of the Y. pestis-specific gene, pla, located on the pPCP1 plasmid, using a previously described qPCR approach 21 .A total of 181 teeth were retrieved and tested from archaeological sites in the cities of Nabburg (n=12), Manching-Pichl 19 (n=28), Starnberg (n=3), Landsberg am Lech (n=10) and Brandenburg an der Havel 19 (n=3) in Germany; Stans (n=32) in Switzerland, London (n=40) in England, Toulouse (n=39) in France and the city of Laishevo (n=10) in the Volga region of Russia (Figure 1 and Supplementary Information).Extracts from 41 teeth across all sites tested positive for pla (Supplementary table 1).All extraction negative controls were free of amplification products.Amplification products from putatively positive individuals were not sequenced, as the presence of Y. pestis was subsequently assessed through whole-genome capture and high-throughput Illumina sequencing.

Y. pestis capture and whole genome reconstruction
We prepared UDG-treated libraries 22,23 from putatively positive extracts and used a Y. pestis insolution capture approach 24 combined with high-throughput sequencing for the retrieval of 1,299,105 to 79,055,317 raw reads per sequenced library.All data were mapped against the Y. pestis CO92 reference genome (NC_003143.1) 3 .This resulted in 86,278 to 3,822,030 unique mapping reads yielding 1.1 to 80.1-fold coverage across 32 individuals that span the time transect between the 14 th and 17 th century in Europe (Supplementary table 2).More specifically, we could retrieve three Y. pestis genomes from Nabburg, two from Manching-Pichl 19 , one from Starnberg, one from Landsberg am Lech, two from Brandenburg an der Havel 19 (all from Germany), 15 from Stans (Switzerland), five from London (England), one from Toulouse (France) and two from Laishevo (Russia).Of those, 23 isolates showed at least 50% of the reference genome covered at 5-fold (Supplementary table 2), which allowed for their confident inclusion in phylogenetic analysis.In addition, we nearly tripled the genomic coverage of the published "549_O" isolate from Ellwangen, Germany, (now reaching 14.1-fold) which was previously processed by array-based capture (using immobilized probes) 14 (Supplementary table 2).

Y. pestis phylogenetic reconstruction
To infer genetic relationships between the new and previously published Y. pestis isolates we constructed phylogenies using the maximum likelihood method, allowing for up to 3% missing data (97% partial deletion) to accommodate lower coverage genomes.As a reference dataset we used 233 modern isolates 7,9,10,25,26 , which represent most of the published Y. pestis genetic diversity.In addition, we included all previously published second pandemic isolates (n=12) 8,14,20 , a 6 th -century AD isolate from Germany 27 , a 2 nd -to 3 rd -century AD isolate from the Tian Shan mountains in Kyrgyzstan 28 , as well as three Bronze Age isolates from the Altai and Volga regions 29,30 (Supplementary fig.1).
All newly reconstructed genomes appear on Branch 1 and are closely related to the previously published second pandemic isolates from Europe (Figure 2), thus confirming their authenticity.
In addition, they seem to represent a diverse group of strains that were present across Europe between the 14 th and 18 th century AD (Supplementary data 1).A number of genomes were excluded from the phylogeny (NAB005, BRA003, STN011 and STN004) as they showed an excess amount of heterozygous SNP calls in comparison to other contemporaneous isolates from the same archaeological contexts (Supplementary fig.2).Such calls potentially arise from enrichment of non-target DNA stemming from closely related organisms, an effect frequently identified in ancient metagenomic datasets 27,31,32 that can result in false branch elongation (Supplementary fig. 3) and misinterpretation of phylogeographic inferences.
Our phylogenetic reconstruction shows that the LAI009 isolate from Laishevo is ancestral to the BD isolates from Central, Western and Southern Europe, as well as to the previously published late 14 th -century isolates from London (6330) 8 and Bolgar City 14 (Figure 2).This genome possesses only one derived SNP distinguishing it from the N07 polytomy that gives rise to Branches 1 to 4 (Figure 2; Supplementary data 1) 7 .Since all other second pandemic genomes share an additional derived SNP on Branch 1, we interpret LAI009 as the most ancestral thus far identified form of the strain that entered Europe during the initial wave of the second pandemic.
Regarding the Central and Western European genomes, NAB003 from Nabburg does not show differences compared to previously published BD genomes from London and Barcelona 8,14 , whereas TRP002 from Toulouse, which dates to 1347-1350 based on archaeological evidence, is seemingly distinct from other isolates and forms its own unique branch (Figure 2; Supplementary data 1).Although this isolate shows indications of high heterozygosity (Supplemental fig.2) and suffers from comparatively lower coverage, we nevertheless considered it further in our analysis, as it is the sole genome available from this site.To investigate whether its branch arose from true unique variants or contaminant reads that resulted in erroneous SNP calling, the genome alignment was evaluated in greater detail.After visual inspection, all eight SNPs unique to TRP002 appear in regions of the genome where reads from diverse sources seem to be mapping (Supplementary fig.4).In addition, these variants exhibit all previously defined characteristics of problematic SNP assignments in low coverage genomes 27 and, therefore, were considered to be of exogenous origin.We, hence, conclude that apart from LAI009 all other reconstructed genomes that date to the Black Death period have identical genotypes.
We find a number of genomes grouping with the previously described 'post-BD' lineage together with published strains from Ellwangen (ELW098/549_O), Germany (1486-1630) 14 , and Marseille, France (1720-1722) 20 , which are descended from the European BD isolates (Figure 2; Supplementary data 1).Here, we identify the earliest evidence of this lineage in a 14 th -century isolate from Manching-Pichl (MAN) 19 (see Supplementary Information), which is followed by the more derived 15 th -to 17 th -century isolates from Landsberg (LBG), Stans (STN) and Starnberg (STA), as well as the 17 th -century Brandenburg an der Havel (BRA) 19 and London (BED), all of which provide further evidence for plague's continuous presence in Europe after the BD.Of note, we retrieved eight nearly identical genomes from Stans (STN, maximum one SNP difference in two of eight genomes, mean SNP distance d=0), and together with the four identical genomes from 17 th -century London (BED) (d=0), the five previously published nearly identical genomes from Marseille (OBS, maximum one SNP difference in one of five genomes, d=0), and the seven identical BD isolates from various regions in Europe (d=0), our results demonstrate low genetic diversity of the bacterium within local outbreaks and/or major epidemics of the second pandemic.In addition, we find that this "post-BD lineage" gave rise to (at least) two distinct clades within Europe, with the Ellwangen isolate being positioned closest to an apparent population split (Figure 2).From this divergence, one clade gives rise to the strains associated with outbreaks in Germany and Switzerland, and the second encompasses strains from 17 th -century London (BED) and 18 th -century Marseille (OBS).Notably, these two clades show dissimilar rates of substitution accumulation.For example, the mean SNP distance between the Ellwangen genome (ELW098/549_O) and the London (BED) genomes (d=45) is double than that observed between Ellwangen and Brandenburg (BRA, d=22), despite an assumption of them being contemporaneous (early 17 th century AD) based on their archaeological dating (Figure 2; Supplementary Information; Supplementary table 1).

Analysis of substitution rate variation in Y. pestis
We used the Bayesian framework BEAST v1.8 in order to make an assessment of substitution rate variations across the genealogy of Branch 1 (n=76), which contains all second pandemic Y. pestis genomes, using available calibration points in our modern and ancient datasets (Supplementary data 2).Previous studies have demonstrated that overdispersion among Y. pestis branch lengths is unlikely a result of natural selection, and have rather suggested a link between rate acceleration and geographic expansion of certain lineages during epidemic spread 7,16 .Our analysis suggests an over 40-fold difference between the fastest and slowest mutation rates observed in Branch 1 (Figure 3).In particular, we observe the fastest rates in three internal branches (Figure 3).The first spans the genetic distance between the strains from Ellwangen (549_O) and London (BED), and supports the conflicting branch lengths of BED and BRA strains described earlier (Figure 3 and Supplementary data 3).The second is the branch leading to the 1.ANT strains isolated from Africa (Congo and Uganda) (Figure 3 and Supplementary data 4).The broad history of 1.ANT and the time period associated with its establishment in Africa are unknown, though an introduction from Eurasia has been hypothesized 14 .The third, which displays the fastest rate within the entire Branch 1, is the branch leading to 1.ORI isolates (Figure 3 and Supplementary data 5), which is associated with the global spread of Y. pestis via maritime routes during the third plague pandemic (1894 -1950s) 15,16 .Our results, therefore, support the idea of faster substitution rates during epidemic spread, here particularly noticeable for lineages known to have expanded over wide geographic areas.

Virulence factor analysis
To investigate the virulence profiles of all newly reconstructed genomes, we analysed the presence or absence of virulence-associated genes located on the Y. pestis chromosome (Figure 4a) and plasmids (Supplementary fig.5) 33,34 , in comparison to representatives of published ancient and modern strains.We find that the genetic profiles of some of the previously characterised historical strains are influenced by the capture design used for their retrieval.Specifically, the second pandemic genomes 'Bolgar 2370' and 'Barcelona 3031' 14 , and the first pandemic genome 'Altenerding 2148' 27 seem to lack coverage in certain Y. pestis-specific regions, since Y. pseudotuberculosis was used as a probe-design template for enrichment of these genomes (Figure 4a).Regarding the newly reconstructed strains, we find that most possess all analysed virulence determinants with the exception of the New Churchyard (BED) and Marseille (OBS) strains that lack the magnesium transporter genes mgtB and mgtC (Figure 4).

Magnesium transporters are considered vital for bacterial intracellular survival under low Mg 2+
conditions 35 , such as those encountered within macrophage phagosomes.Specifically for Y. pestis, mgtB disruption has been associated with a decreased ability for macrophage invasion resulting in its attenuated virulence in mice 36 .Both mgtB and mgtC are present in all 233 modern Y. pestis genomes used in our comparative dataset.We explored these gene deletions in greater detail using BWA-MEM and identified them as part of a 49 kb missing region within the BED and OBS genomes (1,879,467 -1,928,869 on CO92) (Figure 4b) flanked by an IS100 element immediately following its downstream end, which is consistent with previously characterised disruptions or losses of Y. pestis genomic regions via insertion elements 37 .Apart from mgtB and mgtC, this region encompasses a set of 34 additional genes that code for both characterised and hypothetical proteins, most of which seem to be associated with phenotypic characteristics that appear inactivated in Y. pestis such as motility and chemotaxis as well as few genes associated with metabolism, structure synthesis and environmental stress response (Supplementary table 3).
In addition, the clade encompassing this deletion is associated with some of the late outbreaks of the second plague pandemic, i.e. during the 17 th century in London, England, (BED) (see Supplementary Information) and during the 18 th -century Plague of Marseille, in France, (OBS -1720-1722) 20 , which was one of the last major epidemics that occurred in continental Europe 38 .
Intriguingly, a nearly identical genomic deletion (45 kb), also including the mgtB and mgtC virulence-associated genes, was recently identified in an ancient isolate from France (LVC) 39 sequenced from a victim of the first plague pandemic (6 th -8 th centuries AD) 39 .This genome is described elsewhere and dates within a wide temporal interval (550-650 AD), though based on the existing genomic data it seems to be the most derived isolate on the first pandemic lineage sequenced to date 39 .Therefore, the identified deletion, which includes potential virulence factors, seems to be specific to derived isolates of two lineages within the Y. pestis phylogeny that seem to be today extinct.

Discussion
A series of studies have sufficiently demonstrated the preservation of Y. pestis in ancient human remains from a wide temporal transect 8,14,20,24,27,29,40 .In this study, this has given a great opportunity for the extensive sampling of multiple European epidemic burials from the time period between the 14 th and 17 th century, in order to gain a more complete picture of Y. pestis' genetic history during the second plague pandemic.As a result, we nearly quadruple the amount of genomic data available from that time period (Figure 1 and Supplementary tables 1, 2) and their integration in existing datasets reveals key aspects regarding the initiation and progression of the second plague pandemic in Europe.
Based on historical sources alone, it has been difficult to determine the time at which Y. pestis first reached different parts of western Russia 13 .A commonly accepted view dates its arrival in the southwest, particularly in cities of Astrakhan and Sarai, in 1346 1,41 with subsequent spread into southern Europe from the Crimean peninsula.On the other hand, the dispersal of plague into northwestern Russia (i.e. in the cities of Pskov and Novgorod 13,41 ) may have followed an alternative route, occurring at the end of the BD, between 1351-1353, via the Baltic Sea 1,13,41 .Such a notion of plague's expansion from northern Europe towards the East is also supported by published ancient genomic data from the late 14 th -century Middle Volga region of Russia 14 , though other employable scenarios may come to light with incorporation of additional genomic and historical data.Importantly, through analysis of our new strain from Laishevo (LAI009), which is phylogenetically ancestral to all second pandemic strains sequenced to date (Figure 2), we provide evidence for the bacterium's presence in the same region, ~2,000 km northeast of the Crimean peninsula, prior to reaching Southern Europe in 1347 -1348 (here represented by strains from Barcelona and Toulouse) 1 .These results lead us to propose that the N07 derived SNP previously termed "p1" 14 (Figure 2, Supplementary fig. 1) that is common to all other second pandemic strains, was likely acquired within Europe during the onset of the BD.In addition, given the proximity of the LAI009 genome to the N07 node often associated with the initiation of the BD (Figure 2, Supplementary fig. 1) 7 and the apparent East Asian sampling bias of modern isolates 7 , additional data will be necessary for an accurate re-evaluation of the geographic origin of Branch 1.
Such insights of low genomic diversity during the initial pandemic wave (BD) become valuable when attempting to reconstruct the spread of plague after 1353.Previous research based on climatic proxies 18 and PCR data 42 have proposed multiple introductory waves of Y. pestis into Europe as the main source for the post-BD outbreaks recorded until the 18 th century.Here, using previously published 8,14,20 and new whole genome data from 15 archaeological sites, we identify that all genomes associated with post-BD outbreaks in Europe derived from a single ancestral strain that was present in Southern, Central and Western Europe during the BD.We, therefore, interpret our data as supporting a single entry of Y. pestis during the BD, though additional interpretations may be revealed through the discovery of un-sampled diversity in Western Eurasia.Subsequent to its entry, we observe the formation of two sister lineages (Figure 2).The first lineage is responsible for the bacterium's possible eastward expansion after the BD.It contains strains from late-14 th -century London (6330) 8 and Bolgar City (2370) 14 , as well as extant strains from Africa (1.ANT) 43 , and, most importantly, a worldwide set of isolates associated with the third pandemic (1.ORI, 19 th -20 th centuries) 7,15,16 (Figure 2).The second, here termed the "post-BD lineage" is characterised by a profound genomic diversity identified within Europe that seems to have been restricted to the second pandemic, as no modern descendants have been identified for this lineage to date.It is represented by historical genomes isolated from 14 th -to 18 th -century Germany (MAN, STA, ELW, LBG, and BRA), Switzerland (STN), England (BED) and France (OBS) (Figure 2), suggesting that it persisted in Europe and caused infections over a wide geographic range.The fact that this lineage has no identified modern descendants is likely related to the disappearance of plague from Europe in the 18 th century, possibly due to extinction of local reservoirs, as previously suggested 14 .
We find that the "post-BD lineage" gave rise to (at least) two distinct clades within Europe that separate the strains identified in Central Europe during the 15 th -17 th centuries, and those identified in 17 th -to 18 th -century England and France.Their distinction is corroborated not only by their genetic and geographic separation (Figure 2), but also by potential differences in their virulence profiles (Figure 4a) and substitution rates (Figure 3).The clade that exhibits a slower substitution rate is mainly represented by temporally and genetically closely related isolates from Germany and Switzerland (Figure 2), which could indicate endemic circulation of the bacterium in that region.Such an observation may be compatible with the hypothesis of an Alpine rodent reservoir facilitating the spread of plague in Central Europe after the BD 44 , although a possible sampling bias should be noted since the majority of our data points derive primarily from this region.On the other hand, the clade that exhibits a faster substitution rate (Figure 3) seems to have had a wider geographic distribution.Given that both Marseille and London were among the main maritime trade centres in Europe during that time it is plausible that introduction of the disease in these areas occurred via ships 45 , although sources favouring local epidemic eruptions also exist 46 .Published research has demonstrated that transmission of Y. pestis via steamships during the 19 th -century third pandemic played a significant role in initial introduction of the bacterium to several regions worldwide, such as in Madagascar where it persists until today 15,16,47,48 .Although the possibility of maritime introductions of plague into London and Marseille during the second pandemic vastly expand the location of its potential geographic source(s), the phylogenetic positioning of the BED and OBS genomes within the "post-BD lineage" and in relation to other second pandemic isolates suggests it likely arose within Europe or the vicinity.
We identified a 49 kb deletion within both BED and OBS genomes (Figure 4b), which results in the loss of two virulence-associated genes, mgtB and mgtC (Figure 4a).This deletion could not be identified in any other second pandemic or modern strain within our dataset.The inferred virulence importance of mgtB and mgtC genes is associated with intracellular survival of Y. pestis within macrophages 36,49 .Their co-expression has been shown to affect the virulence exerted by other pathogenic enterobacteria under laboratory conditions 50,51 and both genes have been proposed as potential drug targets 36,52 .Moreover, the function of mgtB was shown to be temperature-dependent, being active at 37°C but not at 20°C 53 , which suggests that its loss may primarily affect warm-blooded hosts.Intriguingly, a 45 kb deletion within the same region was identified in a genome likely associated with a late outbreak of the first plague pandemic (6 th -8 th century AD) 39 , which sets it as a candidate for convergent evolution and raises a question regarding its functional importance.Given that all the genomes that contain this deletion were isolated from historical plague victims, it is plausible it may not have affected the bacterium's virulence, particularly since genome decay is a well known mode of evolution in Y. pestis 54,55 .
Nevertheless, since both lineages that harbour it are thought to be today extinct, a full functional characterisation will be of importance to evaluate its potential significance in survival across different mammalian hosts and arthropod vectors, which may have affected the bacterium's maintenance in Europe during the first and second pandemics.
The second plague pandemic has arguably caused the highest levels of mortality of the three recorded plague pandemics 1,56 .It serves as a classic historical example of rapid infectious disease emergence in Europe, long-term local persistence and eventual extinction for reasons that are currently not understood.We have shown that extensive sampling of ancient Y. pestis genomic data can provide direct molecular evidence on the genealogical relationships of strains present in Europe during that time.In addition, we provide relevant information regarding the initiation and progression of the second pandemic and suggest that a single source reservoir may be insufficient to explain the breadth of epidemics and Y. pestis' genetic diversity in Europe during the 400-year course of the pandemic.Although certain key regions in Europe remain under-sampled for ancient DNA, namely the Eastern Mediterranean, Scandinavia and the Baltics, vast amounts of high-quality genomic data are becoming increasingly available.Their integration into disease modelling efforts, which consider vector transmission dynamics 57,58 , climatic 18,59,60 and epidemiological data 61 , as well as a critical re-evaluation of historical records 62 will become increasingly important for better understanding the second plague pandemic.
was performed where libraries were split into multiple PCR reactions based on their initial quantification 65 , in order to ensure maximal amplification efficiency.Every reaction was assigned a maximum input of 2×10 10 DNA molecules.A unique index combination (index primer containing a unique 8 bp identifier) was assigned to every library, and a 10-cycle amplification reaction was used to attach index combinations to DNA library molecules using Pfu Turbo Cx Hotstart DNA Polymerase (Agilent).PCR products were purified using the MinElute DNA purification kit (Qiagen), and eluted in TET (10mM Tris-HCl, 1mM EDTA pH 8.0, 0.05% Tween20).After indexing all libraries were amplified using Herculase II Fusion DNA Polymerase (Agilent) to a concentration of 200-300 ng/µl, in order to achieve 1-2 µg of DNA in a total of 7 µl.Products were again purified using the MinElute DNA purification kit (Qiagen), and eluted in TET (10mM Tris-HCl, 1 mM EDTA pH 8.0, 0.05% Tween20).Insolution Y. pestis capture was then performed as described previously 24 , where the following genomes were used as templates for probe design: CO92 chromosome (NC_003143.1),CO92 plasmid pMT1 (NC_003134.1),CO92 plasmid pCD1 (NC_003131.1),KIM 10 chromosome (NC_004088.1),Pestoides F chromosome (NC_009381.1)and Y. pseudotuberculosis IP 32953 chromosome (NC_006155.1).DNA captures were carried out on 96-well plates.Each sample was either captured in its individual well, or pooled with maximum one more sample from the same site.Blanks with non-overlapping index combinations were captured together.

SNP calling and phylogenetic analysis
SNP calling was performed using the UnifiedGenotyper of the Genome Analysis Toolkit (GATK) 69 .Our newly reconstructed genomes were analysed alongside previously published Y. pestis genomes, which included a modern-day dataset of 233 genomes 7,9,10,25,26,43 , one Justinianic strain (Altenerding) 27 , three Bronze Age strains 29 , 12 previously published historical genomes from the second plague pandemic and a Y. pseudotuberculosis strain (IP32953) which was used as outgroup.A vfc file was produced for every genome using the 'EMIT_ALL_SITES' option, which generated a call for every position present in the reference genome.Furthermore, we used the custom java tool MultiVCFAnalyzer v0.85 32 (https://github.com/alexherbig/MultiVCFAnalyzer)to produce a SNP table of variant positions across all genomes analysed, using the following parameters: for homozygous alleles, a SNP would be called when covered at least 3-fold with a minimum genotyping quality of 30 and for heterozygous alleles a variant would be called when 90% of reads would support it.In cases where none of the parameters would be met an "N" would be inserted in the respective genomic position.In addition, we omitted previously defined noncore regions, as well as annotated repetitive elements, homoplasies, tRNAs, rRNAs, and tmRNAs from our SNP analysis 7,16 .In the present dataset, a total of 7,251 variant positions were identified.Subsequently, the annotation as well as the effect of each SNP was determined through the program SnpEff 70 .
For the phylogenetic analysis, we used a SNP alignment produced by MultiVCFAnalyzer v0.85 to construct phylogenetic trees using the maximum likelihood (ML) and maximum parsimony (MP) methods.Up to 3% missing data was included in the analysis (97% partial deletion), resulting in a total amount of 6,353 SNPs used for phylogenetic reconstruction.The MP phylogeny was produced in MEGA7 71 in order to make a first assessment of genome topologies.
The ML phylogenies were reconstructed with the program RAxML (version 8.2.9) 72 using the Generalised Time Reversible (GTR) 73 model with four gamma rate categories and 1000 bootstrap replicates to assess tree topology support.

Heterozygosity estimates
Heterozygous variant calls in all genomes were investigated given the disparity of branch lengths observed in isolates from the same archaeological sites (see Supplementary figure 2).
Our approach takes into account the "haploid" nature of prokaryotic genomes, suggesting that "heterozygous" SNPs could either arise as a result of mixed infections or from erroneous mapping of DNA reads that belong to closely related bacterial contaminants.We subsampled all newly reconstructed genomes to ~5-fold coverage, and performed SNP calling using GATK 69 , as described above.We then compiled a SNP table of all variant positions across our dataset using MultiVCFAnalyzer v0.85 32 accounting for all heterozygous positions.Histograms of allele frequencies for all SNPs with <100% read support were constructed with R v3.4.1 74 using representative genomes from all sites.

Estimates of substitution rate variation in Y. pestis
In order to calculate the substitution rate variation across Y. pestis isolates associated with the second pandemic, we first assessed the temporal signal across Branch 1 that includes all genomes from both the second and third plague pandemics.For this, we computed a maximum likelihood phylogeny in RaxML 72 using all Branch 1 genomes (modern + ancient n=76), with the exception of the low coverage TRP002 genome that showed possible environmental contamination in its SNP calls.In addition, we used the strain 2.MED KIM10 (Branch 2) as outgroup for rooting the tree.Variant positions across this set of genomes were used for the analysis, allowing for up to 3% missing data (545 SNPs).We used TempEst v1.5 (http://tree.bio.ed.ac.uk/software/tempest/) for calculating the root-to-tip regression in relation to specimen or sampling ages.The calculated correlation coefficient (R) and R 2 values were 0.53 and 0.28 respectively, which permitted the proceeding with molecular dating analysis.
The Bayesian framework BEASTv1.8 75 was used to assess the substitution rate variation across the Y. pestis Branch 1 using the 2.MED KIM10 as outgroup.Our BEAUti set-up took into consideration all archaeological, radiocarbon, and sampling dates of both ancient and modern genomes (Supplementary data 2) that were used as calibration points for the Bayesian phylogeny.Divergence dates for each node in the tree were estimated as years before the present, where the year 2005 was set as the present since it represents the most recently isolated modern Y. pestis strain on Branch 1 (1.ORI MG05).All genomes within this Branch 1 were constrained as a monophyletic clade.The Generalised Time Reversible (GTR) 73 substitution model (6 gamma rate categories) and a lognormal relaxed clock (clock rate tested and strict clock rejected in MEGA7 71 ) were used to set up two separate analyses using the Coalescent Constant Size 76 and Coalescent Bayesian Skyline 77 demographic models.For each analysis, three independent chains of 50,000,000 states each were carried out and then combined using LogCombiner to ensure run convergence, with 10% burn-in.In addition, we estimated marginal likelihoods to determine the best-fitted demographic model for our dataset using path sampling and stepping stone sampling (PS/SS) implemented in BEAST v1.8 75 .For this, each of the described runs was carried out for an additional 50,000,000 states (500,000 states divided into 100 steps) using an alpha parameter of 0.3, which determined the Coalescent Bayesian Skyline model as better fit for the current dataset.The results produced by the run considering this demographic model were then viewed in Tracer v1.6 (http://tree.bio.ed.ac.uk/software/tracer/) to ensure all relevant expected sample sizes (ESS) were > 200.We used TreeAnnotator 75 , to produce a maximum clade credibility (MCC) phylogeny using the best-fit model with 10% burnin, which resulted in the processing of 13,503 trees.The MCC tree was viewed and modified in FigTree v1.4 (http://tree.bio.ed.ac.uk/software/figtree/)where branch lengths were represented as a function of age and mean rates were used to colour individual branches.

Virulence factor presence/absence and deletion analysis
In order to investigate the virulence profiles of the newly reconstructed second pandemic genomes, the highest quality (coverage) genome from every site (LAI009, NAB003, TRP002, MAN008, STA001, LBG002, STN014, BRA001, BED030) was used for comparison with each other and with previously published representatives of ancient (London ES, Bolgar 2370, Barcelona 3031, Ellwangen 549_O, OBS137, RISE509, RT5, Altenerding 2148) and modern (1.ORI-CO92, 0.PE2-PESTF, 0.PE4-Microtus 91001) Y. pestis isolates.The newly reconstructed TRP002 and the published London 6330 genomes were excluded from this analysis due to their low coverage.All other listed genomes were re-mapped against the CO92 chromosomal reference genome (NC_003143.1)using a mapping quality (-q) filter of 0. The coverage across 80 chromosomal and 43 plasmid virulence-associated genes that were previously define 33 , was calculated using bedtools 78 .The results are plotted in the form of a heatmap of using the ggplot2 79 package in R version 3.4.1 80 and can be viewed in Figure 4.In addition, we used BWA-MEM 81 to explore the precise coordinates of observed gene or region losses in all affected genomes using default parameters.For a visualisation of the identified deletion we computed average coverages across 2,000-bp windows in representative Y. pestis genomes from all analysed time periods of the second pandemic, and subsequently used the program Circos 82 to produce coverage plots of a 20-fold maximum coverage for all genomes (LAI009, London BD 8124/8291/11972, Bolgar 2370, MAN008, STA001, ELW098, LBG002, STN014, BRA002, BED030, OBS137 and the reference genomes CO92).

Data availability
Raw sequencing data of the deep-sequenced genomes will become available on the European Nucleotide Archive under project accession number PRJEB29990 upon publication.Other data supporting the findings of the study are available in this article and its Supplementary Information files, or from the corresponding authors upon request.The figure presents a maximum clade credibility (MCC) phylogenetic tree generated using BEAST v1.8 75 (rooted with 2.MED KIM10 -outgroup not shown).The tree was viewed in FigTree v1.4, and modified so that branch colours represent mean substitution rates (substitution/site/year).The tree depicts the substitution rate variation across Branch 1 of the Y. pestis phylogeny, which ranges from 2.12E-7 (highest) to 5.10E-9 (lowest) substitutions/site/year (see rate key).The isolates used for this analysis overlap with the ones used for the SNP and maximum likelihood phylogenetic analysis (see Supplementary figure 1), with the exception of the TRP002 genome since its SNP calls were likely affected by a possible environmental contamination.The mean substitution rate across Branch 1 (including 2.MED KIM10) was calculated to 2.67E-8 substitutions/site/year. Lengths of branches are scaled to represent sample ages, and the depicted Branch 1 sequences are estimated to represent 756 years (95% HPD: 671 -887) of Y. pestis evolution.The time scale is shown in years before the present (BP), where present denotes the most recently isolated modern Y. pestis strain (year 2005).

Figure 1 -Figure 2 -
Figure 1 -Archaeological site locations (a) Archaeological/radiocarbon dates of previously published and new second plague pandemic isolates (see Supplementary Information).(b) Map showing geographic locations of the archaeological sites from which second pandemic (14 th -to 18 th -century AD) Y. pestis genomes have been reconstructed.The number (n) of whole genomes obtained from each site is shown in brackets.Locations of previously published genomes appear in triangles, whereas genomes that are newly described in this study appear in circles.

Figure 3 -
Figure 3 -Substitution rate variation across the Y. pestis phylogeny

Figure 4 -
Figure 4 -Presence/absence analysis of virulence-associated genes and chromosomal coverage assessment (a) A comparison of virulence genetic profiles was performed across newly reconstructed and previously published second pandemic genomes (in red, orange, green and blue).An assessment of the presence or absence of 80 previously defined 33 chromosomal virulence-associated genes was used for this characterisation.Published Y. pestis representative strains from the Bronze Age period 29,30 (RISE509 and RT5), from the first pandemic 27 (6 th -century Altenerding 2148), from modern-day isolates (0.PE2, 0.PE4 & 1.ORI) 7 , as well as a Y. pseudotuberculosis (IP32953) strain 55 , are also shown for comparative purposes.The colour scale ranges from 0 (not covered -yellow) to 1 (entirely covered -blue) according to the relative proportion of gene/locus covered.The heatmap was plotted in R version 3.4.1 80 using the ggplot2 package 83 .Refer to Supplementary fig. 5 for presence/absence of virulence genes across the pMT1, pPCP1 and pCD1 plasmids.(b) Chromosomal coverage plots made with the Circos 82 software.The plots were constructed to a maximum coverage of 20-fold, and the average coverage was calculated over 2,000-bp windows.Genomes are shown in order according to their age from oldest (innermost circle) to youngest (outermost circle) as follows: LAI009, London BD 8124/8291/11972, Bolgar 2370, MAN008, STA001, ELW098/549_O, LBG002, STN014, BRA002, BED030, OBS137 and the reference genome CO92.
sa E p sa F p sa A p sa B p sa C Y a d B C p h o Q p h o P m g tC m g tB in v h ig h R ip A h m sH h m sF h m sR h

Table 1 -
Post-capture sequencing statistics of all new Yersinia pestis genomes that passed quality tests for inclusion in phylogenetic analysis a Dates based on radiocarbon dating of collagen b Dates based on archaeological context information