Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico

Indigenous populations of the Americas experienced high mortality rates during the early contact period as a result of infectious diseases, many of which were introduced by Europeans. Most of the pathogenic agents that caused these outbreaks remain unknown. Through the introduction of a new metagenomic analysis tool called MALT, applied here to search for traces of ancient pathogen DNA, we were able to identify Salmonella enterica in individuals buried in an early contact era epidemic cemetery at Teposcolula-Yucundaa, Oaxaca in southern Mexico. This cemetery is linked, based on historical and archaeological evidence, to the 1545–1550 ce epidemic that affected large parts of Mexico. Locally, this epidemic was known as ‘cocoliztli’, the pathogenic cause of which has been debated for more than a century. Here, we present genome-wide data from ten individuals for Salmonella enterica subsp. enterica serovar Paratyphi C, a bacterial cause of enteric fever. We propose that S. Paratyphi C be considered a strong candidate for the epidemic population decline during the 1545 cocoliztli outbreak at Teposcolula-Yucundaa. Ancient DNA from victims of a sixteenth-century disease in Mexico suggests that Salmonella enterica Paratyphi C (enteric fever) was responsible for a devastating epidemic that closely followed European presence in the region.

I nfectious diseases introduced to the New World following European contact led to successive outbreaks in many regions of the Americas that continued well into the nineteenth century. These often caused high mortality and, therefore, contributed a central, and often underappreciated, influence on the demographic collapse of many indigenous populations [1][2][3][4] . Population declines linked to regionally specific epidemics are estimated to have reached as high as 95% 3 , and their genetic impact based on recent population-based studies of ancient and modern human exome and mitochondrial data attests to their scale 5,6 . One hypothesis posits that the increased susceptibility of New World populations to Old World diseases facilitated European conquest, whereby rapidly disseminating diseases severely weakened indigenous populations 2 , in some cases even before European presence in the region 2,7 . Wellcharacterized infections, such as smallpox, measles, mumps and influenza, are known causes of later contact era outbreaks; however, the diseases that are responsible for many early contact period New World epidemics remain unknown and have been the subject of scientific debate for more than a century [2][3][4]7,8 .
Morphological changes in skeletal remains 9 and ethnohistorical accounts 10 are often explored as sources for understanding population health in the past, although both provide only limited resolution and have generated speculative and, at times, conflicting hypotheses about the diseases introduced to New World populations 2,3,7,11,12 . Most infectious diseases do not leave characteristic markers on the skeleton due to their short periods of infectivity, the death of the victim in the acute phase before skeletal changes formed, or a lack of osteological involvement 9 . Although historical descriptions of infectious disease symptoms can be detailed, they are subject to cultural biases, are affected by translational inaccuracies, lack a foundation in germ theory and describe historical forms of a condition that may differ from modern manifestations 8,11 . In addition, differential diagnosis based on symptoms alone can be unreliable even in modern contexts, as many infectious diseases have similar clinical presentations.
Genome-wide studies of ancient pathogens have proven instrumental in both identifying and characterizing past human infectious diseases. These studies have largely been restricted to skeletal collections where individuals display physical changes consistent with particular infections [13][14][15] , a historical context that links a specific pathogen to a known epidemiological event 16 or an organism that was identified via targeted molecular screening without prior indication of its presence 17 . Recent attempts to circumvent these limitations have concentrated on broad-spectrum molecular approaches focused on pathogen detection via fluorescence-hybridization-based microarray technology 18 , identification via DNA enrichment of certain microbial regions 19 or computational screening of non-enriched sequence data against human microbiome data sets 20 . These approaches offer improvements, but remain biased in the bacterial taxa used for species-level assignments. As archaeological material is expected to harbour an abundance of bacteria that stem from the depositional context, omission of environmental taxa in species assignments can lead to false-positive identifications. Additional techniques for authenticating ancient DNA have been developed 21,22 , including the identification of characteristic damage patterns caused by the deamination of cytosines 23 , methods that evaluate evenness of coverage of aligned reads across a reference genome, or length distributions that consider the degree of fragmentation, where ancient molecules are expected to be shorter than those from modern contaminants 24 .

Nature ecology & evolutioN
A typical next-generation sequencing data set from an ancient sample comprises millions of DNA-sequencing reads, which make taxonomic assignment and screening based on sequence alignments computationally challenging. The gold-standard tool for alignmentbased analyses is the Basic Local Alignment Search Tool (BLAST) 25 , owing to its sensitivity and statistical model. However, the computational time and power that BLAST requires to analyse a typical metagenomic data set is often prohibitive.
Here, we introduce the MEGAN alignment tool (MALT), a program for the fast alignment and analysis of metagenomic DNAsequencing data. MALT contains the same taxonomic binning algorithm, that is, the naive lowest common ancestor (LCA) algorithm (for reviews, see 26,27 ), implemented in the interactive metagenomics analysis software MEGAN 28 . Like BLAST, MALT computes 'local' alignments between highly conserved segments of reads and references. MALT can also calculate 'semi-global' alignments where reads are aligned end-to-end. In comparison to protein alignments or local DNA alignments, semi-global DNA alignments are more suitable for assessing various quality and authenticity criteria that are commonly applied in the field of paleogenetics.
We applied our MALT screening pipeline (Supplementary Figs. 1 and 2) using a database of all complete bacterial genomes available in National Center for Biotechnology Information (NCBI) RefSeq to non-enriched DNA sequence data from the pulp chamber of teeth collected from indigenous individuals excavated at the site of Teposcolula-Yucundaa, located in the highland Mixteca Alta region of Oaxaca, Mexico 29,30 . The site contains both pre-contact and contact era burials, including the earliest identified contact era epidemic burial ground in Mexico 30,31 ( Fig. 1 and Supplementary Methods 1). This is the only known cemetery historically linked to the cocoliztli epidemic of 1545-1550 ce 30 , described as one of the principal epidemiological events responsible for the cataclysmic population decline of sixteenth century Mesoamerica 7,32 . This outbreak affected large areas of central Mexico and Guatemala, spreading perhaps as far south as Peru 7,30 . Through the MALT screening approach, we were able to identify ancient Salmonella enterica DNA in the sequence data generated from this archaeological material, to the exclusion of DNA stemming from the complex environmental background. Although the pathogenic cause of the cocoliztli epidemic is ambiguous based on ethnohistorical evidence 7,8,30 , we report the molecular evidence of microbial infection with genome-wide data from S. enterica subsp. enterica serovar Paratyphi C (a known cause of enteric fever in humans) isolated from ten epidemic-associated contact era burials.

Results
The individuals included in this investigation were excavated from the contact era epidemic cemetery located in the Grand Plaza (administrative square) (n = 24) and the pre-contact churchyard cemetery (n = 5) at Teposcolula-Yucundaa between 2004 and 2010 30 (Fig. 1 30 . In addition, oxygen isotope analysis identified them as local inhabitants 30 . Thirteen individuals included in this study were previously radiocarbon dated 31 , yielding dates that support archaeological evidence that the Grand Plaza (n = 10) and the churchyard (n = 3) contain contact and pre-contact era burials, respectively (Supplementary Table 1). The Grand Plaza is estimated to contain > 800 individuals, most interred in graves that contain multiple individuals. The excavated individuals contribute to a demographic profile consistent with an epidemic event 29,30 .
Tooth samples were processed according to protocols designed for ancient DNA work (Supplementary Methods 2). An aggregate soil sample from the two burial grounds was analysed in parallel to gain an overview of environmental bacteria that may have infiltrated our samples. Pre-processed sequencing data of ~1 million paired-end reads per tooth were analysed with MALT using a curated reference database of 6,247 complete bacterial genomes,   Table 4). Subsequently, the sequencing data for all samples were mapped to the human genome (hg19), revealing a similar level of damage in the human reads for Tepos_10, Tepos_14 and Tepos_35, thus providing further support for the ancient origin of the S. enterica reads (Supplementary Methods 4 and Supplementary Table 5). An additional seven individuals from the Grand Plaza cemetery and one negative control harboured low numbers of assigned S. enterica reads ranging from 4 to 51 (Supplementary Table 2). These were considered as potential weak-positive samples. One negative control was found to contain 15 reads assigned to S. enterica, and a further four contained one or two reads, as did nine sample libraries, seven of which were not included in downstream analyses. The soil library and the remaining sample libraries were void of S. enterica reads (Fig. 2 Table  2). An additional MALT screen for traces of viral DNA revealed one notable taxonomic hit to Salmonella phage Vi II-E1, which is a phage associated with Salmonella serovars, including S. Paratyphi C, that produce the Vi capsule antigen 33 (Supplementary Methods 3  and Supplementary Table 3).

, Supplementary Methods 3 and Supplementary
To further authenticate and elucidate our findings, we performed whole-genome targeted array and in-solution hybridization capture 34,35 , using probes designed to encompass modern S. enterica genome diversity (Supplementary Methods 5-7 and Supplementary Table 6). All five pre-contact samples, the soil sample, one postcontact sample putatively negative for S. enterica based on our MALT screening, all negative controls, and both uracil DNA glycosylase (UDG)-treated (DNA damage removed) and non-UDGtreated libraries from the ten putatively positive samples (Tepos_10, Tepos_11, Tepos_20, Tepos_14, Tepos_34, Tepos_35, Tepos_36, Tepos_37, Tepos_38 and Tepos_41) were included in the capture (Supplementary Methods 6 and 7).
Mapping and genotyping of the captured Illumina-sequenced reads were performed using the S. Paratyphi C reference genome (NC_012125.1) (Supplementary Methods 6-8 and Supplementary Table 7). Capture of S. enterica DNA was successful for the ten

Fig. 2 | MAlT analysis and pathogen screening of shotgun data.
Shotgun data were analysed with MALT using a database constructed from all bacterial genomes available through NCBI RefSeq (December 2016). MALT results were visualized using MEGAN6 (ref. 28 ). The bar chart was constructed from the MEGAN6 output and is based on the per cent reads assigned to bacterial species when using a 95% identity filter. Reads assigned to S. enterica are coloured red regardless. Other taxa to which ≥ 3% reads, per sample, were assigned are colour-coded depending on whether they are 'environmental' or 'human oral microbiome' bacteria. Remaining taxa are sorted into two categories: 'other environmental' or 'other microbiome' (Supplementary Methods 3). Samples from the post-contact Grand Plaza epidemic cemetery containing S. enterica reads, pre-contact era samples from the churchyard cemetery and the soil sample are illustrated. In addition, a sample negative for S. enterica from the Grand Plaza cemetery (Tepos_27) is shown. Samples whose names are coloured in red are from the Grand Plaza and those in blue are from the churchyard cemetery. The percentage of reads in the shotgun data assigned by MALT per sample is indicated at the top of each column. Only taxa with ≥ 4 reads assigned are visualized.

Nature ecology & evolutioN
positive samples, yielding a minimum of 33,327 unique reads per UDG-treated library. The remaining bone samples, soil sample and negative controls were determined to be negative for ancient S. enterica DNA, with the exception of one negative control (EB2-091013) that had probably become cross-contaminated during processing (see Supplementary Methods 8 and Supplementary Table 7). Five complete genomes were constructed for Tepos_10, Tepos_14, Tepos_20, Tepos_35 and Tepos_37, covering 95%, 97%, 67%, 98% and 74% of the reference at a minimum of 3-fold coverage and yielding an average genomic coverage of 33-, 36-, 4.6-, 96-and 5.5-fold, respectively (Table 1). Artificial reads generated in silico for 23 complete genomes included in the probe design were also mapped to the S. Paratyphi C RKS4594 reference (Supplementary Methods 8  and Supplementary Table 6), and phylogenetic comparison revealed that the five ancient genomes clustered with S. Paratyphi C (Fig. 3 Table 8), the most common bacterial cause of enteric fever in humans today. This result excludes the possibility of a reference bias. Despite all five ancient genomes being contemporaneous, the Tepos_10 genome was observed to contain many more derived positions. An investigation of heterozygous variant calls showed that Tepos_10 has a much higher number of heterozygous sites. We believe that this is best explained by the presence of genetically similar non-target DNA that co-enriched in the capture for this sample alone. Based on the pattern of allele frequencies, this genome was excluded from downstream analyses (Supplementary Methods 9 and Supplementary Fig. 7). Tepos_20 and Tepos_37 were also excluded owing to their genomic coverage of < 6-fold, which allowed more reliable single-nucleotide polymorphism (SNP) calling at a minimum of 5-fold coverage for Tepos_14 and Tepos_35. Subsequent phylogenetic tree construction with 1,000 bootstrap replicates revealed 100% support and branch shortening for the Tepos_14 and Tepos_35 genomes in all phylogenies, supporting their ancient origin ( Fig. 3 and Supplementary Fig. 8).
SNP analysis for the ancient genomes together with the refer-  Table 9). Of these, 130 unique SNPs are shared between Tepos_14 and Tepos_35, supporting their close relationship and shared ancestry. The ydiD gene, which is involved in the breakdown of fatty acids 36 , and the tsr gene, which is related to the chemotaxic response system 37

Nature ecology & evolutioN
A region of the pil operon consisting of five genes, pilS, pilU, pilT, pilV and rci, was found in our ancient genomes and was absent in the S. Paratyphi C RKS4594 genome 38 (Supplementary Methods 12  and Supplementary Table 12). This region is located in Salmonella pathogenicity island 7 (SPI-7), and encodes a type IVB pili 39,40 . The version of pilV in our ancient genomes is thought to facilitate bacterial self-aggregation, a phenomenon that potentially aids in invasion of host tissues 39,40 (for details, see Supplementary Methods 12). A further presence/absence analysis was performed to evaluate additional virulence factors. These results are summarized in Supplementary Fig. 9 and Supplementary Methods 13 (See also Supplementary Table 13).
The S. Paratyphi C RKS4594 strain harbours a virulence plasmid, pSPCV, which was included in our capture design. It is present at 10-fold to 224-fold average coverage for the five genomes (Supplementary Methods 14 and Supplementary Tables 14 and 15).
Non-UDG capture reads mapped to the S. Paratyphi C genome (NC_012125.1) for Tepos_11, Tepos_34, Tepos_36, Tepos_38 and Tepos_41, that is, those that did not yield full genomes, had damage patterns that are characteristic of ancient DNA (Supplementary Fig. 3).
To further verify these reads as true ancient S. Paratyphi C reads, we investigated 45 SNPs unique to Tepos_14 and Tepos_35, which are included in our phylogenetic analysis (Supplementary Methods 11 and  Supplementary Table 11). Of the 45 positions, between 6 and 29 were identified at minimum 1-fold in these lower coverage genomes. All of these were in agreement with the unique SNPs present in the highcoverage ancient genomes, thus confirming their shared ancestry.

Discussion
Interpretations of ethnohistorical documents have suggested some form of typhus or enteric (typhoid/paratyphoid) fever (from the Spanish 'tabardillo' , 'tabardete' and 'tifus mortal'), viral haemorrhagic fever, measles or pneumonic plague as potential causes of the cocoliztli epidemic of 1545 ce (for refs, see Supplementary Discussion 1). These diseases present symptoms similar to those that were recorded in the cocoliztli outbreak, such as red spots on the skin, bleeding from various body orifices and vomiting (Supplementary Discussion 1 and Supplementary Fig. 10). Given the non-specific nature of these symptoms, additional sources of data are needed to clarify which disease (or diseases) was circulating. Previous investigation of sequencing data generated from the Teposcolula-Yucundaa material did not identify DNA traces of ancient pathogens; however, S. enterica was not considered as a candidate 41 . Here, we have isolated genome-wide data of ancient S. Paratyphi C from ten Mixtec individuals buried in the Grand Plaza epidemic cemetery at Teposcolula-Yucundaa, indicating that enteric fever was circulating in the indigenous population during the cocoliztli epidemic of 1545-1550 ce. As demonstrated here, MALT offers a sensitive approach for screening non-enriched sequence data in search for unknown candidate bacterial pathogens that were involved in past disease outbreaks, even to the exclusion of a dominant environmental microbial background. Most importantly, it offers the advantage of extensive genome-level screening without the need to specify a target organism, thus avoiding ascertainment biases that are common to other screening approaches. Fast metagenomic profiling tools that are based on k-mer matching, such as Kraken 42 , or specific diagnostic marker regions, such as MetaPhlan2 (ref. 43 ), have limitations in ancient DNA applications. Complete alignments are needed to authenticate candidate taxonomic assignments, and a small number of marker regions might not provide sufficient resolution for identification, as target DNA is often present in low amounts. Our focus on only bacterial and DNA viral taxa limits our resolution in identifying other infectious agents that may have been present in the population during the Teposcolula-Yucundaa epidemic.

Nature ecology & evolutioN
Although our discussions here have focused on a single pathogenic organism, the potential of its having acted synergistically with other pathogens circulating during the epidemic must be considered. The concept of syndemics and the complex biosocial factors that influence infectious disease transmission and severity are well documented in both modern and historical contexts 44,45 . We are currently limited to the detection of bacterial pathogens and DNA viruses included in the NCBI genomic database, although the resolution offered by MALT analyses will increase as this database grows. We have not investigated the presence of RNA viruses, as methods for RNA retrieval from archaeological tissues are underdeveloped and not supported by our current protocols 46 .
We confidently exclude an environmental organism as the source for our ancient genomes on the basis that: (1) S. Paratyphi C is restricted to humans, (2) it is not known to freely inhabit soil (our soil sample was negative for Salmonella during screening and after capture), (3) the deamination patterns observed for the ancient human and S. Paratyphi C reads are characteristic of authentic ancient DNA, and (4) the ancient S. Paratyphi C genomes display expected branch shortening in all constructed phylogenies. Moreover, we recovered all ancient genomic data from the pulp chambers of teeth collected in situ, increasing the likelihood of having identified a bacterium that was present in the individual's blood at the time of death. S. enterica introduction via post-burial disturbance is unlikely because the graves in the Grand Plaza were dug directly into the thickly paved floor at the site, and historical records indicate that Teposcolula-Yucundaa was abandoned shortly after the epidemic ended in 1552 ce 29,30 .
S. Paratyphi C is one of > 2,600 identified S. enterica serovars distinguished by their antigenic formula 47 . Only four serovars (S. Typhi and S. Paratyphi A, B, C), all of which cause enteric fever, are restricted to the human host 47 . Today, S. Typhi and S. Paratyphi A cause the majority of reported cases 48 . S. Paratyphi C is rarely reported 38,48 . Infected individuals shed bacteria long after the termination of symptoms 47 , and in the case of S. Typhi infection, 1-6% of individuals become asymptomatic carriers 49 . Following the hypothesis that this disease was introduced through European contact, it is conceivable that asymptomatic European carriers who withstood the cross-Atlantic voyage could have introduced S. Paratyphi C to Mesoamerican populations in the sixteenth century. First-hand descriptions of the 1545 cocoliztli epidemic suggest that both European and Mixtec individuals were susceptible to the disease 7,50 , with one estimate of a 60-90% population decline in New Spain during this period 7 .
The additional SPI-7 genes detected through insertion and deletion (indel) analysis are reported to vary in presence/absence among modern S. Paratyphi C strains 38,40 , and are suspected to cause increased virulence when the inverted repeats in pilV allow the Rci recombinase to shuffle between its two protein states (Supplementary Methods 12). This may support an increased capacity for our ancient strains to cause an epidemic outbreak. However, the overall mechanisms through which S. Paratyphi causes enteric fever remain unclear. The non-synonymous SNPs in the ydiD and tsr genes may signify adaptive processes, and comparison with a greater number of S. Paratyphi C genomes may clarify this 51 .
Today, S. Paratyphi C is rare in Europe and the Americas, with more cases identified across Africa and Asia 52,53 . Based on multilocus sequence typing (MLST) data from modern S. Paratyphi C strains, no clear phylogeographical pattern has been observed 52 . However, the presence of a 1200 ce S. Paratyphi C genome in Norway indicates its presence in Europe in the pre-contact era 51 , which would be necessary for it to be considered an Old World disease. However, based on the small number of pre-contact individuals that we have screened, we cannot exclude the presence of S. Paratyphi C at Teposcolula-Yucundaa prior to European arrival. A local origin for the cocoliztli disease has been proposed elsewhere 54 .
Historical accounts offer little perspective on its origin as neither the indigenous population nor the European colonizers had a pre-existing name for the disease 7,8,30 . Spanish colonial documents refer to it as "pujamiento de sangre" ('full bloodiness'), whereas the indigenous Aztec population of Central Mexico called it cocoliztli, a generic term meaning 'pestilence' in Nahuatl 7,8 (see Supplementary  Discussion 1).
Little is known about the past severity and worldwide incidence of enteric fever, which was first determined to be distinct from typhus in the mid-nineteenth century 55 . Enteric fevers are regarded as major health threats worldwide 48 , causing an estimated ~27 million illnesses in 2000, the majority of which were attributed to S. Typhi 56 . Owing to the rarity of S. Paratyphi C diagnoses, mortality rates are not established for this particular serovar. Today, outbreaks predominantly occur in developing countries. S. Typhi and S. Paratyphi are commonly transmitted through the faecaloral route via ingestion of contaminated food or water 57 . Changes imposed under Spanish rule, such as forced relocations under the policy of 'congregación' , altered living arrangements, and new subsistence farming practices 29,30 compounded by drought conditions 32 could have disrupted existing hygiene measures, facilitating S. Paratyphi C transmission.
Our study represents a first step towards a molecular understanding of disease exchange in contact era Mexico. The 1545 cocoliztli epidemic is regarded as one of the most devastating epidemics in New World history 7,32 . Our findings contribute to the debate concerning the causative agent of this epidemic at Teposcolula-Yucundaa, where we propose that S. Paratyphi C be considered. We introduced MALT, a novel fast alignment and taxonomic assignment method. Its application to the identification of ancient S. enterica DNA within a complex background of environmental microbial contaminants speaks to the suitability of this approach, and its resolution will improve as the number of available reference genomes increases. This method may be eminently useful for studies wishing to identify pathogenic agents involved in ancient and modern disease, particularly in cases for which candidate organisms are not known a priori.

Methods
The MALT algorithm. MALT is based on the seed-and-extend paradigm and consists of two programs: malt-build and malt-run.
First, malt-build is used to construct an index for the given database of reference sequences. To do so, malt-build determines all occurrences of spaced seeds [58][59][60] in the reference sequences and places them into a hash table 61 .
Following this, malt-run is used to align a set of query sequences against the reference database. To this end, the program generates a list of spaced seeds for each query and then looks them up in the reference hash table, which is kept in the main memory. Using the x-drop extension heuristic 25 , a high-scoring ungapped alignment that is anchored at the seed is computed and is used to decide whether a full alignment should be constructed. Local or semi-global alignments are computed using a banded implementation 62 of the Smith-Waterman 63 or Needleman-Wunsch 64 algorithms, respectively. The program then computes the bit-score and the expected value (E-value) of the alignment and decides whether to keep or discard the alignment depending on user-specified thresholds for the bitscore, the E-value or the per cent identity. The application of malt-run is illustrated in Supplementary Fig. 1.
The MALT screening pipeline. To use MALT in ancient DNA contexts to screen for bacterial DNA and to assess the taxonomic composition of ancient bacterial communities, we applied the following workflow ( Supplementary Fig. 2). First, we used malt-build to construct a MALT index on all complete bacterial genomes in GenBank 65 . This was done only once, and is rebuilt only when the target database requires updating. We align reads to the reference database using malt-run in semi-global mode. MALT generates output in RMA format and in SAM format. The RMA format can be used for interactive analysis of taxonomic composition in MEGAN 28 , and the SAM format can be used for alignment-based assessment of damage patterns and other authenticity criteria.
Sample provenience. The site of Teposcolula-Yucundaa is situated on a mountain ridge in the Mixteca Alta region of Oaxaca, Mexico. Prior archaeological excavation at this site revealed a large epidemic cemetery located in the Grand

Nature ecology & evolutioN
Plaza-the town's administrative centre-and an additional cemetery in the churchyard ( Fig. 1 and Supplementary Methods 1). Twenty-four teeth were collected from individuals buried in the Grand Plaza cemetery and five from individuals buried in the churchyard cemetery (Supplementary Methods 2 and  Supplementary Table 1). Soil samples were also collected from both cemetery sites (Supplementary Methods 2). DNA extraction and library preparation. DNA extracts and double-stranded indexed libraries that are compatible with Illumina sequencing were generated using methods tailor-made for ancient DNA [66][67][68] . This work was carried out in dedicated ancient DNA cleanroom facilities at the University of Tübingen and Harvard University (Supplementary Methods 2).
Screening with MALT. Amplified libraries were shotgun sequenced. The reads were adapter clipped and merged before being analysed with MALT, and the results were visualized in MEGAN6 (ref. 28 ) (Supplementary Methods 2 and 3). Two MALT runs were executed: the first using all complete bacterial genomes that were available through NCBI RefSeq (December 2016), and the second using the full NCBI Nucleotide database (ftp://ftp-trace.ncbi.nih.gov/blast/db/FASTA/) as reference to screen for viral DNA (Fig. 2 Runtime comparison. The programs MALT (version 0.3.8) and BLAST 25 (version 2.6.0+ ) were applied to the shotgun screening data of Tepos_35, consisting of 952,511 reads. For both programs, the DNA alignment mode (BLASTn) was chosen. The maximal E-value was set to 1.0. The maximal number of alignments for each query was set to 100. The minimal per cent identity was set to 95. The number of threads was set to 16. The alignment type of MALT was set to 'local' to be comparable to BLAST. The total amount of random acess memory (RAM) required by MALT during this run was 252.7 GB.
For MALT, the runtime was measured excluding the initial loading of our reference database, which happens only once when screening multiple samples. The loading of the database takes 27.27 min. Including taxonomic binning, the application of MALT to our complete shotgun screening data took 123.36 min. As a comparison, processing only the screening data of a single sample (Tepos_35) with BLASTn took 1,420.58 min without any taxonomic analysis. Processing of this sample alone with MALT, including taxonomic binning, took 6.48 min, which constitutes a 200-fold improvement in terms of computation time.
The computations were performed on a Dell PowerEdge R820 with four Intel Xeon E5-4620 2.2 GHz central processing units (CPUs) and 768 GB of RAM.
Probe design and whole-genome capture. Array probes were designed based on 67 publicly available S. enterica chromosomes/assemblies and 45 associated plasmid sequences (Supplementary Methods 5 and Supplementary Table 6). Extracts from samples that were deemed to be positive for S. Paratyphi C were converted into additional rich UDG-treated libraries 69 for whole-genome capture (Supplementary Methods 6 and 7). Pre-contact and post-contact samples were serially captured using our custom probe design, according to two established methods 35,70 . The eluate from both array and in-solution capture was sequenced to a sufficient depth to allow high-coverage genome reconstruction ( Supplementary Methods 6 and 7).
Sequence data processing, initial phylogenetic assessment and authenticity. The sequence data were adapter clipped and quality filtered before being mapped to the S. Paratyphi C reference (NC_012125.1) (Supplementary Methods 8 and  Supplementary Table 7). Deamination patterns for the DNA were generated to assess the authenticity of the ancient S. Paratyphi C DNA using mapDamage 23 (Supplementary Methods 8, Supplementary Table 7 and Supplementary Fig.  3). Artificial read data were generated for a data set of 23 genomes that were selected for comparative phylogenetic analysis; this data were also mapped to the S. Paratyphi C reference (Supplementary Methods 8). SNP calling was carried out with Genome Analysis Toolkit (GATK) using a quality score of ≥ 30 for the five S. Paratyphi C genomes and the artificial read data set. A neighbour-joining tree was constructed using MEGA6 (ref. 71 ), based on homozygous SNPs called at a minimum of 3-fold coverage where at least 90% of reads are in agreement (Supplementary Methods 8 and Supplementary Fig. 4). To exclude a reference bias in the ascertainment of the phylogenetic positioning of the five ancient genomes (Tepos_10, Tepos_14 Tepos_20, Tepos_35 and Tepos_37), mapping, SNP calling and tree construction were repeated for the S. Typhi CT18 reference (NC_003198.1) (Supplementary Methods 8, Supplementary Table 8 and Supplementary Fig. 5).
SNP typing and phylogenetic analysis. Homozygous SNPs were called from the complete data set (5 ancient and 23 modern) based on our criteria using a tool called MultiVCFAnalyzer (Supplementary Methods 9). Repetitive and highly conserved regions of the S. Paratyphi C genome (NC_012125.1) were excluded from SNP calling to avoid spurious mapping reads. Maximum parsimony 71 and maximum likelihood 72 trees were made that included the five genomes ( Fig. 3 and Supplementary Fig. 6). Heterozygous positions were also called, and their allele frequency distributions plotted using R 73 (Supplementary Methods 9 and Supplementary Fig. 7). SNP calling and phylogenetic tree construction were repeated, excluding the Tepos_10, Tepos_20 and Tepos_37 genomes (Fig. 3, Supplementary Methods 10 and Supplementary Fig. 8).
The five weak-positive samples-Tepos_11, Tepos_34, Tepos_36, Tepos_38 and Tepos_41-that did not yield enough data for genome reconstruction were investigated for 46 SNPs unique to the ancient genomes to verify that the captured reads for these samples are true ancient S. Paratyphi C reads (Supplementary  Methods 11 and Supplementary Table 11).
SNP effect and indel analyses. SNP effect analysis was carried out for the two ancient genomes (Tepos_14 and Tepos_35) alongside the modern data set (see Supplementary Methods 10). SNPs unique to the ancient genomes, pseudogenes and homoplastic positions were investigated (Supplementary Methods 10 and Supplementary Tables 9 and 10). Indels, ≥ 700 base pairs, in the two ancient genomes were identified through two approaches. Deletions were visually detected by mapping the ancient data to the S. Paratyphi C reference using a mapping quality threshold of 0 and manually viewing the genome alignment in the Integrative Genomics Viewer (IGV) (Supplementary Methods 12). To detect insertions, or regions present in the ancient genomes that are missing the modern reference, the ancient data were mapped to concatenated reference pairs. One reference was in all cases, the S. Paratyphi C RKS4594 reference (NC_012125.1), and the other was one of four S. enterica genomes of interest. A mapping quality threshold of 37 was used, thus allowing only regions unique to one or the other genome in the pair to map (Supplementary Methods 12 and  Supplementary Table 12).
Virulence factor analysis. Forty-three effector genes identified within S. enterica subsp. enterica 74 were investigated using the BEDTools suite 75 . The percentage of each gene that was covered at least onefold in the ancient and modern genomes in our data set was plotted using the ggplot2package 76 in R 73 (Supplementary Methods 13 and Supplementary Fig. 9).
Plasmid analysis. The ancient data were mapped to the S. Paratyphi C virulence plasmid, pSPCV. SNP effect analysis was carried out in comparison to three other similar plasmid references (Supplementary Methods 14 and Supplementary  Tables 14 and 15).

Data exclusions
Describe any data exclusions.
No data were excluded from the analyses. Three genomes (Tepos_10, Tepos_20, Tepos_37) were excluded partway through analyses due to the lower quality or lower coverage of these genomes, the reasons for which are explained in the main text and supplementary information.

Replication
Describe whether the experimental findings were reliably reproduced. Experimental findings were reproduced in that both non-UDG and UDG libraries from the same DNA extracts, for each of the ten positive individuals from the epidemic cemetery, yielded Salmonella enterica Paratyphi C genomic DNA after capture.

Randomization
Describe how samples/organisms/participants were allocated into experimental groups.
Samples were allocated to experimental groups based on the cemetery site from which the individuals were excavated from.

Blinding
Describe whether the investigators were blinded to group allocation during data collection and/or analysis.
Blinding was not necessary to this study as neither human participants or animals were used.