The origins of haplotype 58 (H58) Salmonella enterica serovar Typhi

Antimicrobial resistance (AMR) poses a serious threat to the clinical management of typhoid fever. AMR in Salmonella Typhi (S. Typhi) is associated with the H58 lineage, which arose comparatively recently before becoming globally disseminated. To better understand when and how this lineage emerged and became dominant, we performed detailed phylogenetic and phylodynamic analyses on contemporary genome sequences from S. Typhi isolated in the period spanning the emergence. Our dataset, which contains the earliest described H58 S. Typhi, indicates that the prototype H58 organisms were multi-drug resistant (MDR). These organisms emerged spontaneously in India in 1987 and became radially distributed throughout South Asia and then globally in the ensuing years. These early organisms were associated with a single long branch, possessing mutations associated with increased bile tolerance, suggesting that the first H58 organism was generated during chronic carriage. The subsequent use of fluoroquinolones led to several independent mutations in gyrA. The ability of H58 to acquire and maintain AMR genes continues to pose a threat, as extensively drug-resistant (XDR; MDR plus resistance to ciprofloxacin and third generation cephalosporins) variants, have emerged recently in this lineage. Understanding where and how H58 S. Typhi originated and became successful is key to understand how AMR drives successful lineages of bacterial pathogens. Additionally, these data can inform optimal targeting of typhoid conjugate vaccines (TCVs) for reducing the potential for emergence and the impact of new drug-resistant variants. Emphasis should also be placed upon the prospective identification and treatment of chronic carriers to prevent the emergence of new drug resistant variants with the ability to spread efficiently.


58
Salmonella enterica serovar Typhi (S. Typhi) is the etiologic agent of typhoid fever, a disease 59 associated with an estimated 10.9 million new infections and 116,800 deaths annually. 1 The disease 60 classically presents as a non-differentiated fever and can progress to more severe manifestations or 61 even death. 2 Typhoid fever necessitates antimicrobial therapy, as the associated mortality rate in the 62 pre-antimicrobial era ranged from 10-30%; 3 presently, typhoid has a case fatality rate (CFR) of <1% 63 when treated with effective antimicrobials. 4 S. Typhi is spread via the faecal-oral route, typically 64 through the ingestion of contaminated food or water. 2 Therefore, high prevalence rates of typhoid 65 fever were historically associated with urban slums in South Asia with poor sanitation. 5 Recent 66 multicentre surveillance studies have demonstrated that typhoid fever is also a major problem in both 67 urban and rural areas in sub-Saharan Africa. 6-8 68 69 Given the importance of antimicrobials for the management and control of typhoid, antimicrobial 70 resistance (AMR) in S. Typhi has the potential to be a major public health issue. Indeed, the problem 71 of AMR in S. Typhi first appeared in the 1950s with the emergence of resistance against the most 72 widely used drug, chloramphenicol. 9 Multi-drug resistant typhoid (MDR; resistance to all first-line 73 antimicrobials chloramphenicol, trimethoprim-sulfamethoxazole, and ampicillin) was first identified 74 in the 1970s and became common in the early 1990s. 10,11 MDR in S. Typhi is frequently conferred by 75 self-transmissible IncH1 plasmids carrying a suite of resistance genes, include resistance determinants 76 for chloramphenicol (catA1 or cmlA), ampicillin (blaTEM-1D, blaOXA- 7), and co-trimoxazole (at 77 least one dfrA gene and at least one sul gene). 12 Lower efficacy of first-line antimicrobials led to the 78 increased use of fluoroquinolones, but decreased fluoroquinolone susceptibility became apparent in 79 the mid-1990s, and was widespread in South and Southeast Asia in the early 2000s. 13,14 Inevitably, as 80 treatment options have become limited, third-generation cephalosporins and azithromycin have been 81 used more widely for effective treatment of typhoid fever. [15][16][17] However, newly circulating 82 extensively-drug resistant variants of S. Typhi (XDR; MDR plus resistance to fluoroquinolones and 83 third generation cephalosporins) has left azithromycin as the only feasible oral antimicrobial for the 84 treatment of typhoid fever across South Asia. 18 We are arguably at a tipping point, as azithromycin-85 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 3, 2022. ; https://doi.org/10.1101/2022. 10.03.510628 doi: bioRxiv preprint 4 resistant S. Typhi has since been reported in Bangladesh, Pakistan, Nepal, and India, thereby 86 threatening efficacy of common oral antimicrobials for effective typhoid treatment. [19][20][21][22] If an XDR 87 organism were to acquire azithromycin resistance (single bae pair mutation), this would lead to what 88 Hooda and colleagues have referred to as pan-oral drug-resistant (PoDR) S. Typhi, which would 89 require inpatient intravenous treatment. 23 This would come at substantial additional cost to patients 90 and their families, and place additional strain on already overburdened health systems. [24][25][26] 91 92 In contrast to many other Gram-negative bacteria, S. Typhi is human restricted with limited genetic 93 diversity that can be described by a comparatively straightforward phylogenetic structure. 27 Therefore, 94 the phylogeny and evolution of S. Typhi provide a model for how AMR emerges, spreads, and 95 becomes maintained in a human pathogen. AMR phenotypes in S. Typhi are typically dominated by a 96 single lineage; H58 (genotype 4.3.1 and consequent sublineages), which was the 58 th S. Typhi 97 haplotype to be described in the original genome wide typing system. 28 This highly successful lineage 98 is commonly associated with MDR phenotypes and decreased fluoroquinolone susceptibility. 14

99
Previous phylogeographic analysis suggested that H58 emerged initially in Asia between 1985 and 100 1992 and then disseminated rapidly to become the dominant clade in Asia and subsequently in East 101 Africa. 14 H58 is currently subdivided into three distinct lineages -lineage I (4.3.1.1) and lineage II 102 (4.3.1.2), which were first identified in a pediatric study conducted in Kathmandu, 29

124
The main questions that we aimed to address with this study were: i) when and where did H58 S.

125
Typhi first emerge; ii) can we better resolve the evolutionary events that lead to the long branch 126 length observed for H58 S. Typhi; and iii) how quickly did this lineage spread and why? Therefore, to 127 investigate the origins of S. Typhi H58, data from United Kingdom Health Security Agency 128 (UKHSA, formerly Public Health England) containing information on stored S. Typhi organisms 129 isolated between 1980 and 1995 from travellers returning to the UK from overseas and receiving a 130 blood culture were analysed.

132
The database was queried and organisms were selected from the following three categories: i) 126 S.

133
Typhi with the E1 Phage type (which is considered to be associated with H58) 12

137
Typhi organisms meeting these criteria were randomly selected, revived, subjected to DNA extraction 138 and whole genome sequenced. Ultimately, our dataset was composed of 463 novel sequences 139 generated as a component of this study and 305 existing sequences 30,33,34 known to belong to the H58 140 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

168
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made   In order to contextualise these isolates to understand the evolutionary events leading to this clone, we 185 selected H58 and nearest neighbours (from genotypes 4.1 and 4.2) S. Typhi organisms (n=305) that 186 were already available in the public domain from previous studies 30,33,34 (Table S2) and generated a 187 phylogenetic tree combining these isolates with early H58 and nearest neighbour isolates from our 188 unpublished dataset (n=117). In our H58 and nearest neighbour dataset (n= 422, Figure 2), which 189 included both published data as well as our contemporary data, 17 countries were represented. 30 the earliest organism was isolated in 1983 in India, followed by two additional Indian isolates (1988) 195 that were also classified as H58. However, we can observe that there were no more recent isolates 196 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 3, 2022. 223 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 3, 2022.

233
Our data support the hypothesis that H58 S. Typhi was successful specifically because of the 234 acquisition and maintenance of an MDR plasmid. This selection meant that later gyrA mutations were 235 more likely to occur in this lineage given its dominance (and therefore, higher rates of replication 236 leading to additional opportunities for mutations to occur), as well as assumed frequent 237 fluoroquinolone exposure, given its existing MDR phenotype. Based on the results of our mapping, 238 we undertook further genetic analysis to identify non-synonymous SNPs unique to the early H58 239 isolates, as well as SNPs that were unique to early H58 isolates that were MDR. The motivation was 240 to explain the origins of the long branch length illustrated on Figure 1a, to infer why this lineage was 241 so globally successful, and identify genetic elements that may stabilize an MDR IncH1 plasmid.

243
We identified 16 unique non-synonymous SNPs that were exclusive to the early H58 isolates as 244 compared to precursor 4.1 and 4.2 organisms, the majority of which were present in genes associated 245 with central metabolism and outer membrane structures; one of which was associated with 246 pathogenicity (Table 1). Within the early H58 isolates that were also MDR, we identified an 247 additional 23 unique non-synonymous SNPs, most of which were found in genes encoding proteins 248 predicted to regulate metabolism, degrade small molecules, membrane/surface structures, as well as 249 regulators, pathogenicity adaptation, and information transfer ( Table 2). We additionally identified 250 mutations in a gene (t2518/STY0376) encoding a hypothetical protein with an EAL (diguanylate 251 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

259
The cumulative mutations signified by the observed long branch length is uncommon in S. Typhi and 260 has two feasible explanations. The first is that the progenitor organism was a hyper mutator, and that a 261 key mutation in mutS was responsible for generating a large amount of genetic diversity in a short 262 time frame. 42 However, no such informative SNPs were observed in the early H58 isolates, but any 263 mutations may have reverted. The second, and more likely explanation, is that the organism was in an 264 environment that created an atypical selective pressure to induce mutations that facilitated its ability 265 become exposed to, and then accept, an MDR plasmid. Our previous data on S. Typhi carriage in the 266 gallbladder determined that this environment creates an atypical selective pressure and stimulates 267 mutations in metabolism and outer membrane structures. 31,43 This genetic variation was associated 268 with organisms being located on signature long branches; our observations here are comparable. We 269 suggest that H58 S. Typhi became successful due to its early ability to accept and stabilise a large 270 MDR plasmid, which probably occurred whilst in the gallbladder; this one-off event and onward 271 transmission then created this successful lineage. Therefore, we speculate that gallbladder carriage  (Table S1).

355
Read alignment and SNP analysis 356 FastQC and FASTX-Toolkit bioinformatics pipelines were used to check the quality of raw reads. 58,59 357 Six samples were excluded from the analysis, one was determined to not be Salmonella, one appeared 358 to be comprised of multiple genotypes, and four samples were on a long branch length and were 359 conclused to be contaminated. Paired end reads for the remaining 464 samples were mapped to the S.

360
Typhi CT18 reference genome (accession number: AL513382) 60  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 3, 2022. ; https://doi.org/10.1101/2022.10.03.510628 doi: bioRxiv preprint 14 SNP calls. SNPs that did not meet predefined criteria (a minimal phred quality score of 30 and depth 364 coverage of 5 were filtered out). 63 A failed mapping sequence was defined as when <50% of total 365 reads mapped to the reference genome. 2 isolates were excluded from additional analysis after 366 mapping failed, due to depth coverage of less than 10 (as per the RedDog pipeline default). A 367 concatenation of core SNPs that were present in >95% of all genomes was generated and filtered to 368 exclude all SNPs from phage regions or repetitive sequences in the genome reference CT18 as 369 defined previously (Table S2). 60 75 To further test temporal signal, the TipDatingBeast R package was used to randomly 404 reassign the sampling dates of sequences 20 times to create date-randomized data sets. BEAST 405 analyses were conducted for these randomized data sets and the mean rates were compared between 406 runs. The data had sufficient temporal signal if the 95% credible interval of mean rates of the date-407 randomized datasets did not overlap with that of the original sampling dataset. 76         TipDatingBeast results (analysis incorporated n=345 isolates and an alignment of 724 nonrecombinant SNPs), demonstrating no overlap between the original mean rates of mutation and mean rates of date randomization. Tables   Table S1. Organism-level data and metadata for historical UKHSA S. Typhi isolates. Table S2. Organism-level data and metadata for published H58 and nearest neighbours S. Typhi isolates.

Supplementary
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 3, 2022. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted October 3, 2022. ; https://doi.org/10.1101/2022.10.03.510628 doi: bioRxiv preprint