A global Corynebacterium diphtheriae genomic framework sheds light on current diphtheria reemergence

Background Diphtheria, caused by Corynebacterium diphtheriae, reemerges in Europe since 2022. Genomic sequencing can inform on transmission routes and genotypes of concern, but currently, no standard approach exists to detect clinically important genomic features and to interpret emergence in the global C. diphtheriae population framework. Methods We developed the bioinformatics pipeline DIPHTOSCAN (available at https://gitlab.pasteur.fr/BEBP/diphtoscan) to extract from genomes of Corynebacteria of the diphtheriae species complex, medically relevant features including tox gene presence and disruption. We analyzed 101 human C. diphtheriae isolates collected in 2022 in metropolitan and overseas France (France-2022). To define the population background of this emergence, we sequenced 379 additional isolates (mainly from France, 2018-2021) and collated 870 publicly-available genomes. Results The France-2022 isolates comprised 45 tox-positive (44 toxigenic) isolates, mostly imported, belonging to 10 sublineages (<500 distinct core genes). The global dataset comprised 245 sublineages and 33.9% tox- positive genomes, with DIPHTOSCAN predicting non-toxigenicity in 16.0% of these. 12% of the global isolates, and 43.6% of France-2022 ones, were multidrug resistant. Convergence of toxigenicity with penicillin and erythromycin resistance was observed in 2 isolates from France-2022. Phylogenetic lineages Gravis and Mitis contrasted strikingly in their pathogenicity-associated genes. Conclusions This work provides a bioinformatics tool and global population framework to analyze C. diphtheriae genomes, revealing important heterogeneities in virulence and resistance features. Emerging genotypes combining toxigenicity and first-line antimicrobial resistance represent novel threats. Genomic epidemiology studies of C. diphtheriae should be intensified globally to improve understanding of reemergence and spatial spread.


82
Although early 1930s literature suggested a higher virulence of isolates of biovar Gravis (McLeod, 1943; 83 Barksdale, 1970), it is unknown whether this historical observation applies to extant diphtheria cases, as 84 recent Gravis isolates are more rarely tox-positive than those of biovar Mitis (Hennart et (Bolt et al., 2010), and its core-genome MLST (cgMLST) extension and 97 associated nomenclature of sublineages and genomic clusters (Guglielmini et al., 2021).

98
Here, we aimed to provide insights into the France 2022 diphtheria emergence by reporting on its 99 epidemiology and by placing the involved isolates in the global genomic context of C. diphtheriae 100 populations. We introduce DIPHTOSCAN

120
In addition, a total of 1,249 comparative genomes were included (Table S1) Table S3.

155
The presence of the diphtheria toxin tox gene was determined by real-time PCR assay (Badell et   Pasteur database was used for the Corynebacterium cgMLST typing (Guglielmini et al., 2021). To facilitate 174 the use of these resources and avoid redundancy in the curation of the two independent genomic libraries, 175 a merging of the databases was decided in agreement with PubMLST administrators. In order to merge the 176 data available in the two databases, we proceeded as per BIGSdb dual design: isolates genomes and 177 provenance data were imported into the "isolates" database, whereas allelic definitions of MLST were 178 imported into the "seqdef" database.

179
Regarding the isolates database, we first downloaded Oxford's PubMLST C. diphtheriae database.

180
To avoid isolate entries duplication, we identified common isolates between the two databases, and filtered 181 duplicate isolates before import into the Pasteur database. In total, 684 out of 934 (73%) isolates from the 182 Oxford database were imported. To facilitate the tracing of isolates and their possible previous existence in 183 Oxford's database, isolates identification numbers (BIGSdb-Pasteur ID number) of isolates from the Oxford 184 database were numbered from 1,520 to 2,003. We also collated them into a public project collection called 185 "Oxford" (project ID 13).

186
Regarding the sequence and profiles definition database, we imported MLST alleles and profiles 187 into an initially void MLST scheme container within the BIGSdb-Pasteur database. MLST analysis was 188 performed on all isolates of the BIGSdb-Pasteur database, including the ones imported from Oxford, which 189 were therefore assigned the same MLST genotype as previously in the Oxford database.

207
Genomes were classified using the single-linkage cluster-profile.pl function of BIGSdb into genomic 208 clusters (25 mismatch threshold) and sublineages (500 mismatches). Sublineages were attributed numbers 209 7 by using an ST inheritance rule (Hennart et al., 2022), which was applied from SL1 to SL744, after which the 210 numbers are attributed consecutively with no reference to MLST identifiers, starting at 10,000 (see column 211 'SL' in Table S1).

213
Phylogenetic analysis based on a core genome 214 Panaroo v1.2.3 was used to generate from the assembled genomic sequences, a core genome used 215 to construct a multiple sequence alignment (cg-MSA). The genome sequences were first annotated using 216 prokka v1.14.5 with default parameters, resulting in GFF files. Protein-coding gene clusters were defined 217 with a threshold of 70% amino acid identity, and core genes were concatenated into a cg-MSA when present 218 in 95% of genomes. IQtree version 2 was used to build a phylogenetic tree based on the cg-MSA, with the 219 best fitting model TVM+F+R5. The tree was constructed from 1,948 core genome loci, for a total alignment

377
Besides the above four frequent sublineages, six additional tox-positive sublineages were isolated: 378 three isolates of sublineage SL486 associated with Senegal and Tunisia; two SL852 isolates associated with 12 Mali; and one SL466 isolate associated with travel from Afghanistan and one SL464 isolate associated with 380 Thailand. SL91 comprised one non-toxigenic, tox-gene bearing (NTTB) isolate, and SL830 comprised 2 381 isolates: one tox-positive and one tox-negative.

382
Besides, there were 31 tox-negative sublineages, which were typically isolated once or twice only; a 383 notable exception was SL297, which comprised six tox-negative isolates associated with travel from Egypt, 384 Senegal, and Mali (Figure 1).  (Figure 3). Sublineages appeared to be grouped according to biovars Gravis 420 (and its spuA marker gene) and Mitis as previously noted (Hennart et al., 2020), as they formed two main 421 lineages named Gravis (green branches) and Mitis (purple), defined by the presence of the spuA gene (Table   422   S1). cgMLST-defined sublineages were highly concordant with the phylogeny and often comprised more 423 than one 7-gene ST (Figure 3; Table S1). The frequent tox-positive sublineages SL377 and SL384 were 424 phylogenetically related within lineage Gravis (Figure 3), suggesting they share ancestrally-acquired genetic 425 features.

426
We placed within this population background, the France-2022 isolates (Figure S6), which appeared to 427 be dispersed in multiple branches of the global phylogeny. The isolates previously collected by the French 428 reference laboratory appeared even more diverse and largely dispersed across the global phylogenetic 429 diversity of C. diphtheriae (Figure S6), indicating that a large fraction of the global diversity has been 430 sampled by the French surveillance system.

431
Ribotyping was previously used as a classification and nomenclature system of C. diphtheriae strains 432 (Grimont et al., 2004;Mokrousov, 2009). The 71 ribotype reference strains sequenced herein or previously 433 (Hennart et al., 2020) were placed in the global phylogeny (Figure S7), showing that these strains are highly Elek-positives, 195 (98.5%) were predicted to be toxigenic by 466 DIPHTOSCAN, whereas 1 was predicted to be non-toxigenic and for two isolates the tox gene was not 467 detected. Of the Elek-negative isolates, 11 (50.0%) were predicted as non-toxigenic by DIPHTOSCAN. Thus, 468 tox detection by DIPHTOSCAN was both sensitive and specific, whereas toxigenicity prediction was highly 469 sensitive but not highly specific, likely due to unexplained non-toxigenicity in isolates with a full-length toxin 470 gene. However, we did not observe non-toxigenicity-associated variation in the promoter region of the tox 471 gene, nor on the DtxR protein sequence.   Table S1).

512
Antimicrobial resistance genes were dispersed across the global C. diphtheriae phylogenetic tree 513 (Figure 3). The distribution of resistance at the sublineage level showed that just above half of the 514 sublineages (128; 52.0%) comprised at least one strain with at least one resistance genomic feature (Table   515   S1). The two sublineages with the most resistant strains were SL8 (the main sublineage involved in the ex-516 USSR outbreak; 46 strains) and SL377 (17 strains) (Figure 2). 19 sublineages carried at least one multidrug 517 resistant isolate, and SL377 and SL405 were the most frequent of these (Figure 2).

531
Antimicrobial susceptibility phenotypes were determined for the France-2022 dataset, and were 532 highly concordant with the presence of resistance features (Table S4). Resistance to penicillin and 533 macrolides was associated with pbp2m and ermX, respectively, although some ermX-carrying isolates 534 remained susceptible to erythromycin (Table S4).

535
We included in DIPHTOSCAN a search for integrons, which may harbor multiple resistance genes in 536 C. diphtheriae (Barraud et al., 2011;Arcari et al., 2023). In the global dataset, we identified 45 (4.6%) isolates 537 carrying integrons (including integrase-less ones, i.e., CALINs) (Table S1), which were highly dispersed in the 538 phylogeny (not shown). In France-2022, we found the presence of complete integrons in 9 isolates and 539 integrase-less integrons in 9 additional isolates (18; 17.8%). These structures were strongly associated with 540 antimicrobial resistance, particularly to trimethoprim and sulfonamides (Figure 1; Table S1). The presence within the same isolates of multidrug resistance and toxigenicity could cause 551 particularly threatening infections. We therefore explored the co-occurrence of these two genotypes 552 (Figure 2). In the global dataset, 57 (5.8%) isolates were both multidrug resistant and tox-positive. The 553 majority of these isolates belonged to a few sublineages (Figure 2), including SL377, which comprised 9 tox-554 positive multidrug resistant isolates mostly from India (and also observed in France-2022). Eight convergent 555 isolates of SL301 were also observed from India, Austria and Syria. SL453 had three tox-positive multidrug 556 resistant isolates, which were isolated in Spain and France with links to Afghanistan (Arcari et al., 2023). In 557 metropolitan France, there were 22 tox-positive isolates that were multidrug resistant (21.8%), with SL377 558 and SL696 being predominant among these (Table S1, Figure 1).

559
Regarding resistance genes to first-line treatments, there was not a single isolate carrying at the 560 same time tox, pbp2m and ermX in the global dataset (Table S1). However, in France-2022, SL852 isolates 561 (from two patients with travel history from Mali) were tox-positive and carried pbp2m and ermX.

562
Furthermore, they carried other resistance genes including cmx, sul1, dfrA1, and in addition tet33 and 563 aadA15 for isolate FRC1688. This latter isolate only lacked resistance features to quinolones and rifampicin.

564
No other isolate of this particularly concerning sublineage (SL852) was found in the global dataset.

567
Biovars represent an early attempt to discriminate among C. diphtheriae strains (Anderson et al., 1931) 568 and are still commonly reported. We found that lineages Mitis and Gravis, defined genetically based on the 569 19 presence of the spuA gene probably involved in starch utilization, correspond to two distinct parts of the 570 phylogenetic tree (Figure 3) as previously reported (Hennart et al., 2020;Guglielmini et al., 2021). Note that 571 the match between lineage and spuA or biovar phenotype is not absolute, as a few isolates within the Gravis 572 branch were spuA-negative (in particular SL625, SL130, SL102, and SL377) and 42 (5.1%) isolates of the Mitis 573 lineage were spuA-positive. Among the France-2022 isolates, for which biovars were in addition determined 574 phenotypically, the two biovars were also phylogenetically distinct (Figure 1) Table S2 for 580 pathogenesis involvement evidence). These include genes involved in iron and heme acquisition, fimbriae 581 biosynthesis and assembly, and other adhesins (Ott et al., 2022).

582
Screening for these genes in the global dataset revealed highly heterogeneous patterns of presence 583 and phylogenetic distribution (Table S1; Figure S10). We found that a number of virulence factors are highly 584 conserved within C. diphtheriae; for example, DIP1546 was present in all genomes except in 28_DSM43988, 585 and DIP0733, DIP1281, DIP1621, and DIP1880 were fully conserved ( Table S1). The corynebactin transport 586 (ciuA-D) gene cluster was present in all genomes, with one exception, whereas the corynebactin synthesis 587 (ciuEFG) locus was absent or incomplete in only 5.4% of genomes (n=29 Mitis, n=25 Gravis); of these, 33 588 lacked the ciuE gene, which is essential for siderophore synthesis. One of the genomes lacking ciuE 589 corresponds to the vaccine strain PW8, which is defective for corynebactin synthesis (Russell & Holmes, 590 1985). The heme-acquisition genes hmuTUV were also largely conserved (921 genomes; 94.4%).

591
In contrast, some genes were infrequent: DIP2014, a gene encoding for a BigA-like adhesin, was 592 detected in only a few sublineages of the Gravis branch (133 isolates), and the DIP0543 (also known as 593 nanH, coding for a sialidase) was present in only a few sublineages distributed across the phylogeny (not 594 shown).

595
Remarkably, we uncovered a sharp divide between lineages Gravis and Mitis in terms of iron 596 metabolism-associated genes, fimbriae gene clusters and other genes ( Figure S10). The putative 597 siderophore synthesis and transport operon irp2ABCDEFI-irp2JKLMN was strongly associated with the Mitis 598 lineage: 513 out of 567 (90.5%) Mitis isolates were irp2-positive, whereas only 1 of 406 Gravis isolates was 599 irp2-positive. The iron transport cluster irp1ABCD was also mainly present in the Mitis lineage. Differently, 600 the htaA gene, which is part of the same gene cluster as hmuTUV and codes for a membrane protein that 601 binds hemoglobin, was absent or truncated in most genomes from the Mitis branch (92.1%), whereas it was 602 largely conserved in the Gravis branch (99.8% htaA-positive). Similar to htaA, genes chtA and chtB, which 603 have sequence and functional similarity to htaA and htaB, were also strongly associated with the Gravis

639
However, so far, our understanding of diphtheria reemergence has been hindered by a lack of background 640 knowledge on the population diversity of C. diphtheriae, its sublineages of concern and the epidemiology 641 of their local or global dissemination. Here, we report on a sharp increase in tox-positive C. diphtheriae in 642 France in 2022, and developed a bioinformatics pipeline, DIPHTOSCAN, which enables to harmonize the way 643 genomic diversity and genetic features of medical concern are detected, reported and interpreted. We 644 illustrate how this novel tool provides clinically-relevant genomic profiling and evolutionary understanding 645