Genomic surveillance framework and global population structure for Klebsiella pneumoniae

K. pneumoniae is a leading cause of antimicrobial-resistant (AMR) healthcare-associated infections, neonatal sepsis and community-acquired liver abscess, and is associated with chronic intestinal diseases. Its diversity and complex population structure pose challenges for analysis and interpretation of K. pneumoniae genome data. Here we introduce Kleborate, a tool for analysing genomes of K. pneumoniae and its associated species complex, which consolidates interrogation of key features of proven clinical importance. Kleborate provides a framework to support genomic surveillance and epidemiology in research, clinical and public health settings. To demonstrate its utility we apply Kleborate to analyse publicly available Klebsiella genomes, including clinical isolates from a pan-European study of carbapenemase-producing Klebsiella, highlighting global trends in AMR and virulence as examples of what could be achieved by applying this genomic framework within more systematic genomic surveillance efforts. We also demonstrate the application of Kleborate to detect and type K. pneumoniae from gut metagenomes.


Virulence and AMR scores 183
Genomes are scored according to the clinical risk associated with the AMR and 184 virulence loci that are detected (see Methods). Here we take advantage of the 185 structured distribution of AMR and virulence determinants within the K. pneumoniae 186 population 14 to reduce the genotyping data to simple numerical summary scores that 187 reflect the accumulation of loci contributing to clinically relevant AMR or 188 hypervirulence: virulence scores range from 0 to 5, depending on the presence of key 189 loci associated with increasing risk (yersiniabactin < colibactin < aerobactin); 190 resistance scores range from 0 to 3, based on detection of genotypes warranting 191 escalation of antimicrobial therapy (ESBL < carbapenemase < carbapenemase plus 192 colistin resistance, see Table 1). These simple numerical scores facilitate downstream 193 analyses including trend detection. For example, analysis of a non-redundant subset of 194 9,705 publicly available K. pneumoniae genomes (see below, Table S2) showed 195 increasing AMR and virulence scores over time (barplots in Figure 1A-B). The 196 virulence and resistance scores were correlated not only with the prevalence of 197 individual components that contribute to the scores, but also with other components 198 that are co-distributed in the population (lines in Figure 1A-B). For example, the 199 AMR, virulence or convergence of both traits; such as specific K. pneumoniae 208 lineages or specimen types (see below). 209 210 Rapid genotyping of clinical isolates from a large-scale surveillance study 211 We applied Kleborate to analyse all K. pneumoniae clinical isolate genomes deposited 212 in RefSeq by the EuSCAPE surveillance study (927 carbapenem-non-susceptible, 697 213 carbapenem-susceptible; see Table S2) 33 . Kleborate rapidly and accurately 214 reproduced key findings from the original study, which were originally derived from 215 multi-step analyses comprising five independent tools and four independent databases 216 (each from a different public repository, one with additional manual curation): (i) 217 70.2% of carbapenem-non-susceptible genomes (n=651/927) carried carbapenemases, 218 mainly KPC-3, OXA-48, KPC-2 and NDM-1; (ii) these were dominated by a few 219 major clones, ST11, ST15, ST45, ST101, ST258, and ST512; (iii) individual countries 220 were associated with specific carbapenemase/clone combinations (see Figure 2A) location and year of isolation (see Methods). However, we cannot fully correct for 296 the sampling biases inherent in the public genome data and even after subsampling, 297 the 30 most common STs accounted for 63.4% of genomes (n³50 genomes each, 298 n=6,151 total; see Figure S4). Figure 5 shows the distribution of AMR and virulence 299 scores amongst non-redundant genomes from these 30 common K. pneumoniae STs 300 (n>50 per ST), each of which displays high rates of AMR and/or virulence. 301 302 AMR determinants 303 SHV β-lactamases conferring intrinsic resistance to the penicillins were detected in 304 85.9% of the 9,705 non-redundant K. pneumoniae genomes (ESBL forms of SHV 305 were detected in 10.0%). Acquired AMR was widespread (77.1% of genomes had at 306 least one gene or mutation conferring acquired AMR detected) and 71.6% of genomes 307 were predicted to be MDR (acquired resistance to ≥3 drug classes 48 ), a much higher 308 rate than is reported in most geographical regions 3,49-51 , reflecting the bias within 309 public genome collections. The majority of genomes had a non-zero resistance score, reflecting presence of ESBL and/or carbapenemase genes: 22.3%, 37.1% and 5.9% 311 genomes had resistance scores of 1, 2 and 3 respectively. Mean resistance scores 312 increased through time ( Figure 1B). This trend could be an artefact of sampling bias 313 towards the selective sequencing of AMR isolates, however it is consistent with the 314 increasing AMR rates reported in surveillance studies globally 52-54 . 315 316 Comparatively higher prevalence of acquired AMR genes was observed in some STs 317 ( Figure S4) Figure S5A-B), highlighting their mobile nature. The 323 notable exception was CTX-M-65, which appeared to be largely clone specific, 324 detected in only 9 STs and ST11 accounting for 96.7% of these genomes. 325

326
Colistin resistance determinants were detected in 8.7% of the non-redundant K. 327 pneumoniae genomes. These were mostly nonsense mutations in MgrB or PmrB 328 (83.5%) rather than acquisition of an mcr gene (15.8%, and an additional 6 genomes 329 with both acquired mcr and truncated MgrB/PmrB). The rate of detection ranged from 330 0-25.2% for the 30 most common STs, and was highest amongst ST512, ST437, 331 ST147, ST16 and ST258 (Figure S5C), each of which are also associated with high 332 rates of carbapenem-resistance. Porin mutations were detected in 37.9% of genomes 333 (34.0% OmpK35, 20.2% OmpK36, 16.3% both). High prevalence of specific porin 334 defects have been reported previously in some clones 41,42 , and this was reflected in 335 our analysis of ST258 and its derivative ST512. We observed OmpK35 truncations in 99.9% of non-redundant ST258 genomes (with or without truncations or substitutions 337 in OmpK36), and truncations in OmpK35 and/or OmpK36 in all ST512 (99.4% with 338 OmpK35 truncations, 94.4% with the OmpK36GD mutation, see Figure S5D). 339 340

Virulence loci 341
The prevalence of acquired siderophores and colibactin loci amongst non-redundant 342 K. pneumoniae genomes was 44.4% ybt, 7.5% clb, 11.2% iuc and 7.0% iro. The loci 343 were found across diverse K. pneumoniae STs (391 STs with ybt, 56 with clb, 144 344 with iuc, 108 with iro) but were rarely detected in other Klebsiella species (with the 345 exception of ybt among the K. oxytoca species complex, see Figure 4) indicating 346 frequent mobilisation within K. pneumoniae but not between species (Table S6, 347 Figure S6). Mean virulence scores increased through time ( Figure 1A). Figure 5B (Table S6). Figure S7A shows the frequency of iuc lineages 360 in K. pneumoniae STs with ≥20 non-redundant genomes and at least one genome 361 harbouring iuc. There were four STs for which >60% genomes harboured iuc, and only a single iuc lineage was detected in each (iuc1 in ST23, ST65, ST86; iuc2A in 363 ST82), consistent with long-term persistence of a specific virulence plasmid in these 364 well-known hypervirulent clones. In contrast, iuc was less frequent among other STs, 365 several of which were associated with multiple iuc lineages (e.g. ST231, ST25, 366 ST35), consistent with more recent and/or transient virulence plasmid acquisitions 367 (mostly iuc1, followed by iuc3 and iuc5).  Table S8). 416

417
The most common virulence plasmid, KpVP-1 (iuc1 ± iro1), accounted for 54% of 418 virulence plasmid acquisition events (n=45 acquisitions), while iuc3 plasmids, the E. 419 coli derived iuc5 (±iro5) and iuc/iro unknown (i.e. novel or divergent iuc/iro loci) 420 accounted for 21%, 11% and 14%, respectively (Figure 7). AMR acquisitions by 421 hypervirulent clones involved the ESBL/carbapenemase genes that are most common 422 in the general K. pneumoniae population: KPC-2 (26%), OXA-232 (17%) and CTX-423 M-15 (18%). The majority of convergence events (87%) were associated with just a 424 small number of genomes (i.e. n≤3); however, five events were associated with >20 425 genomes in the complete dataset, which may indicate clonal expansion and 426 dissemination of the corresponding convergent strains locally and/or between 427 countries. One such event corresponded to the ST11-KPC + KpVP-1 deletion variant 428 strain that was originally reported in 2017 20 and has since been recognized as widely 429 distributed in China 20-24 . The complete public genome set (i.e. counting redundant 430 genomes) included 148 genomes corresponding to this specific ST11 convergence 431 event mostly from China but also from France (n=2). Notably though, this was only 432 one of 50 convergence events that we detected in China, including 8 involving 433 acquisition of iuc1 or iuc5 by ST11 (see Table S8, and interactive tree at Overall, convergent genomes were detected originating from most geographical 443 regions for which genome data was available, but some regions had many more 444 events than others (Figure 7, Table S8). This uneven distribution may stem from a 445 skew in the number of genomes available per region (e.g. due to variation in 446 accessibility or application of genome sequencing). Nevertheless, the number of 447 convergent genomes in the eastern, southeastern and southern parts of Asia were 448 noticeably high, driven by the frequency of convergence events detected in China 449 (n=50 events) and Thailand (n=26 events) as well as putative clonal expansions of 450 these strains as discussed above (Figure 7). Of  Another strength of our approach is the rich data output by Kleborate, which 507 facilitates in-depth investigation of population structure, AMR and virulence 508 epidemiology. This allows rapid exploration and understanding of: (i) hypervirulence-509 associated loci and the molecular drivers of their dissemination (Figures S4 and S7); 510 (ii) molecular mechanisms of complex AMR phenotypes e.g. carbapenem resistance 511 ( Figure 3); (iii) AMR and virulence trends (Figures 1, 5 and 6); (iv) emerging 512 convergent AMR-virulent strains so that they can be targeted for surveillance and 513 infection control (Figure 7); (v) overrepresented STs and genotypes, which may be 514 indicative of transmission clusters that should be targeted for further investigation (as antigen epidemiology, which can inform the design of novel vaccines and 517 therapeutics (Figure 2B-C). Notably Kleborate can also yield useful genotyping 518 results from metagenomics data (Figure S8), which is gradually being adopted for 519 clinical and surveillance applications relevant to K. pneumoniae. User interpretation 520 of Kleborate's extensive data output can be guided by the accompanying web-based 521 visualization app, Kleborate-Viz. Through this app, many of the analyses and plots 522 presented in this manuscript can be rapidly replicated, and further explored in an 523 interactive manner. 524

525
Kleborate is designed to facilitate detection and tracking of clinically relevant AMR 526 and virulence traits from genome data, and analysis of public data not only identified 527 specific clones and genes associated with one or the other of these traits (Figures 5,  528   6), but also 601 genomes in which the two converge (carrying iuc+ virulence 529 plasmids and ESBL and/or carbapenemase genes; Figure 7). We estimated at least 530 173 unique AMR-hypervirulence convergence events; the majority were detected 531 within a single isolate (n=119 events), but many others appear to be associated with 532 local outbreaks or larger-scale spread and apparently across multiple countries (Table  533   S8 (Table S2).  Table S8, assignment of genomes to events in Table S2). Circles are scaled by the number of total genomes linked to the event and colored to indicate whether convergence is