Genomic epidemiology of a complex, multi-species plasmid-borne blaKPC carbapenemase outbreak in Enterobacterales in the UK, 2009-2014

Carbapenem resistance in Enterobacterales is a public health threat. Klebsiella pneumoniae carbapenemase (encoded by alleles of the blaKPC family) is one of the commonest transmissible carbapenem resistance mechanisms worldwide. The dissemination of blaKPC has historically been associated with distinct K. pneumoniae lineages (clonal group 258 [CG258]), a particular plasmid family (pKpQIL), and a composite transposon (Tn4401). In the UK, blaKPC has caused a large-scale, persistent outbreak focused on hospitals in North-West England. This outbreak has evolved to be polyclonal and poly-species, but the genetic mechanisms underpinning this evolution have not been elucidated in detail; this study used short-read whole genome sequencing of 604 blaKPC-positive isolates (Illumina) and long-read assembly (PacBio)/polishing (Illumina) of 21 isolates for characterisation. We observed the dissemination of blaKPC (predominantly blaKPC-2; 573/604 [95%] isolates) across eight species and more than 100 known sequence types. Although there was some variation at the transposon level (mostly Tn4401a, 584/604 (97%) isolates; predominantly with ATTGA-ATTGA target site duplications, 465/604 [77%] isolates), blaKPC spread appears to have been supported by highly fluid, modular exchange of larger genetic segments amongst plasmid populations dominated by IncFIB (580/604 isolates), IncFII (545/604 isolates) and IncR replicons (252/604 isolates). The subset of reconstructed plasmid sequences also highlighted modular exchange amongst non-blaKPC and blaKPC plasmids, and the common presence of multiple replicons within blaKPC plasmid structures (>60%). The substantial genomic plasticity observed has important implications for our understanding of the epidemiology of transmissible carbapenem resistance in Enterobacterales, for the implementation of adequate surveillance approaches, and for control. IMPORTANCE Antimicrobial resistance is a major threat to the management of infections, and resistance to carbapenems, one of the “last line” antibiotics available for managing drug-resistant infections, is a significant problem. This study used large-scale whole genome sequencing over a five-year period in the UK to highlight the complexity of genetic structures facilitating the spread of an important carbapenem resistance gene (blaKPC) amongst a number of bacterial species that cause disease in humans. In contrast to a recent pan-European study from 2012-2013(1), which demonstrated the major role of spread of clonal blaKPC-Klebsiella pneumoniae lineages in continental Europe, our study highlights the substantial plasticity in genetic mechanisms underpinning the dissemination of blaKPC. This genetic flux has important implications for: the surveillance of drug resistance (i.e. making surveillance more difficult); detection of outbreaks and tracking hospital transmission; generalizability of surveillance findings over time and for different regions; and for the implementation and evaluation of control interventions.


8
Antimicrobial resistance (AMR) in Enterobacterales is a critical public health threat.

9
Carbapenem resistance is of particular concern, and outbreaks involving multiple In addition to short-read data, to resolve genetic structures fully we obtained long- from other UK locations). These included the two presumed earliest bla KPC isolates 2 3 0 from both CMFT and UHSM, as well as isolates sharing the same species/ST but with 2 3 1 different plasmid replicon combinations or from North West regional versus national 2 3 2 locations, same-species isolates with different STs, and isolates of different species. short-read). These two assemblies were excluded, leaving 21 assemblies for further 2 3 8 analysis (Table S1).  intermediates (21)), and a short linear bla KPC contig (~18kb). We observed bla KPC in multiple plasmid backgrounds (Fig.3), including a majority of between STs and species (Fig.4). In addition to their plasticity, part of the success of these bla KPC plasmids may also be short-read data is sub-optimal, we compared all short-read sequences with our were shared across a median (IQR) of 3 (1-6) STs, with pKpQIL-like plasmids being 2 8 8 most widespread across species/STs (7 species, 75 STs), and clearly playing a major dataset included those fully resolved by long-read sequencing performed within this 2 9 2 study, some of which were seen in ≥ 5% of study isolates (e.g. pKPC-trace75 [a non-2 9 3 typeable replicon]), and in non-North-West settings, likely reflecting recombination  We present the largest WGS-based analysis of bla KPC -positive isolates (n=604) to our 2 9 9 knowledge, focused on assessing genetic diversity around the carbapenemase gene sampling frame from UK regional and national collections, over five years. bla KPC 3 0 2 remains one of the three most common carbapenemases observed in the UK,  Our study provides an interesting context in which to consider the findings of a "nearest-neighbours" in their data, the EuSCAPE team found 51% of bla KPC -K. pneumoniae were most closely related to another isolate from the same hospital. The However, instead of clonal expansion, in our study we found rapid dissemination of 3 2 7 mobile backgrounds supporting bla KPC-2 , similar to observations from sequencing of 3 2 8 other polyclonal bla KPC outbreaks reported elsewhere, including the US(6, 27).

2 9
Tn4401a, associated with high levels of bla KPC expression(28), has been previously 3 3 0 predominantly seen in K. pneumoniae, and in isolates from the US, Israel and Italy, ATTGA motif into CMFT/North-West England and subsequent horizontal spread.

4
Notably, as in EuSCAPE, 46/72 (64%) singleton isolates we sampled from UK 3 3 5 hospitals were also CG258, but our detailed sampling within a region reflected a very 3 3 6 different molecular epidemiology. Although the EuSCAPE study is large and impressive, its breadth may have been limiting in understanding regional diversity -3 3 8 for example, the subset of bla KPC -K. pneumoniae from the UK that were analysed in 3 3 9 EuSCAPE consisted of 11 isolates submitted from six centres 3 4 0 (https://microreact.org/project/EuSCAPE_UK). The focus was also more on analysing 3 4 1 species-specific clonal relationships, with no analysis of other species or MGEs. Although in our study diversification occurred at all genetic levels (Tn4401+TSSs, harbouring resistance genes (Fig.4). This was also shown to be relevant in a previous large-scale analysis of AMR gene outbreaks. There are several limitations to our study. The reconstructed genomes generated using 3 8 8 long-read PacBio data remained incomplete (49% of all contigs uncircularised).

8 9
Improvements in long-read technology and assembly approaches will likely overcome 3 9 0 this(30). Our short-read and long-read datasets were generated from the same frozen 3 9 1 stocks of isolates, but from separate sub-cultures (because we used the short-read data 3 9 2 to inform selection for long-read sequencing); ideally they would have been generated selection, and this may have led to short plasmid sequences (<15kb) being lost. Our In conclusion, our large analysis highlights the difficulty and complexity of these 4 0 9 outbreaks once important AMR genes have "escaped" the genetic confines of molecular epidemiology. It also demonstrates that regional differences in AMR gene

2 4
Study isolates and setting 1 9 We sequenced archived carbapenem-resistant Enterobacterales isolates from two sequenced as part of regional and national surveillance undertaken by Public Health Ethical approval was not required as only bacterial isolates were sequenced, and their Genomic tip 100/G kit (Qiagen, Netherlands). DNA extracts were initially sheared to 4 5 8 an average length of 15kb using g-tubes, as specified by the manufacturer (Covaris).

5 9
Sheared DNA was used in SMRTbell library preparation, as recommended by the the PacBio RSII sequencing system. were annotated using PROKKA (version 1.11)(40); annotations were used to 5 0 2 determine genes known to encode toxin-antitoxin systems, heavy metal resistance, 5 0 3 and anti-restriction mechanisms. representative of the known target site signature sequence indicative of Tn4401 pairwise similarity between any two plasmid sequences p i and p j. The similarity was 5 1 9 defined as a function of their lengths l i , l j, and the aligned bases l ij , l ji as reported by: The score was designed to penalise differences in length of the compared sequences, i.e. to make sequences of different lengths proportionately more different. The with sparse similarity matrix and uneven cluster size and cluster number(46), and 5 2 6 resulted in 34 clusters of 1-43 plasmids per cluster (Table S3). The largest cluster was database used for bla KPC plasmid typing in this study. Subsequently, bla KPC plasmid typing for each study isolate sequence was performed 5 3 4 as follows: (1) assembled sequences for each isolate were BLASTed (BLASTn) 5 3 5 against KPC-pDB; (2) any >1kb contig with >90% nucleotide identity and >80% total figures 1, 2, 5 and S1 were generated using ggplot2 in R (version 1.1.463). Figure 4 was generated using the GenomeDiagram package(47) in Biopython(48). isolates by microbiology and clinical teams from contributing UK hospitals, and from 5 4 9 Martin Cormican and the contributing laboratories in Dublin, Republic of Ireland. We Medical Microbiology consortium (Oxford). Contemporaneous investigation by CMFT, UHSM and PHE was undertaken as part England. NS is funded by a PHE/University of Oxford Academic Clinical 5 6 5 Lectureship. TEAP, DWC and ASW are NIHR Senior Investigators. plasmids). Plasmids from isolates from the wider UK collection are denoted with a 8 0 2 "*". were re-orientated to start at IncFII for the purposes of alignment visualization (this 8 0 7 also includes incomplete sequences, for which the exact structure and order may shown. Shading between sequences denotes regions of homology, with light pink order of sequences is adjusted to highlight genetic overlap between sequences, but not 8 1 3 to imply any specific direct exchange events. and date. Dots are coloured by location of isolate collection, as defined in Methods. Dots represent estimated copy number for single isolates; boxplots represent median 8 2 1 estimated bla KPC copy number +/-1.58*IQR/sqrt(n). For species assignations, "Eclo" 8 2 2 = Enterobacter cloacae, "Ecol" = Escherichia coli, "Ente" = Enterobacter spp., pneumoniae, "Raou" = Raoultella ornithinolytica, "Serr" = Serratia marcescens.   read (PacBio) assemblies and reconstructed plasmid structures.