Genomic and antigenic diversity of carried Klebsiella pneumoniae isolates mirrors that of invasive isolates in Blantyre, Malawi

Klebsiella pneumoniae is an antimicrobial resistance (AMR) associated pathogen of global importance, and polyvalent vaccines targeting K. pneumoniae O-antigens are in development. Genomes from sub-Saharan Africa (sSA) are underrepresented in global sequencing efforts. We therefore carried out a genomic analysis of extended-spectrum beta-lactamase (ESBL)-producing K. pneumoniae complex isolates colonising adults in Blantyre, Malawi, placed these isolates in a global genomic context, and compared colonising to invasive isolates from the main public hospital in Blantyre. 203 isolates from stool and rectal swabs from adults were whole-genome sequenced and compared to a publicly available multicountry collection of 484 K. pneumoniae genomes sampled to cover maximum diversity of the species, 150 previously sequenced Malawian and 66 Kenyan isolates from blood or sterile sites. We inferred phylogenetic relationships and analysed the diversity of genetic loci linked to AMR, virulence, capsule (K-) and LPS O-antigen (O-types). We find that the diversity of Malawian Klebsiella isolates is representative of the species’ population structure, but with local success and expansion of sequence types (STs) ST14, ST15, ST340 and ST307. Siderophore and hypermucoidy genes were more frequent in invasive versus carriage isolates (present in 13% vs 1%, p < 0.001) but still generally lacking in most invasive isolates. The population structure and distribution of O-antigen types was similar in Malawian invasive and carriage isolates, with O4 being more common in Malawian isolates (14%) than in previously published studies (2-5%). We conclude that host factors, pathogen opportunity or alternate virulence loci not linked to invasive disease elsewhere are likely to be the major determinants of invasive disease in Malawi. Distinct ST and O-type distributions in Malawi highlights the need for geographically aware sampling to robustly define secular trends in Klebsiella diversity. Colonising and invasive isolates in Blantyre are similar and hence O-typing of colonising Klebsiella isolates may be a rapid and cost-effective approach to describe global diversity and guide vaccine development. Data Summary All data and code to replicate this analysis is available as the blantyreESBL v1.0.0 R package (https://doi.org/10.5281/zenodo.5554082) available at https://github.com/joelewis101/blantyreESBL. Reads from all isolates sequenced as part of this study have been deposited in the European Nucleotide Archive, and accession numbers (as well as accession numbers of publicly available genomes used in this analysis) are provided in the R package.


Abstract
Klebsiella pneumoniae is an antimicrobial resistance (AMR) associated pathogen of global importance, and polyvalent vaccines targeting K. pneumoniae O-antigens are in development. Genomes from sub-Saharan Africa (sSA) are underrepresented in global sequencing efforts. We therefore carried out a genomic analysis of extended-5 spectrum beta-lactamase (ESBL)-producing K. pneumoniae complex isolates colonising adults in Blantyre, Malawi, placed these isolates in a global genomic context, and compared colonising to invasive isolates from the main public hospital in Blantyre. 203 isolates from stool and rectal swabs from adults were whole-genome sequenced and compared to a publicly available multicountry collection of 484 K. 10 pneumoniae genomes sampled to cover maximum diversity of the species, 150 previously sequenced Malawian and 66 Kenyan isolates from blood or sterile sites.
We inferred phylogenetic relationships and analysed the diversity of genetic loci linked to AMR, virulence, capsule (K-) and LPS O-antigen (O-types). We find that the diversity of Malawian Klebsiella isolates is representative of the species' population 15 structure, but with local success and expansion of sequence types (STs) ST14, ST15, ST340 and ST307. Siderophore and hypermucoidy genes were more frequent in invasive versus carriage isolates (present in 13% vs 1%, p < 0.001) but still generally lacking in most invasive isolates. The population structure and distribution of O-antigen types was similar in Malawian invasive and carriage isolates, with O4 20 being more common in Malawian isolates (14%) than in previously published studies (2-5%). We conclude that host factors, pathogen opportunity or alternate virulence loci not linked to invasive disease elsewhere are likely to be the major determinants of invasive disease in Malawi. Distinct ST and O-type distributions in Malawi highlights the need for geographically aware sampling to robustly define secular 25 trends in Klebsiella diversity. Colonising and invasive isolates in Blantyre are similar and hence O-typing of colonising Klebsiella isolates may be a rapid and costeffective approach to describe global diversity and guide vaccine development.

Introduction
Klebsiella pneumoniae is a highly prevalent human gut colonizer 1 and opportunistic pathogen 2 which is often significantly associated with antimicrobial resistance (AMR) and has been identified by the World Health Organisation as a global priority AMR pathogen 3 . In Low-and Middle-Income Countries (LMIC) such as the nations of sub-55 Saharan Africa (sSA), AMR K. pneumoniae presents a significant therapeutic challenge. Malawi is a low-income country in South-East Africa, where 91% of K.
pneumoniae infections are now resistant to third-generation cephalosporins 4 (3GC), largely mediated through production of extended-spectrum beta-lactamases (ESBLs). In this and many other LMIC, 3GC are first-line antimicrobials for severe 60 febrile illness and alternatives with activity against ESBL-producers are often unavailable, rendering ESBL K. pneumoniae infections de facto untreatable with locally available antimicrobials.
Whole genome sequencing (WGS) has provided significant insight into the population structure of K. pneumoniae, which we now understand is a species 65 complex encompassing several subspecies 5 . Despite this diversity, WGS highlighted that the global spread of AMR is linked to clonal expansion of AMR-associated highrisk clones 2 and genomic loci associated with virulence 6 (including the hypermucoid phenotype 7 ) have been identified. Historically, antimicrobial resistance and virulence were associated with different K. pneumoniae populations, but convergence of AMR 70 genes in hypervirulent lineages is increasingly described, especially in South and South-East Asia 8 , resulting in community-acquired widely-disseminated or deep seated infections in otherwise healthy individuals that are difficult to treat 9 .
In response, K. pneumoniae vaccines are in development 10 , targeting LPS Oantigens, which can be predicted via a sequence-based typing scheme 11,12 . 75 Analyses of large-scale genome collections have provided important insights into the distribution and diversity across the species complex, which is essential to focus efforts to the clinically most relevant types 11,13 . However, whilst an initial 'global' collection 5 represented a milestone in K. pneumoniae genomics and was designed to cover the diversity in a multi-country effort providing our first insight into the 80 genomic plasticity, it is restricted to isolates from 12 countries, notably lacking any sSA representatives. Follow-up studies over the past years have also often focused on HIC clinical studies 14,15 , and genomes from sSA are drastically under-represented in genome datasets. There is an urgent need to investigate the genomic epidemiology of K. pneumoniae in this setting to assess whether conclusions from 85 largely HIC collections are valid for LMICs, a crucial requirement for a vaccine to be effective in these settings where, arguably, it may have the most benefit.
In addition, though colonisation with K. pneumoniae is thought to precede infection in many cases 1 , sequencing efforts have largely focused on invasive isolates. There is some evidence from elsewhere in the world that colonising and invasive isolates 90 differ 2 ; understanding this difference in sSA could help to define the determinants of infection in this setting. We therefore present the results of a genomic analysis of K.
pneumoniae from a study of colonisation with ESBL Enterobacterales in Blantyre, Malawi, with three aims: to describe the population structure, serotype diversity, AMR and virulence determinants of colonising K. pneumoniae in this setting; to 95 compare colonising to previously sequenced Malawian invasive isolates; and to relate these data to observations made in other parts of the world.

Methods
The isolates analysed in this study were colonising isolates selectively cultured from stool and/or rectal swabs collected from adults in Blantyre, Malawi, as part of a study 100 of longitudinal carriage of ESBL-producing Enterobacterales, as previously ARIBA v.2.14.6 27 was used to identify AMR-associated genes using the SRST2 curated version of the ARG-ANNOT database 28 and to call SNPs in the quinoloneresistance determining regions (QRDR) gyrA, gyrB, parC and parE, using the wildtype genes from the Escherichia coli K-12 substr. MG1655 (NC_000913.3) as 140 reference. Quinolone resistance was assumed to be conferred by QRDR mutations recorded in the Comprehensive Antibiotic Resistance Database 29 (CARD) as causing quinolone resistance in Enterobacterales. Beta-lactamases were considered to be extended spectrum based on the phenotypic classifications at https://ftp.ncbi.nlm.nih.gov/pathogen/betalactamases/Allele.tab. We explored 145 clustering of AMR genes using hierarchical clustered heatmaps of Jaccard distances of AMR gene presence using the base dist and hclust functions in R, visualized with the pheatmap package v1.0.2. ARIBA was also used to determine multilocus sequence type (ST) as defined by the 7-gene scheme 30  which were used to infer a phylogeny for Malawian isolates as described above, using the same nucleotide substitution model. To build a global phylogeny we included all Malawian isolates plus 66 genomes from Kenya 34 plus 288 genomes from the multi-country 5 collection, again using the same methods. In total, following QC, this analysis included 687 genomes; the pan-genome as constructed with Roary 175 comprised 49,385 genes, of which 2754 were core; these formed a concatenated pseudosequence of 0.95Mb with 200,622 variable sites, which were used to infer a phylogeny as above; ST and ESBL presence or absence were also inferred as  Figure 1). O-types were more likely to be encoded by multiple STs than K types: each O-type was associated with a median of six (IQR 2.5 -11.8) STs.  and qnrS (n=40) were also identified. No known genes conferring carbapenemase resistance were identified. Some AMR genes clustered together (e.g. strA with strB, blaCTX-M-15 with blaTEM-1 and sulII, and blaSHV-11 with aadA1-pm and blaOXA-10, Supplementary Figure 2), and some 245 of these AMR-gene clusters were lineage-associated (e.g. the blaSHV-11 aadA1-pm blaOXA-10 cluster with ST340, Supplementary Figure 3).  Tables 3 and 4). Both of these K-types were strongly associated with invasive isolates (9/10 KL62 and 9/9 KL43 were invasive) 255 and associated with multiple STs: KL43 was present in ST372 (6 isolates), ST106 (2 isolates), ST276 (1 isolate) and KL62 was present in ST644 (4 isolates), ST48 (3 isolates), ST348 (2 isolates) and ST4 and ST432 (one isolate each). All of these isolates contained O-type 1 or 2.

Discussion
Malawi, and sSA in general, is an area of the world that is undersampled in current 320  from diverse settings to guide vaccine development. We found that the O-type distribution for Malawian ESBL producing K. pneumoniae carriage isolates was similar to invasive isolates, suggesting that stool or rectal swab sampling with selective culture could be a cost-effective way to rapidly expand understanding of worldwide O-type distributions to guide vaccine development. This finding must be 395 confirmed in further sites before such a strategy could be adopted.
There are limitations to our study. Most importantly, our sampling scheme is not unbiased. ESBL-producing carriage isolates were selected for, and one of the Malawian studies providing invasive context genomes was an investigation of a K. pneumoniae outbreak on the QECH neonatal unit. This is likely to have introduced 400 bias into the collection of Malawian genomes, especially against classically hypervirulent but antimicrobial susceptible lineages. All Malawian genomes are from a single centre, which enables us to compare the population structure of carriage and clinical isolates but may limit generalisation to other settings in sSA. Multiple samples were cultured from single individuals and so were not independent, which could introduce bias, however most individuals were colonized by different strains at