Title. Genome wide association studies define minimal set of loci for prediction of penicillin and tetracycline susceptibility in Neisseria gonorrhoeae

Background. While Neisseria gonorrhoeae poses an urgent public health threat because of increasing antimicrobial resistance, much of the circulating population remains susceptible to historical treatment regimens. Point-of-care diagnostics that report susceptibility could allow for reintroduction of these regimens, but development of such diagnostics has been limited to ciprofloxacin, for which susceptibility can be predicted from a single locus. Methods. We assembled a dataset of 12,045 N. gonorrhoeae genomes with phenotypic resistance data for tetracycline (n = 3,611) and penicillin (n = 6,935). Using conditional genome wide association studies (GWAS), we sought to define genetic variants associated with susceptibility to penicillin and tetracycline. We evaluated the sensitivity and specificity of these variants for predicting susceptibility and non-resistance in our collection of gonococcal genomes. Findings. In our conditional penicillin GWAS, the presence of a genetic variant defined by a non-mosaic penA allele without an insertion at codon 345 was significantly associated with penicillin susceptibility and had the highest negative effect size of significant variants ( p = 5.0 x 10 -14 , β = -2.5). In combination with the absence of bla TEM , this variant predicted penicillin susceptibility with high specificity (99.8%) and moderate sensitivity (36.7%). For tetracycline, the wild type allele at rpsJ codon 57, encoding valine, was significantly associated with tetracycline susceptibility ( p = 5.6 x 10 -16 , β = -1.61) after conditioning on the presence of tetM . The combination of rpsJ codon 57 allele and tetM absence predicted tetracycline susceptibility with high specificity (97.2%) and sensitivity (88.7%). predict


Introduction.
Gonorrhea, caused by infection with Neisseria gonorrhoeae, is the second most reported notifiable infection in the United States at a rate of 188.4 cases per 100,000 people in 2019 1 and increasing antibiotic resistance has made it an urgent public health threat. 2 Treatment is empiric, and resistance has limited the recommended treatment in the US to ceftriaxone, an extended spectrum cephalosporin (ESC). 3 Despite the emergence of multidrug resistant strains, 4,5 a large fraction of clinical isolates remain susceptible to multiple antibiotics. 1 Data from the Gonococcal Isolate Surveillance Project (GISP), the US Centers for Disease Control and Prevention's sentinel surveillance system for antibiotic resistance in N. gonorrhoeae, reported that in 2019 44.5% of clinical isolates were not resistant to any tested antibiotics, 64.6% were non-resistant to ciprofloxacin (MIC < 1 µg/mL), 72.2% were non-resistant to tetracycline (MIC < 2 µg/mL), and 87.2% were non-resistant to penicillin (MIC < 2 µg/mL). 1 Point-of-care diagnostics that inform on antibiotic susceptibility may help forestall the emergence and spread of resistance by enabling a shift from empiric to tailored treatment and expanding the number of antibiotics used to treat N. gonorrhoeae infections. [6][7][8] The observation that ciprofloxacin susceptibility can be predicted with high specificity and sensitivity based on gyrA codon 91 has led to the development of molecular tests that query this locus; the SpeeDx ResistancePlus GC, for example, was recently approved for clinical use in Europe and granted 'breakthrough designation' by the FDA. [9][10][11] However, extension of this sequence-based approach to other antibiotics has been stymied, as they lack single locus determinants of susceptibility and resistance.
Penicillin (PCN) and tetracycline (TET) were the recommended therapies for gonorrhea until the 1980s, when the prevalence of high level resistance increased enough to prompt a switch in the empiric treatment regimen. 12,13 Resistance to PCN and TET can be both chromosomal and plasmid mediated. Chromosomally-encoded resistance arises from mutations modifying the antibiotic targets-rpsJ for TET resistance and penA and ponA for PCN-and mutations in the porin porB and in the efflux pump mtr operon. 14 The plasmid-borne beta lactamase bla TEM confers high level PCN resistance and the ribosome protection protein tetM confers TET resistance. Despite previously being first-line gonorrhea treatments for decades, molecular diagnostics for PCN and TET susceptibility have been less commonly studied. Proposed diagnostics or targets of molecular surveillance for PCN susceptibility have focused on bla TEM, 15,16 which performs poorly in the setting of chromosomally-encoded resistance, porB, 17 which neglects important target modifying mutations in penA, or resistance-associated penA alleles 18 rather than susceptibility-associated alleles. Similarly, assays targeting tetM have been developed, but they have not incorporated chromosomally-encoded tetracycline resistance. 16 While there are multiple pathways to resistance for each drug, 14 the key goal for sequencebased diagnostics is to predict susceptibility-rather than resistance-with high specificity. Therefore, here we sought to identify a concise set of loci that are associated with PCN and TET susceptibility using genome wide association studies (GWAS) and evaluate their predictive performance in gonococcal clinical isolates.

Methods.
Global and validation dataset assembly. We collected publicly available whole genome sequencing (WGS) data (n = 12,045) and PCN (n = 6,935) and TET MICs (n = 5,727) from clinical N. gonorrhoeae isolates.  Genomes were assembled and resistance-associated alleles were called as previously described. 42 Specifically, alleles at known resistance loci were identified from variant calls after mapping to a reference genome for rpsJ codon 57, encoding the RpsJ V57M mutation associated with tetracycline resistance, and BLASTN 43 of de novo assemblies for accessory gene content using the sequences NG_068038.1 for bla TEM and MG874353.1 for tetM. penA alleles were typed according to the NG-STAR database, 44 and the presence of insertions in penA were determined based on de novo assemblies. For 2116 isolates, TET MICs were reported as ≤ 4 µg/mL or ≤ 8 µg/mL. These were excluded them from phenotype-based analyses, since we could not classify them as susceptible or resistant. Additionally, Our final global dataset included 12,049 genotyped isolates, 3,611 with TET MICs and 6,935 with PCN MICs. To validate our results, we additionally assembled 1479 genomes from CDC's 2018 Gonococcal Isolate Surveillance Program (GISP) collection, 45 representing the first five viable isolates collected each month from urethral specimens at sentinel surveillance sites across the United States.
Genome wide association studies. To identify variants associated with PCN and TET susceptibility, we performed conditional GWAS 46 incorporating the presence of high effect size plasmid-mediated resistance. The GWAS employed a linear mixed model and were run using Pyseer v 1.2.0 47 with default allele frequency filters on 124,070 unitigs (unique sequences representing SNPs, insertions, deletions, and changes in gene content) for the PCN GWAS and 158,087 unitigs for the TET GWAS generated from GATB v 1.3.0. Most datasets reported PCN MICs within the range of 0.06 -32 μ g/mL. Isolates with PCN MICs reported as '>4' or '>2' were not included in the GWAS analysis since the precise MIC was unknown; the final PCN GWAS dataset size was 6220 isolates after excluding isolates with missing genotypic or phenotypic data. Similarly isolates with imprecise TET MICs were excluded (e.g. "≤ 4", "≤ 8"); the final dataset size for the TET GWAS was 3,453 isolates after excluding isolates with missing genotypic or phenotypic data. The GWAS incorporated isolate dataset of origin, country of origin, and presence of plasmid-encoded resistance determinants (bla TEM, tetM) as fixed effect covariates. We used a recombination-corrected phylogeny constructed with Gubbins v 2.3.4 48 to generate a similarity matrix, which was included as a random effect to correct for population structure. Unitigs from the GWAS were mapped and annotated using Pyseer annotate_hits_pyseer with the WHO_N genome 49 (GCA_900087725.2), which encodes the β lactamase-encoding plasmid (African pblaTEM-1) and the tetM-encoding plasmid (Dutch), as the reference. The output was used to generate Manhattan plots in R v 4.0.3 with ggplot2. 50 Sensitivity, specificity, and validation. We evaluated the sensitivity and specificity of resistance alleles to predict PCN and TET susceptibility using CLSI breakpoints for susceptibility (PCN MIC ≤ 0.06 µg/mL, TET MIC ≤ 0.25 µg/mL) and non-resistance (PCN MIC < 2 µg/mL, TET MIC < 2 µg/mL) in both the global and validation datasets. We additionally used isolate metadata from the 2018 GISP collection to estimate the prevalence of isolates with susceptibility associated genotypes across patient groups (sexual behavior, race/ethnicity). Statistical analyses were perfomed in R v 4.0.3 using infer v 0.5.4 (https://infer.tidymodels.org/). . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 3, 2021. ;

Results.
Since plasmid-encoded resistance determinants are known to contribute to high level resistance for both PCN and TET, we used conditional GWAS to identify additional variants contributing to susceptibility. Here, we focused on significant variants associated with increased susceptibility (i.e., negative effect size, β ). We found that a unitig (penA_01) corresponding to non-mosaic penA alleles without the insertion at codon 345 was significantly associated with PCN susceptibility (Supplementary Figure 1A, p = 5.0 x 10 -14 , β = -2.5). After conditioning on the presence of tetM, we found that a unitig corresponding to the wild-type allele at rpsJ codon 57, encoding valine, was significantly associated with TET susceptibility (Supplementary Figure 1B, p = 5.6 x 10 -16 , β = -1.61). Significant unitigs also mapped to porB and a loss of function variant in mtrC for both antibiotics; however, effect sizes were lower than unitigs mapping to antibiotic targets.
We used the presence of penA_01 combined with the absence of bla TEM to predict PCN susceptibility in our global dataset ( Figure 1A). We found that this susceptibility-associated genotype predicted PCN susceptibility and non-resistance with high specificity but low sensitivity (Table 1). For TET susceptibility prediction, we identified isolates with the wild-type allele at rpsJ codon 57 combined with the absence of tetM ( Figure 1B). This combination predicted TET susceptibility and non-resistance with high specificity and moderate sensitivity (Table 1).
Since PCN and TET MICs were not reported for all isolates, we identified these mutations in our full genomic dataset: 2.1% had the PCN susceptibility-associated genotype, and 15.9% of isolates had the TET susceptibility-associated genotype. The prevalence of these genotypes varied across genomic epidemiology studies (Supplementary Table 1).
To validate our observations in a relatively unbiased dataset from the United States, we assembled a recently published collection of N. gonorrhoeae genomes from CDC's Gonococcal Isolate Surveillance Program 45 ; in this collection, isolates were not selected for sequencing based on their susceptibility phenotypes. First, we verified that the penA sequence identified in the GWAS (penA_01) also identified isolates with non-mosaic penA alleles without the 345 insertion in the validation dataset. In this dataset, 100% of isolates with the penA_01 encoded non-mosaic penA alleles without the insertion when the full length penA allele was examined.
We also calculated sensitivity and specificity for prediction of PCN and TET susceptibility and non-resistance in the GISP collection ( Figure 1, Table 1). Similar to results from the global collection, specificity was high for both antibiotics and CLSI cutoffs. Sensitivity increased for PCN prediction and decreased for TET prediction, reflecting different proportions of isolates falling into the susceptible and intermediate MIC categories in the global and validation datasets.
Since PCN and TET MICs were not reported for all isolates, we identified these mutations in our full genomic dataset: 2.1% had the PCN susceptibility-associated genotype, and 15.9% of isolates had the TET susceptibility-associated genotype. The prevalence of these genotypes varied across genomic epidemiology studies (Supplementary Table 1).
To validate our observations in a relatively unbiased dataset from the United States, we assembled a recently published collection of N. gonorrhoeae genomes from CDC's Gonococcal Isolate Surveillance Program 45 ; in this collection, isolates were not selected for sequencing based on their susceptibility phenotypes. First, we verified that the penA sequence identified in . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 3, 2021. ; the GWAS (penA_01) also identified isolates with non-mosaic penA alleles without the 345 insertion in the validation dataset. In this dataset, 100% of isolates with the penA_01 encoded non-mosaic penA alleles without the insertion when the full length penA allele was examined.

Figure 1.
Penicillin and tetracycline MICs are lower in isolates with susceptibility associated genotypes in global and validation datasets. Dashed lines indicate CLSI breakpoints for susceptibility and resistance. A) Penicillin MICs of isolates with penA_01 and without blaTEM (S, dark blue) compared to isolates with one or more of these determinants (NS, light blue). B) Tetracycline MICs of isolates with wild-type rpsJ (57V) and without tetM (S, dark purple) compared to isolates with one or more of these determinants (NS, light purple).
We also calculated sensitivity and specificity for prediction of PCN and TET susceptibility and non-resistance in the GISP collection ( Figure 1, Table 1). Similar to results from the global collection, specificity was high for both antibiotics and CLSI cutoffs. Sensitivity increased for PCN prediction and decreased for TET prediction, reflecting different proportions of isolates falling into the susceptible and intermediate MIC categories in the global and validation datasets.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 3, 2021. ; In addition to antimicrobial resistance phenotypes, GISP reports information on patient characteristics for each isolate collected. To analyze the utility of these genotypic markers in different patient populations, we calculated the prevalence of the susceptibility-associated genotypes across patient groups. Susceptible genotypes were more common among men who have sex with women (MSW) compared to men who have sex with men (MSM) and men who have sex with men and women (MSMW) for both PCN (χ 2 test, p = 0.0035) and TET (χ 2 test, p < 0.0001). The prevalence of the PCN susceptibility-associated genotype was 5.2%, 1.5%, and 2.2%, in MSW, MSM, and MSMW, respectively. For TET, the susceptibility-associated genotype was 20.6% in MSW, 9.6% in MSM, and 9.9% in MSMW. Additionally, the susceptibilityassociated genotypes varied across race and ethnicity groups and were enriched in samples from Black men; however, prevalence of susceptibility-associated genotypes were not significantly different between race and ethnicity groups when MSM and MSW were considered separately (Supplementary Table 2).

Discussion.
Here, we have used conditional GWAS incorporating known, high effect size variants 46 to identify targets for diagnostics addressing both plasmid and chromosomally mediated PCN and TET resistance. We found that the combination of penA_01, representing non-mosaic penA without an insertion at codon 345, and the absence of bla TEM predicts PCN susceptibility and that the combination of rpsJ codon 57 and the absence of tetM predicts TET susceptibility. These loci defined the most susceptible isolates in our dataset and predicted susceptibility (PCN MIC ≤ 0.06 µg/mL, TET MIC ≤ 0.25 µg/mL) with high specificity and moderate sensitivity in both our global dataset and an unbiased collection from the United States. The addition of other loci (e.g., mtr and porB) may be needed to increase sensitivity for the higher cutoff (MIC < 2 µg/mL) but with as yet unclear impact on specificity.
These loci could additionally be used for culture-free molecular epidemiology and surveillance, as WGS directly from patient samples is not currently routine. Typing schemes such as NG-STAR 44 targeting resistance determinants have been developed; however, these schemes have not focused on loci specific to penicillin and tetracycline resistance.
Utility of a diagnostic or molecular surveillance targeting these loci may vary in different patient populations. For example, the prevalence of susceptibility associated genotypes varied across genomic epidemiology studies included in our global dataset, reflecting both enrichment of antibiotic resistant isolates in some studies as well as variable selection pressure from antibiotic use in different regions. WGS data from N. gonorrhoeae isolated in the United States, Europe, . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 3, 2021. ; and Australia make up the majority of available genomic data, and the composition of the N. gonorrhoeae population in other regions is unknown. Similar to other studies of the association between N. gonorrhoeae antibiotic resistance and patient demographics, prevalence of these susceptibility-associated genotypes vary across patient groups defined by sexual behavior and race/ethnicity in isolates collected by GISP. 34,37,39,51 In the United States, a diagnostic for PCN and TET susceptibility may be most useful in populations with higher prevalence of infection with susceptible isolates, such as MSW and women.
The alleles identified here from genomic analyses are promising targets for the development of POC molecular diagnostics for N. gonorrhoeae susceptibility to penicillin and tetracycline. Diagnostics that evaluate as few as two loci per drug could allow for the reintroduction into clinical use of these gonococcal treatment regimens.