Abstract
Background Antimicrobial resistant (AMR) Neisseria gonorrhoeae is an urgent threat to public health, as strains resistant to at least one of the two last line antibiotics used in empiric therapy of gonorrhoea, ceftriaxone and azithromycin, have spread internationally. With new treatment options not yet available, this has prompted a call for collaborative action on global surveillance for this sexually transmitted pathogen. Whole genome sequencing (WGS) data can be used to identify new AMR clones, outbreaks, transmission networks and inform the development of point-of-care tests for antimicrobial susceptibility, novel antimicrobials and vaccines. Community driven tools that provide an easy access to and analysis of genomic and epidemiological data is the way forward for public health surveillance.
Methods Here we present a public health focussed scheme for genomic epidemiology of N. gonorrhoeae using Pathogenwatch (https://pathogen.watch/ngonorrhoeae), which enables the processing of raw or assembled genomic data. We implement backwards compatibility with MLST, NG-MAST and NG-STAR typing schemes as well as an exhaustive library of genetic AMR determinants associated with resistance to eight antibiotics. A collection of over 12,000 N. gonorrhoeae genome sequences from public archives has been quality-checked, assembled and made public together with available metadata for contextualization.
Results An international advisory group of experts in epidemiology, public health, genetics and genomics of N. gonorrhoeae was convened to identify public health needs in the field and inform on the utility of current and future analytics in the platform, including a customised library of genetic AMR determinants. After uploading genome data, this platform automatically provides typing information, detects genetic determinants of AMR for eight antibiotics including azithromycin and the extended-spectrum cephalosporins ceftriaxone and cefixime, and infers resistance based on the specific combination of mechanisms. Furthermore, genomes are contextualised with globally available genomic data to aid epidemiological investigation.
Conclusions The N. gonorrhoeae scheme in Pathogenwatch provides customized bioinformatic pipelines guided by expert opinion that can be adapted to public health agencies and departments with little expertise in bioinformatics and lower resourced settings with internet connection but limited computational infrastructure. This advisory group will assess and identify ongoing public health needs in the field of gonococcal AMR in order to further enhance utility with modified or new analytic methods.
Background
Antimicrobial resistance (AMR) is an urgent threat to public health. Neisseria gonorrhoeae, the strictly human pathogen causing the sexually-transmitted infection (STI) gonorrhoea, has developed or acquired resistance to the last-line antibiotics used in empiric therapy to treat the infection, and thus has become one of the major global priorities in order to tackle AMR. In 2017, due to the increase in AMR infections and the absence of an effective vaccine, the World Health Organization (WHO) included N. gonorrhoeae as a high priority pathogen in need of research and development of new antimicrobials and ideally a vaccine (1). In 2019, the Centers for Disease Control and Prevention (CDC) again included the gonococcus on the list of urgent threats in the United States (2). The most recent WHO estimates from 2016 indicate an annual global incidence of 87 million cases of gonorrhoea among adults (3, 4). Untreated cases can develop complications including an increased acquisition and transmission of HIV. In women, long-term infections can cause infertility, pelvic inflammatory disease, ectopic pregnancy, miscarriage or premature labour (5). Infections during pregnancy can transmit to newborns at birth causing eye damage that can have permanent effects on vision (6).
Strains of N. gonorrhoeae resistant to every recommended treatment have rapidly emerged, including resistance to penicillins, tetracyclines, fluoroquinolones, macrolides and the extended-spectrum cephalosporins (ESCs) (5-7). The current recommended treatment in many countries is a dual therapy with injectable ceftriaxone plus oral azithromycin, although reports of decreased susceptibility to ceftriaxone as well as azithromycin resistance have increased globally (7, 8). One case of failure of dual treatment was reported in 2016 in the United Kingdom (UK) (9). Additionally, in 2018 a gonococcal strain with resistance to ceftriaxone combined with high-level resistance to azithromycin was detected in both the UK and Australia (10, 11). A ceftriaxone-resistant clone (FC428) has been transmitted internationally, raising concerns about the long-term effectiveness of the current treatment in the absence of an available alternative (12). In some countries such as in Japan, China and since 2019 in the UK, a single dose of ceftriaxone 1 gram is recommended due to the increasing incidence of azithromycin resistance in N. gonorrhoeae and other STI pathogens such as Mycoplasma genitalium (13). Extensive investigations have been ongoing for years to unveil the genetic mechanisms that explain most of the observed susceptibility patterns for the main classes of antimicrobials for N. gonorrhoeae. For ciprofloxacin, nearly all of the resistant strains have the GyrA S91F amino acid alteration (14-16), however, resistance prediction from genomic data is not as straightforward for other antibiotics. Known resistance mechanisms often involve additive or suppressive effects as well as epistatic interactions that all together explain just part of the observed phenotypic resistance. For example, there is good evidence that many mosaic structures of the penA gene are associated with decreased susceptibility of ESCs (17, 18), however, mosaics do not explain all cases of ESC resistance, especially for ceftriaxone, and some mosaic penA alleles do not cause decreased susceptibility or resistance to this antibiotic (17-20). On top of these, variants that overexpress the MtrCDE efflux pump, mutations in porB that reduce drug influx and non-mosaic mutations in penicillin-binding proteins also contribute to decreased susceptibility to ESCs (21). Furthermore, mutations in the rpoB and rpoD genes, encoding subunits of the RNA polymerase, have been recently related to resistance to ESCs in clinical N. gonorrhoeae isolates (22). Mutations in the 23S rRNA gene (A2045G and C2597T in N. gonorrhoeae nomenclature, coordinates from the WHO 2016 reference panel (23), A2059G and C2611T in Escherichia coli) are frequently associated with azithromycin resistance, as do variants in mtrR or its promoter that increase the expression of the MtrCDE efflux pump (5). Recently, epistatic interactions between a mosaic mtr promoter region and a mosaic mtrD gene have also been reported to increase the expression of this pump, contributing to macrolide resistance (24, 25). Mutations in rplD have also been associated with reduced susceptibility to this antibiotic (26) and contrarily, loss-of-function mutations in mtrC have been linked to increased susceptibility to several antibiotics including azithromycin (27).
A myriad of methods have been used to discriminate among strains of N. gonorrhoeae, from phenotypic to DNA-based techniques (28), but whole genome sequencing (WGS) can provide the complete genome information of a bacterial strain. The cost of amplifying all loci of the different typing schemes via nucleic acid amplification and traditional Sanger sequencing can be more expensive than the cost of WGS of one bacterial genome in many settings. With WGS, multiple genetic AMR mechanisms as well as virulence and typing regions can be targeted simultaneously with the appropriate bioinformatic tools and pipelines. It also provides a significant improvement in resolution and accuracy over traditional molecular epidemiology and typing methods, allowing a genome-wide comparison of strains that can: identify AMR clones, outbreaks, transmission networks, national and international spread, known and novel resistance mechanisms as well as also inform on the development of point-of-care tests for antimicrobial susceptibility, novel antimicrobials and vaccines (29, 30). However, implementation of WGS for genomic surveillance poses practical challenges, especially for Low-and Middle-Income Countries (LMICs), due to the need of a major investment to acquire and maintain the required infrastructure. The cost of sequencing is decreasing very rapidly in well-resourced settings, especially in large sequencing centres, but it is still prohibitive for routine surveillance in many others.
WGS produces a very high volume of data that needs to be pre-processed and analysed using bioinformatics. Bioinformatics expertise is not always readily available in laboratory and public health settings, and currently there are no international standards and proficiency trials for which algorithms to use to process WGS data. There are several open source tools specialised in each step of the pipeline as well as proprietary software containing workflows that simplify the analyses. However, these are less customizable and may not be affordable for all (31, 32). Choosing the best algorithms and parameters when analysing genomic data is not straightforward as it requires a fair knowledge of the pathogen under study and its genome diversity. Multiple databases containing genetic determinants of AMR for bacterial pathogens are available (31, 32), however, choosing which one is most complete for a particular organism frequently requires an extensive literature search. Public access web-based species-specific tools and AMR databases revised and curated by experts would be the most approachable option for both well-resourced and LMICs with a reliable internet connection. Very importantly though, the full benefits of using WGS for both molecular epidemiology and AMR prediction can only be achieved if the WGS data are linked to phenotypic data for the gonococcal isolates and, as much as feasible, epidemiological data for the patients.
Here, we present a public health focussed system to facilitate genomic epidemiology of N. gonorrhoeae within Pathogenwatch (https://pathogen.watch/ngonorrhoeae), which includes the latest analytics for typing, detection of genetic AMR determinants and prediction of AMR from N. gonorrhoeae genome data, linked to metadata where available, as well as a collection of over 12,000 gonococcal genomes from public archives for contextualization. We formed an advisory group including experts in the field of N. gonorrhoeae epidemiology, public health, AMR, genetics and genomics to consult on the development and design of the tool, such as the analytics and genetic AMR mechanisms to include, in order to adapt the platform for ongoing public health needs. We present this scheme as a community-steered model for genomic surveillance of other pathogens.
Methods
Generation of the N. gonorrhoeae core genome library
Pathogenwatch implements a library of core genome sequences for several supported organisms. In the case of N. gonorrhoeae, a core gene set was built from the 14 reference genomes that constitute the 2016 WHO reference strain panel (23) using the pangenome analysis tool Roary (33) as described in Harris et al (2018) (16). Briefly, the minimum percentage of identity for blastp was set to 97% and the resulting core genes were aligned individually using MAFFT. The resulting genes with a percentage of identity above 99% were post-processed as described in (34). Overlapping genes were merged into pseudocontigs and clusters representing paralogs or fragment matches were removed. Representative sequences from each cluster were selected as the longest compared to a consensus obtained from the cluster alignment. The final core gene set contains 1,542 sequences that span a total of 1,470,119 nucleotides. A BLAST database was constructed from these core segments and used to profile new assemblies.
Profiling new assemblies
New genome assemblies can be uploaded by a user (drag and drop) or calculated from high-throughput short read data directly within Pathogenwatch using SPAdes (35) as described in (36).
A taxonomy assignment step for species identification is performed on the uploaded assemblies by using Speciator (37). New assemblies are then queried against a species-specific BLAST database using blastn. For N. gonorrhoeae, every core loci needs to match at least 80% of its length to be considered as present. Further filtering steps are applied to remove loci that can be problematic for tree building, such as a paralogs or loci with unusually large number of variant sites compared to an estimated substitution rate on the rest of the genome, as described in (38). The overall substitution rate is calculated as the number of total differences in the core library divided by the total number of nucleotides. Indels are ignored to minimise the noise that could be caused by assembly or sequencing errors. The expected number of substitutions per locus is determined by multiplying this substitution rate by the length of the representative sequence.
The number of substitutions observed for each locus between the new assembly and the reference sequence are scaled to the total number of nucleotides that match the core library, creating a pairwise score that it is saved on a distance matrix and is used for tree construction, as described in (39).
Algorithms for sequence typing and cgMLST clustering
Alleles and sequence types (STs) for Multi-Locus Sequence Typing (MLST) (40) and cgMLST (core genome MLST, N. gonorrhoeae cgMLST v1.0) (41) were obtained from PubMLST (42, 43), for N. gonorrhoeae Multi-Antigen Sequence Typing (NG-MAST) (44) from (45) and for N. gonorrhoeae Sequence Typing for Antimicrobial Resistance (NG-STAR) (46) from (47). A search tool implemented as part of Pathogenwatch is used to make the assignments for MLST, cgMLST and NG-STAR, while NGMASTER (48) is used for NG-MAST. Briefly, exact matches to known alleles are searched for, while novel sequences are assigned a unique identifier. The combination of alleles is used to assign a ST as described in (49). Databases are regularly updated and novel alleles and STs should be submitted by the user to the corresponding schemes for designation.
cgMLST typing information is used for clustering individual genomes with others in the Pathogenwatch database as described in (50). Users can select the clustering threshold (i.e. number of loci with differing alleles) and a network graph is calculated within individual genome reports.
AMR library and detection of genetic AMR determinants
Genes and point mutations (single nucleotide polymorphisms (SNPs) and indels) were detected using PAARSNP v2.4.9 (51). PAARSNP also provides a prediction of AMR phenotype inferred from the combination of identified mechanisms. Genetic determinants described in the literature as involved in AMR in N. gonorrhoeae were collated into a library in TOML format (version 0.0.11). A test dataset containing 3,987 isolates from 13 studies (16, 19, 23, 52-61) (Additional file 1: Table S1) providing minimum inhibitory concentration (MIC) information for six antibiotics (benzylpenicillin, tetracycline, ciprofloxacin, cefixime, ceftriaxone and azithromycin) was used to benchmark and to curate this library. A validation benchmark was posteriorly run with a dataset of 1,607 isolates from 3 other publications (62-64) with MIC information for the same six antibiotics plus spectinomycin (Additional file 1: Table S1). EUCAST clinical breakpoints v9.0 (65) were used for S (susceptibility), I (intermediate resistance/decreased susceptibility) or R (resistance) (SIR) categorical interpretation of MICs for all antibiotics except for azithromycin, for which the epidemiological cut-off (ECOFF) was used. As a result of the benchmark analyses, sensitivity, specificity and positive/negative predictive values (PPV/NPV) were obtained for the AMR mechanisms implemented in the library and, globally, for each of the antibiotics. Confidence intervals for these statistics were calculated using the epi.tests function in the epiR R package v1.0-14 (66). Individual or combined AMR mechanisms with a PPV below 15% were discarded from the library to optimise the overall predictive values. Visual representations of the observed ranges of MIC values for a particular antibiotic for each of the observed combinations of genetic AMR mechanisms on the test dataset were used to identify and assess combinations of mechanisms that have an additive or suppressive effect on AMR. These were included in the library.
As part of the quality assessment of the AMR library, we ran the 2016 WHO N. gonorrhoeae reference genomes 2016 panel (n=14) through Pathogenwatch and compared the detected list of genetic AMR mechanisms with the list published in the original study (23). For the WHO U strain, a discrepancy on a mutation in parC was further investigated by mapping the original raw Illumina data (European Nucleotide Archive (ENA) run accession ERR449479) to the reference genome assembly (ENA genome accession LT592159.1) and visualized using Artemis (67).
In short-read assemblies, the four copies of the 23S rDNA gene are collapsed into one, thus the detection of the A2045G and C2597T mutations is dependent on the consensus bases resulting from the number of mutated copies (57, 60, 68).
Quality check and assembly of public sequencing data
Public N. gonorrhoeae genomes with geolocation data were obtained from the ENA in November 2019. This list was complemented by an exhaustive literature search of studies on N. gonorrhoeae genomics without metadata submitted to the ENA but instead made available as supplementary information in the corresponding publications. Raw paired-end short read data from a list of 12,192 isolates was processed with the GHRU assembly pipeline v1.5.4 (69). This pipeline runs a Nextflow workflow to quality-check (QC) paired-end short read fastq files before and after filtering and trimming, assembles the data and quality-checks the resulting assembly. In this pipeline, QC of short reads was performed using FastQC v0.11.8 (70). Trimming was done with Trimmomatic v0.38 (71) by cutting bases from the start and end of reads if they were below a Phred score of 25, trimming using a sliding window of size 4 and cutting once the average quality within the window fell below a Phred score of 20. Only reads with length above a third of the original minimum read length were kept for further analyses. After trimming, reads were corrected using the kmer-based approach implemented in Lighter v1.1.1 (72) with a kmer length of 32 bp and a maximum number of corrections allowed within a 20 bp window of 1. ConFindr v0.7.2 was used to assess intra- and inter-species contamination (73). Mash v2.1 (74) was applied to estimate genome size using a kmer size of 32 bp and Seqtk v1.3 (75) to down sample fastq files if the depth of coverage was above 100x. Flash v1.2.11 (76) was used to merge reads with a minimum overlap length of 20 bp and a maximum overlap of 100 bp to facilitate the subsequent assembly process. SPAdes v3.12 (35) was used for genome assembly with the --careful option selected to reduce the number of mismatches and short indels with a range of kmer lengths depending on the minimum read length. The final assemblies were quality-checked using Quast v5.0.2 (77) and ran through the species identification tool Bactinspector (78). QC conditions were assessed and summarised using Qualifyr (79).
Fastq files with poor quality in which the trimming step discarded all reads from either one or both pairs were excluded from the analyses. Assemblies with an N50 below 25,000 bp, a number of contigs above 300, a total assembly length above 2.5 Mb or a percentage of contamination above 5% were also excluded.
Metadata for public genomes
Geolocation data (mainly country), collection dates (day, month and year when available), ENA project accession and associated Pubmed ID were obtained from the ENA API for all the genomes in the pipeline (80). A manual extensive literature search was performed to identify the publications containing the selected genomes. In order to complete published studies as much as possible, extra genomes were downloaded and added to the dataset. Metadata for the final set was completed with the information contained in supplementary tables on the corresponding publications, including MIC data. Submission date was considered instead of collection date when the latter was not available, however, this occurred in only a few cases (<0.5%).
Results
Upload and analyse N. gonorrhoeae genome data
Data can be uploaded in the form of assemblies or raw data (fastq format) into Pathogenwatch, which allows users to run different analytics on genomic data simultaneously (Figure 1). If raw data is provided, an assembly is calculated before running the analyses. These analytics include four typing schemes for N. gonorrhoeae as well as a genotypic AMR prediction using a customized AMR library that includes known genetic mechanisms of resistance for 8 antimicrobials: ceftriaxone, cefixime, azithromycin, ciprofloxacin, spectinomycin, tetracycline, benzylpenicillin and sulfonamides. Statistics on the quality of the assemblies are also provided in the form of matches to the core genome, total genome length, N50, number of contigs, number of non-ATCG bases and GC content (Additional file 2: Figure S1).
Genomes from one or multiple studies can be grouped into collections (Figure 2 and Additional file 2: Figure S2), and the genomic data are automatically processed by comparing with a core N. gonorrhoeae genome built from WHO reference strain genomes (16, 23). A phylogenetic tree, inferred using the Neighbour-Joining algorithm on core SNPs, is obtained as a result, representing the genetic relationship among the isolates in the collection. Metadata can be uploaded at the same time as the genome data, and if the collection location coordinates for an isolate are provided, this information is plotted into a map (Additional file 2: Figure S1). If date or year of isolation is also provided, this information is represented in a timeline. The three panels on the main collection layout - the tree, the map and a table or timeline – are functionally integrated so filters and selections made by the user update all of them simultaneously. Users can also easily switch among the metadata and the results of the main analytics: typing, genome assembly statistics, genotypic AMR prediction, AMR-associated SNPs, AMR-associated genes and the timeline (Additional file 2: Figure S1). A video demonstrating the usage and main features of Pathogenwatch is available (81).
Sequence typing schemes: cgMLST, MLST, NG-MAST and NG-STAR
Pathogenwatch implements four sequence typing schemes for N. gonorrhoeae: cgMLST (41), MLST (40), NG-MAST (44) and NG-STAR (46) (Table 1). Each of the schemes is based on a group of loci for which individual allele numbers are assigned relying on an existing database of allele sequences. A unique ST is generated from the combination of allele numbers to represent each isolate. The cgMLST scheme includes 1,649 loci from the N. gonorrhoeae cgMLST v1.0 scheme in PubMLST (43) and it is used for clustering individual genomes with others in the database based on allele differences (Additional file 2: Figure S3). The MLST scheme, also hosted in PubMLST, includes 7 housekeeping genes and gene fragments more conserved and slowly evolving in the Neisseria genus. NG-MAST includes internal fragments from two highly polymorphic and rapidly evolving outer membrane protein genes, porB and tbpB. NG-STAR was developed more recently with the aim of standardizing the nomenclature associated with AMR determinants as well as having a typing scheme that would distinguish among lineages with different AMR mechanisms. It includes 7 genes associated with resistance to β-lactams, macrolides and fluoroquinolones (Table 1).
Library of genetic AMR mechanisms: test and validation
We compiled described genetic AMR mechanisms previously reported for N. gonorrhoeae up to the writing of this manuscript into the AMR library in Pathogenwatch (Table 2).
This list was benchmarked using a test dataset of 3,987 N. gonorrhoeae isolates from 13 different studies containing MIC information for at least part of the following six antibiotics: ceftriaxone, cefixime, azithromycin, ciprofloxacin, benzylpenicillin and tetracycline (Additional file 1: Table S1). EUCAST clinical breakpoints were applied for five of the antimicrobials except for azithromycin, for which the adoption of an ECOFF>1 mg/L is now recommended to distinguish isolates with azithromycin resistance determinants, instead of a clinical resistance breakpoint (108, 109). A visualization of the range of MICs on each particular combination of genetic AMR mechanisms observed on the isolates from the benchmark test dataset (Figure 3a-b and Additional file 2: Figures S4-S9) revealed combinations that show an additive effect on AMR. These combinations were included in the AMR library to improve the accuracy of the genotypic prediction. For example, rpsJ V57M and some mtrR-associated mutations individually cause decreased susceptibility or intermediate resistance to tetracycline (MICs between 0.5-1 mg/L), however, a combination of these variants can increase MICs above the EUCAST resistance breakpoint for tetracycline (MICs>1 mg/L) (Additional file 2: Figure S8). This is the case of the combination of rpsJ V57M with the mtrR promoter −57delA mutation (N=681 isolates, 94.9% positive predictive value, PPV) or with mtrR promoter −57delA and mtrR G45D (N=83 isolates, 93.9% PPV). Several combinations of penA, ponA1, mtrR and porB1b mutations were observed to be able to increase the benzylpenicillin MIC above the resistant threshold in most of the cases (Additional file 2: Figure S9). This is the case of the porB1b mutations combined with mtrR A39T (N=31 isolates, 100% PPV), with the mtrR promoter −57delA deletion (N=286 isolates, 96.5% PPV) or with mtrR promoter −57delA and ponA1 L421P (N=269 isolates, 96.3%). Despite mosaic penA not being a main driver of resistance to penicillins, a combination of the porB1b mutations with the three main mosaic penA mutations (G545S, I312M and V316T) was also observed to produce a resistant phenotype in all cases (N=17 isolates, 100% PPV). A recent publication showed that loss-of-function mutations in mtrC increased susceptibility to azithromycin and are associated with isolates from the cervical environment (27). We included the presence of a disrupted mtrC as a modifier of antimicrobial susceptibility in the presence of an mtr mosaic, as it did not show a significant effect in the presence of 23S rDNA A2045G and C2597T mutations.
Results from the benchmark (Additional file 1: Table S2) show sensitivity values (true positive rates, TP/(TP+FN); TP=True Positives, FN=False Negatives) above 96% for tetracycline (99.2%), benzylpenicillin (98.1%), ciprofloxacin (97.1%) and cefixime (96.1%), followed by azithromycin (71.6%) and ceftriaxone (33.3%). These results reflect the complexity of the resistance mechanisms for azithromycin and ceftriaxone, where the known genetic determinants explain only part of the antimicrobial susceptibility. However, specificity values (true negative rates, TN/(TN+FP); TN=True Negatives, FP=False Positives) for these two antibiotics as well as ciprofloxacin were above 99% (Additional file 1: Table S2), demonstrating that the genetic mechanisms included in the database have a role in AMR. The specificity value for cefixime was lower but nearly 90%, mainly due to the high number of isolates with an MIC below the threshold but with three mutations characterising a mosaic penA allele (G545S, I312M and V316T, TP=367, TN=323, PPV=53.2%; Additional file 1: Table S3). Benzylpenicillin and tetracycline showed specificity values of 77.3% and 61.3%, respectively. In the first case, all the mechanisms included in the library showed a PPV value above 94%. For tetracycline, a considerable number of false positive results are mainly caused by the presence of rpsJ V57M, for which PPV=83.8% (TP=1083, FP=209; Additional file 1: Table S3). However, this mutation was kept in the AMR library because it can cause intermediate resistance to tetracycline on its own (Additional file 2: Figure S8).
Results from the benchmark analysis on the 3,987-isolates dataset were used to curate and optimize the AMR library. Thus, in order to objectively validate it, the benchmark analysis was also run on a combination of three different collections (N=1,607, Additional file 1: Table S1) with available MIC information for seven antibiotics including spectinomycin (Additional file 1: Table S4) (63, 64, 110). Results from the test and validation benchmark runs were compared, showing that sensitivity values on the six overlapping antibiotics were very similar, with the validation set performing even better for azithromycin and ceftriaxone (Figure 3c). In terms of specificity, both datasets performed equally well for all antibiotics except for benzylpenicillin, in which specificity drops in the validation dataset. This is due to the penA_ins346D mutation (TP=1125, FP=83) and the blaTEM genes (TP=525, FP=36), which despite showing false positives, have a PPV above 93% (Additional file 1: Table S4). In general, discrepancies found between the test and the validation datasets can be explained by particular mechanisms that on their own show high predictive values and affect antibiotics for which we do not currently understand all the factors involved in resistance, such as azithromycin and the ESCs (Additional file 1: Table S4).
An additional quality assessment of the AMR library was performed using the 14 N. gonorrhoeae reference genomes from the WHO 2016 panel (23), which were uploaded into Pathogenwatch. All the genetic AMR determinants described as present in these isolates and implemented in the Pathogenwatch AMR library were obtained as a result (Additional file 1: Table S5). Only one discrepancy was found when compared to the original publication. The WHO U strain was reported as carrying a parC S87W mutation. However, mapping the original Illumina data from this isolate with the final genome assembly revealed that this strain carries a wild type allele (Additional file 2: Figure S10). MLST and NG-MAST types were the same as those reported in the original publication (note that NG-STAR was not available at that time) and the porA mutant gene was found in WHO U as previously described. This mutant porA has nearly a 95% nucleotide identity to N. meningitidis and 89% to N. gonorrhoeae, and it is included as screening because it has previously been shown to cause false negative results in some molecular detection tests for N. gonorrhoeae (111).
Over 12,000 public genomes available
All N. gonorrhoeae short-read sequencing raw data with geolocation data (minimum of country and preferably also year) and associated to a scientific publication was downloaded from the ENA. This collection was expanded after an exhaustive literature search on studies that did not upload geolocation data to the ENA but released as a part of scientific publication(s). Over 12,000 genomes were assessed for sequencing quality data and contamination, assembled using a common pipeline and thresholds as well as post-assembly quality check (Additional file 3). Data for 11,461 isolates were successfully assembled and passed all quality cut-offs, providing 12,515 isolates after including the previously-available Euro-GASP 2013 dataset (16). New assemblies were uploaded and made public on Pathogenwatch, which now constitutes the largest repository of curated N. gonorrhoeae genomic data with associated metadata, typing and AMR information at the time of submission of this manuscript. Updated data spans 27 different publications (19, 44, 48, 52-55, 57-59, 61-64, 110, 112-125) and is organized into individual collections associated with the different studies (Additional file 1: Table S6). Available metadata was added for the genomes from these publications while basic metadata fields were kept for others (country, year/date and ENA project number).
The N. gonorrhoeae public data available on Pathogenwatch spans nearly a century (1928-2018) and almost 70 different countries (Additional file 2: Figure S11). However, sequencing efforts are unevenly distributed around the world, and over 90% of the published isolates were isolated in only 10 countries, headed by the United Kingdom (N=3,476), the United States (N=2,774) and Australia (N=2,388) (Additional file 1: Table S7, Figure 4). A total of 554 MLST, 1,670 NG-MAST and 1,769 NG-STAR different STs were found in the whole dataset, from which a considerable number were new profiles caused by previously undetected alleles or new combinations of known alleles (N=92 new MLST STs, N=769 new NG-STAR STs and N=2,289 isolates with new NG-MAST porB and/or tbpB alleles). These new alleles and profiles were submitted to the corresponding scheme servers.
Genomic studies are often biased towards AMR isolates, and this is reflected in the most abundant STs found for the three typing schemes within the public data. Isolates with MLST ST1901, ST9363 and ST7363, which contain resistance mechanisms to almost every antibiotic included in the study, represent over 25% of the data (Figure 5). Isolates with MLST ST1901 and ST7363 are almost always resistant to tetracycline, sulfonamides, benzylpenicillin and ciprofloxacin and nearly 50% of isolates from these two types harbour resistance mechanisms to cefixime. Ciprofloxacin resistance is not widespread among ST9363 isolates, in which azithromycin resistance can approach to nearly 50% of the isolates for this ST (Figure 5). NG-STAR ST63 (carrying the non-mosaic penA-2 allele, penA A517G and mtrR A39T mutations as described in (47)) is the most represented in the dataset and carries resistance mechanisms to tetracycline, sulfonamides, and benzylpenicillin, but is largely susceptible to spectinomycin, ciprofloxacin, the ESCs cefixime and ceftriaxone and azithromycin. NG-STAR ST90 isolates, conversely, are largely resistant to cefixime, ciprofloxacin and benzylpenicillin as they carry the key resistance mutations in mosaic penA-34, as well as in the mtrR promoter, porB1b, ponA, gyrA and parC (as described in (47)). NG-MAST ST1407 is commonly associated with MLST ST1901 and is the second most represented ST in the dataset following NG-MAST ST2992, which mainly harbours resistance to tetracycline, benzylpenicillin and sulfonamides (Figure 5).
Data sharing and privacy
Sequencing data and metadata files uploaded by the user are kept within the user’s private account. Genomes can be grouped into collections and these can be toggled between private and accessible to collaborators via a URL. Collection URLs include a 12-letter random string to secure them against brute force searching. Setting a collection to ‘off-line mode’ allows users to work in challenging network conditions, which may be beneficial in LMICs – all data are held within the browser. Users can also integrate private and potentially confidential metadata into the display without uploading it to the Pathogenwatch servers (locally within the browser on a user’s machine).
Discussion
We present a public health focussed N. gonorrhoeae framework within Pathogenwatch, an open access platform for genomic surveillance supported by an expert group that can be adapted to any public health or microbiology laboratory. Little bioinformatics expertise is required, and users can choose to either upload raw short read data or assembled genomes. In both cases, the upload of high-quality data is encouraged in the form of quality-checked reads and/or quality-checked assemblies. Recent benchmark analyses show particular recommendations for long-read or hybrid data (126) as well as short read-only data (35, 127). On upload, several analyses are run on the genomes, and results for the three main typing schemes (MLST, NG-MAST and NG-STAR) as well as the detection of genetic determinants of AMR and a prediction of phenotypic resistance using these mechanisms can be obtained simultaneously. The library of AMR determinants contained in Pathogenwatch for N. gonorrhoeae has been revised and extended to include the latest mechanisms and epistatic interactions with experimental evidence of decreasing susceptibility or increasing resistance to at least one of eight antibiotics (Tables 2 and 3). A benchmark analysis on a test and validation datasets revealed sensitivity and/or specificity values >90% for most of the tested antibiotics (Additional file 1: Table S2).
The continuous increase in reporting of N. gonorrhoeae AMR isolates worldwide led to a call for international collaborative action in 2017 to join efforts towards a global surveillance scheme. This was part of the WHO global health sector strategy on STIs (2016-2021), which set the goal of ending STI epidemics as a public health concern by year 2030 (7, 8). Several programmes are currently in place at different global, regional or national levels to monitor gonorrhoea AMR trends, emerging resistances and refine treatment guidelines and public health policies. This is the case of, for example, the WHO Global Gonococcal Antimicrobial Surveillance Programme (WHO GASP) (8), the Euro-GASP in Europe (6, 16, 128), the Gonococcal Isolate Surveillance Project (GISP) in the United States (129), the Canadian Gonococcal Antimicrobial Surveillance Programme (130), the Gonococcal Surveillance Programme (AGSP) in Australia (131) or the Gonococcal Resistance to Antimicrobials Surveillance Programme (GRASP) in England and Wales (132). The WHO in collaboration with CDC has recently started an enhanced GASP (EGASP) (133) in some sentinel countries such as the Philippines and Thailand (134), aimed at collecting standardized and quality-assured epidemiological, clinical, microbiological and AMR data. On top of these programs, WHO launched the Global AMR Surveillance System (GLASS) in 2015 to foster national surveillance systems and enable standardized, comparable and validated AMR data on priority human bacterial pathogens (135). Efforts are now underway to link GASP to GLASS. However, gonococcal AMR surveillance is still suboptimal or even lacking in many locations, especially in LMICs, such as some parts of Asia, Central and Latin America, Eastern Europe and Africa, which worryingly have the greatest incidence of gonorrhoea (3). LMICs often have access to antimicrobials without prescription, have limited access to an optimal treatment, lack the capacity needed to perform a laboratory diagnosis due to limited or non-existent quality-assured laboratories, microbiological and bioinformatics expertise or training, insufficient availability and exorbitant prices of some reagents on top of a lack of funding, which altogether compromises infection control.
High throughput sequencing approaches have proved invaluable over traditional molecular methods to identify AMR clones of bacterial pathogens, outbreaks, transmission networks and national and international spread among others (29, 30). Genomic surveillance efforts to capture the local and international spread of N. gonorrhoeae have resulted in several publications within the last decade involving high throughput sequence data of thousands of isolates from many locations across the world. The analysis of this data requires expertise, not always completely available, in bioinformatics, genomics, genetics, AMR, phylogenetics, epidemiology, etc. For lower-resourced settings, initiatives such as the NIHR Global Health Research Unit, Genomic Surveillance of Antimicrobial Resistance (136) are essential to build genomic surveillance capacity and provide the necessary microbiology and bioinformatics training for quality-assured genomic surveillance of AMR.
One of the strengths of genomic epidemiology is being able to compare new genomes with existing data from a broader geographical level, which provides additional information on, i.e. if new cases are part of a single clonal expansion or multiple introductions from outside a specific location. Currently, over 12,000 isolates of N. gonorrhoeae have been sequenced using high throughput approaches and publicly deposited on the ENA linked to a scientific publication. We have quality-checked and assembled these data using a common pipeline and we make it available through Pathogenwatch, with the aim of representing as much genomic diversity of this pathogen as possible to serve as background for new analyses. These public genomes are associated with at least 27 different scientific publications, and have been organized in Pathogenwatch as individual collections (Additional file 1: Table S6).
In this study, we have gathered an advisory group of N. gonorrhoeae experts in different fields such as AMR, microbiology, genetics, genomics, epidemiology and public health who will consult and discuss current and future analytics to be included to address the global public health needs of the community. We suggest this strategy as a role model for other pathogens in this and other genomic surveillance platforms, so the end user, who may not have full computational experience in some cases, can be confident that the analytics and databases underlying this tool are appropriate, and can have access to all the results provided by Pathogenwatch through uploading the data via a web browser. We are aware that this is a constantly moving field and analytics will be expanded and updated in the future. These updates will be discussed within an advisory group to make sure they are useful in the field and the way results are reported is of use to different profiles (microbiologists, epidemiologists, public health professionals, etc.).
Future analytics that are under discussion include the automatic submission of new MLST, NG-STAR and NG-MAST STs and alleles to the corresponding servers and the automatic submission of data to public archives such as the ENA. Including a separate library to automatically screen targets of potential interest for vaccine design (137-139) as well as targets of new antibiotics on phase II or III clinical trials (i.e. zoliflodacin (140) or gepotidacin (141)) can also be an interesting addition to the scheme. Regarding AMR, new methods for phenotypic prediction using genetic data are continuously being reported (56, 142, 143), especially those based on machine learning algorithms (144), and will be considered for future versions of the platform.
Conclusions
In summary, we present a genomic surveillance platform adapted to N. gonorrhoeae, one of the main public health priorities compromising the control of AMR infections, where decisions on existing and updated databases and analytics as well as how results are reported will be discussed with an advisory board of experts in different public health areas. This will allow scientists from both higher or lower resourced settings with different capacities regarding high throughput sequencing, bioinformatics and data interpretation, to be able to use a reproducible and quality-assured platform where analyse and contextualise genomic data resulting from the investigation of treatment failures, outbreaks, transmission chains and networks at different regional scales. This open access and reproducible platform constitutes one step further into an international collaborative effort where countries can keep ownership of their data in line with national STI and AMR surveillance and control programs while aligning with global strategies for a joint action towards battling AMR N. gonorrhoeae.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and materials
The assemblies included in the current version of the N. gonorrhoeae Pathogenwatch scheme and used for the AMR benchmark analyses were generated from raw sequencing data stored in the ENA. Project accession numbers are included in Additional File 1: Tables S1 and S6. The generated assemblies can be downloaded from Pathogenwatch. The AMR library can be accessed from: https://gitlab.com/cgps/pathogenwatch/amr-libraries/-/blob/master/485.toml. The code to reproduce the figures and analyses in this manuscript can be found in https://gitlab.com/cgps/pathogenwatch/publications.
Competing interests
The authors declare that they have no competing interests.
Funding
Pathogenwatch is developed with support from Li Ka Shing Foundation (Big Data Institute, University of Oxford) and Wellcome (099202). LSB and DMA are supported by the Li Ka Shing Foundation (Big Data Institute, University of Oxford) and the Centre for Genomic Pathogen Surveillance (CGPS, http://pathogensurveillance.net). DMA and SA are supported by the National Institute for Health Research (UK) Global Health Research Unit on Genomic Surveillance of AMR (16_136_111). The department of MJC receives funding from the European Centre for Disease Prevention and Control and the National Institute for Health Research (Health Protection Research Unit) for gonococcal whole-genome sequencing. YHG was supported by the NIH/NIAID grant R01 AI132606. KCM is supported by the NSF GRFP grant number DGE1745303. TDM is supported by the National Institute of Allergy and Infectious Diseases at the National Institutes of Health [1 F32 AI145157-01]. WMS is a recipient of a Senior Research Career Scientist Award from the Biomedical Laboratory Research and Development Service of the Department of Veterans. Work on antibiotic resistance in his laboratory is supported by NIH grants R37 AI-021150 and R01 AI-147609. The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the Department of Veterans Affairs, The National Institutes of Health or the United States Government. The findings and conclusions in this article are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention. The WHO Collaborating Centre for Gonorrhoea and other STIs directed by MU receives funding from the European Centre for Disease Prevention and Control and the World Health Organization. This publication made use of the Neisseria Multi-Locus Sequence Typing website (https://pubmlst.org/neisseria/) sited at the University of Oxford (43) and funded by Wellcome and European Union.
Authors’ contributions
DMA conceived the Pathogenwatch application. CY, RG, KA, BT, AU and DMA developed the Pathogenwatch application. LSB and DMA contributed to the conception and design of the work. CY and LSB generated, updated and benchmarked the N. gonorrhoeae AMR library. BT, CY, AU and LSB obtained, quality-checked and reassembled the raw data from the ENA. LSB revised the assembled data, obtained all metadata available from the corresponding scientific publications and created collections. LSB drafted the manuscript. LSB, DMA, CY, SA, KCM, TDM, MJC, YHG, IM, BHR, WMS, GS, KT, TW and MU contributed to the acquisition, interpretation and discussion of the data. LSB, CY and LSB analysed the data. All authors read and approved the final manuscript.
Acknowledgements
We would like to thank MJC, YHG, IM, BHR, WMS, GS, KT, TW and MU for their support on the development of the N. gonorrhoeae Pathogenwatch scheme and the creation of the N. gonorrhoeae Pathogenwatch Scientific Steering Group.
Footnotes
↵^ Current members of the N. gonorrhoeae Pathogenwatch Scientific Steering Group.
Author order updated/fixed.
List of abbreviations
- AGSP
- Australian Gonococcal Surveillance Programme
- AMR
- Antimicrobial Resistance
- AZM
- Azithromycin
- CDC
- Centers for Disease Control and Prevention
- CFM
- Cefixime
- cgMLST
- Core Genome Multi-Locus Sequence Typing
- CIP
- Ciprofloxacin
- CRO
- Ceftriaxone
- ECOFF
- Epidemiological Cut-Off
- EGASP
- Enhanced Gonococcal Antimicrobial Surveillance Programme
- ENA
- European Nucleotide Archive
- ESCs
- Extended Spectrum Cephalosporins
- EUCAST
- European Committee on Antimicrobial Susceptibility Testing
- Euro-GASP
- European Global Antimicrobial Surveillance Programme
- FN
- False Negative
- FP
- False Positive
- GASP
- Global Gonococcal Antimicrobial Surveillance Programme
- GISP
- Gonococcal Isolate Surveillance Project
- GRASP
- Gonococcal Resistance to Antimicrobials Surveillance Programme
- HIV
- Human Immunodeficiency Virus
- LMICs
- Low and Middle-Income Countries
- MIC
- Minimum Inhibitory Concentration
- MLST
- Multi-Locus Sequence Typing
- NG-MAST
- N. gonorrhoeae Multi-Antigen Sequence Typing
- NG-STAR
- N. gonorrhoeae Sequence Typing for Antimicrobial Resistance
- NPV
- Negative Predictive Value
- PEN
- Benzylpenicillin
- PPV
- Positive Predictive Value
- SNPs
- Single Nucleotide Polymorphisms
- ST
- Sequence Type
- STI
- Sexually-Transmitted Infection
- TET
- Tetracycline
- TN
- True Negative
- TP
- True Positive
- UK
- United Kingdom
- WGS
- Whole Genome Sequencing
- WHO
- World Health Organization
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.
- 16.↵
- 17.↵
- 18.↵
- 19.
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.
- 53.
- 54.
- 55.
- 56.↵
- 57.↵
- 58.↵
- 59.
- 60.↵
- 61.
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.
- 94.
- 95.
- 96.
- 97.
- 98.
- 99.
- 100.
- 101.
- 102.
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.
- 113.
- 114.
- 115.
- 116.
- 117.
- 118.
- 119.
- 120.
- 121.
- 122.
- 123.
- 124.
- 125.
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵