ABSTRACT
A Gram-negative rod from the Yersinia genus was isolated from a clinical case of yersiniosis in the United Kingdom. Long read sequencing data from an Oxford Nanopore Technology (ONT) MinION in conjunction with Illumina HiSeq reads were used to generate a finished quality genome of this strain. Overall Genome Related Index (OGRI) of the strain was used to determine that it was a novel species within Yersinia, despite biochemical similarities to Yersinia enterocolitica. The 16S ribosomal RNA gene accessions are MN434982-MN434987 and the accession number for the complete and closed chromosome is CP043727. The type strain is CFS3336T (=NCTC 14382T/ =LMG Accession under process).
INTRODUCTION AND BACKGROUND
The majority of species within the Yersinia genus are considered to be non-pathogenic and are found broadly within the environment. Pathogenic members of Yersinia have been shown to evolve independently following the acquisition of virulence genes in select lineages (1). Yersinia pestis, arguably one of the most historically serious zoonotic pathogens reported (2), is the agent of bubonic, pneumonic, and septicaemic plague (3). Two other species, Yersinia enterocolitica and Yersinia pseudotuberculosis, are the aetiological agents of the human gastrointestinal infection yersiniosis (4). As detailed in a 2016 report by the European Food Safety Authority, yersiniosis is the third most commonly reported zoonotic pathogen in Europe (5). Yersinia lends its namesake to the Yersiniaceae family within the Enterobacteriales order (6) and is comprised of 19 species, including the aforementioned pathogenic species and Y. aldovae, Y. aleksiciae, Y. bercovieri, Y. entomophaga, Y. frederiksenii, Y. hibernica, Y. intermedia, Y. kristensenii, Y. massiliensis, Y. mollaretii, Y. nurmii, Y. pekkanenii, Y. rohdei, Y. ruckeri, Y. similis, and Y. wautersii. Two additional subspecies are described with Y. enterocolitica subsp. palearctica and the recently characterised Y. kristensenii subsp. rochesterensis (7).
At a local NHS frontline hospital laboratory, diarrhoeic stool samples were tested with the GI PCR screening test (Fast-Track Diagnostics Bacterial gastroenteritis panel FTD-14.1-64 supplied by Launch Diagnostics). Specimens which tested positive for Yersinia by PCR were then cultured on Cefsulodin irgasan (triclosan) novobiocin (CIN) agar at 28° C for 48 hours. Isolates were referred to the Gastrointestinal Bacteria Reference Unit (GBRU) in Public Health England (PHE) for further speciation and characterisation. As per statutory reporting requirements for infectious diseases, the laboratory reported confirmed cases (PCR and/or culture positive) to PHE’s Second Generation Surveillance System (SGSS).
One such suspected Yersinia isolate, denoted NCTC 14382T, was isolated from an adult human female in the United Kingdom after travel to the Canary Islands in 2018 (8). Identification of Yersinia to the species level by traditional biochemical methods is difficult due to heterogeneous biochemical phenotypes (9), thus all Yersinia isolates receipted at the GBRU are routinely sequenced via an Illumina HiSeq 2500 and subsequent bioinformatics speciation is based on a k-mer (18-mer) based approach comparing k-mers to a known reference database (8). The closest match for this isolate by number of k-mers in the database was Yersinia enterocolitica. Whole genome sequencing data characterised this isolate as ST333, utilising the multi-locus sequence type (MLST) scheme developed by Hall et al. (8,10). A phylogenetic tree previously revealed that the isolate did not cluster with Y. enterocolitica instead being located on a distinct branch (8). This study aims to resolve the taxonomic placement of this isolate as a new species within Yersinia with the support of genomic and biochemical data. As NCTC 14382T was associated with travel to the Canary Islands, the name Yersinia canariae sp. nov. is proposed.
BIOCHEMICAL TESTS
API 20E strips were used in determining the phenotypes of Y. canariae NCTC 14382T and two closely related Yersinia species through a gallery of biochemical tests. The latter is suitable for the identification of Yersinia species (11) but is dependent on incubation temperatures (12,13). Biological triplicates of the test Yersinia strains were assayed with API 20E test strips according to manufacturer’s instructions and incubated at both 28° and 37° C for 24 hours (Table 1). Incubation of NCTC 14382T at 28° C revealed this strain was capable of fermenting almost all carbon sources, similar to Y. enterocolitica 8081 (Fig. 2). Y. canariae NCTC 14382T was negative for utilisation of ODC at 28° C in contrast to Y. enterocolitica. Inoculated API 20E strips incubated at 37° C produced more contrasting phenotypes notably with the loss of inositol fermentation for NCTC 14382T when compared to Y. enterocolitica (Fig. 3). Y. canariae NCTC 14382T was positive for ONPG utilisation in contrast to the other two strains tested at 37° C.
GENOME FEATURES
Y. canariae NCTC 14382T was previously sequenced by an Illumina HiSeq 2500 at Public Health England using the Nextera XP library preparation kit following a retrospective study on yersiniosis isolates cultured from patients between April 2004 and March 2018 (8). To generate a finished quality genome, NCTC 14382T was grown in Luria-Bertani (LB) broth (Sigma) for 18 h at 25° C. Genomic DNA was extracted with a Wizard Genomic DNA Purification Kit (Promega) following the recommended manufacturer’s protocol. DNA was sequenced on the ONT MinION R9.4 flowcell (FLO-MIN106) for approximately 16 hours.
For ONT MinION data, the run metrics were inspected using NanoPlot (version 1.0) (14) before raw FAST5 files were base-called using Guppy (version 3.2.2) with the high accuracy model to FASTQ files. Adapters were trimmed from the raw reads by Porechop (version 0.2.4) using default parameters for SQK-RAD004 before the genome was de novo assembled with Flye (version 2.5) (15,16). The best assembly parameters were empirically determined to include the option flags “meta” and “plasmid” with coverage reduced to 30X for initial contig assembly based on a predicted genome size of ~4.73 Mbp as informed by de novo assembly of short read Illumina data (17). This produced a single contiguous chromosome for which the final consensus sequence was determined following four iterative rounds of long read polishing with Racon (version 1.4.3) (18) using the high accuracy base-called reads produced by Guppy that were previously adaptor trimmed by Porechop. A final round of consensus sequence correction was performed with the same long read data using Medaka (version 0.8.2). Lastly, short read Illumina data were aligned using minimap2 (version 2.17) (19) producing BAM files that were sorted and indexed with Samtools (version 1.9) (20) before four iterative rounds of short read polishing with Pilon (version 1.23) (21).
After assembly, a circular finished quality chromosome of 4,7101,54 bp devoid of any plasmid was generated. Annotation by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) identified 4,370 genes of which 4,132 were coding. 8 copies of the 5S rRNA genes, 7 copies of the 16S rRNA genes, 7 copies of the 23S rRNA genes, 81 tRNA genes, and 6 non-coding RNA genes were present, resulting in a total of 109 RNA genes.
As the genome was corrected by multiple rounds with long read aware polishing, the most common 16S rRNA gene allele (Accession: MN434982) was extracted from the genome for analysis. The full length 16S rRNA gene was queried in the EzBioCloud 16S database as previously recommended by Chun et al. (Table 2) (22,23). For much of the type strains of Yersinia species, NCTC 14382T showed at least 98.7% or higher 16S rRNA gene similarity and thus was queried for overall genomic related index (OGRI).
The average nucleotide identity (ANI) of NCTC 14382T was determined by FastANI (v1.2) against the type sequences of all other Yersinia species (24). Y. canariae NCTC 14382T was most closely related to Y. hibernica, Y. enterocolitica subsp. enterocolitica, Y. enterocolitica subsp. palearctica, Y. kristensenii subsp. rochesterensis, and Y. kristensenii subsp. kristensenii based on ANI values (Table 3). However, the ANI values for NCTC 14382T are below the threshold of ≤95% ANI when compared to the type strains of other Yersinia species, thus suggesting taxonomic placement of NCTC 14382T into a novel species. The digital DNA-DNA hybridization (dDDH) values were calculated with the Type (Strain) Genome Server hosted by the Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ) (25). Using the recommended formula d4 (25,26) with the BLAST+ local alignment tool, in silico DNA-DNA hybridization values of NCTC 14382T revealed additional OGRI data for the support of a novel species (Table 3).
A phylogenetic tree based on conserved core sequences used in a whole genome alignment was generated by Parsnp as previously described (27). The resulting phylogenetic tree was visualized in EvolView2 (28) and showed that Y. canariae clustered distinctly from other Yersinia species and was most closely related to the newly described Y. hibernica (27) (Fig. 1).
On the basis of the biochemical profile, phylogenetic relationships, and OGRI data of isolate NCTC 14382T, evidence for a new species within Yersinia is conclusive in which the name Yersinia canariae sp. nov. is proposed.
Description of Yersinia canariae sp. nov
Yersinia canariae (ca.na’ri.ae. N.L. gen. n. canariae from the Canary Islands in which this strain was associated with travel to the Canary Islands, Canariae insulae).
Cells grow aerobically at 25-37 °C on LB agar, producing 1.5-2.0 mm diameter colonies after 24 hours. At 28 °C, cells are positive for ortho-nitrophenyl-β-D-galactopyranoside hydrolysis, urease utilisation, indole production and fermentation of D-glucose, D-mannitol, inositol, D-sorbitol, D-sucrose, amygdalin, and L-arabinose. At 28 °C, cells are negative for utilisation of L-arginine, L-lysine, L-ornithine, trisodium citrate, H2S production, tryptophan deaminase, gelatinase, L-rhamnose fermentation, and D-melibiose fermentation. In contrast to growth at 28 °C, cells do not ferment inositol at 37 °C. The DNA G+C content is 47.2% and the chromosomal length of the type strain is 4710154 bp.
The type strain, CFS3336T (=NCTC 14382T =LMG Accession under process), was isolated in the United Kingdom from a yersiniosis case associated with travel to the Canary Islands. The complete genome of NCTC 14382T has been deposited into GenBank (accession number, CP043727).