Abstract
The Escherichia/Shigella genus comprises four Escherichia species, four Shigella species, and five lineages currently not assigned to any species, termed ‘Escherichia cryptic clades’ (numbered I, II, III, IV and VI). Correct identification of Escherichia cryptic clades is strongly hindered by the indeterminate taxonomy of this genus. Furthermore, little is known about the cryptic clades as reports of genomic data are scarce. Hence, we searched public databases for whole-genome sequences of Escherichia cryptic clades and characterized these. Following a genomic analysis of the Escherichia/Shigella genus, we also describe a new Escherichia species: Escherichia ruysiae sp. nov (type strain OPT1704T = NCCB 100732T = NCTC 14359T) and provide a closed genome assembly based on Illumina and Oxford Nanopore Technologies sequencing. We screened 79,911 Sequence Read Archive Escherichia records and detected 357 cryptic clade strains (0.44%). Based on average nucleotide identity, these strains should be grouped in seven distinct species: 1) E. coli, Shigella spp. and clade I; 2) Clade II; 3) Escherichia ruysiae sp. nov. (formerly clades III and IV); 4) E. marmotae (formerly clade V); 5) Clade VI; 6) E. albertii and 7) E. fergusonii. Notably, half of the clade I strains carried genes encoding shiga toxin, while ESBL- and carbapenemase-encoding strains were also found.
In conclusion, we provide an improved overview of the Escherichia/Shigella genus and advance our understanding of Escherichia cryptic clades.
Importance Correct definition and identification of bacterial species is essential for clinical and research purposes. Groups of Escherichia strains - “Escherichia cryptic clades” - have not been assigned to species, which causes misidentification of these strains. The significance of our research is threefold. First, we detect 357 cryptic clade strains, many more than previously known. This can serve as a resource for other researchers. Second, we show how these cryptic clades should be assigned to existing or newly defined species. This could improve identification of cryptic clade strains and Escherichia species. Finally, we characterize the genomes in detail, revealing virulence genes encoded in the cryptic clade I genomes.
Footnotes
↵† Members listed in the appendix.
Extra information was added in the discussion regarding concordance between our findings and those of the Genome Taxonomy Database (GTDB).