Genomic Sequencing of Bacillus cereus Sensu Lato Strains Isolated from Meat and Poultry Products in South Africa Enables Inter- and Intranational Surveillance and Source Tracking

ABSTRACT Members of the Bacillus cereus sensu lato species complex, also known as the B. cereus group, vary in their ability to cause illness but are frequently isolated from foods, including meat products; however, food safety surveillance efforts that use whole-genome sequencing (WGS) often neglect these potential pathogens. Here, we evaluate the surveillance and source tracking potential of WGS as applied to B. cereus sensu lato by (i) using WGS to characterize B. cereus sensu lato strains isolated during routine surveillance of meat products across South Africa (n = 25) and (ii) comparing the genomes sequenced here to all publicly available, high-quality B. cereus sensu lato genomes (n = 2,887 total genomes). Strains sequenced here were collected from meat products obtained from (i) retail outlets, processing plants, and butcheries across six South African provinces (n = 23) and (ii) imports held at port of entry (n = 2). The 25 strains sequenced here were partitioned into 15 lineages via in silico seven-gene multilocus sequence typing (MLST). While none of the South African B. cereus sensu lato strains sequenced here were identical to publicly available genomes, six MLST lineages contained multiple strains sequenced in this study, which were identical or nearly identical at the whole-genome scale (≤3 core single nucleotide polymorphisms). Five MLST lineages contained (nearly) identical genomes collected from two or three South African provinces; one MLST lineage contained nearly identical genomes from two countries (South Africa and the Netherlands), indicating that B. cereus sensu lato can spread intra- and internationally via foodstuffs. IMPORTANCE Nationwide foodborne pathogen surveillance programs that use high-resolution genomic methods have been shown to provide vast public health and economic benefits. However, Bacillus cereus sensu lato is often overlooked during large-scale routine WGS efforts. Thus, to our knowledge, no studies to date have evaluated the potential utility of WGS for B. cereus sensu lato surveillance and source tracking in foodstuffs. In this preliminary proof-of-concept study, we applied WGS to B. cereus sensu lato strains collected via South Africa’s national surveillance program of domestic and imported meat products, and we provide strong evidence that B. cereus sensu lato can be disseminated intra- and internationally via the agro-food supply chain. Our results showcase that WGS has the potential to be used for source tracking of B. cereus sensu lato in foods, although future WGS and metadata collection efforts are needed to ensure that B. cereus sensu lato surveillance initiatives are on par with those of other foodborne pathogens.

here (Table 1). One such sequence typing framework relies on the pantoate-b-alanine ligase gene (panC) to assign B. cereus sensu lato strains to one of seven or more major phylogenetic groups, which have been proposed to conceptually serve as "species" (31,32). Using the adjusted eight-group panC typing approach implemented in BTyper3 (33), the 25 B. cereus sensu lato strains sequenced here encompassed four panC phylogenetic groups (i.e., "species"; Fig. 2 and Table 1). The majority (n = 18 of 25, 72.0%) of the strains sequenced here were assigned to panC group IV ( Fig. 2 and Table 1). The remaining strains were assigned to panC groups III, II, and V (n = 5, 1, and 1 strains, representing 20.0%, 4.0%, and 4.0% of isolates sequenced here, respectively; Fig. 2 and Table 1).
Using the Genome Taxonomy Database (GTDB) taxonomy, the 25 B. cereus sensu lato strains sequenced here encompassed eight genomospecies ( Fig. 2 and Table 1). The 18 panC group IV strains sequenced here encompassed four GTDB genomospecies, while the five panC group III strains spanned two GTDB genomospecies ( Fig. 2 and Table 1). The panC group II and group V strains (n = 1 each) were each assigned to separate GTDB genomospecies ( Fig. 2 and Table 1).
Using a standardized genomospecies-subspecies-biovar (GSB) nomenclatural framework proposed for B. cereus sensu lato in 2020 (34) (referred to here as the "2020 GSB" framework), the 25 strains sequenced here encompassed three genomospecies ( Fig. 2 and Tables 1 and 2). All 18 panC group IV strains were assigned to the B. cereus sensu stricto genomospecies ( Fig. 2 and Tables 1 and 2); genes encoding insecticidal toxins (referred to here as "Bt toxin-encoding genes") were detected within all group IV B. cereus sensu stricto genomes (using BtToxin_scanner2's "old" gene detection approach), meaning that these 18 strains were predicted to belong to the Thuringiensis biovar (i.e., B. cereus sensu stricto bv. Thuringiensis; Table 2). All five panC group III and the single panC group II strain(s) were assigned to genomospecies B. mosaicus within the 2020 GSB framework ( Fig. 2 and Tables 1 and 2). One panC group III B. mosaicus strain (i.e., strain S66) was assigned to PubMLST sequence type 26 (ST26) and possessed cereulide (emetic toxin) synthetase-encoding cesABCD and was thus assigned to the cereus subspecies and biovar Emeticus (i.e., B. mosaicus subsp. cereus bv. Emeticus; Table 2). The lone panC group V strain sequenced here was assigned to the B. toyonensis genomospecies; Bt toxin-encoding genes were detected in this genome (via BtToxin_scanner2's "old" gene detection Two strains were isolated from meat products imported from the Netherlands (NL). approach), and thus this strain was predicted to belong to biovar Thuringiensis (i.e., B. toyonensis bv. Thuringiensis; Fig. 2 and Tables 1 and 2).
Using a rapid, average nucleotide identity (ANI)-based pseudo-gene flow unit (GFU) assignment scheme, which attempts to assign B. cereus sensu lato genomes to taxonomic units that mimic B. cereus sensu lato "species" previously delineated using recent gene flow (33), the 25 strains sequenced here were assigned to six pseudo-GFUs ( Fig. 2 and Table S1). The 18 panC group IV and five panC group III strains sequenced here each spanned two pseudo-GFUs ( Fig. 2 and Table S1). The panC group II and group V strains (n = 1 each) were each assigned to separate pseudo-GFUs, respectively ( Fig. 2 and Table S1).
As mentioned above, considerable phenotypic diversity was predicted among strains sequenced here; one strain harbored cereulide synthetase-encoding genes, and 19 strains possessed Bt toxin-encoding genes (detected using BtToxin_scanner2's "old" gene detection approach; Table 2), although the toxin production and insecticidal capabilities of the strains sequenced here were not evaluated in vitro or in vivo. No anthrax toxin-or capsule-encoding genes were identified within the genomes of the isolates sequenced here (Table 2).
Overall, regardless of whether the panC, GTDB, 2020 GSB, or pseudo-GFU assignment frameworks were used, B. cereus sensu lato strains isolated from meat and poultry products in South Africa were considerably diverse and represented multiple genomospecies ( Fig. 2; Tables 1 and 2; Table S1). Additionally, using PubMLST's seven-gene MLST scheme for B. cereus, 17 of 25 strains (68.0%) encompassed 11 STs, with eight strains (32.0%) assigned to FIG 2 Maximum likelihood (ML) phylogeny constructed using amino acid sequences derived from the 25 B. cereus sensu lato isolate genomes sequenced in this study (tip labels colored by geographic origin) and type strain/species representative genomes of 23 published and effective B. cereus sensu lato species (gray tip labels). The heat map to the right of the phylogeny showcases species assignments obtained within the following taxonomic frameworks (from left to right): (i) genome taxonomy database (GTDB) release 05-RS95 and GTDB-Tk (GTDB R95), (ii) pseudo-gene flow units (GFUs) assigned using BTyper3 (pseudo-GFU), (iii) genomospecies of the 2020 standardized B. cereus sensu lato genomospecies-subspecies-biovar (GSB) framework and BTyper3 (2020 GSB), and (iv) panC group (I to VIII) assigned using BTyper3 (panC group). The phylogeny was constructed using IQ-TREE using core orthologs identified among all genomes via OrthoFinder as input. Branch lengths are reported in substitutions per site. Branch labels correspond to branch support percentages obtained using 1,000 replicates of the ultrafast bootstrap approximation (selected for readability). The type strain genome of effective B. cereus sensu lato species "B. manliponensis" (the most distant recognized member of B. cereus sensu lato) was used to root the phylogeny. Heat map legends for all four taxonomies are colored by their order of appearance in the heat map, from top to bottom; white heatmap cells denote genomes that could not be assigned to a taxonomic unit within a given taxonomic framework.   Sequence type (ST) assigned using the PubMLST seven-gene multilocus sequence typing (MLST) scheme for B. cereus and BTyper3; NA, not available.
g Species, subspecies (where applicable), and biovar (where applicable) assigned using the 2020 genomospecies-subspecies-biovar (GSB) nomenclatural framework for B. cereus sensu lato and BTyper3; multiple taxonomic labels are listed for strains that can be referenced using shorted subspecies and/or biovar notation (separated by a semicolon).
h Biovar Thuringiensis was assigned to genomes in which Bt insecticidal toxin-encoding genes were detected using BtToxin_scanner2's "old" gene detection approach and default settings (which is less conservative than BTyper3's Bt toxin gene detection approach).
i Despite GTDB assigning a species label of "B. anthracis," these strains cannot cause anthrax illness nor do they belong to the classic "B. anthracis" lineage most commonly responsible for anthrax illness (34). j The gene was present when the minimum coverage threshold was lowered to 0%.
South African Bacillus cereus Genomic Sequencing Microbiology Spectrum unknown STs (Table 1 and Table S1). Due to the considerable genomic diversity observed among isolates sequenced here, major lineages represented by strains sequenced in this study are discussed individually in detail below, largely within the context of panC groups and/or MLST STs, as these frameworks are well established (31,32,35,36) and likely the most interpretable to readers. Several B. cereus sensu lato lineages within panC group IV are distributed across multiple South African provinces. Within panC group IV, the 18 strains sequenced here were partitioned into 10 lineages using MLST (referred to here as "MLST lineages"; Table 3). Based on (i) whole-genome phylogenies, (ii) pairwise core single nucleotide polymorphisms (SNPs) identified within MLST lineages, and (iii) ANI values calculated within and between MLST lineages, 4 of 10 panC group IV MLST lineages contained South African strains sequenced in this study, which were highly similar to at least one other strain at the whole-genome level ( Fig. 3 and 4 and Table 3). One of these lineages (denoted in Table 3 as lineage IVA) was composed of three South African strains sequenced in this study (S65, S85, and S87), which were assigned to GTDB's "B. bombysepticus" genomospecies and belonged to an unknown ST ( Fig. 3 and 4 and Table 3). Despite all three genomes being nearly identical (pairwise core SNP distance of 0, pairwise ANI of .99.99; Fig. 3 and Table 3), the three strains were isolated from (i) three different establishments and provinces (a processing plant in Limpopo, a retail outlet in Gauteng, and a processing plant in Free State) and (ii) two different types of meat products (two strains from beef wors and one from a poultry frankfurter; Fig. 4 and Table 1).
Similar results were observed for panC group IV ST2721 (lineage IVB in Table 3), which was also assigned to GTDB's "B. bombysepticus" genomospecies and contained three strains sequenced in this study (S56, S79, and New_S84; Fig. 3 Table 3). The three ST2721 genomes were nearly identical (pairwise core SNP distance of 0, pairwise ANI of .99.99; Table 3), despite the fact that the strains originated from three different meat products (one from each of beef wors, beef biltong, and processed beef patties) obtained from three different establishments (from a processing plant in North West province, a retail outlet in North West province, and a retail outlet in Free State, respectively; Fig. 4 and Table 1).

and 4 and
A third panC group IV lineage of unknown ST, which was assigned to GTDB's B. cereus species (lineage IVD), contained two isolates sequenced in this study (S81 and S88; Fig. 3 and 4 and Table 3). One strain (S81) had been isolated from beef wors in a processing plant in the North West province; the other strain (S88) was from RTE beef biltong in a retail outlet in Limpopo ( Fig. 4 and Table 1). Both strains were highly similar on a genomic scale (.99.99 ANI) and differed by three core SNPs ( Fig. 3 and Table 3). For reference, in a previous point source foodborne outbreak caused by B. cereus sensu lato (37), outbreak isolates could differ by up to seven core SNPs (using the same SNP-calling methodology used here) (38).
A fourth panC group IV lineage, ST2668, contained three strains sequenced in this study (S53, S63, and S70), which were assigned to GTDB's "B. thuringiensis_S" genomospecies (lineage IVH; Fig. 3 Table 3). The three strains sequenced here differed by, at most, a single core SNP (Table 3), even though all had been isolated from different meat products (processed beef patties, beef biltong, and processed beef mince) obtained from different establishments (i.e., from retail outlets in each of Gauteng, Free State, and Limpopo, respectively; Fig. 4 and Table 1). One publicly available genome associated with a B. cereus sensu lato strain isolated from grass in KarieDeshe, Israel (39), was additionally assigned to ST2668 (NCBI RefSeq accession number GCF_005217245.1); however, this strain was not closely related to the three nearly identical South African ST2668 strains sequenced in this study ( Table 3).

and 4 and
The remaining six MLST lineages (i.e., lineages IVC, IVE, IVF, IVG, IVI, and IVJ corresponding to an unknown ST, ST177, ST2289, ST24, ST1697, and ST1578, respectively; Table 3) contained isolates sequenced in this study, which were not closely related to any other strains at the whole-genome level ( Fig. 3 and Table 3). Lineages IVC, IVF, and IVI were singleton lineages, which each contained one genome sequenced in this   k Despite GTDB assigning a species label of "B. anthracis," these strains cannot cause anthrax illness nor do they belong to the classic "B. anthracis" lineage most commonly responsible for anthrax illness (34).
South African Bacillus cereus Genomic Sequencing Microbiology Spectrum study (S67, S86, and S78, respectively), and shared 98.90 to 99.35 ANI with their closest publicly available neighbors ( Fig. 3 and Table 3). Two lineages (IVG and IVJ) each contained multiple genomes, but only one genome sequenced in this study (i.e., S77 and S51, respectively; Fig. 3 and Table 3); for both lineages, the South African genome sequenced here was not identical to any publicly available genomes ($94 core SNP distance; Table 3). The remaining lineage, lineage IVE (ST177), contained multiple genomes as well as multiple genomes sequenced in this study ( Fig. 3 and Table 3). Notably, the two ST177 strains sequenced in this study (S55 and S80), which were assigned to this lineage, were not identical and were isolated from beef wors from processing plants in the Limpopo and North West provinces, respectively (  Tip label colors and clade labels denote species assigned using GTDB-Tk ("GTDB Species"). The heat map to the right of the phylogeny denotes (i) whether a strain was sequenced in this study (dark pink) or not (light pink; "Study") and (ii) multilocus sequence typing (MLST) sequence types (STs) associated with strains sequenced in this study, where applicable (colored), or not (gray; "ST"). MLST lineages discussed in Table 3 are annotated to the right of the heat map ("MLST Lineage"). MLST lineages with numerical superscripts contain two or more strains sequenced in this study, which were highly similar on a genomic scale; these lineages are depicted in Fig. 4. Branch lengths are reported in substitutions per site.
A panC group III B. cereus sensu lato lineage with a novel sequence type was detected in beef and poultry products from two South African provinces. Two panC group III B. cereus sensu lato strains sequenced here were assigned to an unknown ST (S58 and S64; Fig. 5 and Table 1). Despite having been isolated from different products (S58 from a raw chicken thigh and S64 from RTE sausage emulsion) from different establishments (retail outlets in Mpumalanga and Western Cape, respectively), the strains were nearly identical (pairwise core SNP distance of 0, pairwise ANI of .99.84; Fig. 4 and 5 and Table 3), indicating that these two strains may share a source.
A panC group III B. cereus sensu lato strain with cereulide synthetase-encoding genes belongs to the "high-risk" ST26 lineage. One panC group III B. cereus sensu lato strain (S66) was assigned to GTDB's B. paranthracis genomospecies and possessed cereulide synthetase-encoding genes ( Fig. 5 and Table 2). This strain, which had been isolated from processed beef mince from a processing plant in Mpumalanga, was a member of ST26 ( Fig. 5 and Table 2), the ST to which most cereulide-producing B. cereus sensu lato strains belong (although it should be noted that ST26 strains may be capable of producing enterotoxins and causing diarrheal illness regardless of whether they produce cereulide or not) (38,40,41). While members of ST26 are comparatively closely related (.99.52 pairwise ANI), WGS was able to distinguish the South African strain sequenced here from closely related ST26 strains (pairwise core SNP distance of $85 relative to publicly available genomes after removing recombination; Fig. 5 and Table 3).
A panC group III B. cereus sensu lato lineage assigned to ST2413 shows evidence of intercontinental dissemination. Two panC group III B. cereus sensu lato strains sequenced in this study (S59 and S62) were assigned to ST2413 within GTDB's B. paranthracis species (Fig. 5 and Table 3). Unlike the ST26 strain, which was also assigned to GTDB's B. paranthracis species, neither ST2413 strain possessed cereulide synthetase- FIG 4 B. cereus sensu lato multilocus sequence typing (MLST) lineages that contained two or more strains sequenced in this study, which were identical or nearly identical at the whole-genome scale (pairwise core single nucleotide polymorphism [SNP] differences of #3). Lineage names and sequence types (STs), where applicable, are shown in the top right corner. Geographic and source origins of each strain are displayed in the respective map. Strains affiliated with the Netherlands (n = 2) were isolated within South African borders; however, the poultry products from which they were isolated were confirmed to originate from the Netherlands.
South African Bacillus cereus Genomic Sequencing Microbiology Spectrum encoding genes ( Fig. 5 and Table 2). Both strains were isolated from raw chicken; however, S59 was isolated at port of entry from a chicken quarter leg that had been imported from the Netherlands, and S62 was isolated from a chicken thigh sold at a retail outlet in Mpumalanga ( Fig. 4 and Table 1). Notably, these strains were nearly identical on a genomic scale (pairwise core SNP distance of 1, pairwise ANI of 100.0; Fig. 5 and Table 3), despite originating from different continents (i.e., Europe and Africa; Fig. 4 and Table 1).
A panC group II B. cereus sensu lato strain assigned to ST794 is most closely related to a food-associated strain responsible for diarrheal illness in Norway. One panC group II B. cereus sensu lato strain was sequenced in this study (S57) and was assigned to ST794 within GTDB's B. wiedmannii species (Fig. 5 and Table 1). Strain S57 had been isolated from RTE beef biltong sold at a retail outlet in Free State (Table 1). FIG 5 Maximum likelihood phylogeny constructed using core SNPs identified among orthologous gene clusters of 597 B. mosaicus (as defined within the 2020 genomospecies-subspecies-biovar [GSB] taxonomy). The phylogeny was rooted using panC group IV B. cereus strain ATCC 14579 as an outgroup (NCBI RefSeq accession number GCF_006094295.1; omitted for readability). Tip label colors and clade labels denote species assigned using GTDB-Tk ("GTDB Species"). The heat map to the right of the phylogeny denotes (i) whether a strain possesses anthrax toxin-encoding genes cya, lef, and pagA (dark orange) or not (light orange; "Anthrax"); (ii) whether a strain possesses cereulide synthetase-encoding cesABCD (dark purple) or not (light purple; "Emetic"); (iii) whether a strain belongs to panC group II (blue) or III (yellow; "panC"); (iv) whether a strain was sequenced in this study (dark pink) or not (light pink; "Study"); and (v) multilocus sequence typing (MLST) sequence types (STs) associated with strains sequenced in this study, where applicable (colored), or not (gray; "ST"). MLST lineages discussed in Table 3 are annotated to the right of the heatmap ("MLST Lineage"). MLST lineages with numerical superscripts contain two or more strains sequenced in this study, which were highly similar on a genomic scale; these lineages are depicted in Fig. 4. Branch lengths are reported in substitutions per site.

South African Bacillus cereus Genomic Sequencing
Microbiology Spectrum Compared to the two publicly available ST794 genomes, S57 shared greater than 99.9 ANI with both publicly available genomes and differed by 31 and 100 core SNPs (relative to genomes with NCBI RefSeq assembly accession numbers GCF_900094845.1 and GCF_007671965.1, respectively; Fig. 5 and Table 3). Notably, beef biltong-associated strain S57 sequenced here was most closely related to strain NVH 0674-98, a psychrotolerant strain that had been isolated in Norway from mashed swedes, which were reportedly responsible for diarrheal foodborne illness (42). The other ST794 strain, DE0555, was an environmental isolate collected in 2018 from Durham, NC, in the United States (NCBI BioSample accession number SAMN11792715).
A panC group V B. cereus sensu lato strain from South African mixed-meat wors most closely resembles a plant-associated strain from the United States. One panC group V B. cereus sensu lato strain was sequenced in this study (S72) and was assigned to ST223 within GTDB's B. toyonensis species (Fig. 6 and Tables 1 to 3). S72 had been isolated in a butchery in Gauteng from a processed wors composed of a mix of beef, FIG 6 Maximum likelihood phylogeny constructed using core SNPs identified among orthologous gene clusters of 219 panC group V B. toyonensis genomes. The phylogeny was rooted using panC group IV B. cereus strain ATCC 14579 as an outgroup (NCBI RefSeq accession number GCF_006094295.1; omitted for readability). The heat map to the right of the phylogeny denotes (i) whether a strain was sequenced in this study (dark pink) or not (light pink; "Study") and (ii) multilocus sequence typing (MLST) sequence types (STs) associated with strains sequenced in this study, where applicable (colored), or not (gray; "ST"). MLST lineages discussed in Table 3 are annotated to the right of the heat map. Branch lengths are reported in substitutions per site. pork, and lamb (Table 1). Strain S72 was most closely related to a publicly available genome, strain AFS092321 (NCBI RefSeq accession number GCF_002552615.1), which had been isolated in 2014 from a tree leaf in North Carolina, United States (NCBI BioSample accession SAMN07598612) (43); the two strains shared greater than 99.9 ANI and differed by 23 core SNPs ( Fig. 6 and Table 3).

DISCUSSION
B. cereus sensu lato lineages can be disseminated inter-and intranationally via the food supply chain. The movement of commodities (e.g., foods, animals, animal products, agricultural products, and consumer products) through inter-and intranational trade can contribute to the global, regional, and local dissemination of microorganisms, including pathogens (44)(45)(46)(47)(48). The international agro-food trade specifically plays an increasingly pivotal role in providing food supplies to communities around the globe but can contribute to the dissemination of foodborne pathogens (47,48). Consequently, high-resolution technologies, such as WGS, are being used increasingly to monitor the spread of pathogens along the food supply chain (49,50).
Using WGS, we identified six South African B. cereus sensu lato lineages, which showcased evidence of interregional dissemination ( Fig. 4 and Table 3). Notably, one B. cereus sensu lato lineage showed evidence of intercontinental spread between Europe and Africa. A B. cereus sensu lato ST2413 strain isolated from raw chicken sold in retail outlets in Mpumalanga, South Africa, was identical to a ST2413 strain isolated from chicken imported from the Netherlands and tested for B. cereus sensu lato at port of entry. We may hypothesize that the raw chicken sold in Mpumalanga originated from the Netherlands, as the Netherlands was the second-largest exporter of chicken meat products to South Africa in 2014 to 2016 (i.e., the time frame in which the strains sequenced here were collected) (51); however, this is merely a hypothesis, as we were unable to confirm this with the retail outlet, and no publicly available B. cereus sensu lato genomes from the Netherlands or elsewhere were closely related to the two ST2413 genomes collected here. Regardless, all B. cereus sensu lato strains isolated from imported meat products for this study were collected at port of entry. Imported food products are routinely inspected for the presence of foodborne pathogens before their entry into the country via the South African government's national foodborne bacterial pathogen surveillance program. Thus, there is strong evidence that the B. cereus sensu lato strains collected from imported meat and poultry products and sequenced here originated from outside South Africa.
We additionally identified five B. cereus sensu lato lineages, which showed evidence of interprovincial spread within South Africa: four panC group IV lineages and one panC group III lineage were each composed of (nearly) identical strains, which were isolated from meat products in two or more South African provinces ( Fig. 4 and Table 3). Thus, it is likely that strains within each lineage shared a common source; however, a lack of additional metadata and genomes prevents confirmation of this. Overall, these results showcase the utility of WGS for B. cereus sensu lato source tracking and surveillance, although future studies relying on additional genomes with detailed metadata are needed.
Nomenclatural frameworks that incorporate both genomic and phenotypic data can improve strain-level B. cereus sensu lato risk assessment. While B. cereus sensu lato strain isolation and culturing protocols (e.g., choice of medium and growth temperature) may preferentially select for or against some B. cereus sensu lato lineages (37,(52)(53)(54), multiple B. cereus sensu lato species were identified among strains isolated from South African meat products, regardless of the taxonomic framework used. B. cereus sensu lato species delineation is notoriously challenging, and numerous B. cereus sensu lato taxonomic frameworks have been proposed (29). Phenotypic traits historically used for B. cereus sensu lato species assignment (e.g., motility, colony morphology, and ability to cause illness) have long been known to be inconsistent with genome evolution (29,31,32). Taxonomies that rely solely on genomic data, however, may lead to incorrect assumptions of a strain's pathogenic potential, particularly when taxonomic labels have deep roots in medicine or industry (29,34).
For example, within some ANI-based taxonomic frameworks (e.g., GTDB), the B. anthracis genomospecies includes B. cereus sensu lato strains that, historically, would be referred to as "B. cereus" or "group III B. cereus"; these strains possess phenotypic characteristics associated with "B. cereus" (e.g., as outlined in the United States Food and Drug Administration's Bacteriological Analytical Manual) (53,54), and, like "B. cereus," they are incapable of causing anthrax (34). These nonanthrax-causing group III B. cereus sensu lato strains have been isolated from diverse environments, including foods (e.g., milk, egg whites, and spices), consumer products (e.g., baby wipes), and soil, indicating that these organisms are not uncommon in environmental and industrial settings (34). Thus, as WGS becomes more popular in clinical and industrial settings, it is possible that professionals who rely solely on increasingly popular genomic methods for taxonomic delineation (e.g., GTDB, ANI-based comparisons to species type strains) may incorrectly assume that these strains can cause anthrax due to the "B. anthracis" species labels that some taxonomic classification programs produce (29,34). Here, during routine surveillance of meat products in South Africa, we identified two panC group III B. cereus sensu lato strains that did not possess anthrax toxin-or capsule-encoding genes and did not belong to the classic "clonal" B. anthracis lineage associated with anthrax disease (34,55). These strains would be classified as "B. cereus" or "group III B. cereus" using standard microbiological assays (53,54); however, these strains were assigned to the "B. anthracis" genomospecies using GTDB and similar ANI-based methods (Table 1). While it is possible for B. anthracis strains to lose virulence plasmids during storage (56), the isolates sequenced here were not members of any anthrax toxin gene-harboring lineages (Fig. 5). Thus, referring to these strains as "B. anthracis" would be misleading as they cannot cause anthrax, and this potential miscommunication could have disastrous public health and industrial consequences.
We additionally isolated three B. cereus sensu lato strains from beef and poultry products, which were assigned to the "B. paranthracis" genomospecies via GTDB and similar ANI-based methods ( Table 1). As noted previously, "B. paranthracis" was proposed as a "novel" species in 2017 (57) but was later found to encompass the wellknown foodborne pathogen "emetic B. cereus" within its genomospecies boundary (29,37,38). One of the "B. paranthracis" strains isolated here indeed possessed cereulide synthetase-encoding genes and belonged to ST26 (Table 2), the ST that encompasses most B. cereus sensu lato strains capable of producing emetic toxin (38). This strain thus likely poses a food safety threat and would most likely be referred to as "emetic B. cereus" in clinical or industrial settings. Referring to this strain as "B. paranthracis" could be misleading to researchers, clinicians, and other professionals who are not well versed and up to date in B. cereus sensu lato taxonomy (29,34,38). However, not all "B. paranthracis" strains are capable of producing emetic toxin. Here, two ST2413 strains isolated from poultry were assigned to the "B. paranthracis" genomospecies but did not possess cereulide synthetase-encoding genes ( Table 2), indicating that these strains cannot cause emetic intoxication. Thus, differentiating potentially emetic from nonemetic strains is critical for informing public health and food safety efforts.
Recently, we proposed a standardized nomenclatural framework for B. cereus sensu lato (i.e., the 2020 GSB framework), which can use genomic, genetic, and/or phenotypic information for taxonomic classification (33,34). Importantly, the 2020 GSB framework relies on a standardized collection of biovars (i.e., biovars Anthracis, Emeticus, and Thuringiensis), which can be applied to individual B. cereus sensu lato strains to convey phenotypes of clinical and/or industrial importance (i.e., ability to produce anthrax, emetic, and insecticidal toxins, respectively) (33,34). Within this framework, the absence of the Anthracis biovar term denotes that B. cereus sensu lato strains sequenced here cannot produce anthrax toxin, while the presence/absence of the Emeticus biovar term differentiates cereulide-producing "B. paranthracis" from noncereulide-producing strains ( Table 2). While the 2020 GSB framework provides a standardized set of B. cereus sensu lato genomospecies (Tables 1 and 2) (33, 34), researchers and other professionals may prefer to use more established names for lineages (e.g., obtained via MLST, panC group assignment); biovar terms can thus be appended to B. cereus sensu lato lineage names (e.g., the ST26 strain sequenced here, which possesses cereulide synthetase, can be referred to as "B. cereus sensu lato ST26 biovar Emeticus"). Overall, standardized taxonomic frameworks that can incorporate both genomic/genetic and phenotypic information may improve strain-level risk evaluation of B. cereus sensu lato.
WGS may improve B. cereus sensu lato surveillance, traceback investigations, and source tracking in the future. Here, we showed that WGS may conceptually be used for B. cereus sensu lato surveillance and source tracking; however, our study has numerous limitations. First and foremost, we are limited by the relatively small number of isolates that we were able to sequence in this study. While WGS is being increasingly used for foodborne pathogen surveillance in Africa (58), the vast majority of WGSbased foodborne pathogen surveillance efforts are concentrated in world regions with lower burdens of foodborne illness (e.g., the United States and Europe) (18,58). Thus, future WGS efforts are needed to gain further insight into the B. cereus sensu lato lineages circulating within the South African food supply chain.
Second, our study is limited by a lack of publicly available (i) genomic data and (ii) corresponding metadata associated with B. cereus sensu lato strains. For example, we identified two identical B. cereus sensu lato ST2413 strains, which were present in both Dutch and South African raw poultry. However, due to a lack of additional publicly available ST2413 B. cereus sensu lato genomes, we were unable to gain additional insights into exactly where this lineage originated. WGS has been shown to improve surveillance and source tracking efforts for numerous foodborne pathogens, including Escherichia coli, Salmonella enterica, and Listeria monocytogenes (50,59). While the amount of publicly available WGS data derived from members of B. cereus sensu lato is increasing (34), efforts to sequence the genomes of food-associated B. cereus sensu lato strains are lagging relative to other foodborne pathogens. Only 2,664 assembled genomes from all B. cereus sensu lato species that met the quality thresholds used in this study were available in NCBI's RefSeq Assembly database (60, 61) during the time this study was conducted (note that the NCBI GenBank Assembly database did not have any large multi-isolate B. cereus sensu lato projects at this time; accessed 20 March 2021). This can be contrasted with other foodborne pathogens for which source tracking and surveillance have proven to be successful (50,59), as the numbers of publicly available genomes from these organisms are orders of magnitude greater than the number of genomes from all B. cereus sensu lato species combined (e.g., Salmonella enterica and Listeria monocytogenes, each single species with more than 100 and 10 times as many publicly available genomes, respectively; NCBI GenBank Assembly database, accessed 25 February 2022). Thus, future B. cereus sensu lato surveillance and WGS initiatives in clinical, industrial, and environmental settings are needed to improve B. cereus sensu lato source tracking and traceback efforts. Furthermore, it is essential that data and metadata obtained in such future initiatives are made publicly available, as international sharing of WGS data can decrease both the amount of time required to solve foodborne outbreaks and the public health burden caused by foodborne pathogens (62).
Stemming from the lack of B. cereus sensu lato WGS data and metadata, a final limitation of our study is that there are very few existing studies that have used WGS to characterize B. cereus sensu lato isolates known to come from a single source (e.g., from point-source outbreaks and clusters). Consequently, metrics that are used to determine whether two B. cereus sensu lato genomes are "identical" or derived from a common source (e.g., pairwise SNP cutoffs, core genome MLST allelic differences, and whole-genome phylogenetic topology) (63,64) are sparse and only available for select lineages (e.g., emetic ST26, B. anthracis) (37,38,65). Therefore, future B. cereus sensu lato source tracking and surveillance efforts will benefit greatly not only from more extensive WGS efforts but also improved epidemiological surveillance of illness caused by members of B. cereus sensu lato (29), as more data linking strains to illness are needed.
Overall, the proof-of-concept study detailed here highlights the benefits of WGS for B. cereus sensu lato surveillance and source tracking, even among closely related lineages, and future studies will benefit from increasingly available publicly available WGS data and metadata.

MATERIALS AND METHODS
Isolate selection and whole-genome sequencing. A subset of 34 isolates were selected from a total of 79 B. cereus sensu lato isolates from our previous study (26) using simple random sampling without replacement (66) via random numbers generated in Microsoft Excel. Culturing and genomic DNA extraction was performed as described previously (26) using the High Pure PCR template preparation kit (Roche, Germany). WGS of selected isolates was performed at the Biotechnology Platform, Agricultural Research Council, Onderstepoort, South Africa. DNA libraries were prepared using TruSeq and Nextera DNA library preparation kits (Illumina, San Diego, CA, USA), followed by sequencing on HiSeq and MiSeq instruments (Illumina, San Diego, CA, USA).
Data preprocessing and quality control. Paired-end reads associated with each of the 34 isolates were supplied as input to Trimmomatic v0.38 (67), which was used to remove Illumina adapters and leading and trailing low-quality/ambiguous bases (LEADING:3 and TRAILING:3, respectively); reads with average per base quality scores of ,15 within a 4-bp sliding window (SLIDINGWINDOW:4:15) were additionally cut, and reads with lengths of ,36 bp were removed. FastQC v0.11.5 (https://www.bioinformatics .babraham.ac.uk/projects/fastqc/) was used to assess the quality of the resulting trimmed paired-end reads.
SKESA v2.4.0 (68) was used to assemble each genome using the trimmed paired-end reads as input and default settings. SPAdes v3.13.1 (69) was additionally used to assemble each genome in "careful" mode using trimmed paired-end reads as input. QUAST v4.5 (70) was used to assess the quality of each resulting assembly, and CheckM v1. 1.3 (71) was used to evaluate genome contamination/completeness. MultiQC v1.8 (72) was used to assess genome quality in aggregate. Assemblies produced using SKESA were used in all subsequent steps, as they were of higher quality based on metrics produced by QUAST (e.g., N 50 , number of contigs). Genomes with (i) ,95% completeness (via CheckM), (ii) .5% contamination (via CheckM), and/or (iii) an N 50 of ,20 kbp were considered to be of low quality and were excluded (n = 4), yielding a preliminary set of 30 genomes used in subsequent analyses.
Phylogenomic comparison of South African B. cereus sensu lato genomes to B. cereus sensu lato species type strains. Prokka v1.14.6 (78) was used to annotate each of the 25 B. cereus sensu lato genomes sequenced in this study. Protein-coding sequences derived from the type strain genomes of each of the 23 validly published and effective B. cereus sensu lato species (accessed 28 August 2021) were downloaded from NCBI's RefSeq Assembly database (see Table 1  The resulting amino acid sequence alignment was supplied as input to IQ-TREE v1.5.4 (85), which was used to construct a maximum likelihood (ML) phylogeny, using 1,000 replicates of the ultrafast bootstrap approximation (86) plus the optimal amino acid substitution model selected using ModelFinder (i.e., the general matrix model with empirical amino acid frequencies and the FreeRate model with six categories; JTT1F1R6) (87)(88)(89)(90). The resulting phylogeny was rooted using effective species "B. manliponensis" (i.e., the most distant recognized member of B. cereus sensu lato) (91) and annotated using the bactaxR package (34) in R v4.1.2 (92).
Acquisition of genomes from a study of B. thuringiensis outbreaks. All sequencing reads associated with isolates from a previous study of outbreaks caused by B. thuringiensis in France (108) were downloaded from NCBI's Sequence Read Archive (SRA) database using the SRA Toolkit v 2.8.2 (109,110). Genomic data for all 171 isolates were preprocessed, assembled, and taxonomically classified as described above, with genomes assembled using SKESA used in subsequent steps (see Data preprocessing and quality control and In silico typing and taxonomic characterization). Four genomes did not meet the quality thresholds used in this study (see Acquisition of publicly available B. cereus sensu lato genomes) and were thus excluded, yielding 167 genomes from the study that were used in subsequent analyses (Table S3).
Acquisition of genomes derived from strains isolated in conjunction with a previous outbreak caused by emetic B. cereus sensu lato. The genomes of 33 B. cereus sensu lato strains isolated in conjunction with a 2016 emetic outbreak in New York State (United States) were downloaded, preprocessed, and assembled as described previously (37). The quality of each of the 33 genomes was assessed as described above (see Data preprocessing and quality control), and all genomes underwent taxonomic classification and typing as described above (see In silico typing and taxonomic characterization). Two genomes did not meet the quality thresholds used in this study (see Acquisition of publicly available B. cereus sensu lato genomes) and were thus excluded, yielding 31 genomes from the study that were used in subsequent analyses (Table S4).
Within-group phylogeny construction. The 25 B. cereus sensu lato strains sequenced here spanned four major phylogenetic groups based on their panC sequence (i.e., panC groups II, III, IV, and V; Table 1). Thus, phylogenies were constructed using all genomes assigned to each of the following major lineages: (i) panC group IV (Fig. 3), (ii) panC groups II and III, excluding B. luti (Fig. 5), and (iii) panC group V (Fig. 6), which are equivalent to the (i) B. cereus sensu stricto, (ii) B. mosaicus, and (iii) B. toyonensis genomospecies within the 2020 GSB taxonomic framework (33), respectively (panC group II and III genomes were aggregated due to the fact that these lineages are closely related and polyphyletic; Fig. 5).
For each of the three major lineages, Prokka was used to annotate each genome; the resulting general feature format (GFF) files associated with each genome were supplied as input to Panaroo v1.2.8 (111), which was used to partition genes into core-and pan-genome orthologous gene clusters using the following parameters (all other parameters were set to their default values): (i) "strict" mode (-cleanmode strict), (ii) core genome alignment using MAFFT (-a core -aligner mafft), and (iii) a core genome sample threshold of 95% (-core_threshold 0.95). The resulting core genome (nucleotide) alignment was queried using snp-sites v2.5.1 (112), which was used to identify (i) core SNPs and (ii) constant sites among all genomes in the major lineage. The resulting core SNP alignment was supplied as input to IQ-TREE v1.5.4, which was used to construct an ML phylogeny using the general time-reversible (GTR) nucleotide substitution model (113), 1,000 replicates of the ultrafast bootstrap approximation (86), and an ascertainment bias correction obtained using constant sites output by snp-sites.
For each of the three major lineages, all aforementioned steps were repeated, with the addition of an outgroup genome. For the panC group IV phylogeny, panC group III B. anthracis strain Ames Ancestor was used as an outgroup (NCBI RefSeq accession number GCF_000008445.1). For the panC groups II/III and panC group V phylogenies, panC group IV B. cereus strain ATCC 14579 was used as an outgroup (NCBI RefSeq accession number GCF_006094295.1). Additionally, only genomes with detailed metadata (i.e., a reported year of isolation, isolation source, and geographic location) were included in this analysis (Tables S1 to S4). The resulting phylogenies were annotated using the bactaxR package in R.
Delineation of MLST lineages and identification of closely related and "identical" genomes. FastANI v1.31 was used to calculate ANI values between each of the 25 B. cereus sensu lato genomes sequenced in this study (i.e., as a query genome), and all genomes assigned to the panC group of the query genome (panC groups II and III were aggregated); genomes were then grouped into lineages based on STs assigned using seven-gene MLST (see In silico typing and taxonomic characterization). For each of the resulting MLST lineages, FastANI was used to calculate pairwise ANI values between all genomes within the MLST lineage (Table 3).
For each MLST lineage, Snippy v4.6.0 (https://github.com/tseemann/snippy) was used to identify core SNPs among all genomes assigned to the respective MLST lineage using (i) a genome sequenced in this study as a reference genome (Table 3), (ii) paired-end reads associated with each genome as input (for the genomes sequenced in this study as well as the genomes from the Bonis et al. study and the New York State outbreak study) (37,108) and/or assembled genomes as input (for NCBI genomes), and (iii) default settings. For MLST lineages with more than four genomes (i.e., ST24, ST26, ST177, ST223, and ST1578), Gubbins v3.1.3 (114) was used to remove recombination, and core SNPs were identified within the resulting filtered alignment using snp-sites. For all MLST lineages, pairwise core SNP distances were calculated within the MLST lineage (i) among all genomes, (ii) among genomes sequenced in this study, and (iii) between genomes sequenced in this study and publicly available genomes (Table 3) using the dist.gene function in the ape package (115,116) in R.
Data availability. Paired-end Illumina reads associated with the 25 B. cereus sensu lato isolates sequenced in this study have been deposited in NCBI's SRA database under BioProject accession number PRJNA798224. Metadata and quality information for all genomes queried in this study are available in Table S1 (the 25 isolates sequenced in this study) and Tables S2 to S4 (all publicly available  genomes).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, XLSX file, 0.8 MB.

ACKNOWLEDGMENTS
We acknowledge Gauteng Department of Agriculture and Rural Development (GDRAD) for funding this project and for the use of data for this study. We also acknowledge our collaborators (including EMBL), and Agricultural Research Council-Onderstepoort Veterinary Research (ARC-OVR) for providing all research facilities.