Persistent cross-species transmission systems dominate Shiga toxin-1 producing Escherichia coli O157:H7 epidemiology in a high incidence 2 region: a genomic epidemiology study

Background Several areas of the world suffer notably high incidence of Shiga toxin-producing Escherichia coli, among them Alberta, Canada. We assessed the role of persistent cross-species transmission systems in Alberta’s E. coli O157:H7 epidemiology. Methods We sequenced and assembled 229 E. coli O157:H7 isolates originating from collocated cattle (n=108) and human (n=121) populations from 2007-2015 in Alberta. We constructed a timed phylogeny using BEAST2 using a structured coalescent model. We then extended the tree with human isolates through 2019 (n=432) to assess the long-term disease impact of local persistent lineages. Shiga toxin gene (stx) profile was determined for all isolates. Results During 2007 to 2015, we estimated 107 (95% HPD 101, 111) human lineages arose from cattle lineages, and 31 (95% HPD 22, 43) from other human lineages; i.e., 77.5% of human lineages arose from cattle lineages. We identified 11 persistent lineages local to Alberta, which were associated with 36.4% (95% CI 27.8%, 45.6%) of human isolates. Of 115 isolates in local persistent lineages, 6.1% carried only stx2a and the rest stx1a/stx2a. During the later period, six local persistent lineages continued to be associated with human illness, including 74.7% (95% CI 68.3%, 80.3%) of reported cases in 2018 and 2019. The stx profile of isolates in local persistent lineages shifted from the earlier period, with 51.2% encoding only stx2a. Conclusions Our study identified multiple locally evolving lineages transmitted between cattle and humans persistently associated with E. coli O157:H7 illnesses for up to 13 years. Of concern, there was a dramatic shift in the local persistent lineages toward strains with the more virulent stx2a-only profile. We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta’s high E. coli O157:H7 incidence.


BACKGROUND
Several areas of the world suffer notably high incidence of Shiga toxin-producing Escherichia coli, among them Alberta, Canada.We assessed the role of persistent cross-species transmission systems in Alberta's E. coli O157:H7 epidemiology.

METHODS
We sequenced and assembled 229 E. coli O157:H7 isolates originating from collocated cattle (n=108) and human (n=121) populations from 2007-2015 in Alberta.We constructed a timed phylogeny using BEAST2 using a structured coalescent model.We then extended the tree with human isolates through 2019 (n=432) to assess the long-term disease impact of local persistent lineages.Shiga toxin gene (stx) profile was determined for all isolates.

INTRODUCTION
Several areas around the globe experience exceptionally high incidence of Shiga toxin-producing Escherichia coli (STEC), including the virulent serotype E. coli O157:H7.These include Scotland, 1 Ireland, 2 Argentina, 3 and the Canadian province of Alberta. 4 All are home to large populations of agricultural ruminants, STEC's primary reservoir.However, there are many regions with similar ruminant populations where STEC incidence is unremarkable.What differentiates high risk regions is unclear.Moreover, with systematic STEC surveillance only conducted in limited parts of the world, 5 there may be unidentified regions with exceptionally high disease burden.
STEC infections can arise from local reservoirs, transmitted through food, water, direct animal contact, or contact with contaminated environmental matrices.The most common reservoirs include domesticated ruminants such as cattle, sheep, and goats.While STEC has been isolated from a variety of other animal species and outbreaks have been linked to species such as deer 6 and swine, 7 it is unclear what roles they play as maintenance or intermediate hosts.STEC infections can be imported through food items traded nationally and internationally, as has been seen with E. coli O157:H7 outbreaks in romaine lettuce from the United States. 8Secondary transmission is believed to cause approximately 15% of cases, but the pathogen is not believed to be sustained long-term through person-to-person transmission alone. 9,10ix of STEC infection sources in a region directly influences public health measures needed to control disease burden.2][13][14][15] These studies suggest an important role for local reservoirs in STEC epidemiology.A comprehensive understanding of STEC's disease ecology would enable more effective investigations into potential local transmission systems and ultimately their control.Here, we take a phylodynamic, genomic epidemiology approach to more precisely discern the role of the cattle reservoir in the dynamics of E. coli O157:H7 human infections.We focus on the high incidence region of Alberta, Canada to provide insight into characteristics that make the pathogen particularly prominent in such regions.

STUDY DESIGN AND POPULATION
We conducted a multi-host genomic epidemiology study in Alberta, Canada.7][18][19] To select both cattle and human isolates, we block randomized by year to ensure representation across the period.We define isolates as single bacterial species obtained from culture.We sampled 123 E. coli O157 cattle isolates from 4,660 available.7][18][19] Samples were taken from fecal pats, rectal grabs, and hide swabs from cattle in feedlots and fecal samples from transport trucks.We sampled 123 of 1,148 E. coli O157 isolates collected from cases reported to the provincial health authority (Alberta Health) during the corresponding time period (Supplemental Information).
In addition to the 246 isolates for the primary analysis, we contextualized our findings with three additional sets of E. coli O157:H7 isolates (Figure 1 Isolates sequenced by the NML for 2018 and 2019 constituted the majority of reported E. coli O157:H7 cases for those years (217 of 247; 87.9%).U.S. isolates were considered separately from other global isolates, as the U.S. is Alberta's most frequent international trade partner, with both processed beef and live cattle crossing the border.U.S. isolates from 1999 to 2009 and global isolates were identified from previous literature, 20 and U.S. isolates from 2010 to 2016 were randomly selected from E. coli O157:H7 sequences available through the U.S. CDC's PulseNet BioProject PRJNA218110.
This study was approved by the University of Calgary Conjoint Health Research Ethics Board, #REB19-0510.A waiver of consent was granted, and all case data were deidentified.

WHOLE GENOME SEQUENCING, ASSEMBLY, AND INITIAL PHYLOGENY
The 246 isolates for the primary analysis were sequenced using Illumina NovaSeq 6000 and assembled into contigs using the Unicycler v04.9 pipeline, as described previously (BioProject PRJNA870153). 21Raw read FASTQ files were obtained from Alberta Health for the additional 445 isolates sequenced by the NML and from NCBI for the 152 U.S. and 54 global sequences.We used the SRA Toolkit v3.0.0 to download sequences for U.S. and global isolates using their BioSample (i.e.SAMN) numbers.The corresponding FASTQ files could not be obtained for 6 U.S. and 7 global isolates we had selected (Figure 1).PopPUNK v2.5.0 was used to cluster Alberta isolates and identify any outside the O157:H7 genomic cluster (Supplemental Figure S1). 22For assembling and quality checking (QC) all sequences, we used the Bactopia v3.0.0 pipeline. 23This pipeline performed an initial QC step on the reads using FastQC v0.12.1, which trimmed adapters and read ends with quality lower than 6 and discarded reads after trimming with overall quality scores lower than 10.None of the isolates were eliminated during this step for low read quality.We used the Shovill v1.1.0assembler within the Bactopia pipeline to de novo assemble the Unicycler contigs for the primary analysis and raw reads from the supplementary datasets.Bactopia generated a quality report on the assemblies, which we assessed based on number of contigs (<500), genome size (≥5.1 Mb), N50 (>30,000), and L50 (≤50).Low-quality assemblies were removed.This included 1 U.S. sequence, for which 2 FASTQ files had been attached to a single BioSample identifier; the other sequence for the isolate passed all quality checks and remained in the analysis.Additionally, 16 sequences from the primary analysis dataset and 4 from the extended Alberta data had a total length <5.1 Mb.These sequences corresponded exactly to those identified by the PopPUNK analysis to be outside the primary E. coli O157:H7 genomic cluster.Finally, although all isolates were believed to be of cattle or clinical origin during initial selection, detailed metadata review identified 1 isolate of environmental origin in the primary analysis dataset and 8 that had been isolated from food items in the extended Alberta data.These were excluded.We used STECFinder v1.1.0 24to determine Shiga toxin gene (stx) profile and confirm the E. coli O157:H7 serotype using the wzy or wzx O157 O-antigen genes and detection of the H7 Hantigen.After processing, we had 229 isolates (121 human, 108 cattle) in our primary sample, 432 additional Alberta Health isolates, 146 U.S. isolates, and 47 global isolates (Figure 1, Supplemental Data File).
Bactopia's Snippy workflow, which incorporates Snippy v4.6.0,Gubbins v3.3.0, and IQTree v2.2.2.7, followed by SNP-Sites v2.5.1, were used to generate a core genome SNP alignment with recombinant blocks removed.The maximum likelihood phylogeny of the core genome SNP alignment generated by IQTree was visualized in Microreact v251.The number of core SNPs between isolates was calculated using PairSNP v0.3.1.Clade was determined based on the presence of at least one defining SNP for the clade as published previously. 25Isolates were identified to the subclade level [e.g.G(vi)] when both clade and subclade SNPs were present and the clade level (e.g.G) when only clade SNPs were present.

PHYLODYNAMIC AND STATISTICAL ANALYSES
For our primary analysis, we created a timed phylogeny, a phylogenetic tree on the scale of time, in BEAST2 v2.6.7 using the structured coalescent model in the Mascot v3.0.0 package with demes for cattle and humans (Supplemental Table S1).The analysis was run using four different seeds to confirm that all converged to the same solution, and tree files were combined before generating a maximum clade credibility (MCC) tree.State transitions between cattle and human isolates over the entirety of the tree, with their 95% highest posterior density (HPD) intervals, were also calculated from the combined tree files.We determined the influence of the prior assumptions on the analysis (Supplemental Table S1) with a run that sampled from the prior distribution (Supplemental Figure S2, Supplemental Information).
Local persistent lineages (LPLs) were identified based on following criteria: 1) a single lineage of the MCC tree with a most recent common ancestor (MRCA) with ≥95% posterior probability; 2) all isolates <30 core SNPs from one another; 3) contained at least 1 cattle isolate; 4) contained ≥5 isolates; and 5) the isolates were collected at sampling events (for cattle) or reported (for humans) over a period of at least 1 year.From non-LPL isolates, we estimated the number of local transient isolates vs. imported isolates.For the 121 human E. coli O157:H7 isolates in the primary sample, we determined what portion belonged to local persistent lineages (LPL) and what portion were likely to be from local transient E. coli O157:H7 populations vs. imported.
Human isolates within the LPLs were enumerated (n=44).The 77 human isolates outside LPLs included 58 clade G(vi) isolates and 19 non-G(vi) isolates.Based on the MCC tree from the primary analysis, none of the non-G(vi) human isolates was likely to have been closely related to an isolate from the Alberta cattle population, suggesting that all 19 were imported.As a proportion of all non-LPL human isolates, these 19 constituted 24.7%.While it may be possible that all clade G(vi) isolates were part of a local evolving lineage, it is also possible that exchange of both cattle and food from other locations was causing the regular importation of clade G(vi) strains and infections.Thus, we used the proportion of non-LPL human isolates outside the G(vi) clade to estimate the proportion of non-LPL human isolates within the G(vi) clade that were imported; i.e., 58 × 24.7% = 14.We then conducted a similar exercise for cattle isolates.
To contextualize our results in terms of ongoing human disease burden, we created a timed phylogeny using a constant, unstructured coalescent model of the 229 Alberta isolates from the primary analysis and the additional Alberta Health isolates.Outbreaks were down-sampled to avoid biasing the tree by randomly selecting 1 to 2 isolates per outbreak; as such, only 230 of the 432 additional isolates were included in the analysis (Supplemental Table S1).We identified LPLs as above, and leveraged the near-complete sequencing of isolates from 2018 and 2019 to calculate the proportion of reported human cases associated with LPLs.
We created a timed phylogeny of Alberta isolates and U.S. isolates from 1996 to 2016 to test whether the LPLs or Alberta's dominant E. coli O157:H7 clade (G) were linked to U.S. ancestors (Supplemental Table S1).We also created a timed phylogeny of temporally overlapping Alberta, U.S., and global isolates from 2007 to 2015, excluding clades A and B, which were too limited to make meaningful comparisons.All BEAST2 analyses were run for 100,000,000 Markov chain Monte Carlo iterations or until all parameters converged with effective sample sizes >200, whichever was longer.Exact binomial 95% confidence intervals (CIs) were computed for proportions.

THE MAJORITY OF CLINICAL CASES EVOLVED FROM LOCAL CATTLE LINEAGES
In our primary sample of 121 human and 108 cattle isolates from Alberta from 2007 to 2015, SNP distances were comparable between species (Figure 3a).Among sampled human cases, 19 (15.7%; 95% CI 9.7%, 23.4%) were within 5 SNPs of a sampled cattle strain.
The phylogeny generated by our primary structured coalescent analysis indicated cattle were the primary reservoir, with a high probability that the hosts at nodes along the tree's backbone were cattle (Figure 3b).The root was estimated at 1812 (95% HPD 1748, 1870).The most recent common ancestor (MRCA) of clade G(vi) strains in Alberta was inferred to be a cattle strain, dated to 1971 (95% HPD 1961, 1980).With our assumption of a relaxed molecular clock, the mean clock rate for the core genome was estimated at 1.00x10 -4 (95% HPD 8.45x10 -5 , 1.18x10 -4 ) substitutions/site/year.The effective population size, Ne, of the human E. coli O157:H7 population was estimated as 913 (95% HPD 620, 1232), and for cattle as 49 (95% HPD 32, 67).
To understand long-term persistence, we expanded the phylogeny with additional Alberta Health isolates from 2009 to 2019 (Supplemental Table S1).Six of the 11 LPLs identified in our primary analysis continued to cause disease during the 2016 to 2019 period (Figure 4a).With most of the cases reported during 2018 and 2019 sequenced, we were able to estimate the proportion of reported E. coli O157:H7 associated with LPLs.Of 217 sequenced cases reported during these two years, 162 (74.7%; 95% CI 68.3%, 80.3%) arose from Alberta's LPLs.The stx profile of LPL isolates shifted as compared to the primary analysis, with 83 (51.2%) of the LPL isolates encoding only stx2a and the rest stx1a/stx2a (Figure 4b).Among the 55 non-LPL isolates during 2018-2019, the stx2c-only profile emerged with 16 (29.1%)isolates, and stx2aonly was found in only 6 (10.9%) cases.
All 5 large (≥10 cases) sequenced outbreaks in Alberta during the study period were within clade G(vi).LPLs gave rise to 3 large outbreaks, accounting for 117 cases, including 83 from an extended outbreak by a single strain, defined as isolates within 5 SNPs of one another, during 2018 and 2019 (Figure 4a).The two large outbreaks that did not arise from LPLs both occurred in 2014 and were responsible for 164 cases.

INTERNATIONAL IMPORTATION DOES NOT EXPLAIN ALBERTA'S CURRENT DISEASE BURDEN
Only 2 U.S. isolates coincided with Alberta LPLs, specifically G(vi)-AB LPL 9 in 2014 and G(vi)-AB LPL 11 in 2015 (Supplemental Figure S3).Isolates in these two LPLs from Alberta dated to 2007 and 2009, respectively, and were identified multiple times up to and including during the 2018-2019 period (Figure 4a).There was no evidence of early U.S. ancestors of LPLs.
No LPL contained a global isolate.Based on migration events calculated from the tree, we estimated that 15.4% of combined human and cattle Alberta lineages were imported (Supplemental Table S2).Sequences from outside North America were separated from Alberta sequences by a median of 325 (IQR 288-349) SNPs.Including U.S. and global isolates in the phylogeny did not alter the LPLs identified, though some minor rearrangement of the tree was observed (Supplemental Figure S3).

DISCUSSION
Focusing on a region that experiences an especially high incidence of STEC, we conducted a deep genomic epidemiologic analysis of E. coli O157:H7's multi-host disease dynamics.Our study identified multiple locally evolving lineages transmitted between cattle and humans.These were persistently associated with E. coli O157:H7 illnesses over periods of up to 13 years.Of clinical importance, there was a dramatic shift in the stx profile of the strains arising from local persistent lineages toward strains carrying only stx2a, which has been associated with increased progression to hemolytic uremic syndrome (HUS). 26We hypothesize that the large proportion of cases associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence.
Our study has provided quantitative estimates of cattle-to-human migration in a high incidence region, the first such estimates of which we are aware.2][13][14][15] We showed that 77% of strains infecting humans arose from cattle lineages.These transitions can be seen as a combination of the historic evolution of E. coli O157:H7 from cattle in the rare clades and the infection of humans from local cattle or cattle-related reservoirs in clade G(vi).While our findings indicate the majority of human cases arose from cattle lineages, transmission may have involved intermediate hosts or environmental reservoirs several steps removed from the cattle reservoir.However, our analysis demonstrates that local cattle remain an integral part of the transmission system for the vast majority of cases, even when they may not be the immediate source of infection.
The cattle-human transitions we estimated were based on structured coalescent theory, 27 which we used throughout our analyses.This approach is similar to phylogeographic methods that have previously been applied to E. coli O157:H7. 20We inferred the full backbone of the Alberta E. coli O157:H7 phylogeny as arising from cattle, consistent with the postulated global spread of the pathogen via ruminants. 20Our estimate of the origin of the serotype, at 1812 (95% HPD 1748, 1870), was somewhat earlier than previous estimates, but consistent with global (1890; 95% HPD 1845, 1925) 20 and United Kingdom (1840; 95% HPD 1817, 1855) 28 studies that used comparable methods.Our dating of Alberta's G(vi) clade to 1971 (95% HPD 1961, 1980) also corresponds to proposed migrations of clade G into Canada from the U.S. in 1965-1977. 20Our study thus adds to the growing body of work on the larger history of E. coli O157:H7, providing an in-depth examination of the G(vi) clade.
Our identification of the 11 local persistent lineages (LPLs) is significant in demonstrating that the majority of Alberta's reported E. coli O157:H7 illnesses are of local origin.Our definition ensured that every LPL had an Alberta cattle strain and at least 5 isolates separated by >1 year, making the importation of the isolates in a lineage highly unlikely.Further supporting the evolution of the LPLs within Alberta, all 11 LPLs were in clade G(vi), several were phylogenetically related with MRCAs dating to the late 1990s, and few non-Alberta isolates fell within LPLs.The two U.S. isolates associated with Alberta LPLs may reflect Alberta cattle that were slaughtered in the U.S. Thus, we are confident that the identified LPLs represent locally evolving lineages and potential persistent sources of disease.
Based on our LPL analysis, we estimated only 27% of human and 10% of cattle E. coli O157:H7 isolates were imported.This was consistent with the overall importation estimate of 15% for all Alberta lineages from our global structured coalescent analysis.While these estimates may appear low given the recent focus on row crops and other produce as potential vehicles of disease, 8 26% of sporadic STEC infections have been attributed to animal contact and the farm environment, with a further 19% to pink or raw meat. 10Similarly, 24% of E. coli O157 outbreaks in the U.S. were attributed to beef, animal contact, water, or other environmental sources. 9In Alberta, these are all inherently local exposures, given that 90% of beef consumed in Alberta is produced and/or processed there.Even person-to-person transmission, responsible for 15% of sporadic cases and 16% of outbreaks, 9,10 includes secondary transmission from cases infected from local sources, which may explain our estimate of 23% for person-to-person transmission.
To our knowledge, our study provides the first comprehensive determination of local vs.imported status for E. coli O157:H7 cases.Similar studies in regions of both high and moderate incidence would provide further insight into the role of localization on E. coli O157:H7 incidence.
Of the 11 lineages we identified as LPLs during the 2007-2015 period, 6 were also identified in the 2016-2019 period.During the initial period, 36% of human cases were linked to an LPL, and 6.1% carried only stx2a.The risk of HUS increases in strains of STEC carrying only stx2a, relative to stx1a/stx2a, 26 meaning the earlier LPL population had few of the high-virulence strains.In 2018 and 2019, the 6 long-term LPLs were associated with both greater incidence and greater virulence, encompassing 75% of human cases with more than half of LPL isolates carrying only stx2a.The cause of this shift remains unclear, though shifts toward greater virulence in E. coli O157:H7 populations have been seen elsewhere. 29The growth and diversity of G(vi)-AB LPL 1 and G(vi)-AB LPL 6 in the later period suggest these lineages were in stable reservoirs or adapted easily to new niches.Identifying these reservoirs could yield substantial insights into disease prevention, given the significant portion of illnesses caused by persistent strains.
We developed a novel measure of persistence for use in this study, specifically for the purposes of identifying lineages that pose an ongoing threat to public health in a specific region.
Persistence has been defined variably in the literature, for example as shedding of the same strain for at least 4 months. 30Most recently, the U.S. CDC has identified the first Recurring, Emergent, and Persistent (REP) STEC strain, REPEXH01, which has been detected since 2017 in over 600 cases.REPEXH01strains are within 21 allele differences of one another (https://www.cdc.gov/ncezid/dfwed/outbreak-response/rep-strains/repexh01.html).Given that we used high resolution SNP analysis rather than MLST, we used a difference of <30 SNPs to define persistent lineages.Supporting the persistence we have observed, the REPEXH01 strain is also an E. coli O157:H7 strain; however, O157:H7 was defined as sporadic in a German study using the 4-month shedding definition, which may be due to ecological differences. 30erstanding microbial drivers of persistence is an active field of research, with early findings suggesting a correlation of STEC persistence to the accessory genome and traits such as biofilm formation and nutrient metabolism. 30,31Our approach to studying persistence was specifically designed for longitudinal sampling in high-incidence regions and may be useful for others attempting to identify sources that disproportionately contribute to disease burden.
Our analysis was limited to only cattle and humans.However, small ruminants (e.g., sheep, goats) have also been identified as important STEC reservoirs, 12,15,25 and Alberta has experienced outbreaks linked to swine. 7Had isolates from a wider range of potential reservoirs been available, we would have been able to elucidate more clearly the roles that various hosts and common sources of infection play in local transmission.This may help explain the 3 human-tocattle predicted transmissions, which could be erroneous.We also limited our analysis only to E. coli O157:H7 despite the growing importance of non-O157 STEC as historical multi-species collections of non-O157 isolates are lacking.As serogroups differ meaningfully in exposures, 32 our results may not be generalizable beyond the O157 serogroup.Finally, we were not able to estimate the impact of strain migration between Alberta and the rest of Canada, because metadata for publicly-available E. coli O157:H7 sequences from Canada was limited, such that we could not be sure they were from outside Alberta.E. coli O157:H7 infections are a pressing public health problem in many high incidence regions around the world including Alberta, where a recent childcare outbreak caused >300 illnesses.In the majority of sporadic cases, and even many outbreaks, 9 the source of infection is unknown, making it critical to understand the disease ecology of E. coli O157:H7 at a system level.Here we have identified a high proportion of human cases arising from cattle lineages and a low proportion of imported cases.Local transmission systems, including intermediate hosts and environmental reservoirs, need to be elucidated to develop management strategies that reduce the risk of STEC infection.In Alberta, local transmission is dominated by a single clade, and over the extended study period, persistent lineages caused an increasing proportion of disease.The local lineages with long-term persistence are of particular concern because of their increasing virulence, yet they also present opportunity as larger, more stable targets for reservoir identification and control.(blue) and humans (orange), with rings labeled with the SNP distance between isolates.Cattle isolates were highly related with 53% of cattle isolates within 5 SNPs of another cattle isolate and 83% within 15 SNPs (A, top).Human isolates showed a bimodal distribution in their relationship to cattle isolates, with 87% within 52 SNPs of a cattle isolate and the remainder 185-396 SNPs apart (A, bottom).
The maximum clade credibility tree for the structured coalescent analysis of cattle and human isolates (B) was colored by inferred host, cattle (blue) or human (orange).The majority of ancestral nodes inferred as cattle suggests cattle as the primary reservoir.The root was estimated at 1812 (95% HPD 1748, 1870).Eleven local persistent lineages (LPLs) were identified, all in the G(vi) clade and labeled G(vi)-AB LPL 1 through 11 (yellow and gray coloration highlights LPLs but has no other meaning).These accounted for 44 human (36.4%) and 71 cattle (65.7%) isolates.The structured coalescent model estimated 107 cattle-to-human state transitions between branches, compared to only 31 human-to-human transitions, inferring cattle as the origin of 77.5% of human lineages (C).

STEC CASE DEFINITION
Alberta Health defines a confirmed case of Shiga toxin-producing E. coli (STEC), including E. coli O157:H7, as STEC isolation or Shiga toxin antigen or nucleic acid detection.Clinical illness, which may include diarrhea, bloody diarrhea, abdominal cramps, hemolytic uremic syndrome, thrombocytopenia purpura, or pulmonary edema, may or may not be present. 33

SAMPLING FROM THE PRIOR DISTRIBUTION
Results in our Bayesian phylodynamic analyses are drawn from posterior distributions, which are influenced by both the data and the prior information we have about the system (Supplemental Table S1).In order to confirm that our primary results were not overly influenced by our prior assumptions, we conducted an analysis in which the sampling draws were made from the prior distribution, as opposed to the posterior distribution.We graphed these results against the sampling draws made from the posterior distributions from the four runs conducted for our primary analysis (each performed with a different random seed).The comparison shows that the draws from prior distribution differ markedly from the draws from the posterior distributions for the model's key parameters (Supplemental Figure S2).From this, we concluded that our prior assumptions were not overly influencing the results of the primary analysis.identified through metadata review as an environmental (non-human, non-cattle) isolate.Cluster 1 included the Sakai and EDL933 reference strains.Clusters 826 and 827 were novel clusters.Isolates outside of Cluster 1 were excluded from all subsequent analyses.
): 445 from Alberta Health from 2009 to 2019 and already sequenced as part of other public health activities; 152 from the U.S. from 1999 to 2016; and 54 from elsewhere around the world between 2007 and 2015.The additional Alberta Health isolates were sequenced by the National Microbiology Laboratory (NML)-Public Health Agency of Canada (Winnipeg, Manitoba, Canada) as part of PulseNet Canada activities.

Figure 2 .
Figure 2. Maximum likelihood core SNP tree of the 854 E. coli O157:H7 isolates referenced in the study.This includes 229

Figure 3 .
Figure 3. Relationship of randomly selected E. coli O157:H7 strains isolated from 121 reported human cases and 108 beef

Table 1 .
Distribution of study isolates by geographic source, clade, and Shiga toxin gene (stx) profile 465