Monitoring the Antimicrobial Resistance Dynamics of Salmonella enterica in Healthy Dairy Cattle Populations at the Individual Farm Level Using Whole-Genome Sequencing

Livestock represent a possible reservoir for facilitating the transmission of the zoonotic foodborne pathogen Salmonella enterica to humans; there is also concern that strains can acquire resistance to antimicrobials in the farm environment. Here, we use whole-genome sequencing (WGS) to characterize Salmonella strains (n = 128) isolated from healthy dairy cattle and their associated environments on 13 New York State farms to assess the diversity and microevolution of this important pathogen at the level of the individual herd. Additionally, the accuracy and concordance of multiple in silico tools are assessed, including: (i) two in silico serotyping tools, (ii) combinations of five antimicrobial resistance (AMR) determinant detection tools and one to five AMR determinant databases, and (iii) one antimicrobial minimum inhibitory concentration (MIC) prediction tool. For the isolates sequenced here, in silico serotyping methods outperformed traditional serotyping and resolved all un-typable and/or ambiguous serotype assignments. Serotypes assigned in silico showed greater congruency with the Salmonella whole-genome phylogeny than traditional serotype assignments, and in silico methods showed high concordance (99% agreement). In silico AMR determinant detection methods additionally showed a high degree of concordance, regardless of the pipeline or database used (≥98% agreement between susceptible/resistant assignments for all pipeline/database combinations). For AMR detection methods that relied exclusively on nucleotide BLAST, accuracy could be maximized by using a range of minimum nucleotide identity and coverage thresholds, with thresholds of 75% nucleotide identity and 50-60% coverage adequate for most pipeline/database combinations. In silico characterization of the microevolution and AMR dynamics of each of six serotype groups (S. Anatum, Cerro, Kentucky, Meleagridis, Newport, Typhimurium/Typhimurium variant Copenhagen) revealed that some lineages were strongly associated with individual farms, while others were distributed across multiple farms. Numerous AMR determinant acquisition and loss events were identified, including the recent acquisition of cephalosporin resistance-conferring blaCMY- and blaCTX-M-type beta-lactamases. The results presented here provide high-resolution insight into the temporal dynamics of AMR Salmonella at the scale of the individual farm and highlight both the strengths and limitations of WGS in tracking zoonotic pathogens and their associated AMR determinants at the livestock-human interface.

Livestock represent a possible reservoir for facilitating the transmission of the zoonotic 15 foodborne pathogen Salmonella enterica to humans; there is also concern that strains can acquire 16 resistance to antimicrobials in the farm environment. Here, we use whole-genome sequencing (WGS) 17 to characterize Salmonella strains (n = 128) isolated from healthy dairy cattle and their associated 18 environments on 13 New York State farms to assess the diversity and microevolution of this 19 important pathogen at the level of the individual herd. Additionally, the accuracy and concordance of 20 multiple in silico tools are assessed, including: (i) two in silico serotyping tools, (ii) combinations of 21 five antimicrobial resistance (AMR) determinant detection tools and one to five AMR determinant 22 databases, and (iii) one antimicrobial minimum inhibitory concentration (MIC) prediction tool. For 23 the isolates sequenced here, in silico serotyping methods outperformed traditional serotyping and 24 resolved all un-typable and/or ambiguous serotype assignments. Serotypes assigned in silico showed 25 greater congruency with the Salmonella whole-genome phylogeny than traditional serotype 26 assignments, and in silico methods showed high concordance (99% agreement). In silico AMR 27 determinant detection methods additionally showed a high degree of concordance, regardless of the 28 pipeline or database used (³98% agreement between susceptible/resistant assignments for all 29 pipeline/database combinations). For AMR detection methods that relied exclusively on nucleotide 30 BLAST, accuracy could be maximized by using a range of minimum nucleotide identity and 31 coverage thresholds, with thresholds of 75% nucleotide identity and 50-60% coverage adequate for 32 most pipeline/database combinations. In silico characterization of the microevolution and AMR 33 dynamics of each of six serotype groups (S. Anatum, Cerro, Kentucky, Meleagridis, Newport, 34 Typhimurium/Typhimurium variant Copenhagen) revealed that some lineages were strongly 35 associated with individual farms, while others were distributed across multiple farms. Numerous 36 1 Introduction 42 The foodborne pathogen Salmonella enterica is estimated to be responsible for 1.2 million 43 illnesses and 450 deaths each year in the U.S. alone (Scallan et al., 2011). Despite the fact that over 44 2,600 Salmonella serotypes have been described (Issenhuth- Jeanjean et al., 2014), fewer than 100 of 45 these serotypes are responsible for the majority of human infections (Centers for Disease Control and 46 Prevention, 2020). In line with this, some Salmonella serotypes may share strong associations with a 47 specific host, an extreme example of which can be seen in the human-restricted nature of Salmonella 48 Typhi (Uzzau et al., 2000;Boore et al., 2015). Other serotypes, while not confined exclusively to 49 infection of a single host, may be adapted to a given reservoir; for example, Salmonella Choleraesuis, 50 while largely adapted to swine, occasionally infects humans (Uzzau et al., 2000;Chiu et al., 2004). 51 Cattle are a potential reservoir from which humans can acquire salmonellosis, and infected 52 animals can shed Salmonella at irregular intervals for varying periods of time, regardless of whether 53 they express clinical signs of bovine salmonellosis or not (Cummings et al., 2010b;Davidson et al., 54 2018; Holschbach and Peek, 2018). The bovine reservoir boasts its own repertoire of serotypes that 55 can infect humans, with bovine-associated Salmonella serotype Dublin, known for its rare but 56 frequently invasive infections in humans, being arguably the most noteworthy (Taylor et al., 1982;57 Uzzau and/or epidemic lineages (e.g., S. Typhimurium DT104). In this study, 128 non-typhoidal Salmonella 72 enterica strains isolated from repeated sampling on 13 New York State dairy cattle farms between 73 2007 and 2009 were characterized using WGS. All strains were isolated from apparently healthy, 74 subclinically infected bovine hosts, as well as the associated farm environment (Rodriguez- Rivera et  75 al., 2014). Using WGS, we characterized the microevolution of these persistent lineages within each 76 herd, as well as the temporal acquisition and loss of AMR determinants among them. In addition to 77 offering insight into the genomics of Salmonella isolated from healthy bovine populations at the 78 individual herd/farm level, we evaluate the accuracy and concordance of multiple in silico serotyping 79 and AMR prediction tools. Finally, we provide an in-depth, critical analysis of the strengths and 80 limitations of the methods used here, and we offer guidance to researchers who wish to employ WGS 81 for herd-level pathogen monitoring. 82 2 Materials and Methods 83

Isolate Selection 84
Salmonella enterica isolates (n = 128) obtained from one of 13 dairy farms in New York State 85 were selected to undergo WGS for this study (Supplementary Table S1). All strains were isolated 86 from farms that had undergone surveillance for Salmonella for a period of at least 12 months as 87 described previously (Cummings et  samples from healthy, subclinically infected dairy cows (referred to hereafter as "bovine" isolates), or 90 (ii) farm environmental swabs (referred to hereafter as "farm environmental" isolates) (Cummings et 91 al., 2010a). All isolates underwent serotyping, phenotypic antimicrobial susceptibility testing, and 92 pulsed-field gel electrophoresis (PFGE) as described previously (Rodriguez-Rivera et al., 2014). 93

Whole-Genome Sequencing and Data Pre-Processing 94
Genomic DNA extraction and sequencing library preparation were performed as described 95 previously (Carroll et al., 2017b), and the genomes of all 128 Salmonella isolates were sequenced 96 using an Illumina HiSeq platform and 2 x 250 bp paired-end reads. Illumina sequencing adapters and 97 low-quality bases were trimmed using Trimmomatic version 0.33 (using default parameters for 98 Nextera paired-end reads) (Bolger et al., 2014), and FastQC version 0.11.9 (Andrews, 2019) was 99 used to confirm adapter removal and assess read quality. SPAdes version 3.8.0 (Bankevich et al., 100 2012) was used to assemble genomes de novo (using the "careful" option and k-mer sizes of 21, 33, 101 55, 77, 99, and 127), and QUAST version 4.5 (Gurevich et al., 2013) Table S1). In cases where a discrepancy existed between the traditional serotype 114 designation and one or more of the in silico methods, the serotype assigned using two out of the three 115 methods was selected as the final serotype to be reported (e.g., when assigning strain names to 116 isolates in the manuscript, for phylogeny annotation). To confirm that all serotype assignments were 117 reasonable, a phylogeny was constructed using core single nucleotide polymorphisms (SNPs) 118 detected in all Salmonella genomes in this study (see section "Reference-Free SNP Identification and 119 Phylogeny Construction" below). 120 This is a provisional file, not the final typeset article  were unavailable  167  for two isolates (BOV_KENT_16_04-03-08_R8-0967 and ENV_MELA_01_01-10-08_R8-0165;  168  Supplementary Table S1), both isolates had previously been categorized as pan-susceptible to all 15  169 antimicrobials (a classification that was maintained here, as all in silico methods correctly classified 170 these isolates as pan-susceptible). 171 Supplemental as azithromycin was not among the 15 antimicrobials used here for phenotypic testing. The ability of 198 PATRIC3 to predict amikacin resistance was also not evaluated, as amikacin is not among the 199 antimicrobials queried by PATRIC3. A confusion matrix was constructed as described above, using 200 predicted SIR classifications derived from predicted MIC values produced by PATRIC3 and 201 NARMS breakpoints. Additionally, the deviation of raw MIC predictions produced by PATRIC3 202 (!"# !"#$%&' ) from "true" raw MIC predictions produced using phenotypic testing (!"# !()*+,-./0 ) in 203 number of dilution factors ($ 1/23,/+* 560,+78 ) was assessed using the following equation: 204 where ln corresponds to the natural logarithm. For example: if PATRIC3 predicted an MIC value of 206 8 and the "true" MIC value obtained with phenotypic testing was 2, then ln(8/2)/ln(2) = 2; this means 207 that the PATRIC3 prediction of 8 is +2 dilution factors away from the "true" MIC of 2 (as dilution 208 used for MIC are 2 fold serial dilutions, e.g., 2 µg/mL, 4 µg/mL, 8 µg/mL). 209 This is a provisional file, not the final typeset article 2.7 Re-testing of Isolates with Highly Incongruent AMR Phenotypes 210 Several (n = 21) isolates possessed a phenotypic AMR SIR profile which was deemed to be 211 highly incongruent with its predicted in silico AMR profile, regardless of the in silico 212 pipeline/database used (Supplementary Table S4). For example, S. Cerro isolate 213 BOV_CERO_35_10−02−08_R8−2685 was resistant to nine antimicrobials but did not harbor any 214 known acquired AMR genes (Supplementary Table S4). Similarly, S. Newport isolate 215 ENV_NEWP_62_03−05−09_R8−3442 itself was pan-susceptible, but harbored multiple acquired 216 AMR genes (e.g., blaCMY-2, floR, sul2, tetA), which conferred multidrug resistance in closely related 217 S. Newport isolates (Supplementary Table S4). To address these incongruencies, 21 selected 218 Salmonella isolates underwent phenotypic antimicrobial susceptibility re-testing (conducted 219 September 16, 2020) as described above (see section "Prediction of Phenotypic Susceptible-220 Intermediate-Resistant Classifications Using In Silico Methods"), with the exception of amikacin and 221 kanamycin, as the contemporary panel did not include these antimicrobials (Supplementary Table  222 S4). 223 Kanamycin testing was conducted separately using a gradient diffusion assay ( (Revell, 2012), and reshape2 268 (Wickham, 2007). 269 Potential clustering based on AMR gene presence/absence was additionally assessed for the 294 same three grouping factors (serotype, farm, and isolation source), using the presence and absence of 295 AMR determinants detected by AMRFinderPlus as input (i.e., AMR and stress response determinants This is a provisional file, not the final typeset article identified using the "plus" option in AMRFinderPlus). All steps were performed as described above, 297 and a Bonferroni correction was used to correct for multiple comparisons. 298

Reference-Based Core SNP Identification Within Serotypes 299
For each individual serotype, core SNPs were identified among genomes assigned to that 300 serotype using a reference-based approach. For each serotype, Snippy version 4.3.6 301 (https://github.com/tseemann/snippy) (Seemann, 2019b) was used to identify core SNPs among all 302 representatives assigned to the serotype, using the trimmed Illumina paired-end reads of each genome 303 as input (see section "Whole-Genome Sequencing and Data Pre-Processing" above), one of six high-304 quality assembled genomes from isolates in this study as a reference genome (Supplementary Table  305 S1), and the following dependencies: within the full alignment that resulted, and the filtered alignment produced by Gubbins was queried 313 using snp-sites to produce an alignment of core SNPs for each serotype. 314

Construction of Within-Serotype Phylogenies 315
For each serotype, IQ-TREE version 1.6.10 was used to construct a ML phylogeny, using core 316 SNPs detected among all isolates assigned to the serotype as input (see "Reference-Based Core SNP 317 Identification Within Serotypes" section above), the optimal ascertainment bias-aware nucleotide 318 substitution model selected using ModelFinder, and 1,000 replicates of the UltraFast bootstrap 319 approximation. The temporal structure of each resulting ML phylogeny was assessed using the R 2 320 value produced by the best-fitting root in TempEst version 1.5.1 (Supplementary  Table S5). 338 Ten independent BEAST 2 runs were then performed using the optimal model for each serotype, 339 using chain lengths of at least 100 million generations, sampling every 10,000 generations. Tracer 340 version 1.7.1 (Rambaut et al., 2018) was used to ensure adequate mixing of each independent run. 341 The resulting log and tree files were aggregated using LogCombiner-2, and TreeAnnotator-2 (Heled 342 and Bouckaert, 2013) was used to produce a maximum clade credibility tree using 10% burn-in. The 343 phylogenies were annotated using R and the following packages: ggplot2, ggtree, and phylobase. 344 Bayesian skyline plots for all serotype groups analyzed using BEAST2 are also available 345 (Supplementary Figure S1); however, due to the limited number of available isolates surveyed among 346 each serotype and the short temporal range queried here, potential changes in effective population 347 sizes may not be robust and are thus not discussed in this study. 348  Table S1). In addition to undergoing traditional serotyping in a 360 laboratory setting, all isolates were assigned serotypes in silico using both (i) SISTR and (ii) 361

Genomic AMR determinants of bovine-associated Salmonella are serotype-associated 443
Based on the presence and absence of pan-genome elements among all 128 Salmonella isolates 444 sequenced here, the Salmonella pan-genome was more similar within serotype and within farm than 445 between serotype and between farm, respectively (PERMANOVA and ANOSIM P < 0.05 after a 446 Bonferroni correction; Figure 4 and  Figure 4 and Table 2). 455 Based on the presence and absence of AMR and stress response determinants detected among 456 all 128 Salmonella genomes, isolates were more similar within serotype than between serotypes 457 (PERMANOVA and ANOSIM P < 0.05 and PERMDISP2 P > 0.05 after a Bonferroni correction; 458 Figure 4 and Table 2). Additionally, isolates were more similar within farm than between farm based 459 on their AMR and stress response gene presence/absence profiles (PERMANOVA and ANOSIM P < 460 0.05; Figure 4 and Table 2), although significant, potentially confounding dispersion differences 461 among farms were present (PERMDISP2 P < 0.05; Table 2). As was the case with the pan-genome in 462 its entirety, subclinical bovine Salmonella isolates did not significantly differ from farm 463 environmental isolates based on AMR and stress response gene presence/absence (PERMANOVA, 464 ANOSIM, and PERMDISP2 P > 0.05 after a Bonferroni correction; Figure 4 and  Figure 5 and Table 3). Notably, the S. Anatum 470 lineages circulating on each farm were distinct at a genomic level, with isolates from each farm 471 forming a separate clade (posterior probability [PP] = 1 for each; Figure 5). The two farm-associated 472 This is a provisional file, not the final typeset article lineages were predicted to share a common ancestor circa 1836 (node age 1836.28 using median 473 node heights; Figure 5) Figure 5 and Supplementary Figure S5). All Farm 39 S. 477 Anatum isolates possessed identical AMR/stress response gene profiles, and all isolates were pan-478 susceptible except for a single isolate that was resistant to ampicillin ( Figure 5). All Farm 39 S. 479 Anatum isolates additionally harbored ColRNAI plasmids; a single isolate additionally harbored an 480 IncI1 plasmid that appeared to harbor no AMR genes ( Figure 5).   Figure 6, Table 3, Supplementary Figure S6, and Supplementary Table S5). 506 While IncI1 and ColRNAI plasmid replicons were detected in all 13 S. Cerro isolates, only one 507 isolate was not pan-susceptible ( Figure 6). Notably, the isolate from Farm 35 (BOV_CERO_35_10-508 02-08_R8-2685) was classified as resistant to nine antimicrobials using phenotypic methods (i.e., 509 amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, chloramphenicol, 510 streptomycin, sulfisoxazole, and tetracycline); based on the most parsimonious explanation for AMR 511 acquisition, this lineage acquired AMR after July 2008 (node age 2008.51, CA node height 95% 512 HPD interval 2008.14-2008.75; Figure 6 and Supplementary Figure S6). However, no genomic 513 determinants known to confer resistance to these antimicrobials were detected in the genome of the 514 MDR isolate (Figure 6), and the MDR phenotype was confirmed in a second, independent 515 phenotypic AMR test (Supplementary Table S4 Figure 7, Table 3, Supplementary Figure S7, and Supplementary Table S5). Two farms harbored a 524 total of three S. Kentucky isolates, which were not pan-susceptible (two isolates from Farm 16 and 525 one from Farm 17; Figure 7). Farm 17 harbored a tetracycline-resistant isolate (ENV_KENT_17_03-526 11-08_R8-0815), which possessed an IncI1 plasmid and tetC (Figure 7) Figure 7 and Supplementary Figure S7). No corresponding 535 genes that may encode for reduced chloramphenicol susceptibility were identified in these two 536 isolates. 537

A clonal S. Meleagridis lineage is distributed across two New York State dairy farms and 538 encompasses isolates carrying blaCTX-M-1 539
Nineteen S. Meleagridis isolates encompassing two PFGE types (Supplementary Table S1)  540 were isolated from two dairy farms (13 and six isolates from Farms 01 and 11, respectively) and were 541 highly clonal: isolates differed by fewer than ten core SNPs and evolved from a common ancestor 542 that existed circa May/  Table S1) were  556 isolated from one of three farms (four, five, and seven isolates from Farms 17, 35, and 62, 557 respectively); all isolates were resistant to amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, 558 ceftriaxone, streptomycin, sulfisoxazole, and tetracycline ( Figure 9). All S. Newport genomes 559 harbored IncA/C2 and ColRNAI plasmids, as well as streptomycin resistance genes APH(3'')-Ib and 560 APH(6)-Id (i.e., strAB), beta-lactamase blaCMY-2, sulfonamide resistance gene sul2, and tetracycline 561 This is a provisional file, not the final typeset article resistance gene tetA (Figure 9). Notably, the S. Newport lineage circulating on each farm formed one 562 of three separate clades (PP = 0.99-1.0) that evolved from a common ancestor that existed circa 563 March 08_R8-2688 did not possess floR and was chloramphenicol-susceptible ( Figure 9). 582

Each of four major lineages composed of S. Typhimurium and its O5-Copenhagen 583 variant is associated with one of three New York State dairy farms 584
Twenty-seven bovine and farm environmental S. Typhimurium and S. Typhimurium 585 Copenhagen isolates that encompassed five PFGE types (Supplementary Table S1) were isolated 586 from three dairy farms (1, 10, and 16 strains isolated from Farm 17, 22, and 25, respectively). All 587 isolates queried here shared a common ancestor that existed circa 1936 (node age 1935.62, CA node 588 height 95% HPD interval 1864.84-1991.86; Figure 10, Table 3, Supplementary Figure S10, and 589 Supplementary Table S5). Notably, the S. Typhimurium Copenhagen variant was polyphyletic 590 (Figure 10), regardless of whether traditional or in silico (i.e., SeqSero2) methods had been used for 591 serotype variant assignment. Additionally, the S. Typhimurium/S. Typhimurium Copenhagen isolates 592 sequenced here showcased the most diverse AMR phenotypic profiles and AMR gene 593 presence/absence profiles (Figures 4 and 10). 594 Isolates from Farm 25 were partitioned into two clades: one containing S. Typhimurium 595 isolates, and one containing S. Typhimurium Copenhagen isolates (based on SeqSero2's in silico 596 serotype assignments; Figure 10 Using a WGS-based approach applied to serially sampled Salmonella strains isolated over a 647 short time frame (i.e., less than two years), the study detailed here reveals that sporadic acquisition 648 and loss of acquired AMR genes can occur within closely related populations over a short timescale.
This is a provisional file, not the final typeset article  and introduction on farms, as shown in this study. For example, for one farm (i.e., Farm 25), two 707 Salmonella Typhimurium clonal groups were present (i.e., one representing Typhimurium and one 708 representing Typhimurium Copenhagen), each of which shared a common ancestor dated circa 2007. 709 WGS data can be used to identify time frames in which Salmonella lineages may have emerged in a 710 given farm or region, which could help pinpoint root causes (e.g., changes in management practices 711 that occurred around the predicted time of emergence). 712 While the characterization of additional, larger strain sets from more geographically diverse 713 farms is essential, our data suggest that specific Salmonella clones may persist on a given farm. This 714 suggests that WGS databases covering isolates from a large number of farms could be used to 715 develop initial hypotheses about farm sources of Salmonella strains. While such applications are 716 tempting, it is crucial that these types of data are only used for initial hypothesis generation; rigorous, 717 critical epidemiological investigations are essential before any conclusions regarding strain source 718 are drawn. 719

In silico serotyping of bovine-associated Salmonella can outperform traditional serotyping 720
Well into the genomic era, serotyping remains a vital microbiological assay that allows 721 Salmonella isolates to be classified into meaningful, evolutionary units. Serotype assignments are 722 used to facilitate outbreak investigations and surveillance efforts, construct salmonellosis risk 723 assessment frameworks, and inform food safety and public health policy and decision-making efforts 724 (Yoshida et  Disease Control and Prevention, 2020). 727 In this study, serotypes assigned using traditional phenotypic methods were compared to 728 serotypes assigned using two in silico methods (i.e., SISTR and SeqSero2). Notably, both in silico 729 serotyping approaches outperformed traditional Salmonella serotyping for this data set. Serotypes 730 assigned using SISTR's cgMLST approach and/or SeqSero2 were congruent with the Salmonella 731 whole-genome phylogeny and were able to resolve all un-typable, ambiguous, and incorrectly 732 assigned serotypes (Supplementary Table S1). It is essential to note that the data set queried here is 733 far too small and, thus, inadequate to formally benchmark these tools. Furthermore, all serotypes 734 studied here were among the ten most frequently reported serotypes of Salmonella recommended SISTR as the optimal contemporary tool for routine in silico Salmonella serotyping 752 based on overall accuracy; however, they additionally report that the raw read mapping approach 753 implemented in SeqSero2 (i.e., "allele mode") outperforms SISTR for prediction of monophasic 754 variants. Banerji et al. (Banerji et al., 2020) did not assess the performance of SISTR on their data 755 set, as it requires assembled genomes and not raw reads (another potential drawback if a high-quality 756 assembly is not available or obtainable for an isolate of interest); however, they found that both 757 SeqSero and MLST approaches misidentified monophasic variants, particularly among the important 758 monophasic S. Typhimurium lineage. Among the bovine-associated Salmonella strains sequenced 759 here, we observed a combination of S. Typhimurium strains that possessed the O5 epitope, and those 760 that did not (i.e., S. Typhimurium Copenhagen). Importantly, SISTR was unable to differentiate S.  (BOV_CERO_35_10-02-08_R8-2685) was reported to be phenotypically resistant to nine 917 antimicrobials but did not possess any acquired AMR determinants known to produce this 918 phenotypic AMR profile. A recent case study in which WGS and phenotypic methods were used to 919 characterize Salmonella isolates from raw chicken identified numerous AMR genotype/phenotype 920 discrepancies resulting in both false negative and false positive predictions for in silico methods 921 This is a provisional file, not the final typeset article (Zwe et al., 2020). In this case study, the authors attributed these discrepancies to heteroresistant 922 Salmonella subpopulations (i.e., a subpopulation of bacteria that exhibits a range of susceptibility to a 923 particular antimicrobial). The possibility that several heteroresistant Salmonella populations were 924 characterized here cannot be discounted, as isolates underwent phenotypic AMR characterization and 925 WGS separately (i.e., years apart). Other biological phenomena, such as plasmid loss during storage 926 or culturing, or unknown/undetected resistance genes or mutations, could also contribute to 927 discrepancies (Hendriksen et al., 2018). However, it is also possible that one or more incongruent 928 isolates was mislabeled and/or mishandled during AMR phenotyping, genomic DNA extraction, 929 and/or WGS. While removal of these isolates from the data set would increase overall prediction 930 accuracy, the high congruency between AMR genotyping methods would be unaffected. 931

5
Author   a Statistics were calculated using the confusionMatrix function in the caret package in R, with resistant ("R") phenotypes/genotypes treated as 1459 the "positive" result and susceptible ("S") phenotypes/genotypes treated as the "negative" result; See Supplementary Table S3 for an  1460  extended version of this table.  1461 b Adjusted using a Bonferroni correction. 1462