Detection of ESKAPE pathogens and Clostridioides difficile in Simulated Skin Transmission Events with Metagenomic and Metatranscriptomic Sequencing

Background Antimicrobial resistance is a significant global threat, posing major public health risks and economic costs to healthcare systems. Bacterial cultures are typically used to diagnose healthcare-acquired infections (HAI); however, culture-dependent methods provide limited presence/absence information and are not applicable to all pathogens. Next generation sequencing (NGS) has the capacity to detect a wide variety of pathogens, virulence elements, and antimicrobial resistance (AMR) signatures in healthcare settings without the need for culturing, but few research studies have explored how NGS could be used to detect viable human pathogen transmission events under different HAI-relevant scenarios. Methods The objective of this project was to assess the capability of NGS-based methods to detect the direct and indirect transmission of high priority healthcare-related pathogens. DNA was extracted and sequenced from a previously published study exploring pathogen transfer with simulated skin containing background microorganisms, which allowed for complementary culture and metagenomic analysis comparisons. RNA was also isolated from an additional set of samples to evaluate metatranscriptomic analysis methods at different concentrations. Results Using various analysis methods and custom reference databases, both pathogenic and non-pathogenic members of the microbial community were taxonomically identified. Virulence and AMR genes known to reside within the community were also routinely detected. Ultimately, pathogen abundance within the overall microbial community played the largest role in successful taxonomic classification and gene identification. Conclusions These results illustrate the utility of metagenomic analysis in clinical settings or for epidemiological studies, but also highlight the limits associated with the detection and characterization of pathogens at low abundance in a microbial community.

difficile (Slimings and Riley, 2014), these pathogens are the leading causes of nosocomial 51 infections (Boucher et al., 2009;Santajit and Indrawattana, 2016). Culture-based methods within 52 clinical laboratories are typically utilized to identify and track HAI transmission, such as the 53 nosocomial infections caused by ESKAPE pathogens and C. difficile (ESKAPE+C), (Didelot et 54 al., 2012), but cultures have multiple drawbacks. Dead or unculturable pathogens will be 55 overlooked by culture-dependent methods, even though usable biochemical signatures (e.g., 56 DNA) persist. Culturing is primarily a method for identifying viable pathogens amenable to 57 growth under certain conditions, aiming to confirm the presence of known pathogens at the 58 species level. Once a putative pathogen species has been identified, multiple rounds of culturing 59 and biochemical assays may be necessary to further characterize pathogens at the strain level or 60 to identify antibiotic resistance activity. 61 Metagenomic and metatranscriptomic analyses of samples collected in a healthcare setting 62 provide compelling alternatives to traditional culture-based pathogen identification. These 63 analyses do not require pathogen viability or culturability; instead, collected cells are lysed and 64 the nucleic acids are collected for sequencing. These approaches permit species or even strain 65 level identifications of pathogens present within a sample without multiple rounds of culture 66 analysis. Perhaps most importantly, sequencing approaches can provide valuable insights into 67 gene content and expression, identifying components of the resistome and elements contributing 68 to virulence in a clinical sample. Previous studies have evaluated the relationship between 69 culture and metagenomic analysis, highlighting both successes and challenges for this 70 technology (Didelot et al., 2012). Challenges of unbiased metagenomic or metatranscriptomic 71 sequencing methods include complexities in developing standardized analysis protocols and 72 databases, and pathogen concentrations falling below the limit of detection in relation to other 73 organisms in the sample. In the current study, we constructed customized databases based on the 74 known mock microbial community genome and gene content to explore the impact of different 75 ESKAPE+C concentration levels and simulated HAI transfer scenarios on pathogen detection 76 from metagenomic and metatranscriptomic sequence data. 77 Our research expands upon previously published data from a study establishing an in vitro 78 method to model ESKAPE+C transmission using a synthetic skin surrogate (Weber et al., 2020). 79 This prior study enabled the investigation of both direct (skin-to-skin) and indirect (skin-to 80 fomite-to skin) pathogen transmission scenarios using VITRO SKIN ® N-19 to mimic human 81 skin, including a simulated commensal skin flora ( Figure 1). The commensal skin flora was 82 included on both the pre-transfer and post-transfer coupon to simulate pathogen transfer from 83 skin containing a mix of pathogen and commensal organisms to a second piece of skin 84 containing only the existing commensal community. Different  distance was used to identify the closest available reference genome with the trimmed read data 175 (Ondov et al., 2016). SPAdes assemblies were generated from the high-quality genome sequence 176 data for each isolate ( reporting and analysis. Bowtie2 was used to map reads to a custom database of genes present 190 within the isolates. Metagenomic assembly was performed with metaSPAdes (DNA) (Nurk et 191 al., 2017)  successfully collected from all direct and indirect scenarios previously described (Weber et al.,213 2020), the indirect wash scenarios were not sequenced in this study because it was anticipated 214 that they would fall well below the sequencing limit of detection ( Figure 1). 215 For the direct contact scenario, the primary VITRO-SKIN ® coupon was inoculated with a mix of 218 pathogen and background bacteria, representing a contaminated patient hand (step 1).

Taxonomic Detection of Pathogens in Metagenomes 233
Metagenomic analysis successfully identified the commensal and pathogenic organisms present 234 at the high spike-in level (~10 6 CFU/mL), direct contact scenarios. Taxonomic analysis included 235 the use of read mapping and containment estimations with a custom reference genome database 236 that consisted of the specific strains used in this study (Supplementary Tables 3 and 4). As 237 identified in Figure 2, the Mash Screen identity value of 0.90 served as a reasonable threshold for 238 bacterial genome detection in metagenomes of this study. The indirect and low spike-in scenarios 239 did not result in enough sequence coverage of the bacterial genomes to lead to a positive 240 detection event (Figure 3). Ultimately, pathogen abundance within the overall microbial 241 community played the largest role in successful taxonomic classification .  242  243  244  245  246  247  248  249  250  251  252  253  254  255  256  257  258  259  260  261  262  263  264  265  266  267  268  269  270  271  272  273  274  275  276  277  278  279 280

281
A Mash Screen identity threshold of 0.90 served as a reasonable threshold for bacterial genome 282 detection in metagenomes of this study with a custom database of the expected genome strains. 283 While some true positives (green) fell below the 0.90 threshold, none of the false positives (red) 284 were detected above it. Two or three ESKAPE+C pathogens were cultured together in three 285 different mixes before sequencing, and because the Mash Screen custom database only contained 286 the ESKAPE+C and background organisms, the false positives shown here represent pathogens 287 that were not part of the particular mix that was sequenced. the direct, high-spike-in scenarios (x axes). Notably, C. difficile (black) was difficult to detect 298 with metagenomics even in high-spike-in scenarios, which may be due insufficient yield of DNA 299 from its endospore state. 300 301

Impact of Simulated Handwashing on DNA Yield 302
ESKAPE+C pathogens were cultured together in three different mixes based on their media and 303 growth requirements (Weber et al., 2020). Samples were split between the culturing and 304 metagenomics sequencing experiments, and the results were compared downstream. Two or 305 three pathogens were included in each cultured mix, and eight background microorganisms were 306 included along with the pathogens before each metagenome was sequenced (Figure 1). 307 Therefore, the reads within a metagenome that did not map to an ESKAPE+C reference genome 308 in Table 1 and Figure 4 had originated from either 1) another pathogen in the mix or 2) one of 309 the background microorganisms. The CFU/mL values calculated from the culture data were 310 compared to the percentage of metagenomic reads mapped to one of the ESKAPE+C reference 311 genomes. Pathogens were only detected from metagenomics data in the high spike-in direct 312 contact scenarios (Figure 3), such as direct transfer events with and without handwashing. There 313 was no observable correlation between CFU/mL and reads mapped to the reference genomes 314 (Supplementary Figure 3). 315 316 While the simulated handwashing events on VITRO-SKIN ® decreased the number of viable 317 pathogen cells, they resulted in higher overall pathogen detection relative to the non-318 handwashing scenarios rates within metagenomes in the high spike-in (~10 6 CFU/mL) direct 319 contact scenarios ( was only visible in the high spike-in scenarios (~10 6 CFU per 9 cm 2 coupon), as pathogens 358 spiked in at low amounts (<10 4 CFU per 9 cm 2 coupon) fell below the limit of detection with 359 metagenomic sequencing. Culture data still picked up some signal in these low spike-in instances 360 (1-904 pathogen CFU/mL after transfer for direct contact scenarios and 1-72 pathogen CFU/mL 361 for indirect contact scenarios). Metagenomics detected pathogens for only the high spike-in 362 direct no wash scenarios (average of ~2,447 pathogen CFU/mL after transfer), while culture data 363 detected pathogens in all high spike-in scenarios (95-42,400 pathogen CFU/mL after transfer for 364 direct contact scenarios and 1-1,600 pathogen CFU/mL for indirect contact scenarios). 365 366

Detection of Antimicrobial Resistance and Virulence Genes in Metagenomes 367
To evaluate rates of gene detection and coverage, metagenomic reads derived from contact 368 scenarios were mapped to the custom database of genes that were annotated within the 369 assembled genomes of ESKAPE+C pathogens and background organisms ( Figure 5, 370 Supplementary Figures 4-7). Like the ESKAPE+C genome-level analyses, there was a decrease 371 in pathogen gene signals in the indirect contact scenarios compared to the direct contact 372 scenarios ( Figure 6). The wash with high inoculum detected more reads from genes compared to 373 no wash scenarios, and there was no detection of pathogen genes in low inoculum. Though 374 similar to the genome-level analyses, E. cloacae was an exception to this in that no handwash 375 scenarios showing a larger maximum DNA yield compared to handwash scenarios. 376 Antimicrobial resistance (AMR) genes detected within the contact scenarios included several 377 that encode proteins specific for antibiotic inactivation including aminoglycoside resistance 378 genes aac(6'), -ach(2"), and aadD from S. aureus, aac(6')-li and ant (6)  increased DNA yield when read mapping at the genome-level (Table 1), but the impact of 444 handwashing appears relatively larger when viewed at the AMR gene level. Although 445 completely shared genes (100% length and identity) across multiple species were removed from 446 analysis, it is possible that partially shared genes accounted for the increased number of mapped 447 short reads at the AMR gene level. differences in the loss of signal between high inoculum handwash and no handwash events 464 (Figure 7). The detection of signal from C. difficile with no wash was negligible, while the signal 465 from P. aeruginosa and A. baumannii were successfully retained in no wash events. Like the 466 genome mapping results, P. aeruginosa showed a stronger signal after no handwash than 467 handwash, which was different than the majority of the other ESKAPE+C pathogens that 468 increased in DNA yield after the simulated handwashing events. The greater detection and 469 difference in loss of signal from different organisms suggest that metagenomic methods are 470 significantly affected by the physicochemical features of the organisms, and the circumstances 471 employed before DNA extraction .  472  473  474  475  476  477  478  479  480  481  482  483  484  485  486  487  488  489  490  491  492  493  494  495  496  497  498  were simulated with and without handwash. ABRicate detection of gene coverage demonstrates 507 pathogen-specific differences in the loss of signal between handwash and no wash events. 508 509 510

Detection of ESKAPE+C Pathogens in Titrated Metatranscriptome Experiment 511
Mapping of metatranscriptomic reads derived from RNA extracted from different concentrations 512 of ESKAPE+C pathogens spiked into a constant concentration of background organisms 513 demonstrated a loss of signal as ESKAPE+C bacterial concentration decreased (Figure 8). The 514 relationship of mapped bases and mean coverage in relation to estimated CFU from the spike-ins 515 of ESKAPE+C pathogens to background organisms was dependent on the organism. As a 516 general trend, gene signals were detected only for bacteria spiked in at the highest level (~10 6 517 CFU/mL) (Figure 9). Only seven of the AMR and virulence genes that were detected in the 518 contact scenarios were detected in the extracted RNA from pathogen spike-ins (Supplementary 519 Figure 8), suggesting low expression of these genes. Low gene expression is not unexpected, 520 given the absence of selection pressure and recent contact with a host organism. Different genes 521 specific to a microorganism demonstrated variability, while some within the same 522 microorganism showed a positive direct linear relationship with the CFU spike-in level. Other 523 genes within a microorganism appeared consistent as estimated CFU increased (Supplementary 524 Figure 9). Interestingly, metatranscriptomic analysis of the pathogen spike-in detected specific 525 genes from C. difficile, while the C. difficile signal was absent for genome detection methods 526 ( Figure 10). The detection of mRNA related to sporulation (i.e., small acid-soluble protein, spore 527 coat protein) in C. difficile within spike-ins indicates the potential for determination of the 528 presence of specific gene transcripts that can be contributed to a genus or species within a 529 mixture depending on the gene or partial sequence specificity to a taxonomic level. were detected, which were kept at a constant level of 10 6 CFU/mL in all samples. C. difficile and 539 A. baumannii were not detected in metatranscriptomes at any tested level .  540  541  542  543  544  545  546  547  548  549  550  551  552  553  554  555  556  557  558  559  560  561  562  563  564  565  566  567  568  569  570  571  572  573  574  575  576  577  578  579  580  581  582  583  584  to a genus or species within a mixture, depending on the specificity of a gene or partial sequence 606 to a taxonomical level. It also makes sense that sporulation genes would be expressed, since C. 607 difficile endospores were used in the experiments. 608 609 610

Discussion 611
Culturing of bacteria is commonly performed to identify infectious pathogens (Nekkab et  It was promising that genes were detected within microbial mixtures despite the lack of detection 643 with culturing techniques or genome-level taxonomic calls. This suggests that functionally 644 informative genes (e.g., virulence, AMR) could have lower the limits of detection than culturing 645 or standard taxonomic identification methods. A. baumannii was at or near the limit of detection 646 for the culturing method (left of dotted line in Supplementary Figure 5), while specific genes 647 from these species were detected by sequencing. Compared to whole genome techniques, the 648 method of assembly and annotations of organism specific genes led to greater retention of 649 pathogen signal, especially from the no handwash scenarios. The comparison of ABRicate 650 annotations of assembled contiguous sequences with Bowtie2-mapped bases demonstrates that 651 while alignment of short reads allowed for more sensitive detection of genes, the annotation of 652 assembled reads provided better precision in gene identifications. High precision was especially 653 true when at or near 100% coverage was achieved for a species-specific gene within a 654 contiguous sequence. 655 656 Bacterial nucleic acids from skin and nonsterile specimens may result in a stronger signal than 657 that of the pathogen (Gu et al., 2019), and in this study that was simulated by a constant level of 658 "background" bacterial species. The lack of signal from ESKAPE+C pathogen sequences within 659 the indirect scenarios using unbiased NGS methods compared to selective culturing techniques 660 could be attributed to the lack of selection for pathogen signal over background microorganism 661 signal in greater quantity, highlighting a key challenge of NGS in clinical settings with nonsterile 662 specimens and complex sample types. Enrichment techniques to amplify the pathogen signal 663 before sequencing could have improved the limits of detection for ESKAPE+C in all scenarios, 664 but enrichment techniques often come at the price of biased sequencing in search of known 665 targets. Such methods have limited applicability to emerging or novel pathogens, as well as 666 situations when the infectious agent is unknown to the physician and fails standard clinical tests. 667 668 6 Conclusions 669 Metagenomic and metatranscriptomic analyses promise an unbiased approach to pathogen 670 species-level detection and functional gene characterization within clinical samples; however, 671 the limitations of this technology must be fully evaluated before traditional culturing methods 672 can be supplemented or replaced by this new methodology. This study makes significant 673 progress toward this goal, capitalizing on a large, well-curated data set from a previously 674 published study and generating complimentary NGS analyses for comparison. In doing so, we 675 illustrate both the strengths of this type of analysis, such as the ability to identify pathogens and 676 characterize elements of virulence or the resistome in a given sample, as well as the limitations 677 of unbiased sequencing, predominantly highlighted by low sensitivity when pathogens are 678 present at low abundance within a complex mixture. Variability in the loss of signal from 679 different bacterial species also lends support to how laboratory and bioinformatics methods are 680 impacted by the intrinsic nature of the organisms, as it relates to nucleic acid extraction and the 681 uniqueness of genome content. These results will inform and aid the healthcare and 682 epidemiological community as they evaluate the appropriate scenarios to utilize metagenomic 683 analysis. 684 685 7 Abbreviations 686