A prospective role for the rumen in generating antibiotic resistance

Antibiotics were a revolutionary discovery of the 20th century, but the ability of bacteria to spread the genetic determinants of resistance via horizontal gene transfer (HGT) has quickly endangered their use1. Indeed, there is a global network of microbial gene exchange, the analysis of which has revealed particularly frequent transfer of resistance determinants between farm animals and human-associated bacteria2. Here, we leverage the recent release of a rumen microbial genome reference set and show that the wide-spread resistance gene cluster aadE-sat4-aphA-3 is harboured in ruminal Bacteroidetes. While this cluster appears to have been recently transferred between commensal bacteria in the rumen and many diverse animal and human pathogens, comparative analysis suggests that the cluster stabilized in the pathogens. Then, focusing on streptomycin resistance, it was found that homologues from the rumen span much of the known diversity of aminoglycoside O-nucleotidyltransferases (AadEs) and that distinct variants of the enzyme are present in a single rumen bacterial genome. Notably, a second variant of AadE has also been recently transferred, albeit more often as a single gene, throughout a different set of animal and human associated bacteria. By examining the synteny of AadE orthologues in various bacterial genomes and analyzing corresponding gene trees in an environmental context, we speculate that the ruminant associated microbiome has a salient role in the emergence of specific resistance variants and clusters. In light of the recent literature on the evolutionary origin of antibiotic resistance, we further suggest that the rumen provides a possible route of dissemination of resistance genes from soil resistomes, throughout the farm, and to human pathogens3.

Antibiotics were a revolutionary discovery of the 20 th century, but the ability of bacteria to 31 spread the genetic determinants of resistance via horizontal gene transfer (HGT) has quickly 32 endangered their use 1 . Indeed, there is a global network of microbial gene exchange, the analysis 33 of which has revealed particularly frequent transfer of resistance determinants between farm 34 animals and human-associated bacteria 2 . Here, we leverage the recent release of a rumen 35 microbial genome reference set and show that the wide-spread resistance gene cluster aadE-sat4-36 aphA-3 is harboured in ruminal Bacteroidetes. While this cluster appears to have been recently 37 transferred between commensal bacteria in the rumen and many diverse animal and human 38 pathogens, comparative analysis suggests that the cluster stabilized in the pathogens. Then, 39 focusing on streptomycin resistance, it was found that homologues from the rumen span much of 40 the known diversity of aminoglycoside O-nucleotidyltransferases (AadEs) and that distinct 41 variants of the enzyme are present in a single rumen bacterial genome. Notably, a second variant 42 of AadE has also been recently transferred, albeit more often as a single gene, throughout a 43 different set of animal and human associated bacteria. By examining the synteny of AadE 44 orthologues in various bacterial genomes and analyzing corresponding gene trees in an 45 environmental context, we speculate that the ruminant associated microbiome has a salient role 46 in the emergence of specific resistance variants and clusters. In light of the recent literature on 47 the evolutionary origin of antibiotic resistance, we further suggest that the rumen provides a 48 possible route of dissemination of resistance genes from soil resistomes, throughout the farm, 49 and to human pathogens 3 . 50 51 MAIN TEXT 52 53 Since the introduction of antibiotics in 1937, the emergence and spread of antibiotic resistance 54 determinants (ARDs) has become one of the largest threats to human health 1,4 . In fact, the history 55 of antibiotic use is concurrent with the history of increasing antibiotic resistance (AR), which 56 now vastly outpaces new antibiotic discovery 5 . Considering the lack of new antimicrobial 57 compounds entering the clinic, there have been many renewed calls for efforts to discover new 58 compounds or to return to modifying well-understood classes of antibiotics, such as the 59 aminoglycosides 5-8 . In addition to restricting the use of current and future antibiotics, there is a 60 need to better understand the extensive evolutionary history of specific ARDs and their routes of 61 dissemination 3,9-11 . In doing so, attempting to re-trace evolutionary events involving ARDs and 62 resistance clusters will be essential to move from the metagenomic description of AR reservoirs 63 to identifying particular sources where AR variants emerge, assemble into clusters, and 64 subsequently transfer to human pathogens 12,13 . Fortunately, the ability to carry out such analysis 65 is constantly improving with the number of publicly available genome sequences 14 . Recently, 66 several high-quality datasets containing hundreds of bacterial and archaeal genomes from the 67 rumen microbiome have been published, such as the Hungate1000 collection 15,16 .

69
In order to search for ARDs that may form clusters in commensal rumen bacteria, we first 70 collected prokaryotic genome sequences from cultured organisms and combined them with a set 71 of metagenome assembled genomes (MAGs), all sourced from the rumen 15,16 . This led to a total 72 of 1585 genomes (453 genomes from cultured organisms and 1133 MAGs). Predicted open 73 reading frames (ORFs) from this dataset were then compared to the comprehensive antibiotic 74 resistance database (CARD) and it was noticed that 2 characterized ARDs from the antibiotic 75 inactivation category, AadE and AphA-3 from Streptococcus oralis and Campylobacter coli, 76 respectively, each shared 100% amino acid identity with an ORF from three different genomes in 77 the rumen dataset 17 . In all three genomes, these two ORFs were proximal on the same contig, 78 indicating that they may be organized in a cluster (Table S1). Two of the genomes derive from 79 different species of Bacteroides from cows in the US, while the third came from a MAG 80 classified as Prevotella sampled from a cow in Scotland. When compared at the nucleotide level, 81 the three contigs identified from the rumen bacterial genomes shared a region of approximately 82 ~10kB at 100% nucleotide identity, which upon further annotation, was found to contain the 83 well-known aminoglycoside-streptothricin AR cluster aadE-sat4-aphA-3 18 . This cluster was 84 originally identified as the transposon Tn5405 in Staphylococcus aureus and the genes aadE, 85 sat4, and aphA-3 encode for an aminoglycoside O-nucleotidyltransferase, a streptothricin N-86 acetyltransferase and a aminoglycoside O-phophotransferase, respectively ( Figure 1) 19,20 . The 87 Tn5405 sequence itself is also among those conserved at 100% nucleotide identity and to date, 88 the cluster has been observed across a wide range of human and animal pathogens 18-27 . When 89 compared to the NCBI non-redundant nucleotide database, it was found that a highly-conserved 90 region that spanned ~6kB of the ~10kB region was present in a diverse set of pathogens ( Figure  91 1A, Table S2). The segment of this ~6kB which contained the aadE-sat4-aphA-3 cluster ranged 92 from 99.8-100% nucleotide identity in 32 unique sequences as compared to the rumen sourced 93 contigs, while the flanking regions ranged from 89.9-100% ( Figure 1A). Interestingly, the 94 regions that were missing in the pathogens as compared to the rumen bacterial genomes 95 contained only annotated transposases, including a transposase located between aphA-3 and sat4 96 in the cluster, indicating that the cluster has stabilized in the pathogens ( Figure 1B, Table S3) 28 . 97 It is worth noting that the only example found where the cluster was not shared as a whole was in 98 Bacteroides fragilis, a common reservoir of AR and an opportunistic pathogen, where aphA-3 99 appears to have recombined into a different multi-drug resistance cluster, CTnHyb 29,30 . Further, 100 B. fragilis was the only non-rumen sequence found with an additional highly conserved region 101 and is the most closely related organism phylogenetically to the three genomes sourced from the 102 rumen. Taken together, the version of aadE-sat4-aphA-3 identified in rumen Bacteroides is 103 highly-conserved in diverse human pathogens, was therefore likely recently horizontally 104 transferred and the loss of transposases, only observed in the pathogenic isolates, implies 105 stabilization of the cluster outside of the rumen. We then sought to gain more evolutionary 106 insight into the individual ARDs within the cluster.

108
Since genes are the units of evolution and proliferation for mobile traits, we attempted to analyze 109 the evolutionary history of a single enzyme within the cluster. We focused on AadE (also known 110 as ANT(6)), an enzyme characterized to be involved in streptomycin resistance, as it is known to 111 have diverse homologues with the same activity and streptomycin resistance has been long 112 observed in the rumen 31,32 . For instance, in 1966, a range of rumen isolates were screened against 113 various antibiotics and the only compound that demonstrated resistance in all cases was 114 streptomycin 32 . We used 1354 homologues of AadE from the NCBI non-redundant (nr) protein 115 sequences database to build a gene tree ( Figure 2) 33,34 . The majority of the homologues (78%) 116 came from Firmicutes, where AadEs likely originated, followed by the Bacteroidetes (13%) 31 . 117 The taxonomic origin of the remaining sequences was diverse and interestingly, despite the fact 118 that only 7% of the sequences derive from the rumen microbiome, they span much of the 119 diversity represented in the NCBI nr database ( Figure 2). This indicates that the rumen has been 120 exposed to a large and diverse gene pool with respect to sequences homologous to AadE. Then, 121 we noticed that a sub-clade (clade 7) contained both the AadE from the aadE-sat4-aphA-3 122 cluster, as well as a homologous variant from the same rumen Bacteroides genome (Bacteroides 123 thetaiotaomicron nale-zl-c202 (Hungate collection 4309680)) ( Figure 2). These two variants 124 were annotated as ANT(6)-Ia (AadE-Ia) and ANT(6)-Ib (AadE-Ib), respectively. As these two 125 enzymes are thought to have the same activity, we were interested to see how the horizontal 126 transfer of aadE-Ib compared with that of aadE-Ia 31 . To do so, we carried out the same type of 127 analysis as shown in Figure 1, but instead analyzed the aadE-Ib containing contig from the B. 128 thetaiotaomicron nale-zl-c202 ( Figure 3, Table S2). In this case, aadE-Ib was widely distributed 129 in pathogens and commensal bacteria, albeit with lower nucleotide identities as compared to 130 aadE-Ia (81.3-100%) and seems to be transferred alone or with a different aminoglycoside O-131 nucleotidyltransferase (aad9 or ANT(9)) ( Figure 3, Table S3). Considering that the most closely 132 related sequences to aadE-Ib are not as conserved and not exclusively found in pathogens, this 133 gene is likely not under as strong of selection as aadE-Ia. It is however recombining in context 134 with other ARDs. For example, it was found to recombine near Tet(O) in C. coli SX8, a gene 135 which is also highly conserved in several ruminal bacteria at the nucleotide level ( Figure 3B, 136 Figure S1A). When looking at further syntenic regions, AadE-Ib was often found in context of 137 AadE-Ia and the aadE-sat4-aphA-3 cluster. We therefore were interested to further compare 138 AadE-Ia and AadE-Ib across many environments and bacterial genomes and better understand 139 how these two variants may have emerged.

141
By building a gene tree with all protein sequences within clade 7, shown in Figure 2, we 142 observed four clear sub groups that each corresponded to a different annotated version of AadE 143 ( Figure 4A). Outside of bacteria from the rumen or pathogens, the two groups representing 144 AadE-Ia and AadE-Ib contained sequences that were mostly sourced from various animal or 145 human intestinal samples ( Figure 4A). Moreover, the sequences from the rumen tended to span 146 these two groups, whereas the sequences from the two more deeply branching sub groups, 147 containing ANT(6)-Id (AadE-Id) and ANT(6)-Ic (AadE-Ic), were mostly sourced from diverse 148 environmental samples, such as plants and soil ( Figure 4A, Table S4). This may not be surprising 149 in light of several genomic analyses of the transfer of horizontal resistance, which have pointed 150 to the gut as an interconnection between soil and clinical pathogens or found that farm animal 151 microbiomes are enriched for transfer events with human-associated bacteria 2,11 . This does 152 however point more specifically to the rumen as a link between the environment and the human 153 or animal intestinal tract. Two questions that then arise are: why did these two variants, AadE-Ia 154 and AadE-Ib, emerge evolutionarily and why are they both often present in a single genome (e.g.

155
B. thetaiotaomicron nale-zl-c202)? Especially considering that the characterized versions have 156 the same activity 31 . When looking at those genomes that were selected for sharing high 157 nucleotide identity with the aadE-Ia or aadE-Ib from B. thetaiotaomicron nale-zl-c202, several 158 of them were also found to have both or multiple copies of AadE ( Figure 4B and Figure 4C). By 159 comparing their identity and synteny, it is clear that several AadEs have arisen via gene 160 duplication events and often remained in context of each other ( Figure 4B and Figure C). 161 Interestingly, outside of the aadE-sat4-aphA-3, the genes found in context of the two AadEs are 162 mostly streptomycin or aminoglycoside modifying enzymes ( Figure 4B and Figure C). Other 163 ARDs in context include tetracycline and lincosidamide resistance genes, which are also heavily 164 represented in the rumen and act on compounds produced by Streptomyces (Table S1, Table S5, 165 Figure S1). It is interesting to note, although often observed with other ARDs, that AadE-Ia and 166 AadE-Ib further recombine into clusters with genes which would theoretically yield the same 167 resistance phenotype. A logical suggestion is that aminoglycoside producing bacteria from soil 168 are also the sources of AR, and that these genes may have served modifying roles outside of 169 resistance to the toxicity of the compounds 35-37 . Altogether, it is possible that recombining 170 variants of aadE from the environment further duplicated, potentially including the events that 171 spawned AadE-Ia and AadE-Ib, adapted, and refined their syntenic context in the rumen. During 172 the process, there were likely many subsequent transfer events, often with commensal bacteria of 173 the intestinal tract of humans and other animals.

175
In terms of food-producing animals, aminoglycosides accounted for 3.5% of the total sales of 176 antimicrobials in 2015 and are most frequently used to treat infections 38 . Considering the 177 diversity of homologues of aminoglycoside inactivating or modifying enzymes and that cattle are 178 not directly fed aminoglycosides, it is worth considering that the rumen is also exposed to the 179 compounds and an ARD gene pool via natural sources. Soil, for example, is a well characterized 180 reservoir of antibiotic producing organisms and ARDs, which long predate the use of antibiotics, 181 and aminoglycosides have particularly high sorption in soils 8,9,39-41 . Additionally, Streptomyces 182 are often isolated from agricultural soils, including in the case of the discovery to streptomycin 42 , 183 as well as from feed sources such as hay directly 43 . The rumen takes in enormous amount of feed 184 and in various ways, it has been shown to provide favourable conditions for genetic exchange 44-185 46 . Considering that ecology shapes gene exchange, it is reasonable to assume that the rumen, a 186 100-200L anaerobic bioreactor constantly interfacing with the feed containing a diversity of 187 antibiotic related compounds and the microorganisms producing them, provides an opportunity 188 for a ARD gene pool to exchange and adapt within an animal associated microbiome and 189 environment. While streptomycin is not regularly detected in feed, other compounds produced by 190 Streptomyces, which are easier to detect, such as chloramphenicol, are found regularly 47 . 191 Ultimately, a wide range of aminoglycoside modifying enzymes sourced from soils or sediments 192 may be transferred to and refined the rumen, especially in terms of genetic synteny, before being 193 spread throughout the farm and potentially strongly selected or co-selected for when treating an 194 animal infection or when a field is contaminated with antibiotics ( Figure S2) 48 . In terms of 195 spreading throughout the farm, the humans, whose associated microbes show 25 fold more HGT 196 as compared to non-human isolates, in contact with the animals are the most obvious conduit 2 . It 197 was however also interesting to find a common dog pathogen (Staphylococcus 198 pseudointermedius) in the analysis which contained the highly conserved aadE-sat4-aphA-3 199 cluster ( Figure 1A). Overall, we observed recent horizontal transfer events of ARDs between 200 ruminal bacteria, farm animals, pets and pathogens infecting humans, whose history of assembly 201 points towards the rumen as the source. Therefore, while only one of many sources of AR, the 202 rumen should be considered an environment with high potential for generating clusters of ARDs 203 and providing a central link to other reservoirs, especially on the farm, before going on to create 204 problems in the clinic. If further evidence corroborates this suggestion, antibiotic discovery 205 efforts could focus on antibiotic compounds from organisms that evolved in environments with 206 little or no connection to agricultural feed. from NCBI to three rumen contigs (see Table S1) compared to a Prevotella sp. metagenome 212 assembled genome (MAG)(RUG782) contig. B. Gene diagram comparison between 2 rumen 213 sourced and 2 pathogen bacterial genomes to show conservation of genetic synteny from a few 214 selected examples. Gene numbering maps to annotations in Table S3 and grey connections  215 between genes represent homologues. from NCBI to a single rumen contig (see Table S1) compared to a B. thetaiotaomicron nale-zl-226 c202 genome (Hungate collection 4309680) contig. B. Gene diagram comparison between a 227 rumen sourced and 2 pathogenic organism genomes to show conservation of genetic synteny 228 from a few selected examples. Gene numbering maps to annotations in Table S3 and grey  229 connections between genes represent homologues. 230 231 Figure 4. A. A maximum likelihood tree using all sequences falling within clade 7 (Figure 2). 232 Ultrafast bootstrap values are shown and sequences in bold are from the Hungate 1000 233 collection. Clades are labelled based on containing specific variants of AadE. The outgroup used 234 was a randomly selected sequence taken from clade 22 in Figure 2. B and C. Each point around 235 the circle is an antibiotic resistance determined (ARD) coloured by the contig containing it. A 236 contig is representated if the genome was used in Figure 1 or Figure 3 and contained two or more 237 AadE. An ARD is shown if it is annotated as AadE-Ia or AadE-Ib or another annotated ARD that 238 is syntenic with one of the AadE variants (within a resistance cluster AadE-Ia, part of the aadE-sat4-aphA-3 cluster or annotated to act on aminoglycosides. Other 243 ARDs present in the resistance cassettes are shown in Table S5. Comparative analysis of aadE-sat4-aphA-3 and aadE-Ib 258 259 Genome sequences from the Hungate1000 project, including those listed as previously 260 published, were combined with MAGs from Steward et al. 15,16 . Using a concatenated fasta file 261 containing all genome and MAG nucleotide sequences, ORFs were predicted using prodigal 262 v2.6.3 49 . The resulting ORFs were then blasted against the CARD using local blastp v2.9.0+ 50 . 263 The 3 contigs (Hungate collection 4309689_79 and 43809680_52, MAG RUG782_1) which 264 coded for the top 6 blast hits, in terms of bitscore, were then blasted against the NCBI nucleotide 265 collection (nr/nt) using web-based blastn and the full-length sequence for each of the top 50 hits 266 was downloaded 33,34 . After removing identical sequences, a total of 54 sequences were used in 267 the downstream analysis, the accession numbers and descriptions for which are listed in Table  268 S2. Each of the downloaded sequences was then blasted against the rumen sourced Prevotella sp. 269 contig (RUG782_1) using local blastn 2.9.0+ 50 . A sequence was displayed in Figure 1A if the 270 total combined length of alignments was over 4000 bp for each query and the percent identity of 271 the alignment was over 80%. For the gene diagrams displayed in Figure 1B, annotations from up 272 to 10 of the top blastp hits from the NCBI non-redundant protein sequences database are listed in 273  Table S3 33,34 . The same was done for Figure 3A, except starting with the contig from B. 274 thetaiotoamicron nale-zl-c202 genome (43809680_59) containing aadE-Ib. Again, top 50 hits 275 from the NCBI nucleotide collection (nr/nt) were downloaded (Table S2) and subsequently 276 blasted against the rumen B. thetaiotoamicron contig (43809680_59). Here, a sequence was 277 displayed in Figure 3A if the total combined length of alignments was over 1000 bp and the 278 percent identity of the alignments were over 80%.

280
Phylogenetic and syntenic analysis of AadE 281 282 The predicted ORF for AadE from the 3 selected rumen contigs (Hungate collection 283 4309689_79 and 43809680_52, MAG RUG782_1), being identical ORFs, was blasted against 284 the NCBI non-redundant protein sequences database 33,34 . All hits with an e-value below 1e-4 285 were downloaded. Sequences were further eliminated if the length was below 250 bp or above 286 350 bp and an initial alignment was then made using MUSCLE (including the following flags: -287 maxiters 3 -diags -sv -distance1 kbit20_3) 51 . This alignment was inspected using Geneious 288 v9.1.8, trimmed to between position 64 and 609, and further refined using the default setting 289 from MUSCLE, while allowing for up to 50 iterations 51,52 . The phylogenetic tree shown in 290 Figure 2 was subsequently constructed FastTree on the default settings 53 . In terms of 291 visualization, clades were collapsed whose average branch length to the leaves was below 1.5 292 using the interactive tree of life (iTOL) online tool 54 . The resulting tree is down in Figure 2.

294
The tree shown in Figure 4A was constructed using the sequences extracted from clade 7 in 295 Figure 2, with the addition of any homologues of AadE-Ia or AadE-Ib (>200 amino acids and 296 >60% identity to the two versions from B. thetaiotoamicron nale-zl-c202 when compared using 297 local blastp 2.9.0+) from the genomes used in Figure 1 and 3, if they contained multiple copies 298 of the homologues 50 . The sequences were aligned using MUSCLE with 50 iterations, inspected 299 using Geneious v9.1.8, and trimmed to between positions 25 and 305. Moreover, truncated 300 proteins were removed, resulting in an alignment of 156 sequences, which was again refined 301 using MUSCLE. This was then used as the input file for IQ-TREE using the standard settings 302 with the following flags: -m TEST -bb 1000 -alrt 1000. An Le Gascuel (LG) model was selected 303 using Gamma with 4 categories for the rate of heterogeneity 55-57 . The resulting tree, along with 304 ultrafast bootstrap values, was visualized using iTOL.

306
To analyze synteny, any ORFs annotated as ARDs surrounding the AadE-Ia or AadE-Ib 307 homologues (within maximum ~50kB) that were taken from the genomes used in Figure 1 and 3 308 are shown in Figure 4B and C. These were compared to AadE-Ib ( Figure 4B) AadE-Ia ( Figure  309 4C) or using local blastp 2.9.0+ 50 . The annotation based on the top blastp hits from the NCBI 310 non-redundant protein sequences database are listed in Table S5