Prophages and plasmids can display opposite trends in the types of accessory genes they carry

Mobile genetic elements (MGEs), such as phages and plasmids, often possess accessory genes encoding bacterial functions, facilitating bacterial evolution. Are there rules governing the arsenal of accessory genes MGEs carry? If such rules exist, they might be reflected in the types of accessory genes different MGEs carry. To test this hypothesis, we compare prophages and plasmids with respect to the frequencies at which they carry antibiotic resistance genes (ARGs) and virulence factor genes (VFGs) in the genomes of 21 pathogenic bacterial species using public databases. Our results indicate that prophages tend to carry VFGs more frequently than ARGs in three species, whereas plasmids tend to carry ARGs more frequently than VFGs in nine species, relative to genomic backgrounds. In Escherichia coli, where this prophage–plasmid disparity is detected, prophage-borne VFGs encode a much narrower range of functions than do plasmid-borne VFGs, typically involved in damaging host cells or modulating host immunity. In the species where the above disparity is not detected, ARGs and VFGs are barely found in prophages and plasmids. These results indicate that MGEs can differentiate in the types of accessory genes they carry depending on their infection strategies, suggesting a rule governing horizontal gene transfer mediated by MGEs.

MGEs can be thought of as parasites of bacteria. Thus, horizontal gene transfer (HGT) mediated by MGEs can be regarded as the genetic manipulation of hosts by parasites [11]. Given that MGEs are self-interested evolving entities, MGEs are expected to possess accessory genes that advantage themselves [12,13]. For example, plasmids are considered to gain selective advantages from ARGs by improving the survival of their bacterial hosts in heterogeneous environments [13][14][15][16]. It has also been hypothesized that phages gain selective advantages from VFGs by modifying environments in which their bacterial hosts live [17].
What evolutionary rules govern the arsenal of accessory genes carried by MGEs [12,13]? Such rules, if they exist, might reflect the different infection strategies of MGEs. For example, phages typically lyse host cells to transmit to other cells, whereas plasmids do not. Consequently, phages might not suffer substantial disadvantage even if they lack genes that improve the survival of bacteria, such as ARGs, because if their hosts are in danger, they can enter a lytic replication cycle to abandon their hosts and seek new ones [18][19][20][21][22]. By contrast, plasmids cannot typically use such evacuative strategies, and are hence likely to hinge critically on genes improving host survival. Thus, to understand the rules governing HGT mediated by MGEs, it is beneficial to investigate whether different MGEs carry different types of accessory genes.
To address the above question, we consider an ongoing debate about phage-borne ARGs. While it is well established that plasmids frequently carry ARGs [1][2][3], how frequently phages carry ARGs is controversial [23]. Phages mediate HGT through multiple mechanisms, among which specialized transduction is the most similar to HGT mediated by plasmids [24] (see Discussion for the other mechanisms). In specialized transduction, phages transfer genes carried in their genomes. Therefore, specialized transduction is strictly coupled with the infectious transmission of phage genomes, the coupling that is also entailed in plasmid conjugation [24]. Laboratory experiments have demonstrated that phages are capable of transferring ARGs to bacteria through specialized transduction [25]. However, the specialized transduction of ARGs in nature has been scarcely documented [3,26]. While metagenomic studies have detected ARGs in viral fractions of environmental DNA samples [27][28][29][30], other studies provide evidence suggesting that the detection of ARGs was due to the contamination of bacterial DNA in the viral fractions [31,32]. Genomics studies have predicted a number of prophages (i.e. phage genomes inserted into bacterial chromosomes as a consequence of specialized transduction) carrying ARGs in the genomes of Acinetobacter baumannii [33], Klebsiella pneumoniae and Pseudomonas aeruginosa [34] (see also [35]). Also, a previous study has isolated 29 phages from wastewater, of which 15 carry ARGs, suggesting that phages frequently possess ARGs [36]. However, these results appear at odds with a recent comprehensive analysis of phage genomes in public databases, which shows that ARGs are carried by only 0.3% of phages [37]. Taken together, the existing studies present mixed messages about the frequency at which phages carry ARGs.
To investigate how frequently phages carry ARGs, here we compare the distributions of ARGs and VFGs between the prophages and plasmids of pathogenic bacteria by comprehensively analysing public databases. We consider prophages instead of phages to compare different MGEs belonging to the same bacterial genomes. Our approach is designed to mitigate two issues we consider to be involved in the computational analyses of ARGs encoded in prophages, which are not taken into account in previous studies [33][34][35]. First, the misidentification of prophages can cause systematic biases in the number of prophage-borne ARGs. For example, nonprophage regions can be misidentified as prophages, causing an overestimation in the number of prophage-borne ARGs. Contrariwise, a true prophage can be missed, which leads to an underestimation of the number of prophage-borne ARGs. To avoid these biases due to prophage prediction, we compare the number of prophage-borne ARGs to that of prophageborne VFGs, where both numbers are expected to be biased by common factors so that the biases can be cancelled out. The second issue involved in the analysis of prophage-borne ARGs is a sampling bias in bacterial genomes, which can cause overestimation in the numbers of ARGs and VFGs owing to the double-counting of orthologous genes. The degree to which this bias occurs can differ between ARGs and VFGs. To correct this bias, we cluster all genes into putative orthologous groups based on sequence similarity and synteny conservation and count the numbers of putative orthologous groups of ARGs and VFGs (OGARGs and OGVFGs, respectively). Furthermore, to investigate a potential differentiation between prophages and plasmids, we analyse the distributions of ARGs and VFGs in plasmids. Finally, we examine whether prophages and plasmids also differ in the functional categories of VFGs they carry. Our results suggest that prophages are biased towards carrying VFGs, whereas plasmids are biased towards carrying ARGs in subsets of the examined species. However, such biases were not detected in many species, where both ARGs and VFGs are hardly encoded in prophages and plasmids. Moreover, we found that prophage-borne VFGs are more functionally specific than plasmid-borne VFGs in Escherichia coli. Taken together, these results indicate that prophages and plasmids can differ in the types of accessory genes they carry.

Methods
Our method is sketched in figure S1 in the electronic supplementary material.

(a) Data acquisition
Three VFG databases, namely, VFDB (3685 genes in set A), Victors (5085 genes) and PATRIC_VF (1293 genes), were downloaded from the respective websites in December 2020 [38][39][40]. The VFG entries in Victors were refined by removing those carried by non-bacterial pathogens or lacking NCBI protein GIs (4575 out of 5085 remained). Some VFG entries in Victors were missing protein sequences, which were downloaded from GenBank based on their protein GIs [41]. All VFG entries in the three databases were pooled and clustered to remove redundancy with CD-HIT with the protein sequence identity threshold of 1.0 [42], resulting in a combined VFG database of 7218 entries.
The genome assemblies with the 'Complete' status were downloaded from RefSeq in September 2021 with the following criteria [43]: a species had at least 60 complete genomes in RefSeq and at least 70 VFGs in the combined VFG database. These criteria resulted in 21 species of bacterial pathogens spanning three phyla, Actinobacteria, Firmicute and Proteobacteria (table 1; the complete list of 7175 genomes analysed in this study is in electronic supplementary material, table S1).

(b) Prophage prediction
To predict prophages in the bacterial genomes, VIBRANT (v. 1.2.1) was used for the following reasons. First, VIBRANT has comparatively high performance, as reported by a recent benchmark [44]. Second, it is a stand-alone tool, so it can be run on local computers. Third, its algorithm is based on the similarity search of known phage proteins rather than predicting prophages based on nucleotide signatures. VIBRANT was run against the genomic sequences with default parameters.

(d) ARG prediction
To predict ARGs in the bacterial genomes, AMRFinderPlus (v. 3.10.5) was run against translated coding sequences with the core subset of the database (v. 2021-09-11.1) and, if possible, the organism option enabled [45]. The genes predicted by AMR-FinderPlus as ARGs (i.e. 'element subtype AMR') and not annotated as pseudo-genes in RefSeq were considered as ARGs (those predicted as 'element subtype POINT', which contain point mutations associated with AR, were excluded).

(e) VFG prediction
To predict VFGs in the bacterial genomes, every entry in the combined VFG database (VFG database entry, for short) was queried against the translated coding sequences of every bacterial genome with BLASTP with an E-value threshold of 1 × 10 -9 [46]. A gene in a bacterial genome (bacterial gene, for short) could match multiple VFG database entries, in which case the VFG database entry with the highest bit-score was selected as the best match. A bacterial gene was considered as encoding VF if it met the following additional criteria: (i) it was not annotated as a pseudo-gene in RefSeq [43]; (ii) the BLASTP alignment between the bacterial gene and its best-match VFG database entry, if any, had at least 80% sequence identity and covers at least 80% of both the bacterial gene and the best-match VFG database entry; (iii) the species of the genome in which the bacterial gene resides was identical to the species in which experimental evidence for the best-match VFG database entry is available [38][39][40].

(f ) Orthology prediction
To avoid double-counting orthologous genes in closely related strains, all genes within a species were clustered into putative orthologous groups based on protein sequence similarity and synteny conservation, as follows. First, preliminary orthologous pairs of genes were identified between every pair of genomes within each species through all-against-all sequence similarity searches using ProteinOrtho v. 6.0.25 (with DIAMOND v. 2.0.6 [47]; E-value cut-off of 11 × 10 -5 ; minimum coverage of best alignments of 75%; minimum per cent identity of best alignments of 25%; minimum reciprocal similarity of 0.95) [48]. ProteinOrtho defines a preliminary orthologous pair of genes as a reciprocal nearly best hit (RNBH), as follows. A nearly best hit (NBH) of a gene queried against a target genome is defined as a hit whose bitscore is not smaller by a factor f than that of the best hit. The value of f was 0.95, which is the default value of ProteinOrtho. If two genes are mutually NBH of each other, they form RNBH [48].
RNBHs obtained with ProteinOrtho were pruned based on synteny conservation with an in-house script (electronic Table 1. Numbers of predicted prophages per genome (s.d., standard deviation), numbers of orthologous groups of antibiotic resistance genes (OGARGs), and those of virulence factor genes (OGVFGs) in genomes (total), predicted plasmidic prophages (pl.) and predicted chromosomal prophages (ch.). Numbers in brackets are for OGARGs in prophages containing at least one phage-structure gene. supplementary material), as follows. Let x and y be a pair of genes forming RNBH, and let X and Y be the genomic neighbours of x and y, respectively, where X is defined as a set of 21 genes consisting of ten genes upstream of x, 10 genes downstream of x, and x itself, and Y is likewise defined in terms of y (all contigs were assumed to be circular, and the orientation of genes were ignored). Let N x and N y be the number of genes in X and Y that form RNBHs with at least one gene in Y and X, respectively (note that a single gene in one genome can form RNBHs with multiple genes in another genome owing to tandem duplication). If both N x and N y are greater than 10 (i.e. a majority of the genes in X form RNBHs with the genes in Y, and vice versa), the RNBH formed by x and y was kept; otherwise, it was discarded [49]. Finally, the pruned RNBHs were clustered into putative orthologous groups with the spectral clustering algorithm implemented in ProteinOrtho v. 6.0.25 (minimum algebraic connectivity of 0.1; exact step 3; minimum number of species of 0; purity of 1 × 10 -7 ) [48].
(g) Classification of orthologous gene groups A gene (VFG or ARG) was considered to be encoded in a prophage if the entire gene is included within a genomic region predicted as a prophage. Similarly, a gene was considered to be encoded in a plasmid if the gene resides in a plasmid contig. Moreover, genes in prophages residing in plasmids ( plasmidic prophages, for short) were distinguished from those in prophages residing in chromosomes (chromosomal prophages) for two reasons. First, it was ambiguous whether genes in plasmidic prophages should be regarded as encoded by plasmids, prophages or both. Second, plasmidic prophages potentially represent a distinct class of MGEs called phage-plasmids [50].
An orthologous group of genes was considered to be encoded in chromosomal prophages, plasmidic prophages, or plasmids if the majority of the genes belonging to the group were encoded in the respective genomic contexts (the cases of ties were ignored).
An orthologous group of genes was considered an ARG or VFG if the majority of the genes belonging to the group were predicted as ARGs or VFGs, respectively. The majority rule was used because a subset of genes in OGARG or OGVFG could be predicted as non-ARGs or non-VFGs, respectively, owing to sequence divergence. However, for most orthologous groups of ARGs and VFGs, all genes in a group were predicted as either ARGs or VFGs. Moreover, no orthologous group contained both ARGs and VFGs.

(h) Functional classification of prophage-and plasmidborne virulence factor genes
To classify the functions of VFGs, gene symbols (i.e. abbreviated gene names) were assigned to prophage-borne and plasmidborne OGVFGs in E. coli, as follows. All OGVFGs in E. coli genomes were collectively associated with 862 best-match VFG database entries (see 'VFG prediction' section of Methods). Each of these entries was looked up in the respective VFG database (VFDB, Victors, or PATRIC_VF) to find a gene symbol assigned to it [38][39][40]. When a gene symbol was missing, a gene symbol was manually determined by querying the protein sequence of a VFG database entry against VFDB, the NCBI Conserved Domain Database (CDD), and the NCBI NR database [38,51,52] , table S2). Subsequently, these gene symbols were also assigned to OGVFGs that are neither prophage-borne nor plasmid-borne. The 232 gene symbols were manually categorized by their functions based on information obtained from VFDB, NCBI CDD and Uniprot [38,51,53] (electronic supplementary material, table S3). While most of the functional categories were adopted from VFDB [38], a new category, 'phage-related', was added, and the two categories, 'exoenzymes' and 'enzymes', were merged into 'enzymes'. Six gene symbols were annotated to have multiple functions and were therefore placed under multiple categories.

Results (a) Prophage prediction
To examine the distribution of ARGs in prophages, we computationally predicted prophages using VIBRANT [54] and ARGs using AMRFinderPlus [45] in the genomes of 21 pathogenic bacterial species downloaded from the RefSeq database [43] (Methods; electronic supplementary material, tables S1 and S4). To avoid double-counting orthologous ARGs in closely related strains, we clustered all genes within a species into putative orthologous groups based on sequence similarity and synteny conservation (Methods). We then counted the number of orthologous groups of ARGs (OGARGs) encoded in the predicted prophages, distinguishing between prophages residing in bacterial chromosomes and prophages residing in plasmids (chromosomal prophage and plasmidic prophage, respectively, for short). This distinction was made because it was ambiguous whether ARGs in plasmidic prophages should be regarded as encoded in prophages, plasmids, or both (plasmidic prophages potentially represent phage-plasmids, which are a separate class of MGEs from typical phages and plasmids [50]). The result shows that one to nine prophage-borne OGARGs were detected in ten of the 21 examined species (table 1; electronic supplementary material, table S5).
To probe the precision of the above prediction, we manually examined whether predicted prophages carrying ARG contained at least one phage-structure gene (e.g. phage baseplate, capsid, portal, tail, tail fibre, tail sheath, tail assembly, head-tail connector and tail tape measure) according to the RefSeq annotation. Although this criterion cannot perfectly distinguish true prophages from false ones, it allows us to split the predicted prophages into those enriched with true prophages and those enriched with false prophages, allowing us to probe the precision of the prophage prediction. The result of the examination shows that 24 out of 29 (i.e. 83% of) chromosomal prophages carrying ARGs contained phage-structure genes, whereas seven out of 18 (i.e. 39% of) plasmidic prophages carrying ARGs contained phage-structure genes (electronic supplementary material, table S6). This result means that 28 out of royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 290: 20231088 33 (i.e. 85% of) OGARGs in chromosomal prophages are in prophages carrying phage-structure genes, whereas nine out of 21 (i.e. 43% of) OGARGs in plasmidic prophages are in prophages carrying phage-structure genes (table 1; note that the number of prophages carrying ARGs is smaller than that of prophageborne ARGs because one prophage can carry multiple ARGs). This result suggests that the precision of VIBRANT is acceptable for chromosomal prophages.
During the manual examination, we noted that a subset of plasmidic prophages carrying ARGs also carried genes encoding integron integrases according to the RefSeq annotation (electronic supplementary material, table S6). Moreover, these prophages have a higher frequency of lacking phage-structure genes than those without integron integrase genes (IIGs, for short). Specifically, seven out of ten (i.e. 70% of) prophages carrying both ARGs and IIGs lack phagestructure genes, whereas nine out of 37 (i.e. 24% of) prophages carrying ARGs and lacking IIGs lack phage-structure genes. VIBRANT predicted that the genes annotated by RefSeq as IIGs matched phage integrase genes, which are a typical component of phage genomes. However, the RefSeq annotation implies that proteins encoded by these genes are more similar to integron integrases than phage integrases because the RefSeq annotation considers a broader set of protein families than does VIBRANT [43,54]. Therefore, the above result suggests that about half of the prophages carrying ARGs and lacking phage-structure genes might have arisen from the misidentification of integrons as prophages (see also [55]).
(b) Prophages carry significantly more virulence factor genes than antibiotic resistance genes relative to genomic backgrounds in three species To compare the frequencies of ARGs to those of VFGs in chromosomal prophages, we used BLASTP [46] to search bacterial genomes for VFGs collected from VFDB [38], Victors [39] and PATRIC_VF [40] (Methods). We then counted the number of orthologous groups of VFGs (OGVFGs, for short) encoded in the predicted prophages (table 1; electronic supplementary material, table S7). The number of OGVFGs cannot directly be compared to that of OGARGs because we have no a priori reason to expect that bacteria possess an equal number of VFGs and ARGs. Thus, we instead compared the relative frequencies of OGVFGs and OGARGs against the genomic background (see electronic supplementary material, figure S1 for the illustration of what we did). Specifically, we performed binomial tests under the null hypothesis that the relative frequencies of OGARGs and OGVFGs in the chromosomal prophages of each species are the same as those of all OGARGs and OGVFGs in the genomes of the respective species. In this test, we included prophages carrying ARGs and lacking phage-structure genes for fairness because prophages carrying VFGs were too numerous to be manually examined (however, we found that none of the prophages carrying VFGs carried IIG, which suggests that the prophages carrying VFGs do not contain false positives arising from integron misidentification). Also, we corrected p-values using the Holm-Bonferroni method to control the family wise error rate of all the statistical tests conducted in this study, unless otherwise stated [56,57]. The results of the tests indicate that the relative frequencies of OGARGs and OGVFGs in chromosomal prophages are significantly different from those of respective genomic backgrounds in the following three species (figure 1): E. coli (Gammaproteobacteria), S. enterica (Gammaproteobacteria) and S. aureus (Firmicute). In all these species, chromosomal prophages carry VFGs more frequently than ARGs relative to genomic backgrounds. The remaining 18 species, where significant biases were not detected, can be grouped into four categories: the one species where many prophage-borne VFGs and no prophage-borne ARGs were detected, but a bias towards VFGs was not statistically Therefore, if the prophage prediction tool missed prophages carrying ARGs or VFGs, those prophages are likely to be distinct from the currently known phages. Taken together, the above results suggest that prophages tend to carry ARGs less frequently than VFGs in the three species in which they carry a sufficient number of ARGs or VFGs. However, in many bacterial species, prophages either are hardly present or carry little ARGs and VFGs, resulting in sample sizes that are too small to detect any biases. One possible exception is S. pyogenes, where prophages carry many more VFGs than ARGs, although the bias is not statistically significant. Additionally, we tested the robustness of our results by repeating the aforementioned analysis for E. coli using PHA-STER [58] as described in electronic supplementary material, text S1.
(c) Plasmids carry more antibiotic resistance genes than virulence factor genes relative to genomic backgrounds in nine species To compare prophages to plasmids, we counted the number of OGARGs and OGVFGs encoded in plasmids ( , and the one species where plasmids carry more VFGs than ARGs, but a bias was not significant (B. anthracis). Taken together, the above results suggest that plasmids tend to carry ARGs more frequently than VFGs in the nine species in which plasmids carry a sufficient number of VFGs or ARGs. However, in many species, plasmids either are hardly present or barely carry VFGs and ARGs, providing sample sizes that are too small to detect any biases. One possible exception is B. anthracis, where plasmids carry many more VFGs than ARGs, although the bias is not statistically significant.
propages/genome   table S8). This result suggests that the range of functions encoded by prophage-borne VFGs is less diverse than that encoded by plasmid-borne VFGs, as described in more detail below.
Nearly half (22 out of 47) of the gene symbols assigned to prophage-borne VFGs ( prophage-borne VFG symbols, for short) are categorized as the effectors of type III secretion system (T3SS), the proteins secreted by the T3SS apparatus ( figure 3). The T3SS effectors are involved in the destruction of host cells or the modulation of host immune systems [59]. The other prophage-borne VFG symbols include those encoding exotoxins (e.g. Shiga toxins), immune modulation and phage-related components (e.g. phage tails) (figure 3). Phage-related components are represented by the largest number of prophage-borne OGVFGs (930 out of 1580) (electronic supplementary material, table S8). Despite their abundance, the specific function of these genes is largely unknown, although some may be involved in intestinal colonization [60,61]. In addition, two prophage-borne VFG symbols are involved in adherence to host cells ( figure 3). However, only one of them (porcine attaching-effacing associated protein, paa) is frequently located in prophages, whereas the other (E. coli common pilus chaperon, yagV/ecpE) is rarely located in prophages (electronic supplementary material, table S3). Taken together, the above results indicate that many prophage-borne VFGs are involved in causing direct damage to host cells or suppressing host immune systems, and a relatively small number of them are involved in host attachment.
By contrast, the gene symbols assigned to plasmid-borne VFGs (plasmid-borne VFG symbols) include a much wider range of functions, including T3SS effectors and apparatus, exotoxins, immune modulation, adherence and invasion to host cells, Type V secretion systems, enzymes (including exoenzymes), metal uptake, and gene regulation, antiaggregation (dispersin), biofilm formation, motility and so forth. The above results indicate that plasmid-borne VFGs Table 2. Numbers of genetically distinct plasmids per genome (s.d., standard deviation), and numbers of orthologous groups of antibiotic resistance genes (OGARGs) and those of virulence factor genes (OGVFGs) in genomes (total) and plasmids. Notwithstanding their comparatively small functional diversity, prophage-borne VFGs appear to play quantitatively no less important roles than those played by plasmid-borne VFGs in the virulence of E. coli. This is because the total number of prophage-borne OGVFGs is comparable to that of plasmid-borne OGVFGs (1580 versus 2342 OGVFGs; electronic supplementary material, table S8). Moreover, the total number of prophage-borne VFGs without clustering of orthologous genes is greater than that of plasmid-borne VFGs (10 965 versus 8020 VFGs; electronic supplementary material, table S8).
Another notable pattern observed in the above analysis is that almost all gene symbols (220 out of 232) are associated with either prophage-borne or plasmid-borne VFGs, but not both (electronic supplementary material, table S3). This result means that prophages and plasmids carry distinct  Taken together, the above results indicate that prophages and plasmids differ, not only in the functional diversity of VFGs they carry, but also in the specific types of VFG they carry within the same functional categories.

Discussion
The results presented above indicate that in the species where prophages and plasmids carry a sufficient number of VFGs or ARGs, they display opposite trends: prophages are biased towards carrying VFGs, whereas plasmids are biased toward carrying ARGs, relative to the genomic backgrounds. In the other species, plasmids or prophages either barely carry ARGs and VFGs or are nearly absent. Lastly, the comparison between prophage-borne and plasmid-borne VFGs shows that prophage-borne VFGs are functionally less diverse than plasmid-borne VFGs in E. coli. Taken together, these findings indicate that prophages and plasmids can carry different types of bacterial accessory genes.
Based on the results presented above, we formulate the following hypothesis to test for the future (figure 4). Temperate phages do not gain sufficient selective advantage from carrying ARGs because if their hosts are in danger, they can abandon their hosts and seek new hosts by undergoing lytic replication [18][19][20][21][22]. By contrast, both phages and plasmids can benefit from VFGs because VFGs can accelerate bacterial replication by making bacteria exploit their hosts more aggressively [17]. In addition, our results suggest that plasmids benefit from a wider range of virulence functions than that from which phages benefit in E. coli (figure 3; electronic supplementary material, table S8).
Besides the above hypothesis, the difference between prophages and plasmids might also be explained by the   Figure 4. Schematic drawing of hypothesis formulated based on our results. (a) Plasmids substantially benefit from carrying ARG because they cannot abandon their hosts and thus need to minimize death of bacteria. By contrast, temperate phages do not suffer substantial disadvantage even if they lack ARGs (i.e. do not critically depend on ARGs) because if their hosts are in danger, they can abandon their hosts and seek new hosts by entering lytic cycles. (b) Both plasmids and phages benefit from VFGs because VFGs can enhance growth of bacteria by causing bacteria to exploit their hosts more aggressively.
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 290: 20231088 interplay between the scope of natural selection and the host range of MGEs (an idea suggested by an anonymous reviewer). Antibiotic resistance might be beneficial to all bacteria in a community, causing community-wide selection. By contrast, virulence factors might be beneficial only to a subset of bacteria depending on their life-history traits [62], resulting in strainspecific selection. Interestingly, plasmids typically have broader host ranges than those of phages [63,64]. This alignment between the scope of selection and the host range of MGEs might contribute to the observed difference between prophages and plasmids in the distribution of VFGs and ARGs.
The results described in this study, however, do not necessarily mean that plasmids rarely carry VFGs or that prophages always carry VFGs. For example, in K. pneumoniae, plasmids carry ARGs more frequently than VFGs relative to the genomic background; nevertheless, plasmids carry a large number of VFGs (96 out of 372 orthologous groups), whereas prophages carry none (tables 1 and 2). A similar, yet less striking, pattern is also seen in E. faecalis. Given that many prophages are predicted in these species (table 1 and figure 1), the absence of prophage-borne VFGs is unlikely to be merely due to the limited sensitivity of the prophage detection tool.
Heterogeneity in the distributions of VFGs suggests that plasmids and prophages play variable roles in the pathogenicity of different bacterial species. For example, prophages carry many distinct VFGs in E. coli (1590 OGVFGs), S. enterica (64 OGVFGs), S. aureus (28 OGVFGs) and S. pyogenes (48 OGVFGs), whereas they hardly carry any VFGs in A. baumannii (0 OGVFGs), B. anthracis (0 OGVFGs), E. faecalis (0 OGVFGs) and K. pneumoniae (0 OGVFGs), even though prophages are frequently found in the genomes of all these species ( figure 1 and table 1). Similar heterogeneity is observed for plasmids: plasmids carry many distinct VFGs in B. anthracis (14 OGVFGs), E. faecalis (24 OGVFGs), E. coli (2452 OGVFGs), K. pneumoniae (96 OGVFGs) and S. enterica (46 OGVFGs), whereas they hardly carry any VFGs in A. baumannii (0 OGVFGs), even though plasmids are frequently found in the genomes of all these species (figure 2 and table 2). This heterogeneity might be related to species' different life-history traits [62], although further research is needed to explore this relationship. The heterogeneity of VFG distributions is in contrast with the situation for ARGs. In the species where many distinct ARGs are detected (e.g. A. baumannii, E. coli, K. pneumoniae, P. aeruginosa, S. enterica and S. aureus), sizable fractions of these ARGs (greater than 20%, or greater than 40% if P. aeruginosa is excluded) are carried by plasmids (figure 2 and table 2), a result that corroborates the notion that plasmids play a major role in the dissemination of ARGs [1][2][3].
The absolute numbers of prophage-borne ARGs reported in this study need to be interpreted with caution because the prophage prediction tool has limited sensitivity and precision. In particular, limited sensitivity implies that prophages carrying ARGs might have been missed, so the results do not necessarily indicate that prophages are truly devoid of ARGs in many bacterial species. However, the conclusion of this study does not directly depend on the absolute numbers of prophage-borne ARGs because it is based on the comparison between the relative frequencies of prophage-borne ARGs and VFGs and those of plasmidborne ARGs and VFGs. This comparison hinges on the assumption that the prediction of prophages carrying ARGs and VFGs are biased by common factors so that the biases are cancelled out.
To consider the question investigated in this study from another angle, we additionally examined the densities of nonclustered VFGs and ARGs per nucleotide in prophages, plasmids and genomes as described in electronic supplementary material, text S3, and figures S4 and S5.
Regarding the limitations of prophage prediction tools, it is pertinent to discuss a discrepancy between our result and the result of Kondo et al. [34] with respect to P. aeruginosa [34]. While we found no prophage-borne ARGs in P. aeruginosa, Kondo et al. [34] report that more than 10% of P. aeruginosa genomes possess prophage-borne ARGs [34]. The important difference between Kondo et al. [34] and our study is that they used different tools for prophage prediction: Kondo et al. [34] uses PHASTER [58], whereas we used VIBRANT [54]. To investigate the cause of the above discrepancy, we manually examined the 11 prophages described in Kondo et al. [34] that carry ARGs and are predicted as 'intact' by PHA-STER in P. aeruginosa [34]. We found that the examined prophages could be grouped into two categories (electronic supplementary material, table S9). In the first group (five prophages), both VIBRANT and PHASTER predicted prophages in almost the same genomic locations. However, PHASTER predicted longer genomic regions that included ARGs as prophages, whereas VIBRANT predicted shorter regions excluding ARGs. We do not know which prophage boundaries are more accurate. In the second group (six prophages), prophages were predicted only by PHASTER. These prophages, however, contained no phage-structure genes. Although they contained phage-related genes, such as integrase, transposase and protease, these genes are not exclusively associated with phages. Moreover, PHASTER annotated tellurium resistance proteins, TerD, as virion structural proteins in three prophages in the second group, which is likely to be an error. These findings suggest that the second group of the prophages could be false positives. Prophages predicted as 'incomplete' or 'questionable' by PHASTER are less likely to be true than those predicted as 'intact'. Taken together, the above results suggest that the frequency of prophage-borne ARGs in P. aeruginosa is potentially underestimated in our study and overestimated in Kondo et al. [34] owing to the limitations of the prophage prediction tools.
In interpreting the results obtained in this study, we assumed that ARGs and VFGs found within prophages were carried by phage genomes. However, these genes could have been inserted into pre-existing inactivated prophages (i.e. inserted after lysogeny). Although this possibility cannot be completely excluded, the following evidence suggests that not all ARGs and VFGs are inserted into pre-existing inactivated prophages. A previous study has shown that ARGs and VFGs are found in the genomes of temperate phages (which are thus not prophages) and that these genes are hardly found in the genomes of virulent phages [37]. This result would not be expected if all ARGs and VFGs were inserted into pre-existing inactivated prophages. More important, we do not have an a priori expectation that VFGs are more likely to be inserted into pre-existing prophages than ARGs. In the absence of such an expectation, our results are likely to be robust to post hoc insertions of ARGs and VFGs.
That phages do not possess ARGs does not necessarily mean that phages do not mediate the horizontal transfer of ARGs because they can mediate HGT even if their genomes do not contain ARGs. Phages mediate HGT through three known mechanisms: specialized, generalized and lateral transduction [24,65,66]. In specialized transduction (the focus of this study), a transferred gene constitutes a part of a phage genome [24]. By contrast, in generalized and lateral transduction, a transferred gene is originally encoded in bacterial DNA, which is encapsulated into phage particles and subsequently transferred to other cells [24,65,66]. Thus, phages can still mediate the horizontal transfer of ARGs even if their genomes do not contain ARGs.
In conclusion, the results presented above suggest that MGEs can differ in the functional categories of accessory genes they carry depending on their strategies of infection.
Data accessibility. The data are provided in electronic supplementary material [67].