Abstract
Plasmids are key drivers of bacterial evolution by transferring genes between cells via conjugation. Yet, half of the plasmids lack all protein coding genes for this process. We searched to solve this conundrum by identifying conjugative origins of transfer over thousands of plasmids and chromosomes of Escherichia coli and Staphylococcus aureus. We found that plasmids carrying these sequences are very abundant and have the highest densities of antimicrobial resistance genes. They are hyper-parasites that directly hijack conjugative or mobilizable elements, but not both. These functional dependencies explain the co-occurrence of each type of plasmid in cells and illuminate the evolutionary relationships between the elements. We characterized systematically the genetic traits of plasmids in relation to conjugation and alternative mechanisms of transfer, and can now propose a confident putative mechanism of transfer for ca. 90% of them. The few exceptions could be passively mobilized by other processes. We conclude there is no conundrum concerning plasmid mobility.
Introduction
Plasmids are extra-chromosomal DNA molecules and are key drivers of horizontal gene transfer between bacteria1, contributing to the spread of antimicrobial resistance, virulence factors, and metabolic traits2. They are horizontally transmitted by several processes3. Some plasmids can be transferred passively, i.e. without dedicated genetic determinants encoded in the plasmid, by natural transformation4, in vesicles5, or by transducing bacteriophages (phages)6. Some plasmids are also phages, phage-plasmids (P-P), and transfer by producing their own viral particles where they package their DNA7. Yet, conjugation is widely regarded as the major mechanism of plasmid transfer8.
Conjugation involves the recognition by the relaxase (MOB) of a small DNA sequence in the plasmid, the origin of transfer (oriT)9. The relaxase cleaves the oriT at the nic site and binds covalently to the single-stranded DNA. This nucleoprotein complex, named relaxosome, interacts with a type 4 coupling protein that connects it to the mating pair formation (MPF), including a Type 4 Secretion System (T4SS) that transfers the nucleoprotein complex to another cell10. Once the relaxosome has been transferred, the relaxase catalyzes the DNA ligation of the plasmid in the recipient cell to produce a circular single stranded molecule that is replicated by the replication machinery of the recipient cell9. At the end of conjugation there is one copy of the plasmid in each cell. Some conjugative elements remain in cells as plasmids whereas others integrate the chromosome as integrative conjugative elements (ICEs)11. The conjugation machineries of ICEs and plasmids are very similar and have intermingled evolutionary histories12.
Plasmids or integrative elements encoding the three functional elements - oriT, relaxase and MPF - may conjugate autonomously between bacteria. They are called conjugative8. However, plasmids encoding the MPF represent only ~1/4 of all plasmids. Those lacking an MPF but encoding a relaxase and oriT are called mobilizable. In this case, the relaxase interacts with the plasmid oriT, and the resulting nucleoprotein complex is transported by the MPF of a conjugative element co-occurring in the donor cell. Plasmids encoding a relaxase but lacking a complete MPF are as numerous as the conjugative plasmids8. This means that half of all plasmids lack a relaxase and an MPF. We will refer to them as pMOBless plasmids hereinafter. Even though pMOBless lack all proteins required for conjugation, there is epidemiological evidence that some of them transfer between cells13–15. The mobility of pMOBless may occur by several mechanisms: (1) they may have an oriT and be mobilized by a relaxase and an MPF encoded in-trans by a conjugative plasmid16; (2) they may interact with a relaxase of a mobilizable plasmid, and the nucleoprotein complex further interacts with an MPF of a third plasmid17; (3) or they may transfer using other mechanisms, e.g. conjugation through a rolling circle replication protein18, co-integration with a conjugative plasmid19, or the alternative transfer mechanisms mentioned above. Similar mechanisms could be used by integrative elements lacking a complete MPF, commonly named integrative mobilizable elements (IMEs)20.
The observation over a decade ago that slightly more than half of all plasmids lack genes for relaxases was paradoxical, because genetic mobility is thought to be necessary for plasmid maintenance in populations21,22. Of note, some pMOBless with an oriT (pOriT hereinafter) were shown to be mobilized by a conjugative plasmid decades ago17. Yet, the few available sequences of oriT have precluded systematic identification of these plasmids. Recently, pioneering studies on Staphylococcus aureus, a species that has unusually few conjugative plasmids and few types of oriT, showed that 50% of the pMOBless can be mobilized since they carry oriTs similar to those of pWBG74923 or pSK4124. Subsequent studies with three additional oriTs, suggested that oriT-based mobilization is common in this species25,26. If this is true for other species, including those with numerous conjugative plasmids, is not known. Unfortunately, most oriTs remain unknown, precluding their systematic study across bacteria. Here, we focused on S. aureus, for which plasmid diversity is low and well-characterized and Escherichia coli, the best described species of bacteria and one with numerous well-known plasmid families27. These two species are of particular importance because they are responsible for the greatest number of deaths associated to antimicrobial resistance in the world28, a trait that is spread by plasmids29. We first complement previous studies and test if ICEs could be involved in the mobilization of pOriTs in S. aureus. We also test if the same approach can be extended to E. coli. The confirmation that we can identify homologs of experimentally verified oriTs in the plasmids of these species paved the way to answer some outstanding questions. We don’t know how these plasmids contribute to the spread of functions across bacteria. We don’t know the functional dependencies associated with pOriTs, i.e. if they tend to be associated with one single conjugative plasmid or if they often require a third plasmid encoding a relaxase. We don’t know how these plasmids arose in natural history. We also ignore how the existence of pOriTs affects the patterns of co-occurrence of plasmids in cells. Finally, we would like to know how many plasmids remain without a hypothetical mechanism of transfer once pOriT plasmids and phage-plasmids are accounted for. By tackling these questions, this study contributes to unravel the mechanisms allowing plasmid mobility, while giving new insights into the mobility and evolution of oriT-bearing plasmids.
Results
E. coli and S. aureus have distinct plasmid repertoires
We analyzed the complete genomes available in RefSeq of E. coli (n=1,585) and S. aureus (n=581) to characterize the size and diversity of their plasmids. E. coli isolates carry almost three times more plasmids per genome than S. aureus isolates (t(2068.9)=20.65; p<2.2e-16) (Fig 1A). Moreover, E. coli plasmids tend to be larger (Kolmogorov-Smirnov test, D=0.586, p<2.2e-16) (Fig 1B) and with a higher GC% than S. aureus plasmids (t(1074.7)=191.23, p<2.2e-16) (Fig S1). They are also more diverse in terms of gene repertoires. E. coli plasmids encode on average four times more gene families than those of S. aureus (t(2817.9)=43.129, p<2.2e-16) (Fig S1). The plasmid pangenome of E. coli (11,530 gene families) is much larger than that of S. aureus (ca. 1,000), a trend that could be confirmed when comparing similar sampling sizes (455 plasmids) (Fig 1C). Overall, plasmids contribute with many genes to the species pangenomes. This is particularly striking in E. coli, where the plasmid pangenome is more than double the average size of a strain genome30.
We characterized the plasmids in terms of the protein coding genes involved in conjugation: pCONJ encode an MPF and a relaxase, pMOB encode a relaxase, and pMOBless lack a relaxase. In E. coli ~35% of the plasmids are pCONJ, ~25% pMOB, and ~40% pMOBless (Fig 1D). These values are close to previously published ones across Bacteria8. In contrast, only 4% of the S. aureus plasmids were classed as pCONJ, 18% as pMOB, and 77% as pMOBless. Hence, S. aureus seems a more atypical bacteria, where conjugative plasmids are rare. We then tested the hypothesis that ICEs could compensate for the paucity of conjugative plasmids in the species. We searched the chromosomes for loci associated with ICEs (encoding MPF and relaxase) and IMEs (encoding a relaxase), and found that 46% of the chromosomes of S. aureus encode MPF systems (Fig 1E). In contrast, conjugative systems were identified in only ~7% of E. coli chromosomes. Interestingly, many genomes in both species have either conjugative plasmids or ICEs, but rarely both. The integration of these analyses provides a more nuanced view of the differences between the species in terms of the fraction of genomes containing a conjugative element: ~52% of E. coli and ~47% of S. aureus (Fig 1F). While the precise delimitation of ICEs and IMEs is difficult and precludes systematic comparisons between elements in terms of gene content, these results suggest that the existence of ICEs could explain the mobility of some pMOBless, especially in S. aureus. In summary, the two species show different patterns in terms of the mobility of plasmids and integrative elements, but both still contain many plasmids lacking relaxases.
oriTs are frequent in plasmids of E. coli and S. aureus
To unveil the mechanisms of mobilization of the many plasmids lacking a relaxase, we searched for oriTs. To do so, we collected 51 oriT from the ‘oriT database’31 and added 40 new ones from the literature (Table S3). Most of these 91 experimentally validated oriTs (mean size ~131 bp) were originally identified and verified in plasmids of γ-Proteobacteria (n=44) and Bacilli (n=22) (Fig S2). We used it to search for origins of transfer in the 1,585 E. coli and 581 S. aureus genomes by sequence similarity (see Methods). We identified 2,831 putative oriTs in 2,626 plasmids, almost the totality of which locate in intergenic regions (Fig S3). Even if E. coli has more diverse plasmids and more types of oriTs (n=37) than S. aureus (n=7), oriTs were found at similar frequencies in the plasmids of the two species (ca. 70%) (Fig 2A). We also identified 336 oriTs in 282 chromosomes. These chromosomal oriT were much more abundant in S. aureus (25% of the genomes) than in E. coli (9%), in line with the higher frequency of ICEs in the former (Fig 2A). Although many oriTs were identified in both types of replicons, a given family tends to be present either in plasmids or in chromosomes (Fig 2B). To note, none of the oriTs was identified in both species.
Most oriT-encoding plasmids have just one oriT (~88%E. coli, ~85%S. aureus), although a few can have up to 5 (Fig S3). Expectedly, plasmids showing multiple oriTs tend to encode multiple relaxases (r(3868)=0.32, p<2.2e-16) (Fig S3). To study the plasmid size and the co-occurrence of oriTs and relaxases, we retrieved the families of oriTs identified in more than 10 plasmids. The oriTs of a given family are usually associated with plasmids of a specific size range, i. e., they tend to be associated to either small or large plasmids (Fig 2C). Yet, in a few cases, the families associated with large plasmids also include a few much smaller ones. Finally, the oriTs of a given family tend to be in plasmids with the same class of relaxases (Fig 2D). All things considered, the identification of oriTs in most plasmids, usually in a single copy, the strict association between the oriT and the MOB, and their identification in plasmids of homogeneous size, suggest that most oriTs we identified are true positives.
oriT-MOBless plasmids are abundant and usual carriers of antimicrobial resistance genes
We identified at least one oriT in more than 80% of pCONJ and pMOB (Fig 2E). Hence, the oriTs in our collection have homologous sequences in a very large fraction of the oriTs used by the conjugative plasmids of these species. Importantly, we found an oriT in 790 pMOBless. Hereinafter, we will refer to these oriT-carrying pMOBless as pOriT. pOriTs constitute 65% of S. aureus plasmids lacking relaxases and more than 40% of those of E. coli. These results are subject to caution. We cannot ascertain the functionality of all these oriT, even if they are homologous to experimentally verified sequences. More importantly, our analysis may still be missing oriTs, since even a few pCONJ lack an identifiable oriT. Despite these limitations, most plasmids have one and only one identifiable oriT, suggesting that we have identified most of them. If so, around half of the plasmids lacking relaxases are mobilizable by conjugation.
Due to the importance of E. coli and S. aureus as multidrug resistant pathogens28, we enquired on the role of their different plasmids in the spread of antimicrobial resistance genes (ARG). It has previously been found that conjugative plasmids tend to carry more ARGs than the other plasmids29. This is the case of pCONJ in E. coli (~64% of the genes) but not in S. aureus, where pOriTs carry most of these genes (~76%) (Fig 3A). Furthermore, the number of ARGs per kilobase is highest in pOriT in both species (Fig 3B). Interestingly, the plasmids with fewer ARGs, and lowest density, are those lacking both a relaxase and an oriT (presumably non-transmissible, pNT). These results show that plasmids lacking relaxases can be split in two categories, where those with an oriT have an important role in the spread of antibiotic resistance.
pOriTs exploit either conjugative or mobilizable plasmids
The identification of homologous oriTs allows to test functional dependencies between plasmids. We have previously proposed that relaxases of pMOB evolve to interact with multiple types of MPF encoded in pCONJ, whereas those of pCONJ co-evolve with the MPF to optimize their mutual interaction32,33. These differences might require the presence of different oriTs in pMOB and pCONJ, as previously suggested26. In our dataset, many families of oriTs are present in either pCONJ or pMOB, but few are present in both (Fig 4A). The exceptions tend to correspond to “pCONJ-like oriTs” (oriTs typical of pCONJ) that were found in large pMOB plasmids. We hypothesized that these might be decayed conjugative plasmids (pdCONJ)34. These elements have some MPF genes, but not enough to be functional, and seem to have been recently derived from pCONJ by gene deletion34. Hence, we split the pMOB into those encoding at least two MPF genes (pdCONJ) and the others. The pdCONJ are indeed 80% of the mobilizable plasmids with pCONJ-like oriTs. In contrast, pdCONJ do not have “pMOB-like oriTs” (oriTs typical of pMOB) (Fig 4A). After this analysis, only three oriTs remained in a significant fraction of both pCONJ and pMOB (excluding pdCONJ): oriTpKL1, oriTpWBG749, and oriTpSK41. We then enquired on the possibility that ICEs or IMEs show similar trends. Since we ignore the limits of these elements, we cannot properly assign them an oriT. Yet, we can analyze if certain oriTs are present in chromosomes encoding an ICE or/and an IME. Our results showed that indeed, oriTs tend to be associated with either ICEs or IMEs (Fig S4). We conclude that conjugative and mobilizable elements tend to use different oriTs.
A plasmid encoding only an oriT may either use the relaxase and MPF of a conjugative plasmid (if carrying a pCONJ-like oriT), or the relaxase of a mobilizable plasmid which in turn must use an MPF of a conjugative one (if carrying a pMOB-like oriT). In the first case, the pOriT could be regarded as a parasite of the conjugative plasmid, if its activity affects the fitness of the latter, whereas in the second case it is a hyper-parasite (a parasite of a parasite). One could expect that the most efficient strategy for a pOriT would be to take advantage of a unique plasmid rather than relying on the interplay between two other elements. However, since pMOB are often able to interact with multiple pCONJ, a pMOB-like oriT might allow a pOriT to have a higher chance of transfer under certain circumstances. Since the oriTs of pOriTs are homologous to those of conjugative or mobilizable elements (Fig 4B), we could infer the relations of dependence between pOriT and the other plasmids. We focused on E. coli plasmids for this particular analysis because they have a much wider diversity of oriTs for both pMOB and pCONJ. Interestingly, the frequency of pOriTs in E. coli with a pCONJ-like oriT (~56%) or a pMOB-like one (35%) is very close to the relative frequency of each of these types of plasmids in the species (Figure 4C). Hence, the relative frequency of each type of pOriT matches the relative frequency of the hijacked plasmids.
pOriT may originate from both conjugative and mobilizable plasmids
Given the large number of pOriTs, we enquired on their evolutionary origin. It was recently suggested that pMOBless may have derived from conjugative or mobilizable plasmids by gene deletion34. Since pOriTs have either a pCONJ-like or a pMOB-like oriT, we thought they might have emerged by gene deletion in ancestral pCONJ or pMOB while maintaining the oriT. To evaluate this hypothesis, we grouped the 3,869 plasmids into Plasmid Taxonomic Units (PTUs)27 and analyzed their mobility and oriT. Most plasmids in a PTU have the same type of mobility, reflecting the short evolutionary distances between plasmids in the same PTU. But even when they do not, they tend to have oriTs of the same family (Fig S5), suggesting that oriT family is more conserved than the mobility type.
To test the possibility that some pOriTs originated from conjugative plasmids, we selected two PTUs and explored the relation between the pOriTs and pCONJ within a PTU. We analyzed the PTU-Fe (IncF/MOBF/MPFF) (Fig 5) and the PTU-C (IncA/C2/MOBH/MPFF) (Fig S6). Most of the plasmids in these PTUs are pCONJ with a pCONJ-like oriT (oriTF and oriTpVCR94deltaX, respectively). Yet, both include a few other types of plasmids (e.g. pMOB, pOriT) that tend to be smaller than their pCONJ counterparts (PTU-Fe: F(481)=8.808, p=7.21e-07; PTU-C: F(37)=35.69, p=2.32e-09) while encoding the usual oriT of their PTU (Fig 5, Fig S6). This supports the idea that these replicons derived from conjugative plasmids by gene deletion. To further test this idea, we analyzed pairs of pCONJ/pOriT within the PTUs having similar gene repertoires (wGRR>0.75, see Methods). This analysis suggests that these pOriTs were generated by staggered degradation of the MPF system in pCONJ (Fig 5, Fig S6). Crucially, the derived replicons are likely to be able of in-trans conjugation because of the maintenance of their ancestral oriT.
We then selected two PTUs with a majority of pMOB (E1, E22) and analyzed them as above (Fig 6, Fig S7). Both include ColE1-like plasmids (ColRNAI/Col440I), associated to the MOBP and the pMOB-like family oriTColE1-like. As before, these PTUs include other types of plasmids, notably pOriTs and pNTs. The latter tend to be smaller (PTU-E1: F(200)=90.33, p=<2e-16; PTU-E22: F(35)=827.18, p=7.53e-08), again suggesting that they arose by deletion of the relaxases in ancestral pMOBs. As expected, most of the closely related pMOB/pOriT pairs have homologous oriTs, and their alignments further suggest that small pOriTs arise by the loss of the relaxase in pMOB plasmids (Fig 6, Fig S7). Interestingly, we identified a change of the oriT from one to another family in a subgroup of plasmids of the PTU-E1 (Fig 6). This subgroup of plasmids have the oriTpCERC7, an origin of transfer related to the pCONJ-like oriTR6435. This finding suggests that through recombination events, a family of pMOBless with pMOB-like oriTs can acquire an oriT typical of conjugative plasmids. Overall, these results show at the micro-evolutionary scale how pOriTs can derive by gene deletion from other types of plasmids.
Most plasmids may be mobilized by known mechanisms of transfer
Our results suggest that ~80% of E. coli and >70%S. aureus plasmids use an oriT to transfer by conjugation. To this, one may add other genetic elements that spur plasmid transfer (Fig 7A). Notably, some rolling-circle replication proteins (RC-Rep) act as replicative relaxases36. They interact with the MPF system of a conjugative element and trigger plasmid conjugation in an oriT-independent manner37. We searched for these proteins to test if this alternative pathway could be involved in the mobilization of plasmids lacking oriT and classical relaxases. We identified 225 homologs of RC-Rep proteins in 208 plasmids. These plasmids are frequent in S. aureus (~30%), but rare in E. coli (1.9%). As expected, there is an overrepresentation of RC-Rep in non-oriT pMOBless (χ2(4)=103.12, p<2.2e-16) (Fig S8). The unexpected abundance of RC-Rep in plasmids lacking an oriT suggests that such proteins could mediate the mobility of many plasmids in S. aureus.
Some plasmids can be transferred within viral particles. The propensity of a plasmid to be transduced cannot be predicted from its sequence. But ca. 6% of the plasmids are also phages (phage-plasmids, P-Ps)7, and encode viral particles, virion assembly packaging, and cell lysis. We identified 222 P-Ps in E. coli and 1 in S. aureus, which is consistent with the reported uneven distribution of P-Ps across bacteria7. P-Ps correspond to a third of the pMOBless without oriT in E. coli (n=216/702). In agreement with the idea that P-Ps provide an alternative mechanism of plasmid transfer, only six P-Ps encode conjugation-related elements (Fig S9). The latter are much larger (~175 kb) than the remaining P-Ps (~90 kb), and might be the result of co-integration events or assembly artifacts (Fig S9).
At the end of these analyses, we could assign a putative mechanism of mobility for most plasmids in each species. In E. coli, 80% of the plasmids were classed as conjugative or mobilizable by conjugation, and ~7% as P-Ps. In S. aureus, 90% were classed as conjugative or mobilizable by some type of conjugation and only 1 is a P-P. Hence, when one accounts for MPF, relaxases, RC-Rep, oriT, and P-Ps, few plasmids lack a hypothetical mechanism of transfer, i. e. few remain putatively non-transmissible (pNT) (Fig 7A): 13.7% in E. coli and 10.4% is S. aureus. We enquired on the possible mechanisms of mobility of the remaining plasmids. Around 50% of the E. coli pNTs are related to the large plasmid pO157 (PTU-E5) (Fig S10). These are well-known non-transmissible plasmids that have disseminated in E. coli O157:H739. The mechanisms of mobility of the few remaining plasmids (if any) remains unknown.
The distribution of the size of plasmids is bi-modal and associated with their type of mobility (Fig 7B). The mode associated with the largest plasmids is characteristic of pCONJ, but also found among certain pMOB and pOriT in both species. For the latter, we observed a shift of the peak to lower values of plasmid size. Similarly, the mode of the smaller plasmids is characteristically associated with pMOB, but is also found among pRCR and pOriT, with a shift of the peak to lower values of plasmid size. These small downwards shifts observed among pOriT and other plasmids are consistent with our hypothesis that they often originate from pCONJ or pMOB by gene deletion (Fig S11). The patterns for pNT are less clear. In E. coli they are shaped by the many large pO157-like plasmids, whereas in S. aureus they seem to follow the trends of pOriT, suggesting that maybe some oriT remain to be uncovered in the species.
Mobilization explains patterns of plasmid co-existence
The dependence of certain plasmids, e.g. pOriT, on others, notably pCONJ, for conjugative transfer means that the type of mobility of plasmids may affect the patterns of their co-occurrence in cells. We can now test this hypothesis by analyzing which plasmids tend to co-occur with others. The number of plasmids per genome is much more variable (and on average higher) in E. coli than in S. aureus. Hence, we concentrated on the E. coli data for this analysis. We identified the most common patterns of occurrence among the 1,207 plasmid-bearing E. coli genomes, focusing on pCONJ, pMOB, pOriT and pNT (Fig 8A). The most common pattern is the presence of only conjugative plasmids in the cell. The second and fourth most frequent patterns are a pair of pCONJ-pMOB and the triplet pCONJ-pMOB-pOriT. Interestingly, the third most frequent pattern is the single presence of MOBless pNTs, in contrast to the much rarer event of having single pOriTs in the cell. This further reinforces the idea that while MOBless pNTs are non-transmissible and vertically transmitted with their host cells, pOriTs co-transfer with co-existing elements within the cell.
If the pMOB and pOriT require a pCONJ to transfer between cells, one would expect that the frequency of each type of plasmids would vary with the number of plasmids per genome. Notably, genomes with few plasmids would tend to have more pCONJ and those with many plasmids would have progressively a larger fraction of other types of plasmids. Indeed, the frequency of pCONJ in E. coli is highest in genomes with a single plasmid and constantly decreases with the increase in the number of plasmids (Fig 8B). As expected, pMOB and pOriT show the inverse trend. These plasmids are rarely found alone in the genome and become increasingly frequent when cells contain more and more plasmids. The frequency of these plasmids is very high (75%) in genomes with more than 10 different plasmids. Hence, the relative frequency of each type of plasmid varies with the number of plasmids in the cell.
We showed above that some pOriTs may only require a pCONJ (since they have a pCONJ-like oriT), whereas others may require a pCONJ and a pMOB to transfer (pMOB-like oriT). The latter might be found preferentially in genomes with more plasmids, since they require a combination of two compatible plasmids to transfer. Indeed, while pCONJ-like pOriTs reach a frequency plateau in genomes with ≥7 plasmids, pMOB-like pOriTs increase steeply in frequency up to 10 plasmids/genome (Fig 8C). All these findings suggest that the functional dependencies of certain plasmids relative to others do shape the co-occurrence of plasmids in populations.
Discussion
To understand how so many plasmids could lack relaxases and still be present across distant strains, we searched for homologs of experimentally verified oriT, the only genetic element a plasmid needs in-cis for conjugation. The search of homologs of oriTs could result in misidentifications, but our observations suggest that most of the oriTs that we identified are correct. (1) While most plasmids have an oriT, most chromosomes lack them, in spite of their much longer sequences. (2) At least one oriT has been identified in most plasmids that were expected to have it (pMOB or pCONJ). (3) There are no cross matches between E. coli and S. aureus oriTs. (4) There are almost no cross matches between pCONJs and pMOBs, allowing to identify pCONJ-like and pMOB-like oriTs. (5) Most plasmids have one single oriT, and the others often have multiple relaxases, seem to be plasmid co-integrates, or have been already described40. (6) Almost all oriTs identified are located in non-coding regions. (7) There is a strict association between the oriTs and their associated relaxase family. (8) The oriTs were not found where they were not expected, e.g. in phage-plasmids that rely on alternative mechanisms rather than conjugation38, or in pO157-like plasmids, which are known to be non-conjugative39. Finally, previous work in S. aureus validated the identification of oriTs in plasmids25. These results suggest that we identified most oriTs (#1, #2, #5, #6), that false positives are probably rare (#1, #3, #4, #6, #8), and that associations between oriT and relaxases are reliable (#4, #5, #7). Hence, our oriT screening seems accurate. Yet, it’s likely that some oriTs remain to be identified, since some pCONJ and pMOB lack known oriTs (Fig 2E, Fig S12). Further work will be needed to identify these novel oriTs across bacterial species. That will require extensive computational analysis and experimental validation of the oriTs representatives.
The observation that pOriTs usually have oriTs from either pCONJ or pMOB, suggests that these elements have evolved to either hijack the relaxase of a conjugative or a mobilizable plasmid. The latter require a pCONJ themselves resulting in a complex succession of ecological dependencies (see below). These two types of pOriT could have arisen by gene deletion of pCONJ and pMOB, in which case the pOriT would have lost the genes encoding the relaxase (and the MPF in pCONJ) while keeping the ancestral oriT. This is consistent with the emergence of novel pOriTs in closely related plasmids within PTUs. More complex scenarios are also possible, e.g. the translocation of an oriT to a plasmid lacking one. The hypothesis of frequent pOriT genesis by gene deletion from pMOB or pCONJ is further supported by the analysis of the distribution of pOriT size which has two modes, each slightly smaller than the modes of pMOB and pCONJ (Fig 7B, Fig S11). We have proposed that a fraction of pMOB derived recently from pCONJ34. Our present results further suggest that a part of pOriT originated from either pCONJ or pMOB.
Why would plasmids evolve towards less autonomous mobilization, i. e. to depend on other plasmids for mobility? The oriT is a small non-coding sequence that may have little impact on bacterial fitness. In contrast, MPF systems and relaxases are costly and may hamper the successful vertical transmission of the plasmid41,42. This is why the genetic components of conjugative plasmids are usually repressed43 and occasionally lost44. Hence, the loss of protein-coding genes for conjugation may decrease horizontal transfer but increase the success of vertical transmission. In contrast, the loss of oriTs precludes horizontal transmission by conjugation without providing significant advantages for vertical transmission. Hence, the conditions that favor loss of conjugation-related protein coding genes may not favor the loss of oriT.
The decrease in horizontal transmission associated with the loss of protein-coding genes for conjugation resulting in pOriT depends on the frequency with which the latter co-occurs with a compatible pCONJ (and eventually also a pMOB). We observed that the frequency of pOriT with pCONJ-like and pMOB-like oriTs was in direct proportion of the frequency of the “helper” plasmids. The dependence of pOriT on the presence of other plasmids in the cell might suggest that pOriTs should evolve to have a pCONJ-like oriT and dispense the requirement for a pMOB. Notwithstanding, pMOBs are frequent and can often be mobilized by many different pCONJ32,33. We speculate that pOriT with pMOB-like oriTs have an advantage in certain cases over those with pCONJ-like oriTs in that pMOB may hijack many different pCONJ. In genomes with many plasmids the right combinations pMOB/pCONJ might not be rare and allow the transfer of the pOriT. Furthermore, if the mobilization of a pOriT and/or pMOB entails the co-transfer of the helper pCONJ as it has been suggested45, the pOriT will find in this novel host cell all the plasmids that are required for its subsequent mobility.
Independently of the reasons leading to the high frequency of the different pOriTs, their requirements for conjugation seem to shape plasmid distribution in cells. Large and small plasmids were previously found to co-occur more often than expected in bacteria46. Since large plasmids are often pCONJ and smaller ones are typically pMOB or pOriT, this fits our observations of co-occurrence of the different types of plasmids. Interestingly, pMOBs and pOriTs were particularly abundant in genomes bearing many plasmids, where the chances to find helper pCONJ are high. In contrast, pCONJ, which conjugate autonomously, are the most common plasmids in cells having one or a few elements. The simplest mechanism to explain these results is that these plasmids often arrive at the cell together, i. e. using the same mating event. But additional interactions may also contribute to further stabilize the presence of these plasmids in cells. For example, the cost of carrying small plasmids was smaller in a Pseudomonas strain already carrying a large plasmid46.
Our results suggest that the majority of plasmids are able to conjugate autonomously or by recruitment of functions from other plasmids. Considering classical and RCR-mediated conjugation, around 90% of S. aureus plasmids have the genetic elements needed to be horizontally transferred via conjugation. Notwithstanding, alternative mechanisms of plasmid mobility have been recently described. Among E. coli plasmids, there are 7% of phage-plasmids that can transfer within their own viral particles. In S. aureus, phage-plasmids are rare, but plasmids can be transduced by phages and their satellites47. Phages and satellites can transduce pieces of DNA of approximately the size of their own genomes. The size of the genomes of temperate phages matches the largest mode of the sizes of pMOBless and the size of the satellite genomes matches the smallest mode of these plasmids. It was proposed that plasmids were selected to have sizes compatible with transduction by phages and satellites, which explains the bi-modal distribution of plasmid sizes (Fig 7B)47. If correct, transduction by phages and their satellites would explain the enigmatic bi-modality of plasmid sizes, while gene deletions causing the transitions between pCONJ or pMOB to pOriT would explain why the latter tend to follow the size distribution of the former.
In summary, 9 out of 10 plasmids bear identifiable genetic elements that may mediate their horizontal transfer, most of them by conjugation. There are only ~10% plasmids lacking known genetic elements associated with horizontal transfer. Such plasmids may still occasionally be transferred through alternative mechanisms leaving little trace in the plasmid sequence, such as transformation or transduction. With this work, we provide strong evidence suggesting that there is no conundrum regarding the plasmid mobility, and provide new insights into alternative mechanisms of plasmid transfer.
Methods
Genome data
We retrieved from all the complete genomes available in the NCBI non-redundant RefSeq database in March 2021 (22,255 genomes, 21,520 plasmids) those of Escherichia coli and Staphylococcus aureus species. These resulted in a set of 1,585 genomes of Escherichia coli and 582 genomes of Staphylococcus aureus, including 3,409 and 462 plasmids, respectively. The accession numbers and further information on the plasmids is available in the Supplementary Table 1. The information on the chromosomes and the relevant data is available on the Supplementary Table 2.
Collection of the oriT database and its identification in the complete genomes
We built a collection of experimentally validated origins of transfer. First, we retrieved the 52 oriTs with a status ‘experimental’ from the already published oriT database by Li and collaborators31. We expanded this collection by consulting the literature, using as a query “oriT” in the PubMed database (available in September 2021). Among the 708 entries, we screened for experimentally validated oriTs not included in the aforementioned database. This resulted in the retrieval of 47 additional oriTs. However, 1 oriT from the published database and 7 oriTs from the literature were discarded from the collection as only the nic-site sequence was available. This resulted in a final dataset of 91 origins of transfer. Information on this collection is available in Supplementary Table 3.
We used BLAST, version 2.9.0+, to identify oriTs48. The complete genomes of E. coli and S. aureus were indexed with makeblastdb. Then, we used blastn to search for occurrences of each of the 91 oriTs (query) against the database of complete genomes. Due to the short length of the origins of transfer, blastn was used with the option -task blastn-short and an E-value threshold of 0.01 following the developer’s instructions. In cases in which two different oriTs were identified in the same region of a plasmid (overlapping), the oriT hit with the best E-value was retrieved.
We identified during this screening an exceptional case of a ~50 kb plasmid with 23 identical oriTs.This plasmid (NZ_CP019265.1) was discarded from further analysis as we considered it to be a sequencing artifact.
Characterization of conjugative systems and relaxases and plasmid classification on the mobility
We used the module CONJscan of MacSyFinder, version 2.049 to identify all the complete MPF systems. The individual hidden Markov model (HMM) hits that were not associated with MPFs deemed complete were used to identify incomplete MPF systems.
Relaxases were identified using HMMER version 3.3.250, and the HMM profiles employed by the software MOBscan51. We used the tool hmmsearch (default options) to screen for relaxases in all the proteins annotated in the dataset and kept the 2,195 significant hits with >50% coverage on the profile. A careful analysis of the results revealed that this version of the RefSeq annotations sometimes missed genes encoding relaxases, especially when these genes overlapped others (Fig S13). To correct for this artifact, we introduced a preliminary step of re-annotation to ensure a coherent annotation of the genes throughout all the genomes, which was then used to identify the MPF and the relaxases. For the annotation, we used the software Prodigal, version 2.6.352, with the recommended mode for plasmids and viruses to identify all open reading frames. Hits were then identified as mentioned above. When two different profiles matched the same protein, we kept the one with the lowest E-value.
Following the previous characterization, plasmids were classified in different mobility categories depending on their composition in terms of oriT, relaxase, and MPF genes. Plasmids encoding a putatively complete MPF system (including a relaxase) were considered to be conjugative (pCONJ). Plasmids encoding relaxases and lacking a complete MPF system were classified as mobilizable (pMOB). The remaining plasmids were classified as pMOBless, and were split into different categories: pOriTs when they had an oriT, phage-plasmids (P-Ps) when they were phage-related elements (see below) or presumably non-transmissible (pNTs) otherwise. In addition, some plasmids were classified as decayed conjugative plasmids (pdCONJ). These plasmids encode two or more MPF genes, but not enough to form a complete MPF system. Therefore, pdCONJ show a close evolutionary relationship with conjugative plasmids34, but are considered pMOB, pOriT or pNT in terms of mobility (Fig S14). Similarly, the loci encoding presumably complete MPF systems in chromosomes were classed as ICE (Integrative and Conjugative Element), even if often we ignore the precise limits of the element. Chromosomal genes encoding relaxases that were distant from genes encoding MPFs (> 60 genes) were classed as IME (Integrative and Mobilizable Element).
Identification of Rolling Circle Replication Proteins
For the identification of Rolling Circle Replication (RC-Rep) proteins involved in plasmid conjugation, we first retrieved the RC-Rep of the Staphylococcus aureus plasmid pC194 (NC_002013.1), a pNT plasmid known to be mobilized through in trans conjugation36. We used its Pfam profile53, Rep_1 (PF01446), to look for related RC-Rep proteins in all the plasmids of E. coli and S. aureus using the HMMER tool hmmsearch (default options, E-value < 0.001), version 3.3.250.
Identification of phage-plasmids
For the identification of phage-plasmids (P-Ps), we retrieved the E. coli and S. aureus P-Ps recently unveiled54. The database used in the cited work corresponds to the same RefSeq database (retrieved on March 2021). This way, we were able to identify 222 P-Ps among the 3,409 E. coli plasmids and 1 P-P among the 482 S. aureus plasmids.
Analysis of the pangenome of E. coli and S. aureus plasmids
The pangenome of the plasmid-encoded genes of E. coli and S. aureus was identified using the module pangenome of the software PanACoTa, version 1.3.155. Briefly, gene families were built with MMseqs2, version 13.45111, with an identity threshold of 80%. This is the typical threshold for the determination of the E. coli pangenome30. This way, the 227,428 plasmid-encoded proteins in E. coli were grouped into 11,530 gene families. In S. aureus, the 7,902 proteins were grouped into 1,010 gene families. Some plasmids were not used in the analysis because their annotations lacked protein coding genes: 32 of the 3,409 plasmids in E. coli (0.94%) and 20 of the 482 in S. aureus (4,15%). Rarefaction curves were performed with the R package vegan, version 2.5-657. The later package was additionally employed to infer the plasmid pangenome of S. aureus until matching the same sample size as E. coli following an Arrhenius model. Additionally, the Gleason model and Gitay model were used to extrapolate the rarefaction curves of the pangenome for S. aureus (Fig S15). Rarefaction curves were plotted with sample sizes increasing by a step of 100 plasmids.
Determination of sequence similarity between plasmids
We assessed sequence similarity for all pairs of the 3,869 plasmids using two different approaches.
To analyze very closely related plasmids, we classified them based on their average nucleotide identity (ANI) into the existing catalogue of Plasmid Taxonomic Units (PTUs)27. The clustering was performed using COPLA58, version 1.0 (default parameters).
To analyze more distantly related plasmids, we assessed the gene relatedness within and between PTUs, using the weighted Gene Repertoire Relatedness (wGRR)59. For this, we searched for sequence similarity between all the proteins identified in the plasmids using MMseqs2 (version 9-d36de)56, retrieving the hits with E-value < 10−4 and coverage > 50%. Best bi-directional hits (BBH) between pairs of plasmids were used to calculate the wGRR as previously described59: where Ai and Bi are the ith BBH pair of P total pairs; id(Ai, Bi) is the identity between the BBH pair; and min(A, B) is the number of genes encoded in the smallest plasmid of the pair. This way, the wGRR value varies between 0 (no BBH between the plasmids) and 1 (all genes of the smallest plasmid have an identical homolog in the larger one). The wGRR values were used to identify related plasmids between and within PTUs, setting the threshold in wGRR > 0.75 as previously described34. With this purpose, only plasmid pairs with wGRR > 0.75 were retrieved for visualizations, i.e. at least the 75% of genes encoded in the smallest plasmid are shared between the pair.
Clustering of the oriTs
We clustered the oriTs in families, by searching for sequence similarity between all pairs of oriTs in the reference dataset using blastn48 (Fig S16). BLAST was used with the option -task blastn-short and an E-value threshold of 0.01. Only matches with >80% identity and >70% coverage of the smallest oriT were kept for the clustering analysis. The clustering was performed with the hierarchical method available in the R package pheatmap, version 1.0.12 (default options)60. The clusters were named after well-known oriTs contained in the cluster: F-like, R6K-like, R64-like, ColE1-like, RP4-like and R46-like. The association of each oriT to their oriT family is available in the Supplementary Table 3 and Supplementary Figure 16.
Determination of antimicrobial resistance genes
For the identification of antimicrobial resistance genes encoded in the plasmid dataset, we used AMRFinderPlus61, version 3.10, with the default options. This tool combines BLASTP and HMMER to identify the 6,189 resistance determinants available in the NCBI Pathogen Detection Reference Gene Catalog (April 2022). The latter is the result of the curated merging of various widespread-used databases, including CARD62, and ResFinder63 databases, among others61.
Statistical analysis
Except where explicitly stated, all statistical analyses were done with R, version 3.5.2. Additionally, all visualizations were performed with the R package ggplot264, version 3.3.5, occasionally supported by the R packages ggsignif65, version 0.6.0 and ggridges66, version 0.5.3. For the construction and visualization of the networks, we used the R package igraph67, version 1.2.4.1 and the software Gephi 0.9.268, respectively.
Competing interests
The authors declare no competing interests.
Acknowledgements
We would like to thank Eugen Pfeifer for providing the wGRR and PTUs data, Fernando de la Cruz and Maria Pilar Garcillán Barcia for discussion along the years on plasmid mobility. Microbial Evolutionary Genomics Unit for scientific discussions. INCEPTION project [PIA/ANR-16-CONV-0005]. Fédération pour la Recherche Médicale [Equipe FRM/EQU201903007835]. Labex IBEID [ANR-10-LABX-62-IBEID]. HORIZON-MSCA-2021-PF-01-01 EvoPlas-101062386 to Manuel Ares-Arroyo.