Mosaic Plasmids are Abundant and Unevenly Distributed Across Prokaryotic Taxa

Mosaic plasmids, plasmids composed of genetic elements from distinct sources, are associated with the spread of antibiotic resistance genes. Transposons are considered the primary mechanism for mosaic plasmid formation, though other mechanisms have been observed in specific instances. The frequency with which mosaic plasmids have been described suggests they may play an important role in plasmid population dynamics. Our survey of the confirmed plasmid sequences available from complete and draft genomes in the RefSeq database shows that 46% of them fit a strict definition of mosaic. Mosaic plasmids are also not evenly distributed over the taxa represented in the database. Plasmids from some genera, including Piscirickettsia and Yersinia, are almost all mosaic, while plasmids from other genera, including Borrelia, are rarely mosaic. While some mosaic plasmids share identical regions with hundreds of others, the median mosaic plasmid only shares with 8 other plasmids. When considering only plasmids from finished genomes (51.6% of the total), mosaic plasmids have significantly higher proportions of transposases and antibiotic resistance genes. Conversely, only 56.6% of mosaic fragments (DNA fragments shared between mosaic plasmids) contain a recognizable transposase, and only 1.2% of mosaic fragments are flanked by inverted repeats. Mosaic fragments associated with the IS26 transposase are 3.8-fold more abundant than any other sequence shared between mosaic plasmids in the database, though this is at least partly due to overrepresentation of Enterobacteriaceae plasmids. Mosaic plasmids are a complicated trait of some plasmid populations, only partly explained by transposition. Though antibiotic resistance genes led to the identification of many mosaic plasmids, mosaic plasmids are a broad phenomenon encompassing many more traits than just antibiotic resistance. Further research will be required to determine the influence of ecology, host repair mechanisms, conjugation, and plasmid host range on the formation and influence of mosaic plasmids. Author Summary Plasmids are extrachromosomal genetic entities that are found in many prokaryotes. They serve as flexible storage for genes, and individual cells can make substantial changes to their characteristics by acquiring, losing, or modifying a plasmid. In some pathogenic bacteria, such as Escherichia coli, antibiotic resistance genes are known to spread primarily on plasmids. By analyzing a database of 8,592 plasmid sequences we determined that many of these plasmids have exchanged genes with each other, becoming mosaics of genes from different sources. We next separated these plasmids into groups based on the organism they were isolated from and found that different groups had different fractions of mosaic plasmids. This result was unexpected and suggests that the mechanisms and selective pressures causing mosaic plasmids do not occur evenly over all species. It also suggests that plasmids may provide different levels of potential variation to different species. This work uncovers a previously unrecognized pattern in plasmids across prokaryotes, that could lead to new insights into the evolutionary role that plasmids play.

functional knowledge of the genes involved to determine the level of identity 137 necessary to declare genes homologous. For these reasons, synteny is not well 138 suited to large studies across highly diverse sets of plasmids.

139
Parametric measures of gene composition can be applied on a large scale, 140 and are less affected by database biases. They are also prone to error, since GC 141 content and dinucleotide profiles vary even for genes that have been vertically 142 transferred for many generations. This is particularly true for comparisons to 143 smaller genetic elements (since fewer genes are available to establish the proper 144 baseline), increasing the false positive rate. Parametric methods also suffer from 145 false negatives, missing mosaic events where plasmids from similar genetic 146 backgrounds have exchanged DNA.

147
Sequence identity can be specific enough to confidently predict that a DNA 148 sequence has transferred between genetic contexts (15), but can only be used to 149 identify transfer events that have happened recently (i.e. with few or no subsequent 150 point mutations). Sequence identity comparisons thus have low rates of false 151 positives but high rates of false negatives, and are most effective in identifying 152 populations where DNA transfer is frequent relative to the base mutation rate.

153
In the research described below we use a sequence-identity-based, 154 quantitative definition of 'mosaic' to study mosaic plasmids across 8,592 plasmid 155 sequences from the RefSeq plasmid database maintained by the National Center for 156 Biotechnology Information (hereafter referred to as the NCBI plasmid database).
157 Using our quantitative, universal definition of mosaic plasmids, we compare the 158 fraction of plasmids that, using these conservative criteria, are mosaic across 159 different plasmid populations. By extending the concept of mosaic plasmids to the 160 population level, we can begin to ask questions about the forces that favor or 161 disfavor mosaic plasmids, and the implications of mosaic plasmids for important 162 prokaryotic traits such as the spread of antibiotic resistance. Our initial hypothesis 163 is that mosaic plasmids would be found across many prokaryotic clades based on 164 the fundamental mechanisms involved, and because they have been previously 165 observed in species from the Enterobacteriaceae (5, 7), Rhizobium (12), and Borrelia

166
(1) clades. We also hypothesize that mosaic plasmids would be most abundant in 167 the Enterobactericeae, where mosaic plasmids have been most frequently observed 168 and contribute to high levels of antibiotic resistance(6).

170
Abundance of mosaic plasmids. For our analyses, we used all 8,592 finished 171 or draft prokaryotic plasmid sequences in the NCBI plasmid database (Table 1) 172 except where noted otherwise. We define a plasmid as mosaic if it contains a region 173 of 500 bp or longer with 100% identity to the sequence of another plasmid (called a 174 mosaic fragment), and where those two plasmids have less than 93.90% global 175 sequence identity (see Methods section for details). For the purposes of this study, 176 we will refer to plasmids that pass our criteria as "mosaic" and those that do not 177 pass as "non-mosaic".  Figure 1A, only mosaic fragments over 5kb are shown). Most mosaic 184 plasmids share identity with other plasmids over less than 30% of their full length 185 (median = 26.93%), though over 600 mosaic plasmids share identity over more than 186 95% of their length ( Figure 1B). Of the links between mosaic plasmids, 82.71% are 187 between plasmids isolated from different species ( Figure 1C). Each mosaic plasmid 188 can have between 1 and several thousand links (median = 12, Figure 1D Figure 2A) and many had exchanged mosaic fragments with plasmids from 208 other incompatibility groups ( Figure 2B).

209
We again created a network to visualize the sharing of large (> 5 kbp) mosaic 210 fragments between plasmids from the same or different incompatibility groups 211 ( Figure 2C). This visualization shows that some of the mosaic fragments shared over  We identified resistance genes using the Resfams HMM database (19), and 278 calculated the proportion of genes on each plasmid that confer antibiotic resistance, 279 again limiting to plasmid sequences containing at least ten genes ( Figure 4B). We 280 found that mosaic plasmids (median = 0.0% of genes) contain significantly more 281 antibiotic resistance genes non-mosaic plasmids (median = 0.0% of genes), with p = 282 6 x 10 -186 (two-sided Mann-Whitney U test), supporting our hypothesis.

283
Analysis of highly shared mosaic fragments. We next turned to the question 284 of the identity of mosaic fragments shared between plasmids, starting by identifying 285 the number of occurrences of the mosaic fragments in the full NCBI plasmid 286 database. For this analysis, mosaic fragments must be greater than 500 bp long and 287 present in more than one plasmid. To focus on the mosaic fragments that are shared 288 most frequently, in cases where one mosaic fragment was a subsequence of another, 289 the larger mosaic fragment was reduced to the subsequence and combined with the 290 smaller mosaic fragment. The identified fragments range in size from 501 bp to 291 55,524 bp, with a median length of 1,146 bp. We enumerated the number of 292 occurrences of each mosaic fragment in the NCBI plasmid database, counting 293 instances where a mosaic fragment is present in multiple locations on the same 294 plasmid ( Figure 5A). We found that the clear majority of mosaic fragments occur in 295 fewer than 250 locations in the NCBI plasmid database, but that there are a few 296 highly abundant outliers (Table S2). Only 35.22% of mosaic fragments contain any 297 part of functional transposase sequence, and these fragments are similarly 298 distributed to the mosaic fragments without transposases ( Figure 5B). It is possible 299 this is an artifact of our requirement that sequences be 100% identical to be 300 considered the same mosaic fragment; for instance, if a mosaic fragment is moved as 301 a transposon and the transposase undergoes one or more point mutations while 302 adjacent sequences do not, it would not be considered part of the same mosaic 303 fragment. If we relax the mosaic definition to allow 99%-identical sequences to be 304 considered mosaic fragments, then the percent including some transposase 305 sequence increases to 35.69% ( Figure S2). Examining specific sequences revealed that the six mosaic fragments with 307 between 1,300 and 1,400 occurrences all contain part of the IS26 transposase (in 308 one fragment the transposase is annotated as incomplete). These mosaic fragments 309 often co-occur, but each occurs separately in at least one location in the NCBI 310 plasmid database (information on genetic context for these fragments in Table S2).