Evolutionary arms-races between host and parasitic RNA replicators in an artificial cell-like system

A crucial problem for primitive replicators before the origin of life is the appearance of parasitic or selfish replicators, which destabilize molecular cooperation and prevent the development of complexity. To date, theoretical and experimental studies have indicated that spatial structures, such as cell-like compartments, support sustainable replication of primitive replicators even in the presence of parasites. However, it is still a mystery how these host and parasitic replicators can evolve when they undergo long-term co-replication. Here, we investigated the coevolutionary process between an artificial RNA replicator (host RNA) and spontaneously appearing parasitic replicator (parasitic RNA) in artificial cell-like compartments. We performed a long-term replication experiment and found that the population dynamics of the host and parasitic RNAs gradually changed and novel types of hosts and parasites continued to appear. Competitive replication assays confirmed that parasite-resistant evolution of the host RNA and counter-adaptive evolution of the parasitic RNA occurred one after another. These results demonstrated that evolutionary arms races occur in this simple molecular system and generate a continuously-evolving molecular ecosystem. This coevolutionary process between host and parasitic molecules might play an important role in the open-ended evolution and development of complexity for primitive self-replication systems.


Introduction
Parasitic entities are ubiquitous in the biosphere. Viruses, for example, are the most prosperous biological replicators outnumbering cellular organisms and imposing ceaseless selection pressures on host organisms [1][2][3][4]. The competition between parasite infectivity and host resistance, often referred to as "arms race", can accelerate evolution and even allow the development of a sophisticated biological system, such as adaptive immunity [5][6][7]. Accumulating evidence highlights the key roles of parasites, including mobile genetic elements, plasmids, and viruses, in diversification of life and the major evolutionary transitions [8][9][10][11][12][13][14].
Understanding evolutionary processes between host and parasite is also crucial to understand the origin of life [15][16][17][18] and to construct artificial cells capable of evolution [19][20][21]. The emergence of parasitic replicators are believed to be inevitable and crucial in an evolving replicator system [6] as experimentally demonstrated in an artificial RNA replication system we have reported [22]. Theoretical studies have predicted that spatial structures, such as cell-like compartments, can allow survival of host replicators in the presence of parasites [23][24][25][26]. Recent experimental studies using the RNA replication system confirmed this; compartmentalization actually supported the survival of functional RNA replicators in the presence of spontaneously emerging parasitic RNA replicators [27][28][29]. Although these theoretical and experimental advances are revealing how simple molecular replicators survive under the threats of parasites, their coevolutionary process still remains to be elucidated.
In a previous study, we constructed an RNA replication system consisting of an artificial genomic RNA, which encodes the catalytic subunit of an RNA replicase originated from bacteriophage Q, and a reconstituted translation system of Escherichia coli encapsulated in water-in-oil droplets. In this system, the artificial genomic RNA (host RNA) replicates through the translation of the self-encoded replicase subunit. During replication, a deletion mutant of the host RNA (parasitic RNA), which lost the encoded replicase subunit gene, spontaneously appears and replicates by freeriding the replicase provided from the host RNA. In this replication system, host and parasitic RNAs compete for the host-provided RNA replicase in water-in-oil droplets and undergo repeated error-prone replication and natural selection processes [30]. In another study, we reported that the host and parasitic RNA replicators showed oscillating population dynamics, and a new type of the host RNA acquiring a certain level of parasite-resistance appeared through 43 rounds of replication experiment [27]. However, we have not observed the host-parasite coevolutionary process, e.g. counter-adaptative evolution of the parasitic RNA., in that short period of time.
In this study, to investigate co-evolutionary dynamics of the host and parasitic RNA replicators, we extended the replication experiment from 43 to 120 rounds. Oscillating population dynamics of the host and parasitic RNA replicators changed gradually, and novel types of host and parasitic RNAs continue to appear during the experiment. Competitive replication assays confirmed that parasite-resistant evolution of the host RNA and the counter-adaptive evolution of the parasitic RNAs occurred one after another, i.e. evolutionary arms-races occurred even in this simple molecular replication system. replicase by associating with EF-Tu and EF-Ts subunits in the translation system, while the parasitic RNA lacks the intact gene. The host RNA replicates using the self-provided replicase, whereas the parasitic RNA replicates relying on the host-provided replicase.
In this system, parasitic RNAs spontaneously emerges from the host RNA by deleting the internal replicase gene plausibly through nonhomologous recombination [27,32]. The parasitic RNAs reported previously have similar sizes (~200 nt). We refer to this size of parasitic RNA species as "parasite-α". The parasite-α replicates much faster than the original host RNA (~2040 nt) due to its smaller size, and thus inhibits the host replication through competition for the replicase. The replication with Qβ replicase is error-prone, about 1.0×10 -5 per base [33], and mutations are randomly introduced into the host and parasitic RNAs during the replication reaction.
Long-term replication experiment. In the previous study and also in this study, we performed a long-term replication experiment of the host and parasitic RNAs. The replication reaction was performed in water-in-oil droplets through repeating a fusion-division cycle with the supply of new droplets containing the translation system (Fig.1B). The single round of the experiment consists of four steps: 1) incubation, 2) partial removal, 3) dilution, and 4) mixing. 1) In the incubation process, the water-in-oil droplets were incubated at 37 °C for 5 h to induce the internal translation and RNA replication reactions. Note that we started with 1 nM of the host RNA without the parasite-α, but the parasite-α was detected within two rounds in most of the cases. 2) In the partial removal process, we randomly removed 80% of the water-in-oil droplets and 3) substituted with new water-in-oil droplets containing the cell-free translation system in the dilution process (i.e., 5-fold dilution). 4) In the mixing process, droplets were vigorously mixed with a homogenizer to induce random fusion and division among droplets to allow the mixing of RNAs and other components among droplets. We repeated this cycle for 43 rounds (215 h) in the previous study [27] and continued for additional 77 rounds (385 h) in this study.
Population dynamics of the host and parasitic RNAs. We measured the concentrations of the host and parasitic RNAs after every incubation process ( Fig.2A). The host RNA was measured using quantitative PCR after reverse transcription (RT-qPCR) and the parasitic RNA was measured from the band intensity after polyacrylamide gel electrophoresis (Supplementary Figure 1). In some rounds (7)(8)(9)(10)(11)(12)(16)(17)(18)(19)(20)(21)(22), and 75-84), the parasitic RNAs were under the detection limit (less than ~30 nM). Note that, in the previous study [27], we measured the parasitic RNA concentration from the replication kinetics using a purified Q replicase and the detection limit was lower, but this method cannot be employed here because it cannot distinguish different types of parasitic RNAs appeared in this study.
The population dynamics of the host and parasitic RNAs gradually changed throughout rounds ( Fig.2A). In the early stage (round 1 to ~35), the host RNA and the parasite-α showed relatively regular oscillation pattern caused by competition between the host and parasitic RNAs in compartments as reported previously [27]. In this regime, the concentrations of the parasite-α were higher than those of the host RNA where detected. In the middle stage (round ~35 to ~75), the concentrations of the host RNA increased and the oscillation pattern became irregular, suggesting that the host RNA acquired parasite resistance, partially confirmed in the previous study [27]. In the later stage (round ~95 to ~116), the concentrations of the host and the parasite-α further increased and the oscillation pattern became further unclear. In this regime, we observed the appearance of new parasitic RNA species of different sizes and classified them as parasites-β (~1000 nt, green squares) and -γ (~500 nt, purple diamonds) according to their sizes. We termed these new RNAs "parasites" because each clone of these RNAs does not replicate alone (Supplementary Figure 2). These continuously changing population dynamics can be caused by successive adaptation processes between host and parasitic RNA species.
Sequence analysis revealed that four major RNA species of different sizes existed in the long-term replication experiment, consistent with the band positions observed in the polyacrylamide gels: the host (~2040 nt), the parasite-α (~220 nt), the parasite-β (~1070 nt) and the parasite-γ (~510 nt). The sequences of all the parasitic RNA species shared a high degree of similarity with those of host RNAs but lacked a large part of the replicase subunit gene (Fig.2B). The parasite-α lacks the entire gene. The parasite-β lacks approximately 3' half of the gene and the parasite- further lost approximately 1/4 of the remaining 5' region of the gene. Both the parasite-β and -γ retain a part of the M-site sequence, one of the recognition sites for Qβ replicase, in the middle of the gene [34][35].
All four RNA species (host, parasite-α, -β, and -γ) contain various mutations that accumulated during the long-term replication experiment. We next investigated dominant genotypes of each RNA species at each round. Although the RNAs replication by Q replicase is error-prone and introduces many random mutations to produce quasi-species for each genotype, we focused on the consensus sequences that consist of mutations commonly found in multiple RNAs. We first identified 74 dominant (i.e., common) mutations that have accumulated more than 10% of the population in any sequenced rounds in any RNA species, including 60 base substitutions, 4 insertions, and 10 deletions in total (Supplementary Table 2). Next, we measured the frequencies of all genotypes composed of the combination of these 74 dominant mutations in each round for each RNA species. All the genotypes and their frequencies are shown in the supplementary data (Genotype frequency.xlsx).
We next investigated the relationships of the detected genotypes. To visualize evolutionary trajectories, we calculated Hamming distances between all combinations of the top-90 genotypes of all the host and parasitic RNA species in each round, and then plotted them in a single two-dimensional (2D) map by using Principle Coordinate Analysis (Fig. 2C-E). Note that we assigned zero distance for the large deletions between host and parasitic RNAs. A point represents each genotype, and the colors of points represent the rounds of their appearance, consistent with the colors of the markers in Fig. 2A The population of the parasite-α represented a distinct cluster from host RNA populations (Fig.2D) and most of the genotypes are connected with lines (i.e., one Hamming distance apart). The parasite-α did not show clear directionality throughout the long-term replication experiment (round-by-round data is shown in Supplementary  Figure 3). We also identified 18 unique mutations specific to the parasite-α (Supplementary Table 2), not found in the corresponding region of the host sequence, indicating that most of the new parasite-α genotypes were not newly generated from evolving host RNAs but the parasite-α underwent independent evolution.
The populations of the parasite-β and -γ formed distinct clusters (Fig. 2E) and most of the genotypes were closely related (connected with one Hamming distant lines) within each species. Sequences of some parasite-β and -γ were perfectly matched with some dominant host RNAs in the same rounds as connected with broken blue lines, suggesting that these parasitic RNAs originated from these host RNAs. This sequence similarity of the parasite-β and -γ to the coexisting host RNAs suggest that newly emerged parasite species possess similar phenotypic traits of evolved host species.
Competitive replication assay of the host and parasitic RNAs. The continuous appearance of new host genotypes and new parasite species can be a consequence of the coevolution between the hosts and the parasites to adapt to each other. To test this possibility, we performed a series of competitive replication assays by using three representative host and parasitic RNAs. As the representatives, we chose the most dominant genotypes at rounds 0, 99, and 115 for host RNA (Host-0, Host-99, and Host-115, respectively). For the parasite-α, -β, andγ, we chose the most dominant genotype for each at round 13, 99, and 115 (Parasite-α13, Parasite-β99, Parasite-γ115), respectively (sequences are available in Supplementary Text). We mixed a pair of the host and parasitic RNAs according to their order of appearance at an equivalent molar and performed competitive replication. The concentrations of replicated RNAs were measured by sequence-specific RT-qPCR (Fig. 3A). In the first pair (Host-0 vs Parasite-α13), Host-0 hardly replicated (less than 2-fold) and Parasite-α13 predominantly replicated (~200-fold), indicating that Parasite-α13 severely inhibits the original host replication, whereas in the second pair (Host-99 vs Parasite-α13), Host-99, efficiently replicated (~700-fold) with negligible replication of Parasite-α13, indicating that Host-99 acquired the resistance to Parasite-α13. In the third pair (Host-99 vs Parasite-β99), Host-99 still replicated efficiently (~1000-fold) but Parasite-β99, also replicated up to approximately 20-fold, indicating that Parasite-β99 acquired the ability to co-replicate with Host-99. In the fourth pair (Host-115 vs Parasite-β99), Host-115 repressed the replication of Parasite-β99 to less than 2-fold, indicating that Host-115 acquired the ability to evade co-replication of Parasite-β99. In the final pair (Host-115 vs Parasite-γ115), Parasite-γ115 again acquired the ability to replicate up to approximately 20-fold with Host-115. These results demonstrated that successive counter-adaptive evolution (i.e., evolutionary arms-races) occurred among the host and parasitic RNAs as schematically illustrated in Fig. 3B.

Discussion
The emergence of parasitic replicators is a large obstacle for a simple molecular replication system in the early stages of life forms. Here, we investigated the coevolutionary process of simple host and parasitic RNA replicators for the first time and found that the new types of host and parasitic RNAs continuously appeared one after another to produce diversified molecular ecosystem through evolutionary arms-races. This process stands in sharp contrast to the previous unidirectional and rapidly slowing down evolution in the absence of parasitic RNAs [30]. The results of this study implies that the existence of evolving parasites provides dynamically changing selection pressure and allows the host RNAs to continuously evolve and diversify, consistent with the consequence of coevolution between natural host and parasite organisms [8][9][10][11][12][13][14] and also with the theoretical study of primitive replicators [36]. This study demonstrated that even a simple molecular replicator has the ability to evolve continuously and diversify through coevolutionary processes with parasitic replicators. Coevolution and evolutionary arms-races with parasitic entities might play a key role in open-ended evolution of primitive self-replicators before the origin of life.
The molecular mechanism for the resistance of the evolved host RNAs to the parasitic RNAs and the following adaptation by parasites (Fig. 3) still remains unclear. One of the possible mechanisms is the change of sequence specificity of the encoded replicase. We previously reported that the replicase encoded in a host RNA at round 43 does not replicate a parasite-α [27] and here we observed that Host-99 and Host-115 with a different set of non-synonymous mutations (Supplementary Table 3) exhibited distinct parasite resistance in the competitive replication assays (Fig.3). The parasite resistance possibly comes from the specificity of the evolved replicases to additional recognition sites only existing in host sequences, which might explain why new parasitic RNAs (parasites-β and -γ) harbor a part of the host sequences. A probable recognition site is a branched stemloop in the M-site sequence, an important but not indispensable recognition site for Qβ replicase [34][35] and included both in parasite-β and -γ sequences (Fig. 2B). Although further investigation is required, these parasites-β and -γ were able to adapt to the parasite-α-resistant host RNA by acquiring the important part of the M-site from the evolved host sequence.
Coevolution with parasitic entities has been considered as one of the fundamental processes to allow the highly diverse living world. To understand coevolutionary processes in nature, many theoretical and experimental studies have been reported (reviewed in [5,[37][38][39][40]). A famous phenomenon in host-parasite coevolution is Red Queen dynamics, in which host and parasite populations oscillate due to persistent replacement of dominant hosts and parasites [41][42][43]. Similar oscillation dynamics were observed in our experiment ( Fig. 2A) with a remarkable feature of damped-oscillation like behavior. A study on Daphnia and its parasite [44] also reported damped long-term host-parasite Red Queen coevolutionary dynamics and suggested that the increased host diversity as a consequence of coevolution could decrease fluctuations in hostparasite Red Queen dynamics. Likewise, damped fluctuations and coexistence of the host-parasite population in the later stage of the replication experiment here might be partially attributed to the increased diversity ( Fig.  2C-E, Supplementary Figure 3), which allows competition among various types of host and parasitic species to average the population dynamics.
A remaining interesting question is what happens at the end of this evolutionary arms race between the host and parasitic RNAs. Coevolution has been proposed as a driving force for the evolution of complexity [14,36]. Also, a theoretical study predicted that the mutualism between hosts and parasites can appear during coevolution [45]. The RNA replication system used here provides a useful model system to experimentally demonstrate the possible contribution of host-parasite coevolution to the development of complexity in primitive life-forms.

Methods
A long-term replication experiment. In this study, we additionally performed 77 rounds of replication using the RNA population at round 43 of the previous experiment by the same method [27]. In this method, initially, 10 μL of the reconstituted E. coli translation system [31] containing 1 nM of the original host RNA (Host-0, the round 128 clone in [30] was mixed with 1 mL of the buffer-saturated oil prepared as described previously [30] using a homogenizer (POLYTRON PT-1300D; KINEMATICA) at 16,000 rpm for 1 min on ice. The water-inoil droplet was incubated at 37 C for 5 h to induce the protein translation reaction and the RNA replication reaction. For the next round of RNA replication, a fraction of the water-in-oil droplets (200 μL) was transferred and mixed with the new buffer-saturated oil (800 mL) and the new translation system (10 μL) using the homogenizer at 16,000 rpm for 1 min on ice, then incubated at 37 C for 5 h. The average diameter of the waterin-oil droplets was approximately 2 μm [27]. After the incubation step in each round, RNA concentrations were measured as described below. The composition of the translation system is described in the previous study [46].

Measurement of host RNA concentrations.
The water-in-oil droplets after the incubation step were diluted 10000-fold with 1 mM EDTA (pH 8.0) and subjected to RT-qPCR (PrimeScript One Step RT-PCR Kit (TaKaRa)) with primer 1 and 2 after heating at 95 C for 5 min. These primers specifically bind to the host RNA. To draw a standard curve in RT-qPCR, dilution series of the water-in-oil droplets containing the original host RNA diluted 10000-fold with 1 mM EDTA were used.

Measurement of parasitic RNA concentrations.
To determine the concentrations of the parasitic RNAs that appeared during the long-term replication experiment ( Fig. 2A), polyacrylamide gel electrophoresis was performed followed by quantification of the fluorescent intensities of the parasitic RNA bands using ImageJ. The water phases were collected from the water-in-oil droplets after the incubation step at each round and RNAs were purified with spin columns (RNeasy, QIAGEN). The purified RNA samples and dilution series of the standard parasitic RNA (S222 RNA [47]) were subjected to 8% polyacrylamide gel electrophoresis with 0.1% SDS in TBE buffer (pH 8.4) containing Tris(hydroxymethyl)aminomethane (100 mM), boric acid (90 mM), and EDTA (1 mM), followed by staining with SYBR Green II (Takara). The fluorescent intensities of the parasitic RNA bands were quantified and the concentrations were determined based on the standard curve drawn with the dilution series of the standard parasitic RNA bands.
Sequence analysis. The RNA mixture of round-13, 24, 33, 39, 43, 50, 53, 60, 65, 72, 86, 91, 94, 99, 104, 110, and 115 in the long-term replication experiment were purified with spin columns (RNeasy, QIAGEN). The purified RNAs were reverse-transcribed using PrimeScript reverse transcriptase (Takara) and primer 3 and then PCR-amplified using primer 3 and 4. The PCR products were subjected to agarose gel electrophoresis and the bands corresponding to the host cDNA and the parasitic cDNA were separately extracted by using E-gel CloneWell (Thermo Fisher Scientific). The host and the parasite-β and -γ were sequenced using PacBio RS Ⅱ with C4/P6 chemistry (Pacific Biosciences), and the parasite-α was sequenced using MiSeq (Illumina). To reduce read errors in the PacBio RS Ⅱ sequencing, we used circular consensus sequencing (CCS) reads at least comprising five and ten reads for the host and the parasites, respectively. All the sequence reads were subjected to sequence alignment with a reference sequence (the original host sequence) separately for each molecular species (i.e. the host, the parasite-α, -β, and -γ) using MAFFT v7.294b with FFT-NS-2 algorithm [48]. Frequencies of mutations were calculated for each round sample and 74 dominant mutations were identified (Supplementary Table 2). These 74 dominant mutations are located in 72 sites (i.e., a few mutations were introduced in the same sites). In the following analysis, we focused on only the genotypes consist of these 72 sites with 74 mutations.
Mapping dominant genotypes on a two-dimensional space. From the genotypes consisting of the 72 sites with the 74 mutations, the top-90 most dominant genotypes were identified for each host and parasitic species at each round. Hamming distances between all the pairs of the genotypes were calculated and a square distance matrix D, whose i,j-th component dij represents the square of the Hamming distance between the i-th genotype and the j-th genotype, was constructed. Using principal coordinate analysis on the square distance matrix D, the positions of each genotype were determined. The matrix D was transformed into a kernel matrix K = -1/2CDC, where C is the centering matrix. Let λk and ek ≡ (ek1, ek2, …, ekM) denote the k-th eigenvalue and the k-th normalized eigenvector, where λ1 > λ2 > … > λM and | ek | = 1 for all k and M is the dimension of the kernel matrix K. The eigenvalues and eigenvectors of the kernel matrix K were calculated and the i-th genotype was plotted in the two-dimensional space with a coordinate described as follows: Competitive replication assay of host and parasitic RNAs. The six plasmids, each containing T7 promoter and following cDNA sequences of Host-0, Host-99, Host-115, Parasite-α13, Parasite-β99, and Parasite-γ115, were constructed using the gene synthesis service of Eurofins Genomics. Each RNA were synthesized by in vitro transcription with T7 RNA polymerase (TaKaRa) from the plasmids digested with SmaⅠ according to a previous study [49]. We mixed 10 nM of each host and parasitic RNAs in the cell-free translation system and incubated 37 °C for 3 h. The concentrations of the host and the parasitic RNAs were measured by RT-qPCR (PrimeScript One Step RT-PCR Kit (TaKaRa)) with sequence-specific primers (Supplementary text).