Abstract
Double stranded (ds) RNAs play essential roles in many processes of cell metabolism. The knowledge of three-dimensional (3D) structure, stability and flexibility of dsRNAs in salt solutions is important for understanding their biological functions. In this work, we further developed our previously proposed coarse-grained model to predict 3D structure, stability and flexibility for dsRNAs in monovalent and divalent ion solutions through involving an implicit structure-based electrostatic potential. The model made reliable predictions for 3D structures of extensive dsRNAs (≤78 nucleotides) with/without bulge/internal loops from their sequences, and the involvement of the structure-based electrostatic potential and corresponding ion condition can improve the predictions on 3D structures of dsRNAs in ion solutions. Furthermore, the model made good predictions on thermal stability for extensive dsRNAs over the wide range of monovalent/divalent ion concentrations, and our analyses show that thermally unfolding pathway of a dsRNA is dependent on its length as well as its sequence. In addition, the model was employed to examine the salt-dependent flexibility of a dsRNA helix and the calculated salt-dependent persistence lengths are in good accordance with experiments.
Introduction
RNAs play a pervasive role in gene regulation and expression. In addition to single stranded (ss) RNAs such as mRNAs and tRNAs, double stranded (ds) RNAs are widespread in cells and are involved in a variety of biological functions (1–3). For examples, small noncoding dsRNAs can play a critical role in mediating neuronal differentiation (4); dsRNA segments of special lengths can inhibit the translation of mRNA molecules into proteins through attaching to mRNAs (5,6); and dsRNAs of more than 30 base-pair (bp) length can be key activators of the innate immune response against viral infections (7). Generally, dsRNAs realize their biological functions through becoming partially melted or changing their conformations (2–9). Furthermore, the inter-chain interactions in stabilizing structures of dsRNAs are very sensitive to the environment (e.g., temperature and ion conditions) (10–14). Thus, a full understanding of dsRNA-mediated biology would require the knowledge of three-dimensional (3D) structures, structural stability and flexibility of dsRNAs in ion solutions.
The 3D structures of RNAs including dsRNAs can be measured by several experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy. However, it is still technically challenging and expensive to experimentally derive 3D structures of RNAs at high resolution and the RNA structures deposited in Protein Data Bank (PDB) are still limited (15). Therefore, as complementary methods, some computational models have been developed in recent years, aiming to predict RNA 3D structures in silico (16–23). The fragment assembly models (24–31) such as MC-Fold/MC-Sym pipeline (24), 3dRNA (25–27), RNAComposer (28) and Vfold3D (29,30) can successfully predict 3D structures of RNAs including even large RNAs at fast speed, however, these methods are generally based on given secondary structures and the limited known RNA 3D structures deposited in PDB. Although the fragment assembly method of FARNA (31) can predict 3D structures for RNAs from sequences, it could be only efficient for small RNAs due to its full-atomic resolution. In parallel ways, some coarse-grained (CG) models (32–40) such as iFold (41), SimRNA (42), HiRE-RNA (43) and RACER (44,45) have been proposed to predict 3D structures for RNAs with medium-lengths from their sequences based on knowledge-based statistical potentials or/and experiential parameters. However, these existing 3D structure prediction models seldom make quantitative predictions for thermodynamic stability and flexibility of RNAs.
Simultaneously, some models have been employed to predict thermodynamics of RNAs. Vfold2D/VfoldThermal (29,30) with involving thermodynamic parameters can make reliable predictions on the free energy landscape of RNAs including pseudoknots at secondary structure level. The model proposed by Denesyuk and Thirumalai (46,47) can well predict the thermodynamics of small RNAs, while such structure-based (Gö-like) model could not predict 3D structures of RNAs solely from the sequences. Although other models such as iFold (41), HiRE-RNA (43), oxRNA (48) and NARES-2P (49,50) may give melting curves of RNAs, there is still lack of extensive experimental validation for these models.
Furthermore, RNAs are highly charged polyanionic molecules, and RNA structure and stability are generally sensitive to solution ion conditions, especially multivalent ions such as Mg2+ (8,10–14). The role of ions in RNA structure and stability, especially the role of Mg2+ which is generally beyond the mean-field descriptions (51,52), is seldom involved in the existing 3D structure prediction models. To predict the 3D structures and stability of RNAs in ion solutions, we have developed a CG model with implicit electrostatic potential (53,54), and the model has been validated through making reliable predictions on 3D structures and stability of RNA hairpins and pseudoknots as well as the ion effect on their stability. However, the model at present version is only applicable for ssRNAs and the implicit electrostatic potential of ions is independent on RNA structures. Thus, the model cannot make reliable predictions on 3D structure and stability for dsRNAs and may not detailedly predict the ion effect on 3D structures of RNAs.
In this work, we further developed our previous three-bead CG model for ssRNAs to predict 3D structures and stability of dsRNAs from their sequences. Furthermore, in the new version of the model, an implicit structure-based electrostatic potential is introduced in order to capture the effect of ions such as Mg2+ on 3D structures and stability of dsRNAs. As compared with the extensive experimental data, the present model can predict the 3D structures, stability and flexibility of various dsRNAs with high accuracy, and the effects of monovalent/divalent ions on the stability and flexibility of dsRNAs can be well captured by the present model. Additionally, our further analyses show that on thermally unfolding pathway of a dsRNA is dependent on not only its length but also its sequence.
Model and methods
Coarse-grained structure model and energy function
To reduce the complexity of nucleotides, in our CG model, one nucleotide is represented by three beads: phosphate bead (P), sugar ring bead (C) and base bead (N) (53,54). The P and C beads are placed at the P and C4′ atom positions, and the base bead (N) is placed at N9 atom position for purine or N1 for pyrimidine; see Fig. 1. The three beads are treated as van der Waals spheres with the radii of 1.9 Å, 1.7 Å and 2.2 Å, respectively (53,54).
The potential energy of a CG dsRNA is composed of two parts (53,54)
The bonded potential Ubonded represents the energy associated with pseudo-covalent bonds between contiguous CG beads within any single chain, which includes bond length energy Ub, bond angle energy Ua and dihedral angle energy Ud:
The initial parameters of these potentials were derived from the statistical analysis on the available 3D structures of RNA molecules in PDB (http://www.rcsb.org/pdb/home/home.do), and two sets of parameters Parahelical and Paranonhelical were provided for stems and single strands/loops, respectively. Note that only Paranonhelical is used in folding process and both of Parahelical and Paranonhelical are used in structure refinement (53,54). The nonbonded potential Unonbonded in Eq. 1 includes the following five components
Ubp is the base-pairing interaction between Watson-Crick (G-C and A-U) and Wobble (G-U) base pairs. Ubs and Ucs are sequence-dependent base stacking and coaxial stacking interactions between two neighbour base pairs and between two neighbour stems, respectively. The strengths of Ubs and Ucs were derived from the combined analysis of available thermodynamic parameters and Monte Carlo (MC) simulations (53,54). Uexc represents the excluded volume interaction between two CG beads and it is modelled by a purely repulsive Lennard-Jones potential.
The last term Uel in Eq. 3 is a structure-based electrostatic energy for an RNA, which is newly refined to better capture the contribution of monovalent and divalent ions to RNA 3D structures. The electrostatic potential is treated as a combination of Debye-Hückel approximation and the counterion condensation (CC) theory (51–54) where rij is the distance between the i-th and j-th P beads, each of which carries a unit negative charge (-e). lD is the Debye length of ion solution. ϵ0 is the permittivity vacuum and ε is the effective temperature-dependent dielectric constant of water (53,54). The reduced negative charge Qi on the i-th P bead is given by where fi is the fraction of ion neutralization. In the present model, beyond the assumption of uniform distribution of binding ions along RNA chain in our previous model, fi is dependent on RNA 3D structure, and includes the contributions of monovalent and divalent ions
Here, x and 1 − x represent the contribution fractions from monovalent and divalent ions, which can be derived from the tightly bound ion (TBI) model (54–59). is the binding fractioonf v-valent ions, and is given by where N is the number of P beads. represents the average charge neutralization fraction of ions and the CC theory gives that (51–54) , where b is the average charge spacing on RNA backbone and lB is Bjerrum length. ϕi in Eq. 7 is the electrostatic potential at i-th P bead and can be approximately calculated by
Eqs. 5–8 show that the structure-based reduced charge fraction Qi needs to be obtained through an iteration process; see more details in Supporting Material.
The detailed descriptions on the CG energy function as well as the parameters for the potentials in Eqs. 1–3 can be found in the Supporting Material.
Simulation algorithm
To effectively avoid the traps in local energy minima, the MC simulated annealing algorithm is used to sample conformations for a dsRNA at given monovalent/divalent ion conditions. Based on the sequence of a dsRNA, two initial random CG chains can be generated and be separately placed in a cubic box, the size of which is determined by concentration of ssRNA. Generally, the simulation of a dsRNA system with a given ion condition is performed from a high temperature (e.g., 110°C) to the target temperature (e.g., room/body temperature). At each temperature, the conformations of the dsRNA are sampled by intra-strand pivot moves and inter-strand translation/rotation through the Metropolis algorithm until the system reaches enough equilibrium. In this process, the newly refined electrostatic potential Uel is involved (see Eq. 4), and Uel can only be obtained after an iterative process for Qi. In practice, Uel is renewed over every 20 MC steps, and generally, Uel can be obtained through ∼4 times iterations for converged Qi. Thus the increase in computation cost due to involving the newly refined Uel is negligible compared to the whole simulation cost. The equilibrium conformations of the system at each temperature can be saved to obtain 3D structures and structural properties of the dsRNA at each temperature.
Calculation of melting temperature
The stability of dsRNAs generally depends on strand concentration due to the contribution of translation entropy of melted ssRNA chains (60). However, for a dsRNA with low strand concentrations (e.g., 0.1 mM in experiments), a very long simulation time is generally required to reach equilibrium for the dsRNA system. To make our calculations efficient, the simulations for dsRNAs can be performed at relatively high strand concentrations (e.g., ∼10 mM for dsRNAs with length ≤10-bp and ∼1 mM for dsRNAs with length >20-nt) (43,61). Based on the equilibrium conformations at each temperature T, the fraction Φ(T) of unfolded state characterized as completely dissociated single stranded chain can be obtained at T. Since the small system of the simulation (two strands in a simulational box) can lead to a significant finite-size effect (62), the predicted Φ(T) needs to be further corrected to the fraction fh(T) of unfolded state at the high bulk strand concentration (62): where a=1 and 2 for nonself-complementary and self-complementary sequences, respectively (62). Afterwards, based on fh(T) at the high strand concentration, the fraction of unfolded state f(T) at an experimental strand concentration Cs (e.g., ∼0.1 mM) can be calculated by (63,64) where . Finally, the fractions of unfolded state f(T) can be fitted to a two-state model to obtain the melting temperature Tm (53,54,63), where dT is an adjustable parameter. More details about the calculation of melting temperature are given in the Supporting Material. For long dsRNAs whose unfolding can be non-two-state transition, we still used the above formulas to estimate their melting temperatures, in analogy to related experiment (65).
Results and discussion
In this section, the present model was first employed to predict 3D structures of extensive dsRNAs in monovalent/divalent ion solutions. Afterwards, the model was used to predict stability of extensive dsRNAs and the effects of monovalent/divalent ions, and further to analyze the thermally unfolding pathway of various dsRNAs. Finally the model was employed to examine the salt-dependent flexibility of a dsRNA helix. Our predictions and analyses were extensively compared with the available experiments and existing models.
Structure predictions for dsRNAs in ion solutions
Two sets of available dsRNAs (20∼78 nucleotides (nts)) were used in this work on 3D structure prediction. One set includes 16 dsRNAs whose structures were determined by X-ray experiments (defined as X-ray set), and the other set contains 10 dsRNAs whose structures were determined by NMR experiments in ion solutions (defined as NMR set). The PDB codes as well as the descriptions of the dsRNAs in two sets are shown in Tables 1 and S3, respectively. In the following, we first made predictions for the 26 structures of dsRNAs in X-ray set and NMR set at high salt concentration (e.g., 1 M Na+). Afterwards, we made predictions for the 3D structures of dsRNAs in NMR set at respective ion conditions.
Structure predictions for dsRNAs at 1M [Na+]
For 26 dsRNAs in X-ray set and NMR set, the 3D structures were predicted from sequences with strand concentration of 0.1 mM at high salt concentration (e.g., 1 M Na+), regardless of possible ion effects. In the following, we used a paradigm dsRNA (PDB code: 2jxq; shown in Table 1) to show the structure predicting process of dsRNA with the present model, which is shown in Fig. 1C. First, the energy of the system reduces with the decrease of temperature (from 100°C to room temperature) and the dsRNA folds into native-like structures (e.g., structure c in Fig. 1D) from an initial random configuration (e.g., structure a in Fig. 1D). Second, a further structure refinement (∼1.2 ×; 107 MC steps) is performed at the target temperature (e.g., room temperature), where the last predicted structure from the annealing process is taken as input and the parameters Paranonhelical of bonded potentials are replaced by Parahelical for the base-paired regions in order to better capture the geometry of helical stems (53,54). Finally, an ensemble of refined 3D structures (∼10000 structures) can be obtained over the last ∼1 ×; 106 MC steps, and these structures can be evaluated by the root-mean-square deviation (RMSD) values calculated over all the beads in predicted structures from the corresponding atoms in the native structures in PDB (66). As shown in Fig. 1C and Fig. 2, for the dsRNA of PDB code 2jxq, the mean RMSD (the averaged value over the refined structures) and the minimum RMSD (from the structure closest to the native one) are 2.1 Å and 0.8 Å, respectively.
Following the above process, the 3D structures of 26 dsRNAs including 15 dsRNAs with bulge/internal loops were predicted by the present model with the overall mean RMSD of ∼3.3 Å and the overall minimum RMSD of ∼1.8 Å; see Figs. 2 and 3 and Table S1 in the Supporting Material. This shows that the present model with coaxial/base stacking can reliably capture the 3D shapes of various dsRNAs including those with bulge/internal loops. For 13 dsRNAs with internal loops, the overall mean RMSD is ∼4.2 Å, which is slightly larger than that (∼3.3 Å) of all the 26 dsRNAs. This is because large internal loops generally contain non-canonical base pairs, which is ignored in the present model. For example, the dsRNA of PDB code 3wbm contains 2 internal loops with several non-canonical base pairs to keep the helix more continuous than the predicted one; see Fig. 2C.
Structure predictions for dsRNAs in various ion solutions
Since RNA structures can be strongly influenced by ions (10–14), we introduced the structure-based electrostatic potential in the present model to improve the 3D structure prediction for dsRNAs at the respective ion conditions. In the following, we first examined the structure-based electrostatic potential through the charge neutralization fractions fNa and fMg of Na+ and Mg2+ along an example dsRNA (PDB code: 2gm0) in mixed Na+/Mg2+ solutions. As shown in Fig. 3B, fNa and fMg appear dependent on Na+/Mg2+ concentrations as well as dsRNA structures: (i) as Mg2+ concentration increases, fMg increases and fNa decreases, due to the competition between Mg2+ and Na+ in binding to an RNA and lower binding entropy penalty for Mg2+ at higher Mg2+ concentration (10–14,55–59); (ii) fNa and fMg are larger at bent regions and appear less at two ends, which is attributed to the higher P beads charge density at bending regions and lower P beads charge density at two ends of the dsRNA. Therefore, the newly refined electrostatic potential (Eqs. 4–8) can capture the structure-based ion binding and the competitive binding between Na+ ions and Mg2+ ions to a dsRNAs.
To examine whether the involvement of the implicit structure-based electrostatic potential (Eqs. 4-8) and corresponding ion conditions can improve 3D structure prediction for dsRNAs, we further predicted the 3D structures for 10 dsRNAs in NMR set at their respective experimental ion conditions. For the 10 dsRNAs, the overall mean RMSD between predicted structures in ion solutions and the native structures is ∼3.7 Å, which is visibly smaller than that (∼4.3 Å) of the predictions at 1 M [Na+]; see Table 1. This suggests that the inclusion of structure-based electrostatic potential and corresponding ion conditions in this model can improve the predictions on 3D structures for dsRNAs in ion solutions. Furthermore, Table 1 also shows that such improvement appears more pronounced for the dsRNAs with internal loops and for the ion conditions containing Mg2+, e.g., mean RMSDs of the dsRNA of 2gm0 and 1tut decrease from 7.8 Å and 3.9 Å to 6.1 Å and 3.2 Å, respectively. This indicates that the newly refined electrostatic potential can effectively involve the RNA structure information and the effect of ions such as Mg2+.
Comparisons with other models
To further examine the present model, we made the extensive comparisons with three existing RNA structure prediction models: FARNA (31), RACER (44,45) and MC-Fold/MC-Sym pipeline (24). FARNA is a fragment-assembly model with high resolution for small RNAs (31). As shown in Fig. 4A, the average mean RMSD (∼3.9 Å) from the present model is very slightly smaller than that (∼4.1 Å) from FARNA. Afterwards, we made the comparison with a newly developed CG model RACER (44). As shown in Fig. 4B, the mean RMSD of the predictions from our model (∼2.6 Å) is slightly smaller than that (∼3.2 Å) from RACER. Furthermore, we made the extensive comparison with MC-Fold/MC-Sym pipeline (24), a well-established RNA 2D/3D structure prediction model with a web server (http://www.major.iric.ca/MC-Pipeline). We used the web server of MC-Fold/MC-Sym pipeline to predict the structures of all the dsRNAs involved in our structure prediction and chose the top 1 predicted structure to make comparisons with the present model. As shown in Fig. 4C, the average mean RMSD (∼3.3 Å) of the predictions from the present model is slightly smaller than that (∼3.8 Å) of the top 1 structures from MC-Fold/MC-Sym pipeline. Therefore, the present model can be reliable in predicting 3D structures of dsRNAs. Beyond 3D structure predictions, the present model can also predict stability and flexibility of dsRNAs in ion solutions.
Stability of dsRNAs in ion solutions
Stability of dsRNAs with various sequences
As described in the section of Model and methods, for a dsRNA with a given strand concentration, the melting curve as well as the melting temperature Tm can be calculated by the present model. For example, for the sequence (CGCG)2, the melting curve of the dsRNA with a high strand concentration of 10 mM can be predicted based on the fractions of unfolded state at different temperatures, and the melting curve as well as the melting temperature Tm of the dsRNA at low (experimental) strand concentration (0.1 mM) can be calculated through Eqs. 9-11; see Figs. 5A and B. As shown in Figs. 5A and B, the predicted Tm of the sample sequence (CGCG)2 with experimental strand concentration of 0.1 mM is ∼19.5°C, which agrees well with the corresponding experimental value (∼19.3°C). Furthermore, we further predicted the thermodynamic stability of 22 dsRNAs (4∼14-bp) with various complementary sequences; see Table 2. Here, dsRNAs are assumed in solutions of 1 M [Na+], to solely examine the stabilities of various dsRNAs and make comparisons with extensive experimental data (64,65,67–69). As shown in Table 2, Tm’s of extensive dsRNAs from the present model are in good agreement with the corresponding experimental data with the mean deviation ∼1.3°C and maximum deviations <2.5°C. Such agreement indicates that the sequence-dependent base pairing and base stacking interactions in the present model can well capture the stability of dsRNAs of extensive sequences and different lengths (64,65,67–69).
Thermally unfolding pathways of dsRNAs
Since intermediate states of RNAs can be important to their functions (1–3,10,70,71), we made further analyses on thermally unfolding pathways for different dsRNAs. To distinguish the possible different states of dsRNAs at different temperatures in our simulations, all the states for a dsRNA were more detailedly divided into unfolded sate (U, two disassociated single strands), possible hairpin state (H, with at least one hairpin), folded helix state (F, with the formation of all base pairs except for the two end ones) and partially folded helix state (P, other conformations besides U, F and H states).
As shown in Fig. 6, the unfolding pathways of dsRNAs are dependent on their length as well as sequences. For short sequences (≤6-nt), dsRNAs undergo the standard two-state melting transitions and there is almost no intermediate states such as P and H states; see Figs. 6A and B, which is consistent with the previous experiments (65). As chain length increases to ∼8-bp, P state begins to appear and can become visible at ∼Tm; see Figs. 6C and D. Figs. 6C and D also show that the unfolding pathways of dsRNAs with the same chain length but different sequences would be slightly different, e.g., the fraction of P state of (AACUAGUU)2 with end A-U base pairs is slightly higher than that of (CCAUAUGG)2 with end G-C base pairs. This is because the unstable end A-U base pairs can induce more notable P state than stable end G-C base pairs; see Fig. S1 in the Supporting Material.
For dsRNAs with more than 10 base pairs, their thermally unfolding pathways become more complex and interesting, since their single stranded chain may fold to hairpin structures. As shown in Figs. 6E and F for the dsRNA of (CCUUGAUAUCAAGG)2 and (AAAAAAAUUUUUUU)2, the fractions of F state are near unity at low temperature. As temperature increases, the dsRNAs begin to melt and H states would form from the melted ssRNAs with the maximum fractions near 0.2 and 0.5 for the two dsRNAs, respectively. At higher temperature, the dsRNAs almost become completely melted as U state. Notably, as shown in Fig. 6F, the unfolding pathway of (AAAAAAAUUUUUUU)2 predicted by the present model is very close to the corresponding experiments (65,68), suggesting that the melting pathways of dsRNAs can be well captured by our analyses with the present model. The difference on unfolding pathways between the two dsRNAs is attributed to the different sequences, i.e., G-C content, especially at two ends. Specifically, the fractions of states follow the order of F > U ≳ H > P for (CCUUGAUAUCAAGG)2 at ∼70°C, while for (AAAAAAAUUUUUUU)2 at ∼45°C, such order becomes H > F ≳ U > P. To reveal what determines the order, we calculated the stability for the states with Mfold (72). We found that the order of state fractions is in agreement with that of state stability. For example, the formation free energies for F, H and P states are ∼-0.5 kcal/mol, ∼0.2 kcal/mol and ∼1.5 kcal/mol for (CCUUGAUAUCAAGG)2 at ∼70°C, respectively. For (AAAAAAAUUUUUUU)2 at ∼45°C, the formation free energies for H, F and P states are ∼-0.8 kcal/mol, ∼-0.1 kcal/mol and ∼0.6 kcal/mol. This indicates that the unfolding pathway of dsRNA is dependent on the stability of possible states.
Although unfolding of long dsRNAs can be non-two-state transitions, in analogy to experiments (65), we can still estimate their melting temperatures by assuming strands completely disassociated state as U state (65); see Eq. 11 in section of Model and method.
Stability of dsRNAs with bulge/internal loops
Beyond the dsRNAs with complementary sequences shown above, the stability of other 8 dsRNAs with bulge/internal loops was examined by the present model. As shown in Table 3, for the dsRNAs with single/double bulge loops of different loop lengths (sequences 1-6) and the dsRNAs with internal loops (sequences 7 and 8), the mean deviation between the predicted Tm’s and the experimental data (73–79) is ∼2.6°C, which indicates that the present model with the coaxial stacking potential can roughly estimate the stability of dsRNAs with bulge/internal loops. However, such predictions especially for dsRNAs with long bulge/internal loops are not as precise as those for dsRNAs without loops. The detailed comparisons with experimental data show that, the predicted Tm’s for the dsRNAs with 1-nt bulge loop are slightly higher than experimental data, while the present model underestimates the stability of dsRNAs with longer bulge loops, which may suggest that the coaxial stacking potential Ucs involved in the present model may slightly overestimate the coaxial interaction strength while underestimates the coaxial interaction range. For the dsRNAs with internal loop (e.g., AA/AA or AAA/AAA), the present model underestimates their stability, which may be attributed to the ignorance of non-canonical base pairs in the present model (79).
Effects of monovalent and divalent ions
The thermal stability of RNA molecules is generally sensitive to the ionic conditions (10–14, 55–59). Particularly, Mg2+ is efficient in neutralizing the negative charges on RNA molecule and generally plays important role in RNA folding (14,57,59,80–83). However, most of the existing structure prediction models cannot quantitatively predict the stability of dsRNAs in ion solutions, especially in the presence of Mg2+. Here, we employed the present model to examine the stability for dsRNAs over a wide range of monovalent and divalent ion concentrations.
First, we examined the effect of monovalent ions on the stability of dsRNAs. As shown in Fig. 7A, for 5 dsRNAs with different sequences and lengths, the predicted melting temperatures Tm’s from the present model agree well with the experimental data (64,65,67–69,83) with a mean deviation <2°C over the wide range of [Na+]. As [Na+] increases from 10 mM to 1000 mM, Tm’s of the dsRNAs increase obviously, which is attributed to lower ion-binding entropy penalty and stronger ion neutralization for base pair formation at high [Na+]; see also Fig. S1 in the Supporting Material. Furthermore, Fig. 7A shows that the [Na+]-dependence of Tm is stronger for longer dsRNAs. This is because base pair formation of longer dsRNAs causes larger build-up of negative charges and consequently causes stronger [Na+]-dependent ion-binding.
Second, we examined the stability of dsRNAs in mixed monovalent and divalent ion solutions. As shown in Fig. 7B, for 3 different dsRNAs, the predicted Tm’s are in good accordance with the experimental data over the wide range of [Mg2+] (83). Fig. 7B also shows that there are three ranges in Tm-[Mg2+] curves: (i) at low [Mg2+] (relatively to [Na+]), the stability of the dsRNAs is dominated by the background [Na+] and Tm’s of the dsRNAs are almost the same with the corresponding pure [Na+]; (ii) with the increase of [Mg2+], Mg2+ ions begin to play a role and Tm increases correspondingly; (iii) when [Mg2+] becomes very high (relatively to [Na+]), the stability is dominated by Mg2+. Furthermore, it is shown that Mg2+ is very efficient in stabilizing dsRNAs. Even in the background of 110 mM Na+, ∼1 mM Mg2+ begins to enhance the stability of dsRNAs, and 10 mM Mg2+ (+110 mM background Na+) can achieve the similar stability to 1 M Na+ for dsRNAs; see sequences of CCUUGAUAUCAAGG and CCAUAUGG in Figs. 7A and B. This is attributed to the high ionic charge of Mg2+ and the consequent efficient role in stabilizing dsRNAs (57,59, 64,80–83).
Flexibility of dsRNA in ion solutions
DsRNAs generally are rather flexible in ion solutions due to the polymeric nature, and the flexibility is extremely important for their biological functions. Additionally, dsRNA flexibility is highly dependent on solution ion conditions (84–86). In this section, we further employed the present model to examine the flexibility of a 40-bp RNA helix in ion solutions. The sequence of the dsRNA helix is 5’-CGACUCUACGGAAGGGCAUCCUUCGGGCAUCACUACGCGC-3’ with 57% CG content in its central 30-bp segment and the other chain is fully complementary to it, which is selected according to previous studies (85). First, we predicted the 3D structures for the dsRNA helix from the sequence at 25°C, and afterwards, we performed further simulations for the dsRNA helix at various ion conditions based on the predicted structures. The enough conformations at equilibrium at each ion condition were used to analyze the salt-dependent flexibility of the dsRNA helix; see Fig. S2 in Supporting Material.
Structure fluctuation of a dsRNA in ion solutions
In the following, we first examined the structure fluctuation of the dsRNA helix through calculating end-to-end distance, RMSD variance and root mean square fluctuation (RMSF) at different [Na+]’s (84). As [Na+] increases, the end-to-end distance of the dsRNA helix decreases, e.g., from ∼125 Å at 10 mM [Na+] to ∼90 Å at 1 M [Na+], and simultaneously, the variance of end-to-end distance increases; see Figs. 8A and B. This indicates the stronger bending conformations and the higher bending fluctuation for the dsRNA helix at higher ion concentrations, which is attributed to the stronger ion neutralization on P beads charges and consequently the reduced electrostatic repulsion due to bending (80–88). The RMSD variance of the dsRNA helix at different [Na+]’s calculated based on the conformation-averaged reference structure also indicates that the dsRNA helix would become more flexible with the increase of [Na+]; see Fig. 8C. To examine local structure fluctuation, we further calculated the RMSF of the centers of each base pairs of the dsRNA helix at different [Na+]’s. As shown in Fig. 8D, the RMSF increases as [Na+] increases from 0.01 M to 1 M, which is because the stronger ion binding and charge neutralization on P beads enable the larger fluctuation of base pairs along the helix (88). Additionally, end effect contributes to an extra increase of RMSF at the two helical ends (88).
Persistence length of a dsRNA helix in ion solutions
Generally, the flexibility of a polymer can be described by its persistence length lp (89), and lp can be calculated by (90): where and and are the first and i-th bond direction vectors, respectively. b in Eq. 12 is the average bond length. According to Eq. 12, lp of the dsRNA helix can be obtained through modeling the dsRNA helix as a bead chain composed of the central beads of base pairs; see Fig. 8E. To avoid the end effect (88), the first and last 5 base pairs were excluded in our calculations and the bond vectors are selected as that over every 5 continuous base pairs (85).
As shown in Fig. 8F, the persistence lengths of the 40-bp dsRNA helix at different [Na+]’s predicted by the present model are in quantitative agreement with the corresponding experimental data (87). For example, the deviation of lp between prediction and experiments is less than ∼2 nm over the wide range of [Na+]. As [Na+] increases from 0.01 M to 1 M, lp of the dsRNA helix decreases from ∼70 nm to ∼50 nm. This is because more binding ions neutralize the negative P beads charges on the dsRNA helix more strongly and can reduce the electrostatic bending repulsion along the strands more strongly, causing stronger bending flexibility at high [Na+].
Conclusions
Knowledge of the 3D structures and thermodynamic properties of dsRNAs are crucial for understanding their biological functions. In this work, we have further developed our previous CG model by introducing a structure-based electrostatic potential and employed the model to predict 3D structures, stability and flexibility of dsRNAs in monovalent/divalent ion solutions. Our predictions were extensively compared with experimental data, and the following conclusions have been obtained:
The present model can well predict 3D structures from sequences for extensive dsRNAs with/without bulge/internal loops in monovalent/divalent ion solutions with overall mean RMSD <3.5 Å, and the involvement of the structure-based electrostatic potential and corresponding experimental ion conditions generally improves the structure predictions with smaller RMSDs for dsRNAs in ion solutions.
The present model can make good prediction on the stability for dsRNAs with extensive sequences over wide ranges of monovalent/divalent ion concentrations with mean deviation <2°C, and our analyses show that thermally unfolding pathway of a dsRNA is dependent on its length as well as its sequence.
The present model can well capture the salt-dependent flexibility of dsRNAs and the predicted salt-dependent persistence lengths are in good accordance with experiments.
Although our predictions agree well with the extensive experimental data on 3D structure, stability and flexibility of dsRNAs, there are still several limitations in the present model. First, despite that the structure-based electrostatic potential can efficiently capture the effects of monovalent/divalent ions on the structure, stability and flexibility of dsRNAs, the model was not examined for RNAs with more complex structures and the model cannot consider concrete ion distribution and specific ion binding around an RNA. Further development of the present model may need to involve the effect of ions through an implicit-explicit combined treatment for ions (52). Second, the model only involves canonical and wobble base pairing such as A-U, G-C and G-U base pairs, and ignores non-canonical base pairing, which could affect the predictions on the structure and stability of dsRNAs with internal loops. Further involvement of non-canonical base pairing would improve the prediction accuracy of the present model (79). Finally, the present model is a CG model, and it is still necessary to rebuild all-atom structures based on predicted CG ones. Nevertheless, the present model can well predict 3D structures, stability and flexibility of dsRNAs over the wide ranges of monovalent/divalent ion concentrations, and can be a good basis for further development for a predictive model with higher accuracy.
Acknowledgements
We are grateful to Professors Shi-Jie Chen (University of Missouri), Xiangyun Qiu (The George Washington University), Jian Zhang (Nanjing University) and Wenbing Zhang (Wuhan University) for valuable discussions. This work was supported by grants from the National Science Foundation of China (11575128, 11605125, and 11774272). Parts of the numerical calculation in this work are performed on the super computing system in the Super Computing Center of Wuhan University.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.
- 12.
- 13.
- 14.↵
- 15.↵
- 16.↵
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.↵
- 24.↵
- 25.↵
- 26.
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.
- 57.↵
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.↵
- 80.↵
- 81.
- 82.
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵