Free Energy Landscape of RNA Binding Dynamics in Start Codon Recognition by Eukaryotic Ribosomal Pre-Initiation Complex

Specific interaction between the start codon, 5’-AUG-3’, and the anticodon, 5’-CAU-3’, ensures accurate initiation of translation. Recent studies show that several near-cognate start codons (e.g. GUG and CUG) can play a role in initiating translation in eukaryotes. However, the mechanism allowing initiation through mismatched base-pairs at the ribosomal decoding site is still unclear at an atomic level. In this work, we propose an extended simulation-based method to evaluate free energy profiles, through computing the distance between each base-pair of the triplet interactions (d1, d2 and d3) involved in recognition of start codons in eukaryotic translation pre-initiation complex. Our method provides not only the free energy penalty (ΔΔG) for mismatched start codons relative to the AUG start codon, but also the preferred pathways of transitions between bound and unbound states, which has not been described by previous studies. To verify the method, the binding dynamics of cognate (AUG) and near-cognate start codons (CUG and GUG) were simulated. Evaluated free energy profiles agree with experimentally observed changes in initiation frequencies from respective codons. This work proposes for the first time how a G:U mismatch at the first position of codon (GUG)-anticodon base-pairs destabilizes the accommodation in the initiating eukaryotic ribosome and how initiation at a CUG codon is nearly as strong as, or sometimes stronger than, that at a GUG codon. Our method is expected to be applied to study the affinity changes for various mismatched base-pairs.


Introduction
The translation reaction, or mRNA-dependent protein synthesis, is catalyzed by the ribosome, the macromolecular ribonucleoprotein complex [1,2]. During eukaryotic initiation, the ribosome dissociates into the large (60S) and small (40S) subunits, and the latter binds the methionyl initiator tRNA (Met-tRNA Met i ) and mRNA with the help of eukaryotic initiation factors (eIFs) [3,4]. Met-tRNA Met i is recruited by eIF2, a heterotrimeric factor that binds the tRNA in a manner dependent on GTP binding. The resulting ternary complex (TC) binds the 40S subunit in the context of multifactor complex (MFC) with eIFs 1, 3 and 5, forming the 43S preinitiation complex (PIC) [5]. The mRNA is bound by eIF4F through its 5' cap, which then is recruited to the 40S subunit through eIF3 in mammals and eIF5 in yeast [6]. The 48S pre-initiation complex (PIC) thus formed scans for the start codon in the process called scanning. The PIC is primed for scanning by the eIF5-catalyzed GTP hydrolysis for eIF2; the products, GDP and Pi, stay bound to eIF2 during scanning. The start codon base-pairing with the Met-tRNA Met i anticodon in the P-site allows the 40S subunit to stall at the start codon and then join the 60S subunit after most bound eIFs are released in conjunction with Pi release from eIF2 [7,8]. The resulting 80S initiation complex accepts an amino-acyl tRNA in the A-site to begin the translation elongation cycle.
The fidelity of start codon recognition is regulated by eIF1A and eIF1 that bind the 40S subunit A-site and P-site, respectively [9,10]. Essentially, these factors regulate the conformational changes of the PIC. Thus, the N-terminal tail of eIF1A interacts with the codon-anticodon base-pairs to stabilize its closed conformation for initiation [11]. By contrast, eIF1 stabilizes its open, scanning-competent conformation by physically impeding the P-site accommodation of the mismatched codon-anticodon base-bairs [5]. Upon start codon selection by the PIC, eIF1 is released to permanently stabilize the closed state [12,13]. The PIC-bound eIFs directly interact with eIF1 to control the balance of its binding and release, thereby keeping the level of initiation accuracy appropriate for eukaryotic cell function [6,[14][15][16]; for review, see [17].
Despite the aforementioned mechanisms to ensure the accurate initiation of translation, most eukaryotes tested allow translation initiation from near-cognate start codons (start codons with one-base substitution in the AUG codon), for example, CUG and GUG, at a low frequency [18][19][20][21]. However, not all the near-cognate start codons can initiate translation at an equal rate. Among the most puzzling is the observation that GUG serves as a poor initiation site, even though the G residue in the 1st position can potentially wobble-base-pair with the U residue in the anticodon. In agreement with a wobble base-pairing, GUG serves as a normal initiation site in prokaryotes (Bacteria and Archaea), which possess fewer initiation factors than in eukaryotes [17]. Moreover, CUG is considered as the strongest near-cognate start codon in eukaryotes, even though it is not a start codon in prokaryotes [17]. To solve these conundrums, it is crucial to probe the stability of codon-anticodon interactions in the P-site. As it is difficult to experimentally measure these interactions at an atomic resolution, computational analysis offers an effective solution [22,23]. Specifically, molecular interactions are represented by the free energy profiles, which in this case depend on the nucleotide constitutions of codon and anticodon, and thereby determine the frequency of the bound state. Thus, computational estimation of their free energy profiles provide an insight into the mechanism of translation initiation by various start codons.
Previous molecular dynamics (MD) simulation studies estimated energy gap between AUG and mismatched codons by computing free energy scores of interaction between these codons by free energy perturbation (FEP) [24]. The work identified CUG as the strongest near-cognate start codon. Moreover, it demonstrated eIF1's contribution in discriminating from near-cognate start codons. However, there are mainly two caveats in the FEP approach. First, except for the conformation of AUG found within the PIC structure, the geometry of near-cognate start codon was conjectured merely through substituting a base with the corresponding base in the AUG. Second, information on the binding process was missing. As single-strands of RNA are flexible [25], their structural changes may be diverse. Although the binding free energy is independent of the transition path and thus could be calculated by the FEP method, it is crucial to estimate structural changes involved in the binding process in order to understand the selective binding mechanism.
In this study, we employed adaptive biasing force (ABF) method [26][27][28] in order to overcome these problems. This method explores codon-anticodon binding free energy by using systematic reaction coordinates. By using distances of each triplet base-pair as the reaction coordinates, we generated the free energy profiles of base-pairing interactions between Met-tRNA Met i anticodon and cognate (AUG) or near-cognate (CUG or GUG) start codons. Our results provide structural insights related to a strong penalty for placing GUG codon in the P-site and a permissiveness of CUG as a potential start codon in eukaryotes.

Simulation Procedure
We referred to previous study of start codon recognition in eukaryotic translation initiation using all-atom molecular dynamics (MD) simulations [24]. To reconstruct codon-anticodon interaction in solution, we employed open pre-initiation complex (PIC) structure (PDB ID: 3J81) determined by CryoEM [11]. To reduce computational cost for MD simulation, we extract atoms within 25 Å from N1 atom in the middle base of anticodon in tRNA molecule [24]. Then, nucleotides were edited to reconstruct PIC models involving target codons in our study (Tab. 1). When editing the nucleotide (e.g. AUG → GUG), first, atoms except N1 and N9 in base group, sugar group, and phosphate group were deleted. Then, coordinates of missing atoms were inferred. All histidine residues were configured as ϵ-protonated. These molecules were soaked into a 36 Å radius water sphere ( Fig. 1), neutralized by K + , and added 150 mM KCl. TIP3P water model was employed. VMD [29] was used to infer missing atom coordinates, solvate the model, and visualize the structure throughout the study. All simulations were carried out using NAMD (version 2.13 multi-core) [30]. CHARMM36 force-field (July 2019 update) was used [31,32]. Multilevel summation method (MSM) electrostatics [33] was employed. Cutoff at 12 Å (with switching from 10 Å) was applied to non-bonded interactions. Temperature and pressure were set at 310 K and 1 atm, respectively; Langevin thermostat (damping coefficient: 5/ps) and Langevin-piston barostat were adopted. All C1' (nucleotide) and C α (amino acid) atoms farther than 22 Å from the center of the system (i.e. water sphere) were restrained at their initial positions, and water molecules crossing the boundary of water sphere (radius 36 Å) were restrained. Harmonic potential functions with spring constant 10 pN/Å were adopted as the restraint of molecules. After energy minimization (10,000 steps), the system was equilibrated for 10 ns, and then simulated for 1 µs (time-step: 2 fs); the biasing force was applied only after collecting 200 samples in the bin. Each model (Tab. 1) was simulated five times.

Multi Dimensional Adaptive Biasing Force (ABF) Method
We performed adaptive biasing force (ABF) molecular dynamics method [26][27][28] to evaluate multi-dimensional free energy profiles in terms of d 1 , d 2 , and d 3 , which were defined as the distances of the 1st, 2nd, and 3rd base-pairs in Å, respectively (Tab. 1). Specifically, these d i were evaluated as distance between the centers of hydrogen donor and acceptor atoms of codon and anticodon (Table 1 shows  The free energy profile G(d 1 , d 2 , d 3 ) with respect to three variables d 1 to d 3 was obtained through the analysis of ABF results. The probability of Gibbs free energy for each state P (d 1 , d 2 , d 3 ) obeys Eq. 1: was averaged over the simulation trials for each model, and then G(d 1 , d 2 , d 3 ) was evaluated as Eq. 2: Then, to evaluate the free energy difference between the codon-anticodon bound and unbound states, we defined the free energy scores G bound and G unbound , and their gap ∆G binding as Eq. 3: Here we defined the bound and unbound states as ∀i : 4.0 ≤ d i ≤ 6.0 and ∀i : 7.0 ≤ d i ≤ 9.0, respectively (Fig. 2). The distance range for the bound state corresponds to the codon-anticodon (AUG-CAU) structure [11], and that for the unbound state is based on a previous research that described unbound conformation of the complex [34]. G bound and G unbound were hence weighted average of  1 , d 2 ).
were defined in the same way.

Reconstruction of Averaged Structure
Typical structures, or atomic coordinates corresponding to a specific reaction coordinate (d 1 , d 2 , d 3 ), were obtained by averaging the sampled atomic coordinates as follows. Here, we assumed that the reaction coordinate ( Then, each atomic coordinate was averaged over all the snapshots (sampled at 10 ps intervals) corresponding to the representative reaction coordinate (d 1 ,d 2 ,d 3 ).  We evaluated the free energy scores of bound and unbound states (G bound and G unbound ; see Eq. 3) averaged over five simulation trials (Fig. S1), and presented their difference ∆G binding in Fig. 4. In the case of the cognate start codon AUG, ∆G binding must be negative to stabilize the initiation of translation, and was indeed ∼ −4 k B T . In contrast, the GUG codon, less frequently used as a start codon [17,20,21], showed a positive ∆G binding ∼ 2 k B T . For the CUG codon, which is considered as a stronger start codon than GUG [17,20,21], ∆G binding showed an intermediate value ∼ 1 k B T . Thus, ∆G binding accounts for observed initiation rates from the respective start codons. Note that ∆G binding could alternatively be obtained from individual simulation trajectories; assuming each trial as a sample, we confirmed the significance of ∆G binding between AUG and GUG by Welch's t-test (p = 0.044). Conformational changes inferred from the free energy landscape (Fig. 5). The transition path R • n (• is AUG or GUG) is shown by black arrows.

Free Energy Landscape for the Binding Process
To analyze the base-pair binding dynamics in detail, we constructed projected profiles G (d 1 , d 2 ), G(d 1 , d 3 ), and G(d 2 , d 3 ) from G(d 1 , d 2 , d 3 ) (see Materials and Methods), as shown in Fig. 5. We inferred the transition dynamics from the free energy profiles; Fig. 6 shows the suggested paths and their schematics.
In the case of the AUG start codon, the following transitions were expected in the AUG-CAU dynamics in equilibrium (Fig. 6). Starting from the bound state, d 3 shows large fluctuations while d 1 and d 2 show small ones (R AUG 3 ). d 1 and d 2 are bistable (bound and unbound), and once the 3rd G:C base-pair is broken (large d 3 ), the 2nd U:A base-pair may become unbound (R AUG 2 to the large d 2 state). Only after that, the 1st A:U base-pair dissociates (R AUG 1 to the large d 1 state); this is expected to occur less frequently due to the higher barrier than those for R AUG 2 and R AUG 3 .
Starting from the unbound state and reversing the process above, the AUG codon should bind to the CAU anticodon from the side of the 1st A:U base-pair, followed by the 2nd and then 3rd base-pairs (Fig. 6). This result suggests that the recognition of the 1st A:U base-pair is very important for the accurate start codon recognition, in agreement with the role and location of eIF1 in the P-site [11,12] (see Fig. 8 below).
In the case of the GUG codon, d 1 and d 2 show the transition (R GUG 1 ) between two distinct (metastable) states (both bound and both almost unbound) as shown by  G(d 1 , d 2 ) in Fig. 6, while d 3 is mostly high. Binding of the 3rd base-pair (transition to lower d 3 ) is possible but less frequent, and simultaneous binding of the 1st and 3rd base-pairs is rare (Fig. 5). As expected, the affinity of the 1st base-pair (wobble G:U) is lower than the case of AUG. This result is consistent with infrequent GUG initiation observed in the previous works [17,20,21].
In the case of the CUG codon, many meta-stable states were observed as shown in Fig. 5. Transition paths seem to be more complicated than the AUG and GUG cases. Although concurrent binding of the 2nd and 3rd base-pairs (lower d 2 and d 3 ) is possible, the 1st base-pair cannot form simultaneously with these other base-pairs (Fig. 5), which makes the CUG pairing unstable compared to AUG base-pairing. Overall, however, the binding free energy ∆G binding is lower for CUG than for GUG (Fig. 4) (see below). Note that, technically, the rugged free energy landscape (Fig. 5) demanded more computational cost for the ABF sampling, as suggested by the slow convergence shown in Fig. 3. To visualize the bound structures and consider the mechanism underlying the abovementioned results (Figs. 5 and 6), we evaluated the averaged structure of the bound state for each model (see Eq. 6 in Materials and Methods). The averaged structures and schematics of the codon and anticodon are shown in Fig. 7.

Binding Dynamics from the Structural Views
In the case of AUG, the averaged bound-state structure is ordered and tightly bound. It is reasonable, as it is the correct start codon, and the binding free energy is negative (Fig. 4). Note that eIF1 and eIF1A molecules (shown in red and blue in Fig.  7, respectively) are present near the AUG-CAU base-pair. It was experimentally suggested that these proteins contribute to the accurate start codon recognition [5,[9][10][11][12][13] (see Fig. 8).
By contrast, the structure of the GUG-CAU base-pair is disordered (Fig. 7). The mismatched bases (the 1st G:U) avoid each other (rather than forming a wobble base-pair) and the uracil in tRNA tilts toward the 2nd U:A base-pair. The directions of the 2nd and 3rd base-pairs were consequently affected, resulting in the unstable bound state (Fig. 4). Although the projected free energy profile G(d 1 , d 2 ) (Fig. 5) suggests cooperative binding of the 1st and 2nd base-pairs (Fig. 7), the 3rd base-pair is mostly separate, which may prevent the recognition of the GUG start codon.
In the case of CUG, the structure is relatively ordered (Fig. 7). Although the 1st C:U base-pair is mismatched, cytosine is smaller than guanine and adenine (purine bases), which may mitigate steric hindrance at the 1st position. As shown in Fig. 5, many meta-stable conformations are possible, which we propose to be attributed to combinations of bound and unbound conformations of the base-pairs. It is therefore reasoned that some near-bound states can occasionally allow translation initiation at this codon.

Discussion
The binding free energy ∆G binding evaluated by our method was qualitatively consistent with experimental observation, as described in Fig. 4. We compared the AUG start codon and two near-cognate start codons (Tab. 1). Assuming that G unbound is common for all the models, i.e. the free energy of the unbound state is independent of the codon, G bound and ∆G binding are equivalent. The difference, or penalty, of binding free energy (∆∆G) induced by AUG → GUG and AUG → CUG substitution was ≃ 6 k B T (3.6 kcal/mol) and ≃ 5 k B T (3.0 kcal/mol), respectively. This result is largely consistent with another computational approach using the free energy perturbation (FEP) [24].
In contrast to the previous work, however, our ABF-based approach provided not merely the binding free energy but information on the nucleic acid binding dynamics represented by the free energy landscape (Figs. 5 and 6). The free energy profiles shown in Fig. 5 suggested an unexpected stability of the 1st A:U base-pair, compared to the 3rd G:C base-pair. According to the free energy profile of AUG binding, dissociation of the triplet base-pairs starts at the 3rd G:C (Fig. 6, right column). In the open PIC model that is suggested to occur during the scanning process prior to start codon recognition [11], the tRNA is not perpendicularly attached to the mRNA, in contrast to the P-site tRNA positioning during the elongation phase. This conformation appears to allow the 5'-side (i.e. cytosine side) of the anticodon to curve away from the start codon, suggesting a stretching force towards the tRNA side ( Fig.  8(a)). We propose that this stretching decreases the affinity of the 3rd G:C base-pair during the scanning process ( Fig. 8(a)). In contrast, the affinity of the 1st A:U base-pair is likely to be increased by interaction with eIF1, so is that of the 2nd U:A base-pair by eIF1A, as proposed previously [5,[9][10][11][12][13] (Fig. 8(a)). In strong agreement with the role of eIF1 in stabilizing the 1st A:U base-pair, our averaged simulation structure indeed positions Asn-34 and Arg-36 in its proximity ( Fig. 8(b)). In fact, the residues Asn-34:Gly-35:Arg-36, termed β-hairpin loop 1, is absolutely conserved from yeast to human. Mutations altering Asn-34 and Arg-36 display significant increase in UUG initiation [35], in agreement with their crucial role in maintaining open scanning-competent PIC conformation.
The free energy landscape of GUG-anticodon base-pairs (Fig. 5) and its averaged simulation structure in the P-site (Figs. 7 and 8(b)) also suggest that the same structural restriction in turn prevents G:U pairing at the 1st position, that otherwise occurs frequently in its free form. The disordered 3rd G:C base-pair seen with the GUG structure appears to be consistent with this idea (Fig. 7). Since we did not observe a strong disorder in CUG-anticodon structure (Fig. 7), we propose that the near-cognate start codon usage characteristic of eukaryotic initiation is mostly explained by a strong perturbation on GUG accommodation in the P-site due to steric restriction imposed by eIF1 β-hairpin loop. In agreement with this thesis, the level of CUG initiation is just equivalent to that of GUG initiation in yeast S.cerevisiae [19] (and our personal observations), although the former is significantly stronger than the latter in various distinct contexts in human cells [21].
The adaptive biasing force (ABF) method adopted in this study was previously used to evaluate free energy profiles of molecular dynamics in other biomolecular systems [26][27][28]. Among many extended ensemble simulation methods, we chose ABF to mitigate the difficulties in studying nucleic acid (DNA and RNA) interactions. An example of such difficulty is encountered when high temperature is applied in replica exchange molecular dynamics [36] that causes complete separation of nucleic acid strands, which cannot revert to the original structure within the simulation timescale.
To overcome this problem, extended simulation methods without destroying the structure should be sought, and the ABF can offer a solution. We hope that further development of this approach can also contribute to the improvement of analysis and prediction of RNA structural dynamics [37,38].

Conclusion
In this study, we proposed a computational method to obtain multi-dimensional free energy profiles for codon-anticodon base-pairing by adaptive biasing force (ABF) molecular dynamics [26][27][28]. This reaction-coordinate-based analysis method provided the equilibrium profiles of base-pair binding dynamics depending on ribonucleotide sequence -start codon-anticodon base-pairing in this case. Our method successfully detected the changes in the free energy landscape (Figs. 5 and 6) induced by site-specific nucleotide substitution in the start codon (e.g. AUG → GUG) and offered a mechanistic explanation for how such changes led to perturbation in initiation frequency from the altered start codons.

Author Contributions
TK, KA, and YT conceived and designed the research, analyzed the data, and wrote the paper. TK and YT performed the numerical simulations.