Folding Rate of Protein and RNA Studied from Quantum Folding Theory

Starting from the assumption that the protein and RNA folding is an event of quantum transition between molecular conformations,we deduced a folding rate formula and studied the chain length (torsion number) dependence and temperature dependence of the folding rate. The chain length dependence of the folding rate was tested in 65 two-state proteins and 27 RNA molecules. The success of the comparative study of protein and RNA folding reveals the possible existence of a common quantum mechanism in the conformational change of biomolecules. The predicted temperature dependence of the folding rate has also been successfully tested for proteins. Its further test in RNA is expected.


INTRODUCTION
In a recent work Garbuzynskiy et al reported that the measured protein folding rates fall within a narrow triangle (called Golden triangle) [1].Simultaneously, Hyeon et al reported that RNA folding rates are determined by chain length.[2] Both protein and RNA are biological macromolecules.They may obey the same dynamical laws and a unifying folding mechanism is expected [3].We have proposed a quantum theory on protein folding [4] .Following the idea that the conformational change of biomolecule is essentially a quantum transition between conformational states we shall make comparative studies on two-state protein folding and RNA folding and give a unifying approach to find the folding dynamics of both molecules.
For a macromolecule consisting of n atoms there are 3n coordinates if each atom is looked as a point.Apart from 6 translational and rotational degrees of freedom there are 3n-6 coordinates describing molecular shape.The molecular shape is the main variables responsible for conformational change.It has been proved that the bond lengths, bond angles and torsion (dihedral) angles form a complete set to describe the molecular shape.As compared with chemical bond energy (typically in several electron volts) , the vibrational energy of bond length and bond angle (in the range of 0.4-0.03ev) and other forms of biological energies, the torsion vibration energy (about 0.03-0.003ev) is the lowest and therefore constitutes the slow variables of the molecular biological peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/021782doi: bioRxiv preprint first posted online Jul. 1, 2015; system.Moreover, different from stretching and bending the torsion potential generally has several minima with respect to angle coordinate that correspond to several stable conformations.Based on the idea that the molecular conformation is defined by torsion state and the folding/unfolding is essentially a quantum transition between them, through adiabatically elimination of fast variables we obtain a set of fundamental equations to describe the rate of conformational transition of macromolecule.[4] By use of these equations we have successfully explain the non-Arrhenius temperature dependence of the folding rate (the logarithm folding rate is not a decreasing linear function of 1/T)for each protein.Moreover, the statistical investigation of 65 two-state protein folding rates shows the fundamental equations are consistent with experimental data [5].
In the article, based on the rate equation deduced from quantum transition theory a unifying investigation on protein folding and RNA folding will be given.Firstly we shall deduce a relation between folding rate and chain length.Then we shall test the relation in 65 two-state protein dataset and use the relation to analyze the folding rate data of 27 RNA molecules.We shall also give the relation of folding rate with temperature and discuss its experimental implication.

MATERIALS AND METHODS
Datasets Recently Garbuzynskiy and coworkers collected folding rate data for 69 two-state proteins [1].Of the 69 proteins, the folding rates of 65 two-state proteins are obtained at around 25 °C.They constitute a dataset used by us to compare the theoretical vs. experimental results (Table S1 in the Supplementary data).Hyeon and Thirumalai collected the folding rates of 27 RNA molecules [2].They constitute the second dataset we shall use.In addition, the temperature dependence data of the folding rate for 16 proteins are given in Table S2 in the Supplementary data.

Theoretical model
() 2 where I j denotes the inertial moment of the j-th torsion and the torsion potential U tor is a function of a set of torsion angles {} j θθ = .Its form is dependent of solvent environment of the molecule.
where W means the rate of conformational transition at given temperature T and solvent condition, V I ′ is slow-variable factor and E I ′ fast-variable factor, N is the number of torsion modes participating in a quantum transition coherently, j I denotes the inertial moment of the atomic group of the j-th torsion mode (I 0 denotes its average hereafter), ω and ω′ are the initial and final frequency parameters j ω and j ω′ of torsion potential averaged over N torsion modes, respectively, δθ is the averaged angular shift between initial and final torsion potential, G ∆ is the free energy decrease per molecule between initial and final states, M is the number of torsion angles correlated to fast variables, 2 a is the square of the matrix element of the fast-variable Hamiltonian operator, or, more accurately, its change with torsion angle, averaged over M modes, should calculate the number of torsion modes N in advance.N describes the coherence degree of multi-torsion transition in the folding.For two-state protein folding we assume that N can be obtained by numeration of all main-chain and side-chain dihedral angles on the polypeptide chain except those residues on its tail which does not belong to any contact.A contact is defined by a pair of residues at least four residues apart in their primary sequence and with their spatial distance no greater than 0.65 nm.Each residue in such contact fragment contributes 2 main-chain dihedral angles and, for non-alanine and -glycine, it contributes 1 -4 additional side-chain dihedral angles (Table S3 in the Supplementary data).For RNA folding, we assume the quantum transition occurs between compact (yet disordered) intermediate and folding state [8] or between primary and secondary structures of the molecule [9].The torsion number can be estimated by chain length.Following IUB/IUPAC there are 7 torsion angles for each nucleotide, namely of which many have more than one advantageous conformations (potential minima).If each nucleotide has q torsion angles with multi-minima in potential then the torsion number N=qL, where L is chain length of RNA.

Obtaining a relation of the fast-variable factor E I′ with respect to torsion number N.
For protein (and RNA) folding or other macromolecular conformational change not involving chemical reaction and electronic transition the fast variable includes only bond lengths and bond angles of the macromolecule.In this case an approximate relation of the fast-variable factor E I′ with respect to torsion number N can be deduced.When the kinetic energy in (,;) In the above deduction of the second equality the fast-variable wave function is a torsion-energy-related parameter and The relationship of lnW with N given by Eq (11) can be tested by the statistical analyses of 65 two-state protein folding rates k f .We found that the theoretical logarithm rate lnW is in good agreement with the experimental lnk f (Fig 1).The correlation coefficient R has attained higher than 0.78.that occurs in rate equation (7) or (11).Set The linear regression between y and x is given by yABx =+ where A and B are two statistical parameters describing free energy distribution in the dataset.We will test the linear relation in protein folding dataset.Due to the ignorance of the accurate ρ-value for each protein one can test the relation by using the single-ρ-fit (assuming a single ρ-value to deduce a linear regression) at first, then compare the fitting results and find the best-fit ρ-value and the corresponding free-energy statistical parameter A and B .The statistical results (correlation R and parameter A and B) in two-state protein dataset are listed in Table 1.From Table 1 we find the correlation R between y and x is near to 0.8 for ρ= 0.065~0.075and reaches maximum at ρ= 0.069 where R=0.7966.Thus, by single-ρ-fit we obtain the best-fit statistical relation of free energy for two-state proteins as 4.306541.1 yx (Figure 2).In above discussion the single-ρ-fit has been used.Evidently, as the variation of ρ for peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/021782doi: bioRxiv preprint first posted online Jul. 1, 2015; different proteins is taken into account the linear regression between free energy combination y and torsion number x will be further improved.S1).Five proteins in the set denatured by temperature have been omitted in our statistics.
About the relationship of free energy ΔG with torsion number N two statistics were done in

LgBL σ +
)( L -the length of polypeptide chain) [1].By the statistics on 65 two-state proteins in the same dataset we demonstrated the correlation R between free energy and N-related quantity is 0.67 for the former and 0.69 for the latter [5], both lower than the correlation shown in

4.306,541.1 AB ==
Here const is an N-independent constant but dependent of molecular shape (due to the factor f contained in it).
The quantum folding theory of protein is usable in principle for each step of the conformational transition of RNA molecule.Although recent experiments have revealed the multi-stages in RNA collapse, the final search for the native structure within compact intermediates seems a common step in the folding process.In the meantime it exhibits strong cooperativity of helix assembly [3][7].In calculation of the transition from intermediate to native fold, as the collapse transition prior to the formation of intermediate is a fast process and the time needed for the latter is generally shorter than the former [7] , the calculation result can be directly compared with the experimental data of total rate.Moreover, for RNA folding the const term in Eq ( 14) is a real constant if the variation of the structure-related shape parameter f can be neglected in the considered dataset.By using N=qL (L is the chain length of RNA) we have Eq (15) is deduced from quantum folding theory and it predicts the relation of folding rate versus chain length: the rate W increasing with L, attaining the maximum at L max =B ' /D, then decreasing with power law L -D .In a recent work Hyeon and Thirumala [2] indicated that the chain length determines the folding rates of RNA.They obtained a good statistical relation between folding rates and chain length L in a dataset of 27 RNA sequences.Their best-fit result is 0.46 log14.31.15 H WL =−× (16) Both equations ( 15) and (16) give relation between RNA folding rate and chain length.Comparing the theoretical folding rates ln W or ln W H with the experimental folding rates ln k f in 27 RNA dataset the results are shown in Figure 3 .We find Eq (15) can fit the experimental data on RNA folding rate equally well as Eq (16).By using the best-fit value of B' and D the correlation between ln W (calculated from Eq 15) and ln k f is R=0.9729 (Fig 3a The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/021782doi: bioRxiv preprint first posted online Jul. 1, 2015; from Eq (16) (Table 2).It means the folding rate lowers down with increasing L as   1 in literature [2].
The errors of eight RNAs with lengths larger than 120 in 27-RNA dataset are analyzed in the table.The errors of the first RNA hairpin ribozyme are normalized to zero in two models.
There are two independent parameters in RNA folding rate Eq (15), B′ and D, apart from the additive constant.As seen from Fig 3a we obtain the best-fit D value D f =5.619 on the 27-RNA dataset, close to D=5.5 predicted from a general theory of quantum folding.Simultaneously we obtain the best-fit B′ value f B′ =61.63.The f B′ value derived from RNA folding can be compared with the B value from protein folding (Eq ( 14)).Notice that B or B′ =B/q represents the peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/021782doi: bioRxiv preprint first posted online Jul. 1, 2015; contribution from free energy square term in logarithm folding rate.The RNA folding free energy is typically 2 to 4 kcal / mol [7,17,18] while the folding free energy for most proteins in 65-protein dataset is between 1 and 4.6 kcal /mol.(Table S1).On the other hand, the ρ -value varies from about 0.03 (for fast folds) to 0.1 as seen from the statistical analysis of protein data [5].
Although both The 27-RNA dataset [2] contains data of experimental folding rates measured in different processes.The dataset is inhomogeneous, including several subsets, one subset of total folding rates for some RNAs and another subset of rates of secondary structure formation for other RNAs, etc.In using Eq (15) one should notice the parameter difference among subsets.All D's in different subsets are to 5.5 by theoretical grounds.If the variation of B' (and additive constant) among subsets can be neglected then one can use Eq (15) with single parameter B' , D and additive constant to fit the experimental folding rates.We wonder why the simple formula Eq (15) can fit the experiments so well.Here gives an explanation.However, the dissimilarity of free energy parameters B (and A) and the variation of structure-related parameter f in different subsets do exist.For example, as the nucleotide G in the tetraloop hairpin UUCG is substituted by 8-bromoguanosine G , the folding rate of gcUUC G gc is 4.1-fold faster than gcUUCGgc.[8] Both samples were collected in the dataset and they have the same chain length L=8 but different rate k f .From the present theoretical model, the difference comes from the variation of the free energy change G ∆ and the structure-related parameter f in two samples (see Eq (11)).This explains the origin of the error of the model fit to the RNA folding rate by using Eq (15).

Obtaining a law on the temperature dependence of folding rate and testing it in protein dataset
The free energy decrease G ∆ in protein folding is linearly dependent of temperature T (Figure S2 in the Supplementary data).Inserting the linear relation into Eq (7) we obtain the temperature dependence of the transition rate as 1 ln()ln. 2 It means the non-Arrhenius behavior of the rate-temperature relationships.The relation was tested for 16 two-state proteins whose temperature dependence data were available (Table S2 in the Supplementary data).The statistical analyses were made in [5] (see Figure S1 and Table S4 in the Supplementary data).Figure 4 gives an example.The strong curvature on Arrhenius plot is due to the R term in Eq (17) which comes from the square free energy (ΔG) 2 in lnW.The good agreement between theory and experiments affords a support to the concept of quantum folding.Moreover, in peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/021782doi: bioRxiv preprint first posted online Jul. 1, 2015; this model the universal non-Arrhenius characteristics of folding rate are described by two slope parameters S and R and these parameters are related to the known folding dynamics.All parameters related to torsion potential defined in this theory (such as torsion frequency ω and ω′ , averaged angular shift δθ and energy gap ΔE between initial and final torsion potential minima, etc) can be determined.They can be calculated consistently with each other for all studied proteins.Furthermore，in this theory the folding and unfolding rates are correlated with each other, needless of introducing any further [5][19].
For RNA molecule the temperature dependence of folding rate for yeast tRNA phe was observed [20].They measured the logarithm folding rate lnk f versus 1/T between 28.5°C and 34.8°C.The Arrhenius plot shows a straight line in this temperature interval but large standard deviation existing at low temperature end.From the experiments on protein folding, the strong curve of the lnk f -1/T relation only occurs in a temperature interval of several tens degrees.We expect more accurate measurements within a large enough temperature interval will be able to exhibit the non-Arrhenius peculiarity of the temperature dependence of the RNA folding rate.The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/021782doi: bioRxiv preprint first posted online Jul. 1, 2015; chemical reaction G ∆ =0 due to the different bias samplings in frequency space of { j ω } and { ' j ω }, ' ωω ≠ .
We have studied the folding of two-state proteins and RNA molecules from the point of quantum transition.Dislike two-state protein folding RNA folding is a multi-stage process.Both the folding from compact intermediate to native fold and the folding of secondary structure formation can be studied from quantum folding theory.However, to study the collapse transition as a whole for RNA molecule needs further development of the quantum model.The problem can be compared with the multi-state protein folding.For multi-state protein one may assume the folding is a mutual process of several quantum transitions in different domains and that some time delays exist between these transitions [21].The like idea might be introduced in the study of the total folding rate of the RNA molecule.

Conclusion
A formula on protein and RNA folding rate dependent of torsion number(chain length)is deduced from quantum folding theory of macromolecule.The theoretical prediction is in accordance with the experimental data on two-state-protein and RNA folding.A law on the temperature dependence of folding rate is also deduced which can explain the non-Arrhenius peculiarity of protein folding.What encourages us is: the partial success of the present study on protein and RNA folding from a simple unified theory reveals the existence of a common quantum mechanism in the conformational transition of biomolecules.
Suppose the quantum state of a macromolecule is described by a wave function M(θ, x) where {θ} the torsion angles of the molecule and {x} the set of fast variables including the stretching-bending coordinates and the frontier electrons of the molecule, etc.The wave function M(θ,x) satisfies ((,)(,;))(,)(,) torfv HHxMxEMx x θθθθ θ ∂∂ += ∂∂ variable Hamiltonian.Because the fast variables change more quickly than the variation of torsion angles, the adiabatic approximation can be used.In adiabatic approximation the wave function is expressed as (,)()(,) Mxx θψθϕθ = (3) and these two factors satisfy (,;)(,)()(,) (8) are basic equations for conformational transition.To obtain quantitative result one

3 N
assumed to be a constant and normalized in the volume V.As the energy and volume V are dependent of the size of the molecule one may assume energy (0) α ε and U proportional to the peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/021782doi: bioRxiv preprint first posted online Jul.1, 2015;       interacting-pair number (namely N 2 ) and V proportional to N.However, because only a small fraction of interacting-pairs correlated to given (1,...,) the molecular structure.For example, the high helix content makes the integral increasing.It was indicated that a protein with abundant α helices may have a quite oblong or oblate ellipsoid, instead of spheroid, shape and this protein has higher folding rate[1][5].Therefore, apart from the factor − there is another structure-related factor in () j a αα ′ .The latter is N-independent.Assuming M proportional to N, one obtains 25MacfN − = (10)where f is a structure-related shape parameter.It means the fast-variable factor E I ′ is inversely proportional to N 5 10) inserted into Eq(7) we obtain the relation of logarithm rate with respect to N and ΔG

Figure 1
Figure 1 Comparison of theoretical folding rates lnW with experimental folding rates lnk f for 65 proteins.The experimental rates k f are taken from Table S1.The straight line is the linear regression between lnW and lnk f .In calculation of theoretical lnW by use of Eq 11 the shape parameter f is taken as follows:, f=81 for (L α -L β )/L ≥ 0.6, f=25 for 0.3≤(L α -L β )/L< 0.6, and f=1 for (L α -L β )/L< 0.3.(L α and L β are the number of residues in α helix and β sheet, respectively, and L is the total number of folded residues).Our experience shows that the different choice of f-value in the intermediate region is insensitive to the statistical result.The figure is plotted for 0.097 ρ = (with correlation R=0.7818 and slop of regression line 1.109).For any ρ between 0.06 and 0.1 the basically same results are obtained, for example, R=0.7537 and slop=1.044for 0.069 ρ = , R=0.7396 and slop=0.997for 0.06 ρ = .

Figure 2
Figure 2 Statistical relation of free energy for two-state proteins.Experimental data are taken from 65protein set (TableS1).Five proteins in the set denatured by temperature have been omitted in our statistics.
literatures.One was based on the assumption of linear relation existing between ΔG and N , GaNb ∆=− ( 0 b ≠ ) [5].Another was based on the assumed relation of ΔG vs (

Fig
folding rate is an important peculiarity of the present theory.We shall use the statistical relation of the free-energy combination versus N in the following studies on RNA folding.
)，while the correlation between ln W H (calculated from Eq 16) and ln k f is R=0.9752(Fig 3b).However, in Fig 3b the slope of the regression line is 1.03 and the line deviates from origin by -0.36 , while in Fig 3a the slope is 1.0001, very close to 1 and the line deviates from origin only by -0.0012.The reason is: although two equations have the same overall accuracy in fitting experimental data, but for large L cases the errors loglog f ErWk =− calculated from Eq (15) are explicitly lower than loglog HHf ErWk =− peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
at large L (a long-tail existing in the W-L curve) rather than a short tail as exp() L λ −

Figure 3
Figure 3 Comparison of experimental folding rates lnk f with theoretical folding rates lnW (Fig 3a) or lnW H (Fig 3b) for 27 RNA molecules.Experimental rates are taken from Table1in literature[2].
ρ changes with N the variation of 2 G ρ ∆ () with N in a given dataset may be more weak.It is plausible to assume the mean 2 G ρ ∆ () for protein differs that for RNA by a factor no larger than 1.5 to 2the B -value for RNA, B RNA ,will be smaller than B protein =541.1 and takes a value in the range 270-360.Comparing with f B′ =61.63 it leads to q=4.4 ~ 5.8, consistent with the theoretical upper limit q =7 for RNA molecule.

Figure 4
Figure 4 Model fits to overall folding rate k f vs temperature 1000/T for protein FBP28(PDB code 1E0L).Experimental logarithm folding rates are shown by "o", and solid lines are theoretical model fits to the folding rate (k f in unit s -1 , T in unit Kelvin).Experimental rates are taken from [22].
peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.hereα denotes the quantum number of fast-variable wave function ϕ , and (k, n) refer to the conformational (indicating which minimum the wave function is localized around) and the vibrational state of torsion wave function ψ , respectively.

Table 1
Free energy parameters determined by linear regression A and B -free energy parameter, R-correlation coefficient.R reaches maximum at ρ=0.069.
-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
(13)ing the relation of the folding rate with respect to torsion number N in RNA folding datasetIn virtue of Eqs(11)(13)we obtain an approximate expression for transitional rate lnW versus peer

Table 2
Errors of RNA folding rates in two theoretical models compared with experimental data