## Abstract

RNA molecules are highly dynamic systems characterized by a complex interplay between sequence, structure, dynamics, and function. Molecular simulations can potentially provide powerful insights into the nature of these relationships. The analysis of structures and molecular trajectories of nucleic acids can be non-trivial because it requires processing very high-dimensional data that are not easy to visualize and interpret. Here we introduce Barnaba, a Python library aimed at facilitating the analysis of nucleic acids structures and molecular simulations. The software consists of a variety of analysis tools that allow the user to i) calculate distances between three-dimensional structures using different metrics, ii) back-calculate experimental data from three-dimensional structures, iii) perform cluster analysis and dimensionality reductions, iv) search three-dimensional motifs in PDB structures and trajectories and v) construct elastic network models (ENM) for nucleic acids and nucleic acids-protein complexes.

In addition, Barnaba makes it possible to calculate torsion angles, pucker conformations and to detect base-pairing/base-stacking interactions. Barnaba produces graphics that conveniently visualize both extended secondary structure and dynamics for a set of molecular conformations. Barnaba is available as a command-line tool as well as a library, and supports a variety of file formats such as PDB, dcd and xtc files. Source code, documentation and examples are freely available at https://github.com/srnas/barnaba under GNU GPLv3 license.

## Introduction

Despite their simple four-letters alphabet, RNA molecules can adopt amazingly complex three-dimensional architectures. RNA structure is often described in terms of few, simple degrees of freedom such as backbone torsion angles, sugar puckering, base-base interactions, and helical parameters ** Dickerson (1989)**;

**. Given a known three-dimensional structure, the calculation of these properties can be performed using available tools such as MC-annotate**

*Richardson et al. (2008)***, 3DNA**

*Gendron et al. (2001)***, fr3D**

*Lu and Olson (2008)***or DSSR**

*Sarver et al. (2008)***. These software packages make it possible to calculate a variety of structural properties, but are less suitable for analyzing and comparing large numbers of structures.**

*Lu et al. (2015)*The lack of large-scale analysis tools is critical when considering that many RNA molecules are not static, but highly dynamic entities, and multiple conformations are required to describe their properties. In molecular dynamics (MD) simulations ** Šponer et al. (2018)**, for example, it is often necessary to analyze several hundreds of thousands of structures. The analysis and comparison of results from structure-prediction algorithms poses similar challenges

**;**

*Dawson and Bujnicki (2016)***. In order to rationalize and generate scientific insights, it is therefore fundamental to employ specific analysis and visualization tools that can handle such highly-dimensional data. This need has been long recognized in the field of protein simulations, leading to the development of several software packages for the analysis of MD trajectories**

*Miao et al. (2017)***;**

*Michaud-Agrawal et al. (2011)***;**

*McGibbon et al. (2015)***. While these software can be in principle used to analyze generic simulations, they do not support the calculation of nucleic-acids-specific quantities out of the box. Notable exceptions are CPPTRAJ**

*Tiberti et al. (2015)***, and the driver tool in PLUMED**

*Roe and Cheatham III (2013)***, that support the calculation of nucleic acids structural properties, among other features.**

*Tribello et al. (2014)*A limited number of software packages have been developed with the main purpose of analyzing simulations of nucleic acids. Curves+ ** Lavery et al. (2009)** calculates parameters in DNA/RNA double helices as well as torsion backbone angles.

*do*

_{x3dna}

**extends the capability of the 3DNA package to analyze few selected quantities from GROMACS**

*Kumar and Grubmüller (2015)***MD trajectories. The detection of hydrogen bonds/stacking in simulations and the identification of motifs such as helices, junctions, loops, etc. can be performed using the Motif Identifier for Nucleic acids Trajectory (MINT) software**

*Abraham et al. (2015)***.**

*Górska et al. (2015)*Here we present Barnaba, a Python library to analyze nucleic acids structures and trajectories. The library contains routines to calculate various structural parameters (e.g. distances, torsion angles, base-pair and base-stacking detection), to perform dimensionality reduction and clustering, to back-calculate experimental quantities form structures and to construct elastic network models. Barnaba utilizes the capabilities of MDTraj ** McGibbon et al. (2015)** for reading/writing trajectory files, and thus supports many different formats, including PDB, dcd, xtc, and trr.

In this paper we show the capabilities of Barnaba by analyzing a long MD simulation of an RNA stem-loop structure. We first calculate distances from a reference frame. Second, we consider a subset of dihedral angles and compare ^{3}** J** scalar couplings calculated from simulations with nuclear magnetic resonance (NMR) data. We then perform a cluster analysis of the trajectory, identifying a number of clusters that are visualized using a dynamic secondary structure representation. Finally, we search for structural motifs similar to cluster centroids in the entire protein data bank (PDB) database. In addition, we show how to construct an elastic network model (ENM) of RNA molecules and protein-nucleic acid complexes with Barnaba, and how to use it to estimate RNA local fluctuations.

## Results

We present the different features of Barnaba by analyzing a 180*μ*s long simulation of an RNA 14-mers with sequence GGCACUUCGGUGCC performed by Tan et al. ** Tan et al. (2018)** using a simulated tempering protocol where the temperature is used as a dynamic variable to enhance sampling. Experimentally, this sequence is known to form an A-form stem composed by 5 consecutive Watson-Crick base pairs, capped by a UUCG tetraloop (Fig. 1A).

### RMSD, eRMSD calculation and detection of base-base interactions

First, we calculate the distance of each frame in the simulation from the reference experimental structure (PDB code 2KOC ** Nozinovic et al. (2010)**). Fig. 1 B shows the time series of heavy-atom root mean squared distance (RMSD) after optimal superposition

**. During this simulation, multiple folding events occur: In line with previous analyses**

*Kabsch (1976)***we thus observe both structures close to the reference as well as unfolded/misfolded ones.**

*Tan et al. (2018)*We identify the base-base interactions in each frame using the annotation functionality in Barnaba (see Methods). Structures where the stem is completely formed together with the native trans sugar-Watson (tSW) interaction between U6-G9 in the loop are shown in red. Blue points indicate structures in which all base pairs in the stem, but not in the loop, are present. All the other structures are colored in gray. From the histogram in Fig. 1B it can be seen that RMSD < 0.23nm roughly corresponds to native-like structures. A second sharp peak around 0.3nm corresponds to structures in which only the stem is correctly formed. All other conformations have RMSD larger than 0.6nm.

One of the feature of Barnaba is the possibility to calculate the eRMSD ** Bottaro et al. (2014)**. The eRMSD only considers the relative arrangements between nu-cleobases in a molecule, and quantifies the differences in the interaction network between two structures. In this perspective, eRMSD is similar to the Interaction Fidelity Network

**that quantifies the discrepancy in the set of base-pairs and base-stacking interactions. The eRMSD, however, is a continuous, symmetric, positive definite metric distance that satisfies the triangular inequality. Additionally, it does not require detection of the interactions (annotation) and is hence particularly well suited for analyzing MD trajectories and unstructured RNA molecules. Fig. 1C shows the eRMSD from native for the UUCG simulation. We notice that, similarly to the RMSD case, the histogram displays three main peaks. In this case the correspondence between peaks and structures can be readily identified: when eRMSD< 0.7 native stem and loop are formed, if 0.7<eRMSD<1.3, stem is formed but the loop is in a non-native configuration. Other structures typically have eRMSD>1.3. We observe that the separation between the two main peaks (native structure, red, and native stem, blue) is sharper in Fig.1 C, confirming that eRMSD is more suitable than RMSD to distinguish structures with different base pairings**

*Parisien et al. (2009)***.**

*Bottaro et al. (2014)*Note that a significant number of low-RMSD/eRMSD structures lack one or more native base-pair interactions, and are therefore shown in gray. This is because the detection of base-base interactions critically depends on a set of geometrical parameters (e.g. distance, base-base orientation, etc.) that were calibrated on high-resolution structures. The criteria used in Barnaba (as well as the ones employed in other annotation tools) may not always be accurate when considering intermediate states and partially formed interactions that are often observed in molecular simulations ** Lemieux and Major (2002)**.

### Torsion angle and 3 *J* scalar coupling calculations

Another important class of structural parameters is torsion angles. Similarly to other software, Barnaba contains routines to calculate backbone torsion angles (*α*,*β*,*γ*,*δ*,*ϵ*,*ζ*), the glycosidic angle *χ*, and the pseudorotation sugar parameters ** Altona and Sundaralingam (1972)**.

In Fig. 2, left panels we plot the probability distributions of four angles (*β*,*γ*,*δ* and *ϵ*) for three different residues: U6, U7, and G9. We can see from the distribution of *γ* angles that U6 and U7 mainly populate the *gauche*^{+} rotameric state (0° < *γ* ≤ 120°), while G9 significantly populates the *trans* state as well (120° < *γ* ≤ 240°). Different rotameric states can be also seen from the distribution of *δ* angles (C2′/C3′-endo) and *ϵ*, that is related to BI/BII states. Here, we consider the same trajectory of the UUCG tetraloops described above and removed all the unfolded structures, i.e. structures with eRMSD from native larger than 1.5 (≈ 6000 out of 20000), because we below compare to experiments under conditions where these are absent.

In this example we chose these specific torsion angles because their distribution is related to available ^{3}J couplings experimental data from nuclear magnetic resonance (NMR) spectroscopy. The magnitude of ^{3}J coupling depends on the distance between atoms connected by three bonds, and thus on the corresponding dihedral angle distribution. The dependence between angle *θ* and coupling ^{3}** J** can be calculated via Karplus equations

^{3}

**=**

*J***cos**

*A*^{2}(

*θ*+

*ϕ*) +

**cos(**

*B**θ*+

*ϕ*) +

**, where**

*C***are empirical parameters. Couplings corresponding to different angles can be calculated with Barnaba. H1′-H2′, H2′-H3′, H3′-H4′ (sugar conformation), H5′-P, H5″-P, C4-P(**

*A,B,C**β*), H4′-H5′, H4′-H5″ (

*γ*), H3-P(+1), C4-P(+1) (

*ϵ*), H1′-C8/C6, and H1′-C4/C2 (

*χ*). The complete list of Karplus parameters is reported in the Methods section, and may be changed within Barnaba.

Fig. 2, right panels, show the back-calculated average ^{3}** J** couplings and the corresponding experimental value reported in

**. Note that in some cases experiments and simulations do not agree: this is because the simulation was performed at different temperatures using a simulated tempering protocol, and therefore the comparison between simulations and experiments is here made for illustrative purposes only. Significant discrepancies could originate from errors introduced by the Karplus equations, that can be as large as 2Hz**

*Nozinovic et al. (2010)***.**

*Bottaro et al. (2018)*### Cluster analysis

The structures within a trajectory can be grouped into clusters of mutually similar conformations, to understand which different states are visited and how often. For clustering we use the DBSCAN ** Ester et al. (1996)** algorithm with

*ϵ*= 0.45 and min samples=70

**. As in the previous example, structures with eRMSD > 1.5 from native are discarded. Figure 3A shows the trajectory projected onto the first two components of a principal component analysis done on the collection of**

*Bottaro and Lindorff-Larsen (2017)***G**-vectors

**. Circles show the resulting 9 clusters, whose radius is proportional to the square root of their size. The 5500 structures (40%) that were not assigned to any cluster are shown as gray dots. For each cluster we identify its centroid, here defined as the structure with the lowest average distance from all other cluster members.**

*Bottaro and Lindorff-Larsen (2017)*Ideally, clusters should be compact enough so that the centroid can be considered as a representative structure. This information is shown in the box-plot in Fig.3 B, that reports the distances (eRMSD and RMSD, as labeled) between centroids and cluster members. At the same time, structures within clusters are not all identical to one another. In order to visualize the intra-cluster variability we have found it useful to introduce a “dynamic secondary structure” representation. In essence, we detect base-stacking/base-pair interactions in all structures within a cluster, and calculate the fraction of frames in which each interaction is present.

The population of each interaction is shown by coloring the extended secondary structure representation (Fig.3C). This representation has some analogy with the “dot plot” representation used to display secondary structure ensembles obtained using nearest neighbor models, that reports the predicted probability of individual base pairs ** Jacobson and Zuker (1993)**. We can see that the first three clusters correspond to three different tetraloop structures. In cluster 1, the U6-G9 tSW base pair is present, together with the U6-C8 stacking typical of the native UUCG tetraloop structure. In cluster 2, no U6-G9 base pair is present, while in cluster 3 we observe stacking between U6-U7-C8-G9, as also described in the next section. In all clusters the population of the terminal base pairs and stacking is lower than one, indicating the presence of base fraying.

In our experience, cluster analysis is useful to understand and visualize qualitatively the different type of structures in a simulation. In many practical cases, however, the number of clusters and their population may differ depending on the employed clustering algorithm and associated parameters. Clustering may not even be meaningful when considering highly unstructured systems such as long single-stranded nucleic acids lacking secondary structures ** Chen et al. (2012)**.

### Motif search

Barnaba can be used to search for structural motifs in a PDB file or trajectory using the eRMSD distance. In the following example, we illustrate this feature by taking the centroids of the first three clusters described above and search for similar structures within the PDB database. In order to focus on the loop structure, rather than on stem variability, we consider the tetraloop and the two closing base pairs for the search (residues 4-11 in Fig.1A). The search is performed against all RNA-containing structures in the PDB database (retrieved May 4th, 2018, resolution 3.50Å or better). The database considered here consists of 3067 X-ray, 652 NMR and 177 cryo electron-microscopy (EM) structures. Note that the search is purely based on the geometrical arrangement of nucleobases, without restriction on the sequence, a particular feature that is also enabled by the use of eRMSD.

Figure 4 shows the cluster centroids (gray) and the closest motif match, i.e. the lowest eRMSD substructure in the PDB database (orange). The eRMSD between the cluster centroid and the best match are indicated, together with the associated PDB code. Centroid 1 corresponds to the canonical UUCG tetraloop structure, with the signature tSW interaction between U6-G9 and G9 in syn conformation. Note that the eRMSD between centroid and best match is small (0.25), indicating that simulated and experimental structures are highly similar. Cluster 2 corresponds to a structure in which the stem is formed, C8 is stacked on top of U6 and G9 is bulged out. Centroid 3 features four consecutive stacking between U6-U7-C8-G9. Note that this latter structure is remarkably similar to the 4-stack loop described in ** Bottaro and Lindorff-Larsen (2017)**.

As a rule of thumb, we consider as significant matches structures below 0.7 eRMSD, but there are cases in which it is worth considering structures in the 0.7-1.0 eRMSD range as well. More generally, it is useful to consider the histogram of all fragments with eRMSD below 1, as shown in Fig. 4, bottom panels. This type of analysis makes it possible to identify a good threshold value, in correspondence to minima in the probability distributions. For example, there are no structures in the PDB with eRMSD lower than 0.7 for centroid 3. In this case, a value of 0.9 should be used instead.

In this example we performed a simple search of a structure from simulation against experimentally-derived structures downloaded from the PDB database. In Barnaba, any arbitrary motif can be used as a query by providing a coordinate file with at least the position of C2,C4 and C6 atoms for each nucleotide. Searches with more complex motifs composed by two strands (e.g. K-turns, sarcin-ricin motifs, etc.) are also possible. Additionally, Barnaba allows for inserted bases, thereby identifying structural motifs with one or more bulged-out bases.

### Elastic Network Models

Elastic Network Models (ENMs) are minimal computational models able to capture the dynamics of macromolecules at a small computational cost. They assume that the system can be represented as a set of beads connected by harmonic springs, each having rest length equal to the distance between the two beads it connects, in a reference structure (usually, an experimental structure from the PDB). First introduced to analyze protein dynamics ** Tirion (1996)**, ENMs are also applicable to structured RNA molecules

**;**

*Bahar and Jernigan (1998)***;**

*Setny and Zacharias (2013)***. Barnaba contains routines to construct ENM of nucleic acids and proteins, and, as unique feature, makes it possible to calculate fluctuations between consecutive C2-C2 atoms. In a previous work**

*Zimmermann and Jernigan (2014)***, we have shown this quantity to correlate with flexibility measurements performed with selective 2-hydroxyl acylation analyzed by primer extension (SHAPE) experiments**

*Pinamonti et al. (2015)***. Here, we show an example of ENM analysis on two RNA molecules: the 174-nucleotide sensing domain of the Thermotoga maritima lysine riboswitch (PDB ID: 3DIG), and the Escherichia coli 5S rRNA (PDB ID: 1C2X). We construct an all-atom ENM (AA-ENM), where each heavy atom is a bead, together with a cutoff radius of 7 Å. In figure 5 we show the flexibility of the RNA molecules as predicted by the ENM (black), that can be qualitatively compared with the measured SHAPE reactivity**

*Merino et al. (2005)***(orange).**

*Hajdin et al. (2013)*The implementation of the ENM in Barnaba employs the sparse matrix package available in Scipy, that allows for significant speed-ups compared to the dense-matrix implementation. Fig. 6 shows the execution time for constructing ENMs (both SBP and AA) of biomolecules with sizes ranging from a few tens to several hundreds nucleotides. Calculations were performed running Barnaba on a personal computer. This, combined with the significant memory saving granted by sparse matrices representation, makes it possible to easily compute the vibrational modes and the local flexibility of large RNA systems such as ribosomal structures using a limited amount of computer resources.

## Discussion

Many RNA molecules are highly dynamical entities that undergo conformational rearrangements during function. For this reason, it is becoming increasingly important to develop tools to analyze not only single structures, but also trajectories (ensembles) obtained from molecular simulations. In this paper we introduce a software to facilitate the analysis of nucleic acids simulations. The program, called Barnaba, is available both as a Python library as well as a command line tool. The output of the program is such that it can be easily used to calculate averages and probability distributions, or conveniently used as input to the many existing plotting and analysis libraries (e.g. Matplotlib, SKlearn) available in Python.

Barnaba consists of a number of functions: some of them implement standard calculations (RMSD, torsion angles, base-pairs and base-stacking detection). A unique feature of Barnaba is the possibility to calculate the eRMSD. This metric has been successfully employed in several contexts: for analyzing MD simulations ** Kuhrova et al. (2016)**, as a biased collective variable in enhanced sampling simulations

**;**

*Bottaro et al. (2016)***;**

*Yang et al. (2017)***, to construct Markov State models**

*Poblete et al. (2018)***and to cluster RNA tetraloop structures**

*Pinamonti et al. (2017)***. In this paper we show the usefulness of this metric to monitor simulations over time, to perform cluster analysis and to search for structural motifs within trajectories/structures. This last feature can be extremely useful to experimental structural biologists, as it makes it possible to efficiently search for arbitrary query motifs within the entire PDB database. For analyzing simulations and clusters, we have found it useful to introduce a dynamic secondary structure representation, that recapitulates the variability of base-pair and base-stacking interactions within an ensemble.**

*Bottaro and Lindorff-Larsen (2017)*Another unique feature of Barnaba is the possibility to back-calculate ^{3}** J** scalar couplings from structures. This calculation is

*per se*extremely simple. However, it can be difficult to obtain from the literature the different sets of Karplus parameters, and the calculation of the corresponding dihedral angles is error-prone.

Finally, Barnaba contains a routine to construct ENMs of nucleic acid and protein systems and complexes. This is a useful, fast and computationally cheap tool to predict the local dynamical properties of biomolecules, as well as the chain flexibility of RNA molecules.

## Methods and Materials

### Implementation and availability

Barnaba is a Python library and command line tool. It requires Python 2.7 or > 3.3, Numpy, and Scipy libraries. Additionally, Barnaba requires MDTraj (http://mdtraj.org/) for manipulating structures and trajectories. Source code is freely available at https://github.com/srnas/barnaba under GNU GPLv3 license. The github repository contains documentation as well as a set of examples.

### Relative position and orientation of nucleobases

For each nucleotide, a local coordinate system is set up in the center of C2, C4, and C6 atoms. The x-axis points toward the C2 atom, and the y-axis in the direction of C4 (C/U) or C6 (A/G). The origin of the coordinates of nucleobase *j* in the reference system constructed on base *i* is the vector **R**_{ij} = {*x*_{ij}, *y*_{ij}, *z*_{ij}}. Note that |**R**_{jj}| = |**R**_{ji}| but **R**_{ij} ≠ **R**_{ji}. The **R**_{ij} is central in the definition of the eRMSD metric and of the annotation strategy described below.

### eRMSD

The eRMSD is a contact-map based distance, with the addition of a number of features that make it suitable for the comparison of nucleic acids structures. We briefly describe here the procedure, originally introduced in ** Bottaro et al. (2014)**. Given a three-dimensional structure

*α*, one calculates for all pairs of bases in a molecule. The position vectors are then rescaled as follows: with

*a*= 5Å and

*b*= 3Å. The rescaling effectively introduces an ellipsoidal anisotropy that is peculiar to base-base interactions. Given two structures,

*α*and

*β*, consisting of

*N*residues, the eRMSD is calculated as

**G** is a non-linear function of defined as:
where and Θ is the Heaviside step function. Note that the function **G** has the following desirable properties:

.

.

is a continuous function.

The cutoff value is set to = 2.4.

### Annotation

A pair of bases *i* and *j* is considered for annotation only if and .

#### Stacking

The criteria for base-stacking are the following:

Here, and *θ*_{ij} is the angle between the vectors normal to the planes of the two bases. Similarly to other annotation approaches ** Gendron et al. (2001)**;

**;**

*Sarver et al. (2008)***, we identify four different classes of stacking interactions according to the sign of the z coordinates:**

*Waleń etal. (2014)*upward: (>> or 3′-5′) if

*z*_{ij}> 0 and*z*_{ji}< 0downward: (<< or 5′-3′) if

*z*_{ij}< 0 and*z*_{ji}> 0outward: (<> or 5′-5′) if

*z*_{ij}< 0 and*z*_{ji}< 0inward: (>< or 3′-3′) if

*z*_{ij}> 0 and*z*_{ji}> 0

We notice that, with this choice, consecutive base pairs with alternating purines and pyrimidines result in a cross-strand outward stacking (see, e.g., Figure 1A).

#### Base-pairing

Base-pairs are classified according to the Leontis-Westhof nomenclature ** Leontis and Westhof (2001)**, based on the observation that hydrogen bonding between RNA bases involve three distinct edges: Watson-Crick (W), Hoogsteeen edge (H), and sugar (S). An additional distinction is made according to the orientation with respect to the glycosydic bonds, in cis (c) or trans (t) orientation.

In Barnaba, all non-stacked bases are considered base-paired if |*θ*_{ij}| < 60° and there exists at least one hydrogen bond, calculated as the number of donor-acceptor pairs with distance < 3.3*Å*. Edges are defined according to the value of the angle .

Watson-Crick edge (W): 0.16 <

*ψ*≤ 2.0*rad*Hoogsteen edge (H): 2.0 <

*ψ*≤ 4.0*rad*.Sugar edge (S):

*ψ*> 4.0*rad*,*ψ*≤ 0.16*rad*

These threshold values are obtained by considering the empirical distribution of base-base interactions shown in Figure 2 in ** Bottaro et al. (2014)**. Cis/trans orientation is calculated according to the value of the dihedral angle defined by , where N1/N9 is used for pyrimidines and purines, respectively.

We note that the annotation provided by Barnaba might fail in detecting some interactions, and sometimes differs from other programs. This is due to the fact that for non-Watson-Crick and stacking interactions it is not trivial to define a set of criteria for a rigorous discrete classification ** Waleń et al. (2014)**. Typically, these criteria are calibrated to work well for high-resolution structures, but they are not always suitable to describe nearly-formed interactions often observed in molecular simulations.

### Torsion angles and ^{3}*J* scalar couplings

We use the standard definition of backbone angles, glycosidic *χ* angle (O4′-C1′-N9-C4 atoms for A/G, O4′-C1′-N1-C2 for C/U) and sugar torsion angles (*v*_{0}…*v*_{4}) as shown in Figures 8 and 9 ** Saenger (2013)**. Pseudorotation sugar parameters amplitude

*tm*and phase

*P*are calculated as described in

*Altona and Sundaralingam (1972)*^{3}

**Scalar couplings are calculated using the Karplus equations**

*J*Karplus parameters relative to the different scalar couplings are reported in Table 1.

### Elastic Network Model

In ENMs, a set of *N* beads connected by pairwise harmonic springs penalize deviations of inter-bead distances from their reference values. Spring constants are set to a constant value *κ* whenever the reference distance between the two beads is smaller than an interaction cutoff (*R*_{c}), and set to zero otherwise. Under these assumptions, the potential energy of the system can be approximated as
where **M** is the symmetric 3*N* × 3*N* interaction matrix, and *δ***r**_{i} is the deviation of bead *i* from its position in the reference structure.

The user can select different atoms to be used as beads in the construction of the model. The optimal value of the parameter *R*_{c} depends on this choice, as described in Ref. ** Pinamonti et al. (2015)**.

The covariance matrix is computed as
Where *λ*_{α} and **v**^{α} are the eigenvalues and the eigenvectors of the interaction matrix ** M**, respectively. The sum on

*α*runs over all non-null modes of the system.

Mean square fluctuation (MSF) of residue *i* is calculated as:

The variance of the distance between two beads can be directly obtained from the covariance matrix in the linear perturbation regime as
where is the *μ* Cartesian component of the reference distance between bead *i* and *j*.

For most practical applications of ENMs only the high-amplitude modes, i.e. those with the smallest eigenvalues, provide interesting dynamical information. The calculation of C2-C2 distance fluctuations using Eq. 12 requires the knowledge of all eigenvectors. This can be performed by reducing the system to the “effective interaction matrix” relative to the beads of interest ** Zen et al. (2008)**.

Where *M*_{C2} (*M*_{other}) is formed by the rows and columns of ** M** relative to the (non) C2 beads, while

**represent the interactions between C2 and non-C2 beads. The effective interaction matrix is defined as**

*W*This can be computed efficiently using sparse matrix-vector multiplication algorithms. The resulting effective matrix has reduced size (1/3 for SBP-ENM, 1/20 for AA-ENM) making its pseudo-inversion considerably faster. Note that, in case one is interested in computing the C2-C2 fluctuations for a portion of the molecule only, the algorithm could be further optimized by directly computing the effective interactions matrix associated to the required C2-C2 pairs.

## Acknowledgments

We thank D.E Shaw Research for providing the simulation of the UUCG tetraloop. The research is funded by a grant from The Velux Foundations (S.B. and K.L.-L.), a Hallas-Møller Stipend from the Novo Nordisk Foundation (K.L.-L.), and the Lundbeck Foundation BRAINSTRUC initiative (K.L.-L.). G.B.,S.R, S.B and G.P. have received funding from the European Research Council (ERC) under the European Union′s Seventh Framework Programme (FP/2007-2013)/ERC grant agreement no. 306662 (S-RNA-S). W.B. is funded from VILLUM FONDEN (VKR023445) and the Danish Council for Independent Research (DFF-4181-00344).