## Abstract

Large macromolecules, including proteins and their complexes, very often adopt multiple conformations. Some of them can be seen experimentally, for example with X-ray crystallography or cryo-electron microscopy. This structural heterogeneity is not occasional and is frequently linked with specific biological function. Thus, the accurate description of macromolecular conformational transitions is crucial for understanding fundamental mechanisms of life’s machinery. We report on a real-time method to predict such transitions by extrapolating from instantaneous eigen-motions, computed using the normal mode analysis, to a series of twists. We demonstrate the applicability of our approach to the prediction of a wide range of motions, including large collective opening-closing transitions and conformational changes induced by partner binding. We also highlight particularly difficult cases of very small transitions between crystal and solution structures. Our method guaranties preservation of the protein structure during the transition and allows to access conformations that are unreachable with classical normal mode analysis. We provide practical solutions to describe localized motions with a few low-frequency modes and to relax some geometrical constraints along the predicted transitions. This work opens the way to the systematic description of protein motions, whatever their degree of collectivity. Our method is available as a part of the NOn-Linear rigid Block (NOLB) package at https://team.inria.fr/nano-d/software/nolb-normal-modes/.

**Significance Statement** Proteins perform their biological functions by changing their shapes and interacting with each other. Getting access to these motions is challenging. In this work, we present a method that generates *plausible* physics-based protein motions and conformations. We model a protein as a network of atoms connected by springs and deform it along the least-energy directions. Our main contribution is to perform the deformations in a nonlinear way, through a series of twists. This allows us to produce a wide range of motions, some of them previously inaccessible, and to preserve the structure of the protein during the motion. We are able to simulate the opening or closing of a protein and the changes it undergoes to adapt to a partner.

Large macromolecules, including proteins and their complexes, are intrinsically flexible, and this flexibility is often linked with their function. A molecule in solution can be viewed as a structurally heterogeneous ensemble, where a finite number of conformational states (*e.g*. active-inactive, bound-unbound) may become stable under certain conditions to perform specific tasks. Identifying the molecular states relevant to protein functioning is necessary for our understanding of biological processes. Moreover, targeting protein functional motions bears a great potential to control and modulate proteins’ activities and interactions in physio-pathological contexts.

Structural heterogeneity can be probed by various experimental techniques. These include X-ray crystallography, cryo-electron microscopy (cryo-EM), nuclear magnetic resonance (NMR), small-angle scattering and many others (1). The two first methods allow obtaining large macromolecular structures at high resolution. While X-ray crystallography captures single stable states, cryo-EM allows observing conformational ensembles in solution. The resolution attained by cryo-EM is very often lower than that of X-ray structures, mainly due to the *structural heterogeneity* of the measured samples. However, the ongoing revolution in cryo-EM instrumentation (2) has supplied an exponentially growing body of near-atomic resolution structures. These techniques provide valuable insights on proteins’ functioning and interactions with their environment. Nevertheless, experimental protein structure determination remains a time consuming and costly process. The *systematic* description of the variety of shapes a protein adopts under particular environmental conditions, upon post-translational modifications and/or partner binding still remains out of reach. Hence, there is a need for computational tools able to efficiently and accurately predict functionally relevant protein conformations and macromolecular motions in general.

Several decades ago, Hayward and Go (3) observed that large-scale protein dynamics can be described with a set of just a few *collective coordinates*, accessible through the normal mode analysis (NMA). Thus, the latter provides an efficient way for reducing the dimensionality of the initial system and allows to study conformational transitions in proteins and their complexes. This has motivated the development of NMA-based tools for multiple biological applications, including flexible fitting of atomistic structures into cryo-EM maps (4–11) or one-dimensional scattering profiles (12), prediction of crystallographic temperature factors (13–15), generation of structural ensembles for cross-docking (16, 17), prediction of protein hinge regions (18, 19), flexible docking (20–23), refinement of crystallographic structures (24, 25) and docking solutions (26–28), and many others. The suitability of the NMA to model conformational dynamics varies widely depending on the system studied and on the type of motions involved (29). The NMA was shown to better describe highly collective motions, compared to localized deformations (30).

Atomistic molecular dynamics (MD) simulations represent an alternative to the NMA. They provide a practical tool to describe the structural heterogeneity around an equilibrium state and the flexibility exhibited by solvent-exposed small regions, such as loops. For instance, MD-based sampling has been applied to model the conformational diversity embedded in localized regions of cryo-EM maps (31). In addition, the concept of collective coordinates has been extended to MD (32–34), which, as a result, have been applied to the study of free energy changes between different conformational states, and rare-event dynamics (35). Nevertheless, MD simulations are much more costly than the NMA and the systematic characterization of conformational transitions with the former still remains computationally prohibitive.

In this work, we present an efficient real-time method to predict biomolecular transitions involving a wide range of motions, from local deformations, *e.g*. of a small loop, to highly collective domain motions. It relies on the nonlinear rigid block (NOLB) NMA (36). NOLB extends the classical NMA to describe nonlinear motions. Specifically, it extrapolates motions computed from instantaneous linear and angular velocities to large amplitudes. The resulting molecular motion is represented as a series of rigid block twists. We apply this nonlinear extrapolation to a combination of a few low-frequency normal modes to approximate conformational transitions. Importantly, our approach is conceptually simple and explores the conformational space in the Cartesian coordinate system. The nonlinearity of the computed motions allows a better approximation of experimentally observed transitions.

So far, the computation of nonlinear transitions using the NMA formalism has only been possible by cutting them in small steps and recomputing the normal modes at each step, and/or by performing the NMA in the internal coordinate system (11, 14, 37, 38). On average, the internal-coordinate NMA (iNMA) requires a smaller number of modes than the classical Cartesian-coordinate NMA to describe large structural transitions (14), and better predicts transitions upon protein docking (38). Working with internal coordinates also allows for large dimensionality reduction through variable selection and model simplification (14, 39–43). Despite these advantages, iNMA implies solving the generalized eigenvalue problem and dealing with necessarily dense interaction matrices. This makes it computationally costly and prevents its application on a large scale. Moreover, small changes in the internal coordinates may result in very large overall structural changes, which makes the approach less amenable to conformational space exploration, as it generates instability in the solution.

To demonstrate the advantages of the method reported here, we assess structural transitions computed with the classical linear normal modes, the Cartesian nonlinear normal modes, and an iterative scheme where the nonlinear modes are updated while progressing to the target state. For this purpose, we composed three test benchmarks of proteins exhibiting various types of structural transitions. The first test case presents examples of large domain motions, where ‘open’ and ‘closed’ conformations can be clearly identified (11). The second one is comprised of proteins changing their conformation upon binding to other proteins (44). The third one contains test cases from the Cryo-EM 2015/2016 Model Challenge, where the transition takes place between a crystal form and a conformation in solution (45). We find that the classical linear NMA behaves well on the first set, where the motions are mostly collective, but is not suited to describe the more localized deformations and very small transitions exhibited by the two other sets. We show that our Cartesian nonlinear approach systematically obtains better transitions compared to the linear one. Indeed, the final predicted structures are closer to the experimentally known targets and display less distortions. The improvement is particularly significant on changes associated to partner binding. We further demonstrate the usefulness of nonlinearity and mode updating to extend the applicability of the NMA to localized and disruptive motions. Last, but not least, our approach is very computationally and memory efficient. It is implemented as a fully automated tool available at: https://team.inria.fr/nano-d/software/nolb-normal-modes/.

Our results allow revisiting the NMA-based description of biomolecular transitions. They pave the way to the systematic targeting and modulation of protein-protein interactions.

## Computational method

Protein shapes and motions are governed by a multitude of interatomic forces, resulting from intra- and inter-molecular interactions. Despite this high complexity, many functional motions can be approximated by a few *low-frequency modes* characteristic of the protein’s geometrical shape (30, 46, 47). To compute these modes, we represent the protein as an elastic network (Fig. 1, top panel on the left), where each node stands for an atom and two nodes *i* and *j* are connected by a spring whenever the distance *d _{i}j* between the corresponding atoms is smaller than a cutoff value, typically 5 Å (

*SI Appendix, Text 2F*). The normal modes are obtained by diagonalizing the mass-weighted Hessian matrix of the potential energy of this network (

*SI Appendix, Text 2A*). To reduce the dimensionality of this diagonalization problem, we consider each protein residue as a rigid block, according to the

*rotation translation blocks*(RTB) approach (48, 49) (Fig. 1, middle panel on the left, see also

*SI Appendix, Text 2B*). With this coarse-grained representation, the computed normal modes are composed of

*instantaneous linear velocities*and

*instantaneous angular velocities*, defining translations and rotations for each block/residue.

A straightforward way to compute normal-mode guided structural transitions is to calculate instantaneous displacements of each atom in a residue and then linearly extrapolate these up to a given amplitude ** a**. However, at large amplitudes, this will distort interatomic distances and produce unrealistic molecular conformations. To circumvent this problem, we apply a nonlinear extrapolation (Fig. 1, bottom panel on the left), where each residue undergoes a

*screw*(or a

*twist)*motion (

*SI Appendix, Text 2C*). Specifically, the linear velocity is decomposed in two terms, namely , which is collinear to , and , which is orthogonal to . We further represent the pair of and as a pure rotation around a new center . Hence, instead of rotating about the axis defined by passing through its center of mass, each residue is rotated about the new axis defined by passing through and translated only in the direction of (

*SI Appendix, Eq. 10*). This nonlinear extrapolation guarantees preservation of the topology or the protein structure subject to the motion.

Our method computes normal mode-guided nonlinear conformational transitions, starting from an experimentally determined structure or a high-quality 3D model. Specifically, normal modes are computed from the starting structure, which is then deformed along a selection of these modes up to a given amount of conformational deviation (Fig. 1, right panel, see also *SI Appendix, Text 2E*). The simulated conformational change can be potentially very large (several tens of Å). The algorithm may be run in an iterative mode, where the normal modes are re-computed on intermediate conformations. This allows modifying the topology of the network representing the structure and going further away from the starting structure (Fig. 1, right panel, compare orange and red conformations). The method guarantees producing *plausible* physics-based motions and conformations.

## Results

### NOLB nonlinear transitions improve the coverage of a wide range of functional motions

We assessed the nonlinear transitions computed by NOLB against 132 pairs of experimentally determined structures displaying a wide range of biologically relevant conformational changes. The root mean square deviation (RMSD) between the two structures range from 0.5 Å to 33 Å and the motions involve up to 80% of the protein atoms. For each pair, we defined a starting structure and a target structure. For a subset of 23 pairs (open-closed set, see below), each structure alternatively played the role of the starting structure and the target structure, resulting in a total of 155 predicted transitions. The transitions were systematically computed by using the ten lowest-frequency modes from the starting structure. In the general case, the target is not known and one has to sample the amplitudes of the modes. Here, we place ourself in a context where the amplitudes are determined by using the knowledge of the displacement between the starting and target structures (*SI Appendix, Eq. 13*). This allows obtaining the optimal (or close-to-optimal) transitions within our framework. We should mention that the method is very rapid. To compute all transitions reported here, it took us less than 5 minutes with one iteration, and about 15 minutes with five iterations, on a single CPU (*SI Appendix, Text 4*). The quality of a prediction was evaluated by computing its *transition coverage, i.e*. the relative RMSD explained by the prediction (*SI Appendix, Text 3A*). For instance, given a transition of 5 Å, a prediction achieving a coverage of 70% will produce a final conformation 1.5 Å away from the target structure.

On average, the NOLB nonlinear normal modes, computed with five iterations, covered 48% of the transitions. For comparison, the average coverage obtained with the classical linear modes was 40%. Moreover, the nonlinear predictions better approximated the transitions in 92% of the cases (Fig. 2A). The superiority of the NOLB predictions was also found significant without any update of the modes along the transition (*SI Appendix, Figure S1*). The anticoagulation factor VIIa (Fig. 1, right panel, and *Movie S1*) gives an illustrative example of a large transition (6.2 Å) upon binding to a cellular partner. The transition involves a complex motion of an “arm” comprising about 20% of the protein. The classical linear modes covered one third of the transition, producing a conformation 4.1 Å away from the target. The nonlinear NOLB normal modes achieved 44% coverage (Fig. 1, conformations in orange) and 79% after updating the modes 3 times (conformations in red). The final conformation is only 1.3 Å away from the target.

### NOLB extends the applicability of the normal mode analysis to localized motions

We collected the pairs of experimental structures from three benchmark sets (*SI Appendix, Text 1*) designed for different practical applications, namely NMA, docking and cryo-EM fitting. The first set comprises 23 proteins undergoing opening/closing motions. The vast majority of these transitions involve more than 40% of the protein atoms (Fig. 2B, dark grey bars). They can be explained by a few low-frequency normal modes (typically 1-3) computed from the open form (Fig. 3A, see bars in blue tones on the left). The second set contains 95 structural transitions associated to the binding of a protein partner. Such transitions are particularly challenging for protein docking applications (20, 21, 23, 50–52). Indeed, they are often induced by the spatial proximity of the partner (induced-fit mechanism), which makes them very difficult to estimate starting only from the knowledge of the unbound state. This set includes a great variety of motions, from highly localized to highly collective ones (Fig. 2B, medium grey bars). The transition coverage achieved by the classical linear normal modes is rather low (below 40%) for the majority of transitions (Fig. 3B, see colored bars). The few transitions explained by the first three modes (see right part of the plot) involve more than 70% of the protein atoms and are all antibodies. The third set comprises 14 transitions between either a crystal structure and a solution structure solved by cryo-EM or between two cryo-EM solution structures (*SI Appendix, Table S1*). Contrary to the other two sets, it is dominated by very small transitions (<2 Å, see *SI Appendix, Fig. S2*). The explanative power of the 10 first modes is very poor on this set (Fig. 3C).

Overall, the ability of the classical NMA to predict transitions is largely determined by the transitions’ collectivity degrees (Fig. 2A, see the color gradient along the x-axis). Highly collective motions tend to be very well predicted while localized motions tend to be poorly predicted, in agreement with previous works (14, 30, 53). We found that our nonlinear scheme permits to *go beyond this observation and extends the applicability of the NMA*. Indeed, the highest improvement of NOLB predictions over the classical NMA is observed for localized transitions, involving less than 20% of the protein atoms (Fig. 2A, grey dots). The transition coverage is more than twice as big, on average, reaching a maximum value of 60% (versus 40% for the linear normal modes). As illustrative examples, let us mention Ephrin B4 receptor (2hle:r), Cystein protease (1pxv:r), actin (1atn:r and 2btf:r) and Rabex-5 VPS9 domain (2ot3:l), which undergo localized motions upon binding to their partners (Fig. 3B, see the location of the orange and red segments). While the linear modes predict between 23 and 36% of their transitions, our nonlinear scheme predicts between 43 and 60% of them. The linear and nonlinear transitions predicted for actin are illustrated in Fig. 4A (in blue and orange, see also *Movies S2* and *S3*).

### Updating of the modes allows relaxing the elastic network’s constraints

The transitions predicted by the classical NMA strongly depend on the geometrical shape of the starting structure. This is particularly visible on the first test set, where the closed-to-open transitions are significantly worse than the open-to-closed ones (compare the two plots in Fig. 3A). Moreover, the number of transitions explained (at more than 40%) by the first three modes reduces from 18 to 8 upon starting from the closed structure. This effect was observed previously (30) and has a clear physical explanation connected to the limitations of the elastic network model. Indeed, in the closed state, this model contains a larger number of elastic links compared to the open state. Therefore, it is more difficult to produce a large deformation along a few directions from this more constrained starting point.

By re-computing the modes along the transition, our iterative scheme permits to overcome this limitation. Namely, it increases the coverage in the closed-to-open direction from 53% to 61%, on average (Fig. 3A, see the location of the red segments on the right). This result can be explained by the fact that, at each iteration, some elastic links are removed, alleviating some of the constraints that exert on the closed structure. As a consequence, the discrepancy between open-to-closed and closed-to-open predictions is largely reduced (compare the left and right plots). In four cases, namely the aspartate amino transferase (9aat-1ama), the maltodextrin binding protein (1omp–1anf), the alcohol dehydrogenase (8adh–2jhf) and the guanylate kinase (1ex6–1ex7), the coverage achieved in the two directions even becomes equivalent. The highest increase in coverage is obtained for the diaminopimelate dehydrogenase (1dap–3dap), from 31% without any update to 54% after one update (Fig. 4B, compare orange and red conformations, and see *Movie S4*).

### Nonlinear transitions better preserve the protein structure

Beyond improving the coverage of the transitions, the NOLB method produces motions that better preserve the overall protein structure and local topology. The predicted transitions visually look better than those produced with the linear extrapolation. The difference is particularly visible when dealing with large displacements. For instance, the calcium ATPase pump (1su4-1t5s) undergoes a large domain motion taking place during active transport. The RMSD between the open and closed conformations is of 13.5 Å. The nonlinear transition computed by NOLB, without any update of the modes, reaches a coverage of 58% while preserving very well the structure of the protein (Fig. 4C, on the left, and see *Movie S5*). Conversely, the linear transition attains only 49% coverage and it visibly distorts the cytoplasmic headpiece, where the closing motion takes place (Fig. 4C, on the right, and see *Movie S6*).

### Very small transitions remain difficult to predict

The third benchmark set, representing transitions between crystallographic structures and structures found in solution, was particularly challenging for the classical NMA and the NOLB method (Fig. 3C). On average, the ten first modes contribute to only 12% of the structural transitions. The improvement brought by the NOLB predictions is very limited. Using 40 active modes significantly improves the coverage, up to 26% of the transition. Nevertheless, this percentage still seems very low compared to the previous test cases and also for any practical applications. Most of the conformational changes from this set are of very small amplitude, even below 1 Å (*SI Appendix, Fig. S2*). This may explain the low coverages we obtain. The average RMSD between the final conformation computed by NOLB and the target structure is 1.9 Å. The only very large transition of the set, namely that of the GroEL chaperone (1svt:A–3cau:A), with an initial RMSD of 12.1 Å, is predicted at 58% by the NOLB iterative scheme.

## Conclusions

This work revisits the formalism of normal modes and demonstrates its applicability to the previously inaccessible cases of localized motions. Specifically, it critically assesses the relevance of the normal mode analysis to the computation of various structural transitions in biological macromolecules. Our results challenge the long-standing belief that the lowest-frequency modes can only describe collective transitions. Indeed, we show that nonlinear normal modes can also approximate local deformations such as loop motions. Moreover, iterative recomputation of the normal modes relaxes constraints imposed by the geometry of the protein and allows pushing the transitions even further. Another important advantage of our method is that the predicted conformations have a much better local geometry than those resulting from linear NMA perturbations.

Small structural changes, for example those present in the Cryo-EM 2015/2016 Model Challenge benchmark, still remain very difficult to predict with the NMA formalism. Indeed, in this case adding nonlinearity and iterative computations did not improve the results significantly. Activating a much larger number of modes can help approximating the transitions, but at the expense of a significant computational cost. Indeed, the full diagonalization of the Hessian matrix scales as *O*(*N*^{3}) with the number of degrees of freedom *N*. Therefore, it becomes preferable to use MD-based or other stochastic optimization techniques, *i.e*. simulated annealing, with the full range of degrees of freedom.

Our method is very CPU and memory efficient – it took us about 9 minutes to compute the nonlinear structural transitions for all proteins from the PPDBv5 (460 in total) set on a desktop computer. This implies that the method can be applied on a very large scale. For instance, it can be used to model flexibility in docking calculations or to generate putative conformations that can be targeted by small molecules.

## Supporting Information (SI)

SI contains Figs. S1 to S2, Table S1, captions for Movies S1 to S6, and references for SI citations.

## SI Movies

Supporting movies S1-S6 can be found online.

## Materials and Methods

*SI Appendix, Text* includes detailed descriptions of the datasets (*SI Appendix, Text 1*), the computational framework (*SI Appendix, Text 2*), the assessment of the transitions (*SI Appendix, Text 3*) and the computational details, including command lines used to generate the transitions (*SI Appendix, Text 4*). The method is freely available as a part of the NOn-Linear rigid Block (NOLB) package at https://team.inria.fr/nano-d/software/nolb-normal-modes/. Scripts used to produce the reported data are also available at this address.

## Supporting Information Text

### 1. Datasets

#### Test set 1

For the first test set we used protein structures from the iMod benchmark (1) prepared by Chacón and colleagues (available at http://chaconlab.org/multiscale-simulations/imod/imod-donwload/item/imod-benchmark). It was recently used to assess three coarse-grained elastic network model-based flexible fitting methods (2). It comprises 23 proteins, each given in “open” and “closed” conformations, and represents a wide variety of macromolecular motions. While hinge motions are largely represented, the dataset also comprises shear and other complex motions. The structures were extracted from the molecular motions database MolMovDB (3). All of them have less than 3% Ramachandran outliers (as computed by the MolProbity program (4)), do not have any broken chain or missing atom. The average displacement for this test set is 5.1 ± 3.0 Å.

#### Test set 2

For the second test set we have chosen some examples from the Protein-Protein Docking Benchmark v5 (PPDBv5) (5). This benchmark contains 230 protein complexes with at least one of the partners solved in both bound (complexed) and unbound (free) states. All structures have a resolution better than 3.25 Å, and some of them contain more than one chain. We extracted 95 proteins with C_{α} RMSD displacements between the two states above 2 Å. This test set is well suited for assessing the range of applicability of flexible docking methods (6). We should also mention that some of the structure pairs can be classified as open-closed pairs. The average displacement for this test set is 4.0 ± 3.9 Å.

#### Test set 3

For the third test set we have selected seven cases from the Cryo-EM 2015/2016 Model Challenge (7). The initial set was comprised of eight cases, but we decided not to consider one of them, namely the 70S ribosome. The selected cases are listed in Table S1. Each one of them comprises one or several starting structures solved by X-ray crystallography and one or several target structures corresponding to a Model Challenge map. In one case (*γ*-secretase) we did not find homologous X-ray structures for the starting state and used several cryo-EM structures instead. The map resolutions range from 2.2 to 4.3 Å. The average C_{α} RMSD displacement between the two states is 2.6 ± 3.2 Å.

### 2. Computational model and framework

#### A. NMA theory

Let us consider a molecular system with ** N** atoms at an equilibrium position

*q*_{0}∈ ℝ

^{3N}. Let

**: ℝ**

*V*^{3N}↦ ℝ be the potential energy of the molecular system. Let us also introduce

**∈ ℝ**

*q*^{3N}, a small time-dependent displacement of the system around

*q*_{0}. The potential energy

**in the vicinity of**

*V*

*q*_{0}can thus be given by its quadratic approximation, which allows to

*analytically*solve the Newton’s equation of motion, where

**is the diagonal mass matrix, and**

*M***is the**

*H**Hessian matrix*of the potential energy

**evaluated at the equilibrium position**

*V*

*q*_{0}. We then compute the square matrix of eigenvectors

**and the diagonal matrix of eigenvalues Λ of the**

*L**mass-weighted*Hessian

*H*_{w}=

*M*

^{−1/2}

*HM*

^{−1/2},

Let us now introduce ** η** ∈ ℝ

^{3N}, a projection of

**into the eigenspace of**

*q*

*H*_{w}, and (λ

_{i})

_{i=0…3N}, the diagonal values in λ. Then, left multiplying Eq. 1 by

*L*^{T}

*M*

^{1/2}gives the following system of uncoupled equations, which can be solved analytically. We will refer to the columns of the

*M*^{−1/2}

**matrix as to**

*L**Cartesian linear normal modes*. We should specifically mention that these normal modes are not generally orthogonal, unless all the masses in

**are equal to each other.**

*M*#### B. The RTB projection method

Many methods have been proposed to reduce the dimensionality of the NMA diagonalization problem. For example, Noguti and Gõ (8) and Levitt et al. (9), and later Ma et al. (10), Mendez and Bastolla (11), and Chacón et al. (1) explored the NMA approach in internal coordinates. However, an orthogonal idea of reducing the dimensionality of the original system by coarse-graining its representation has gained much more popularity. One of the first coarse-graining methods was the *rotation translation blocks* (RTB) approach introduced by Durand et al. (12) and further developed by Tama et al. (13) and Li and Cui (14). In this method, individual or several consecutive amino residues are considered as rigid blocks that can only exhibit rotational and translational motions (12, 13). The transition from the RTB coordinate system, consisting of ** n** rigid blocks with 6n DOFs to the all-atom coordinate system with 3N DOFs is performed by an

*orthogonal projection matrix*

**∈ ℝ**

*P*^{3N×6n}, whose detailed form can be found elsewhere (15). We will only mention that this projection matrix is obtained by writing down the conservation laws of the linear and the angular momenta for a rigid block in mass-weighted coordinates (12).

The normal modes are then computed by the diagonalization of the RTB-projected mass-weighted Hessian,
where is the matrix composed of the RTB normal modes with the corresponding diagonal eigenvalue matrix . The all-atom normal modes ** L^{w}** (in mass-weighted coordinates) are then obtained as a projection of the RTB normal modes according to

#### C. The nonlinear NOLB NMA method

Molecular vibrations in a multi-dimensional harmonic oscillator are all uncoupled and can be found by solving Eq. 3. Diagonalization of the RTB-projected mass-weighted Hessian gives a set of eigenvectors that are composed of *instantaneous linear velocities* and *instantaneous angular velocities* of individual rigid blocks. For a rigid block with mass ** M_{b}** and inertia tensor

**, we first compute these in non-mass weighted coordinates as follows,**

*I*Then, given a deformation amplitude ** a**, the translational increment in the rigid block’s position and the angular increment in its orientation Δ

*ϕ*can be computed as where the rigid block’s rotation is described with a unit axis passing though its center of mass (COM) , and an angle Finally, we rewrite the increment in the rigid block’s position as a sum of two orthogonal vectors, where is orthogonal to , and is collinear to . We then represent the -related motion as a pure rotation about a new center given as such that the final rigid block’s positions is expressed through the initial positions as where is the rotation matrix describing rigid block’s rotation about an axis by an angle Δ

*ϕ*. More details can be found in the original NOLB publication (15). It is easy to demonstrate that this is the only type of rigid-body motion that conserves the original kinetic energy. Indeed, using the parallel axis theorem it is readily seen that the initial energy contribution of linear velocity is transformed into equivalent contribution from the angular velocity. As it has been noted by Juan Cortés from LAAS-CNRS in a private communication, the presented theory can be also formulated in terms of

*screw algebra*, where a screw is a six-dimensional vector constructed from a pair of three-dimensional vectors, linear and angular velocities.

#### D. Linear structural transitions

Let us assume that we have two conformations of the same molecular system with the known correspondence between the atoms in the two conformations. The correspondence can be robustly deduced using, *e.g*., sequence alignment of the two systems, if they are composed of not fully identical proteins. Let us also assume that we are given the displacement vector between the two conformations after their optimal rigid superposition. It is easy to demonstrate that in this case, the COMs of the two conformations match. We can now find the minimum root-mean-square deviation (RMSD) between the two conformations, if one of them is allowed to deform along its ** M** lowest normal modes

**∈ ℝ**

*L*^{3N×M}, which are not necessarily orthonormal, as where

**is the number of atoms in the system,**

*N***is the identity matrix, and**

*I***are the optimal amplitudes of linear deformations given as**

*a*If the normal modes ** L** are orthonormal (which happens if the mass matrix in Eq. 3 is identity), the above equation simplifies to

It can be readily seen that if all the 3** N** modes are activated, the matrix

**becomes square,**

*L***turns into an identity, and the RMSD reduces to zero.**

*LL*^{T}#### E. Nonlinear structural transitions

The NOLB NMA method produces nonlinear deformations. Therefore, the RMSD equation 11 presented above would not be exact in this case. However, given the displacement vector between the two conformations as in the previous case, we can still construct a deterministic deformation trajectory and compute the corresponding RMSD. We should specifically mention that rotation operators do not commute, and thus the result of application of two rotations would generally depend on the order of these operators. Therefore, to make the method deterministic, when combining multiple nonlinear motions corresponding to different normal modes, we have chosen to always apply the slower modes first. This choice is dictated by the fact that slower modes result in larger amplitudes of thermal fluctuations. Algorithm 1 lists steps producing a nonlinear deformation towards the target structure. In this algorithm, we use an iterative procedure, and at each step of the iteration we approximate the amplitudes of the nonlinear deformation by the analytically computed linear amplitudes using Eq. 12. This approximation will not be valid at large deformation amplitudes ** a**. Therefore, if the RMSD computed for the linear approximation (Eq. 11) is larger than a certain threshold (we have chosen 0.1 Å), we split the deformation into smaller pieces. Each piece is computed based on the values of the linear amplitudes scaled in such a way that the total linear RMSD of the deformation equals to the threshold value of 0.1 Å. We terminate the algorithm when the maximum number of iterations is exceeded (100 by default), or if the relative deformation becomes smaller than a tolerance of 1

*e*− 6.

The abovementioned algorithm can be iterated multiple times. At each iteration, the elastic network model is updated and the normal modes are recomputed, as described in Algorithm 2. On-the-fly normal mode re-computation has been previously proposed in the context of cryo-EM fitting and morphing applications (1, 16, 17). We should specifically note that our nonlinear model and the way we assess the predicted transitions naturally overcome the limitations of classical NMA schemes highlighted in Jernigan et al (18, 19) when the transition involves a substantial protein domain rotation.

### This deterministic algorithm produces a nonlinear structural deformation towards the target conformation

### This extension of the previous algorithm produces a nonlinear structural deformation with multiple updates of the Hessian matrix

#### F. Potential function

Classical NMA methods can use any potential function, provided that it corresponds to the equilibrium position of the molecular system. Some recent developments can also assume non-equilibrium state of the initial system (20). In our method we use an all-atom anisotropic network model (ANM) (21, 22), where the initial structure is always at equilibrium. The all-atom ANM has the following potential function,
where ** d_{ij}** is the distance between the

**and the**

*i*^{th}**atoms, is the reference distance between these atoms, as found in the original structure,**

*j*^{th}*γ*is the spring constant, and

**is a cutoff distance, typically between 3.5 Å and 15 Å. By default we let this value to 5 Å. However, if there are loosely connected structural fragments in the system, it makes sense to increase this value to 10 Å or even more. The Hessian matrix corresponding to this potential function is composed of the following blocks (21–23), where**

*R*_{c}*x*=

_{ij}*x*−

_{i}*x*=

_{j}, y_{ij}*y*−

_{i}*y*, and

_{j}*z*=

_{ij}*z*−

_{i}*z*. To rapidly compute this matrix, we use an efficient neighbor search algorithm (24).

_{j}### 3. Assessment of the transitions

#### A. Transition coverage

To assess the quality of the computed transitions, we measure the extent to which they cover the conformational deviation between the aligned starting and target states. Transition coverage is computed as
where RMSD_{i} is the initial root mean square deviation between the starting and target structures, and RMSD_{f} is the deviation between the final structure obtained from the computed transition and the target structure. The coverage varies between 0 (null prediction) and 1 (perfect prediction).

#### B. Collectivity

Collective motions can be characterised by their *collectivity κ*, which is proportional to the exponential of the information entropy (25). The collectivity of a transition between two structures of a molecule with

**atoms can be computed (26) as where**

*N***are scaled Cartesian displacements of individual atoms, , with the normalization factor**

*q*_{i}**taken such that .**

*α***gives an effective number of nonzero displacements . Thus,**

*N*_{κ}**is confined to the interval {1/**

*κ***;1}. If**

*N***= 1, then the corresponding transition is maximally collective and has all the displacements identical, which happens for rigid-body motions, for example. In the limit of an extremely localized motion, where only one single atom is affected,**

*κ***is minimal and equals to 1/**

*κ***. In a similar way, one can estimate the degree of collectivity of a normal mode. For example, collectivity of the**

*N***th mode is given by the same equation above provided that**

*j***are now the scaled normal mode’s displacements,**

*q*_{i}### 4. Computational details

For all the computations we used the NOLB package that rapidly performs linear and nonlinear NMA in Cartesian coordinates (15, 27). Given two states of a molecule in the PDB file format, we performed all the computations using the following commands, “NOLB initial.pdb final.pdb --linear” for the calculation of linear structural transitions, and “NOLB initial.pdb final.pdb --nlin” for the calculation of nonlinear structural transitions. Additional local minimization can be applied (with the “-m” flag) to keep the bond length and angles near the equilibrium positions. By default, structural transitions are computed between all the C** α** atoms of the two structures, whose residues are aligned to each other using only sequence information. For a few cases from the test set 2, we identified ambiguities in the alignment leading to incorrect results. To resolve these cases, we performed an iterative alinement with 5 additional cycles, progressively excluding atoms with RMSD above a certain threshold (2 Å for 1he8:r, 2z0e:l and 1nw9:r, 4 Å for 1azs:r) at each cycle. The method is available free of charge for academic users on the three main platforms, MacOS, Linux, and Windows. We should also mention that the method is very rapid. For example, it took about 9 minutes to compute the nonlinear structural transitions for all proteins from the PPDBv5 (460 in total) with the local minimization enabled and using the 10 lowest-frequency normal modes on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. Performing 5 iterations of the multi-diagonalization scheme increased the computing time to about half an hour.

**Movie S1. Nonlinear transition predicted for coagulation factor VIIa. The starting and target structures are colored in grey and black, respectively. The part of the transition in orange is obtained from the normal modes computed on the starting structure. The part in red is obtained by updating the modes three times. This transition was produced using the command “NOLB 1fak_r_u.pdb 1fak_r_b.pdb -n 10 --nlin --nIter 3 -m”.**

**Movie S2. Linear transition predicted for actin. The starting and target structures are colored in grey and black, respectively. The transition obtained with the classical linear modes is colored in blue. Please note the incorrect size of the loop that increases with the progression of the transition. This transition was produced using the command “NOLB 1atn_r_u.pdb 1atn_r_b.pdb -n 10 --linear --trajectory -s 33”.**

**Movie S3. Nonlinear transition predicted for actin. The starting and target structures are colored in grey and black, respectively. The transition obtained with NOLB nonlinear modes is colored in orange. This transition was produced using the command “NOLB 1atn_r_u.pdb 1atn_r_b.pdb -n 10 --nlin -m”.**

**Movie S4. Nonlinear transition predicted for diaminopimelate dehydrogenase. The starting and target structures are colored in grey and black, respectively. The part of the transition in orange is obtained from the normal modes computed on the starting structure. The part in red is obtained by updating the modes five times. This transition was produced using the command “NOLB 1dap.pdb 3dap.pdb -n 10 --nlin --niter 5 -m”.**

**Movie S5. Nonlinear transition predicted for the calcium ATPase pump. The residues undergoing the highest displacements are highlighted in pink color and stick representation. This transition was produced using the command “NOLB 1su4.pdb 1t5s.pdb -n 10 --nlin”.**

**Movie S6. Linear transition predicted for the calcium ATPase pump. The residues undergoing the highest displacements are highlighted in magenta color and stick representation. Please note unphysical scaling of the highlighted fragments with the progression of the transition. This transition was produced using the command “NOLB 1su4.pdb 1t5s.pdb -n 10 --linear --trajectory -s 60”.**

## Footnotes

A.H. developed the diagonalization scheme. S.G proposed the twist method and coded the algorithm. E.L. performed the tests and plotted the figures. S.G. and E.L. wrote the manuscript.

The authors declare no conflict of interest.