Non-productive Binding Modes as a Prominent Feature of Aβ1-40 Fiber Elongation: Insights from Molecular Dynamics Simulation

Amyloid formation has been implicated in a number of neurodegenerative diseases. The elongation of amyloid fibers is thermodynamically strongly favorable but kinetic traps exist where the incoming monomer binds in an incompatible conformation that blocks further elongation. Unfortunately, this process is difficult to follow experimentally at the atomic level. It is also too complex to simulate in full detail and thus so far has been explored either through coarse-grained simulations, which may miss many important interactions, or full atomic simulations in which the incoming peptide is constrained to be near the ideal fiber geometry. Here we use an alternate approach starting from a docked complex in which the monomer is from an experimental NMR structure of one of the major conformations in the unbound ensemble, a largely unstructured peptide with the central hydrophobic region in a 310 helix. A 1000 ns full atomic simulation in explicit solvent shows the formation of a metastable intermediate by sequential, concerted movements of both the fiber and monomer. A Markov state model shows the unfolded monomer is trapped at the end of the fiber in a set of interconverting anti-parallel β-hairpin conformations. The simulation here may serve as a model for the binding of other non-β-sheet conformations to amyloid fibers.


Introduction
Neurodegenerative diseases are often characterized by the self-assembly of specific proteins into a fibrillar, "misfolded" amyloid state. The standard model for amyloid growth 1 predicts that the initial nucleation is rate-limiting; the extension of fibers is assumed to be a rapid process.
Recent experiments, however, have challenged this assumption. The dissolution of radiolabeled monomers from the ends of unlabeled fibers 2 supports a two-stage binding mechanism in which the fast reversible association of monomeric Aβ to fibers ("docking") is followed by a second, slower step, interpreted as a conformational transition of the incoming monomer on the fiber surface that results in essentially irreversible binding ("locking"). 3,4 Fluorescence microscopy indicates fiber elongation proceeds in a stop-and-go manner with bursts of fast elongation followed by periods of slow or absent growth [5][6][7][8] , which may be interpreted as failure of the incoming monomer to adopt the correct conformation after binding. The incorrectly docked monomer then arrests growth by preventing other incoming peptides from attaching to the fiber. 4 A structural understanding of this process at the atomic level has proven very difficult to achieve experimentally as the growing fiber ends comprise only a small fraction of the total sample. In the absence of experimental data on the molecular level, molecular dynamics (MD) has been used to explore the dock-lock hypothesis. The fiber elongation process is too complex and occurs on too long of a time-scale to fully model at the atomic level. Coarse-grained results suggest a rugged landscape for fiber elongation [9][10][11][12][13][14] with both productive and non-productive binding modes. 15 On the other hand, constrained atomistic simulations starting from an geometry favorable for binding often show a smooth, downhill trajectories without prominent metastable states. 16 Instead of the low-energy kinetic traps found in the coarse grained simulations of random coil interactions with amyloid fibers, diffusion and dehydration of the fiber tip are the primary obstacles to fiber elongation in these simulations. It is an open question how the binding of states that do not closely match the fiber conformation may initiate amyloid fiber elongation.
To explore this question for the Aβ1-40 peptide, we take a different approach by describing the system atomistically in explicit solvent but starting the simulation from a docked state in which the bound monomer is in a partially helical conformation reflective of some of the states in the unbound ensemble. The system eventually reaches a metastable state in which the C-terminal strand of the amyloid fiber is wrapped around a collapsed coil conformation of the monomer. This result confirms non-productive binding modes may be a prominent feature of Aβ fiber elongation.

Results and discussion
The simulations start from a NMR structure of the Aβ40 monomer in aqueous solution. 17 The Aβ40 monomer does not have a true structure in the traditional sense, but a wealth of experimental data has shown that certain conformations are more likely than others. 18,19,20,21 Although estimates of the actual conformations in the ensemble vary considerably, the central hydrophobic core (CHC, Leu17-Ala21) often folds into a helix in other experimental and computational studies. 22,23,21,24 The remainder of the peptide is largely unstructured and exposed to solvent, with the exception of the hydrophobic C-and N-ter both of which make contact with the CHC. The monomer therefore has a defined surface for docking, although the contacts defining the structure are few and weak.
The docked complex of the Aβ1-40 monomer with the amyloid fiber is shown in Figure 1A.
Solid State NMR measurements indicate that the two β-strands of the fiber are displaced relative to each other. 25 This stagger creates two asymmetrical shelfs at the ends of the fiber, one exposing the sidechains of the N-terminal strand and another the sidechains of the C-terminal strands. The monomer binds to the C-terminal end with the 310 helix resting on the inward facing hydrophobic residues (A22, I24, and L26). The helix is at an angle to the fiber with a slight tilt to the C-terminal strand of the fiber. The largely disordered termini of the Aβ1-40 monomer, which are poorly defined in the NMR structure, primarily sit above the surface of the fiber and make few contacts with the fiber surface, with the exception of R5, which makes a salt bridge to E14.

Unfolding of Aβ40 in a stepwise transition
The docked complex was formed by rigid body docking of the experimental structured of the monomer and fiber. However, the Aβ monomer is flexible and is expected to change as it adapts itself to the surface of the fiber. Moreover, the ends of the fiber only form a small fraction of the sample and are therefore not well defined in the experimental structure. As a result, the initial complex in the simulation is not likely to be stable and will evolve as the monomer unfolds on the surface of the fiber and the ends of the fiber fray and adapt their equilibrium positions. This transition is conveniently analyzed through cluster analysis. Clustering aims to group the conformations sampled during the trajectory to reflect the basins of the free energy landscape. Clustering requires a structural distance metric. Due to its disordered nature, the conformational ensemble is most efficiently analyzed by clustering the conformations by pairwise RMSD. Grouping conformations by clustering offers a suitable mechanism to identify the recurrence and transition probabilities during the time course of simulation. 26 Geometrical linkage clustering collects conformations, which are structurally similar and lie in the same energy basin of the free energy surface.
Visually, cluster analysis shows a stepwise transition away from the initial docked complex as the simulation progresses. The stepwise changes in this transition are easily tracked by changes in the inter-monomer ( Figure 2) and fiber-monomer contacts (Figure S1-S3). Early points can be clustered into 5-8 closely related groups containing near-identical structures. A transition away from the initial unbound structure begins to take place within the first 100 ns.
The first event is the sudden hydrophobic collapse of the protein around 50 ns, which is reflected by a decrease in both the radius of gyration ( Figure S4A) and the solvent accessible area ( Figure   S4B). At the same time, the binding of the monomer and conformational change within a monomer, drive a corresponding structural change in the fiber. As the monomer compacts on itself, the leading C-terminal β-strand loses contact with the monomer and becomes exposed (Figures S1). The loss of the leading C-terminal β-strand drives a structural transition as the monomer and fiber change conformation to adapt themselves to each other (Figures S5).
Interestingly, there is similar movement on the opposite side of the fiber. Similar to previous MD results by Okumura and Itoh,27 the loop connecting the C-and N-terminal β-strands at the opposite end begins to coil against itself within the first 100 ns (Figures S1-S3) mirroring the coil found between the monomer and fiber at the other end that forms as the simulation progresses ( Figure 1D).
After the initial hydrophobic collapse, more contacts start to develop between the hydrophobic residues in the N-and C-terminal regions and the CHC region of the monomer, driving a structural transition to a more collapsed conformation that becomes more prominent as time goes on. During this transition, a turn forms between D23 and S26 and another between H13 and L17. Simultaneously, the N-terminus disengages from the CHC at around 500 ns. This movement is guided by a corresponding movement of the leading C-terminal β-strand to cover the part of the surface of the monomer not in contact with the fiber. The effect of these two transitions is to form a U-shaped like structure that differs from a normal anti-parallel β-hairpin by the intercalation of some of the chains between the strands. This conformation bears a slight resemblance to the repeating steric zipper found in the amyloid structure, with the exception that the secondary structure is more distorted than a canonical β-sheet and the loop connecting the βsheets is much shorter than in the amyloid fiber. The orientation of the monomer is also inconsistent with the final "locked" conformation with the hairpin oriented nearly perpendicular to the fiber edge. Linkage cluster analysis 28,29 indicates this specific cluster grows at the expense of others as time progresses, eventually dominating the smaller, more structurally diverse clusters found in the 100 ns time frame (Figures S6-S7). This specific family of U-shaped structures therefore represents a distinct intermediate through which multiple conformations converge.
The system continues to evolve before eventually reaching stability around the 900 ns mark.
The most prominent change in the monomer is at the C-terminal end, which bends against itself near the 900 ns mark to form another hairpin structure. This movement is matched by a corresponding movement of the leading C-terminal β-strand of the fiber into contact with the Cterminus of the monomer. Once this occurs, the system stabilizes with time. At 900 ns, the higher order clusters (4-5 groups) are of comparable size to the previous cluster and no significant shift occurs in the range of 900 ns to 1 s time scale, suggesting near-convergence ( Figure S7), at least within the limited time frame of the simulation.

Transient metastable states found over the free energy landscape
Individual conformations are limited in their ability to describe a complex, dynamic system such as Aβ fiber binding. The analysis of the structure of cluster centers suggests the conformational transition from the bound to unbound structure is primarily driven by hydrophobic collapse.
Making this assumption, energy landscapes are constructed using the radius of gyration, Rg, as one collective variable and the RMSD from the initial docked structure as the second variable to probe the conformational space within the binding process ( Figure 3). The validity of an energy landscape constructed as in this manner is supported by a scree plot, which shows that 81% of the collective variance in RMSD can be explained by the first two principal components ( Figure   S8). To highlight the kinetic aspect of the transition, we can partition into the energy landscape into the three time regimes that were previously identified in Figure 2.
The energy landscape ( Figure 3A) from the first partition has several distinct energetic minima, a broad basin with an energy of -17.6 kJ/mol closely resembling the starting structure and several sharp minima with a slightly lower free energy that correspond to more compact structures with an alternate hydrogen bonding structure. In these alternate conformations ( Figure   3A left), the 310 helix of the initial structure unwinds slightly and the hydrogen bond between Asp23 and Ser26, which defines the loop that terminates the 310 helix, is lost. The N-terminus also moves away from the CHC and instead folds against the N-terminal part of the 310 helix.
Outside the broad basin near the central minimum, the overall surface is rough, with high-energy barriers between metastable states.
In contrast to the rough energy landscape found in the initial phases of the simulation, the energy landscape from the 100-500 ns time is smooth with two deep but broad minima separated by a low energy barrier, one dominated by a series of bend structures ( Figure 3B right) and another dominated by the hydrophobic collapse of the C-terminus against the CHC ( Figure 3B left). A conformational change in the loop region from V24-I32 converts either of these structures into the minimum energy conformations found between 500-1000 ns. The conformational change is driven by a change in the hydrogen-bonding network of the polar side chain of Asn27. 22 In the earlier structures, the Asn27 side chain is hydrogen bonded to the adjacent backbone residues. After the appearance of dominant cluster at ~500 ns (as discussed in Figure S7), the side chain moves in between the two strands of the loop, creating a hydrogen bond network between the backbone amides of Asn27 and Phe19; and Glu22 on one strand of the loop and Gly29 on the other. The side chain of Asn27 therefore appears to act as a switch to converting one conformation to another. Importantly, the hinge region at Gly25-Asn27 is wellmaintained in all the structural ensembles, which has been reported as key conformational change promoting aggregation in several previous reports. 30,31 Turns and bending within the regions Glu7-Gly9, His13-Gln15, and Val24-Asp27 also correlate well with earlier findings. 32,33,34,35 The transition to final conformation is largely mediated by a change on the fiber side. The hydrophobic Val24-Ile31 loop, which had previously become disordered with the loss of the final β-strand, at around 700 ns forms a structured loop that packs against the end of the pseudo-hairpin of the monomer. The final state is at the bottom of a broad energetic minimum that represents a large ensemble of closely related structures, reflecting the fact that convergence has been reached.

Validation of the converged structure with STD-NMR
The accuracy of the docked model was checked by STD NMR, which detects transient contacts between the Aβ1-40 monomer and high molecular weight species by spin diffusion ( Figure 1). The interpretation of the spectrum was complicated by line broadening from field inhomogeneity caused by the large fiber particles resulting in severe signal overlap. Due to the degree of signal overlap, peaks could not be uniquely assigned and a quantitative analysis is impossible. Nevertheless, even with the overlap, a qualitative estimation of the relative strengths of interactions among different types of residues could be made.
The strongest STD signal arises from the terminal protons from branched aliphatic residues: γ protons from Val12 and Val18, δ protons from Leu17 and Leu34, suggesting that these hydrophobic residues are most frequently in contact with the fiber and likely contribute most of the bulk of the binding energy ( Figure 1B). The aromatic rings of the Phe19 and Phe20 residues also make frequent contacts with the fiber surface. Both findings are reflected in the contact map of the MD simulation ( Figure S6), at least on a qualitative level. Specifically, F20 makes multiple contacts with the fiber. The terminal δ and γ of Ile31 and Ile32 and the β protons of Ala21 and Ala30 show a somewhat weaker STD signal. The STD effect is noticeably weaker in the Hα region 4.0-4.5 ppm. The lack of an STD signal in the Hα region indicates most of the contacts of the fiber are to the side chain of the Aβ40 monomer and not the backbone ( Figure 1C and 1D), in agreement with previous CPMG data. 36

Estimates of Markov States
Based on the conformational analysis of the trajectory, convergence is apparently achieved at ~900 ns from a transition from a metastable state occurring at around 400 ns. The apparent convergence does not mean that the system is static merely that the energy landscape is no longer changing on the nanosecond to microsecond time-scale. Complex biological systems often have multiple energy wells separated by high barriers that are inaccessible to molecular dynamics simulations with conventional sampling. The transition from the initial docked state to the final locked conformation takes place on a time-scale of tens of seconds, implying a series of rare events are evolved that are not sampled with a 1 µs molecular dynamics simulation.
To further explore the conformational space at convergence, we first performed accelerated molecular dynamics (aMD) to starting from ten conformations taken from snapshots at equal intervals within the 400 to 900 ns time segment. This method is an enhanced sampling technique that reduces the energy barrier that separate the states in the conformational hyperspace to favor transitions between energy minima. 37,38 A dual boosting scheme using the entire potential and a separate boosting term to the torsional energy terms of the peptide specifically was employed to accelerate both large scale movements of the protein and solvent (Table 1).

Materials and Methods
All the simulations were carried out using the Amber14 suite with the ff99SB-ILDN 1 force-field on a GPU workstation having a C600/X series Tesla card. Initial coordinates for the molecular modelling study were collected from the Protein Data Bank, accession codes 2LFM and 2M4J, for the partial helical 17  Classical MD simulation -We performed an unbiased simulation procedure within the uniform density approximation with application of periodic boundary conditions (PBC) in the threedimensional space. The simulation began with solvation of the complex with TIP3P water models in a truncated octahedron water-box with an edge-length of 10 Å from edge solute atom. 58 Neutralization of the solvent system was achieved by addition 7 Na + counter ions added using a Monte-Carlo simulation method. 59 10,717 water molecules were added for a total of 35,620 atoms. The length of the periodic box is 8 Å from the edge atoms of the docked complex.
The PME method with cubic β-spline interpolation and a grid spacing of 1 Å was used for calculation of the electrostatic interactions. 60 For the computation of the Lennard-Jones potential and Coulomb interactions, the non-bonded cut-off was fixed at 1.0 nm. The SHAKE algorithm was used for re-ordering the hydrogen bond length during simulation with an integration time step of 2 fs and a relative tolerance value of 10 -5 Å. 9 Energy minimization was first performed to relax the water molecules, holding the protein and ions of the system constant, for 2000 steps using the steepest descent algorithm. Following this, the protein system was relaxed, restraining the water molecules for equivalent steps. Finally, the entire system was minimized with 2000 steps using the steepest descent algorithm, followed by conjugate gradient minimization until the RMS fluctuations and gradient were stable. The system was then heated to bring the temperature to 300 K using Langevin dynamics. 61 During the uniform density approximation, constant pressure dynamics were performed with isotropic position scaling using the Berendsen barostat to maintain correct density. 62 The temperature in the production run was controlled using the canonical (constant T) ensemble. The total production run was 1000 ns (1 µs). The analysis of trajectory frames was performed using tools MMTSB for cluster analysis, 12 PCAsuite for principal component analysis  (2) where E refers to the threshold energy, which governs the potential surface affected by the boost, whereas α is the acceleration factor determining the shape of the modified potential. Values of E and α are calculated from number of atoms and the total energy from a 10 ns preparatory simulation and can be found in Table 1.