The kinetic landscape of nucleosome assembly: A coarse-grained molecular dynamics study

The organization of nucleosomes along the Eukaryotic genome is maintained over time despite disruptive events such as replication. During this complex process, histones and DNA can form a variety of non-canonical nucleosome conformations, but their precise molecular details and roles during nucleosome assembly remain unclear. In this study, employing coarse-grained molecular dynamics simulations and Markov state modeling, we characterized the complete kinetics of nucleosome assembly. On the nucleosome-positioning 601 DNA sequence, we observe a rich transition network among various canonical and non-canonical tetrasome, hexasome, and nucleosome conformations. A low salt environment makes nucleosomes stable, but the kinetic landscape becomes more rugged, so that the system is more likely to be trapped in off-pathway partially assembled intermediates. Finally, we find that the co-operativity between DNA bending and histone association enables positioning sequence motifs to direct the assembly process, with potential implications for the dynamic organization of nucleosomes on real genomic sequences.


Introduction
Nucleosomes are the structural units of chromatin, each consisting of ~147 base pairs of DNA wrapped around a protein histone octamer made of H3/H4 tetramer and two H2A/H2B dimers [1,2]. Nucleosomes not only enable the efficient compaction of the long Eukaryotic genomic DNA into the nucleus, but also influence key biological process by controlling the access to DNA by other proteins [3], for example during transcription [4]. Such important functions rely on the establishment of precise epigenetic patterns of nucleosome positions, histone variants and histone modifications, which mark the distinction between active and repressed genes, and between different genomic regions, e.g. promoters and gene bodies [5]. In order to preserve the identity of each cell, essential in multi-cellular organisms, this genomic organization has to be maintained [6] despite the continuous disassembly of nucleosomes over the course of DNA transcription [7] and replication [8]. However, the molecular details of nucleosome assembly and disassembly have not been yet fully elucidated.
In vitro experiments showed that the salt-induced disassembly of nucleosomes starts with DNA unwrapping and it is then followed by the progressive loss of the two H2A/H2B dimers [9], therefore forming firstly an asymmetric hexasome, and finally a tetrasome. Assembly proceeds in the opposite direction, starting with the binding of a H3/H4 tetramer, then followed by the binding of a H2A/H2B dimer on each side [10]. This order of events is due to the specific geometry of the nucleosome (with the H2A/H2B dimers surrounding the central H3/H4 tetramer), and to the more extensive network of protein-DNA hydrogen bonds on H3/H4 relative to H2A/H2B [1,11]. A similar assembly pathway was suggested to occur also in vivo [12], except that in this case the process is facilitated by histone chaperones limiting non-nucleosomal histone-DNA interactions [12], or by chromatin remodelers [13]. However, many experiments paint a picture of assembly (or disassembly) more complicated than a simple 2-step process involving tetrasomes, hexasomes and complete nucleosomes, highlighting the existence of various partially-assembled or non-canonical nucleosome structures, such as pre-nucleosomes [14], right-handed nucleosomes [15], remodelerassociated fragile nucleosomes [16], and partially opened nucleosomes [17]. Subnucleosomal conformations are also widely distributed across the genome in vivo [18], and their formation has been shown to depend in part on the underlying DNA sequence [19].
How such structures form and interconvert between each other remains unclear.
Characterizing the precise molecular details of nucleosome intermediates and the factors controlling their formation, e.g., salt concentration and DNA sequence, would greatly aid our understanding of how chromatin organization is established and maintained. In order to address these questions, we ran extensive coarse-grained molecular dynamics (MD) simulations of nucleosome assembly under different conditions. We consider a simple system made of the strong nucleosome-positioning 601 DNA sequence [20], a H3/H4 tetramer and two H2A/H2B dimers. By analyzing our MD trajectories with Markov state modeling [21], we reveal a complex kinetic landscape characterized by many metastable structures, some of which correspond to previously proposed non-canonical nucleosomes.
Our simulations show that A/T nucleosome positioning signals on the 601 DNA sequence direct the binding of the H2A/H2B dimer on the optimal DNA region, so that the dimer is then poised to easily bind the H3/H4 tetramer interface. This effect, which is observed only on the A/T-rich side of the 601 DNA, reveals a highly sequence-dependent assembly process, suggesting a potential mechanism allowing genomic sequences to directly control nucleosome organization in vivo [22]. We also investigated the influence of salt on the kinetics, finding that successful nucleosome assembly from a tetrasome is faster at low salt concentration, but that the system is also more likely to remain stuck in off-pathway kinetic traps. This explains the necessity to assemble nucleosomes via salt dialysis [23], or via the addition of chromatin remodelers [14]. The assembly/disassembly mechanism itself is also a function of salt concentration: while at low or intermediate salt the H2A/H2B dimers bind/unbind to/from the H3/H4 tetramer co-operatively with DNA wrapping, unbinding occurs only after DNA unwrapping at high salt. Overall, our results establish clear mechanistic relations between environmental conditions, DNA sequence, and the dynamics of nucleosome assembly.

Modeling nucleosome assembly dynamics
In our MD simulations, the DNA is modeled using the 3SPN2.C coarse grained model [24], where each nucleotide is represented by three beads centered on the base, sugar and phosphate groups. The histones are modeled according to the AICG2+ structure-based model [25], where each residue is represented by one bead centered on the Cα atom. This and similar coarse-grained models have been successfully applied to study complex processes such as nucleosome sliding [26][27][28][29], unwrapping [30][31][32], and assembly [33], at a fraction of the computational cost required for all-atom simulations [34][35][36][37]. Notably, the 3SPN2.C model can capture the sequence-dependent elasticity of DNA [24], and it has been shown to correctly predict the affinity of nucleosome formation for different sequences [32], making this model suitable to investigate the effect of sequence on the assembly kinetics. Further details on the coarse-grained model and its parametrization are provided in the Methods section.
Our simulation system consists of the 147-bp nucleosome-positioning 601 DNA [20], one histone H3/H4 tetramer, and two copies of histone H2A/H2B dimers. The 601 sequence was chosen for its widespread use in experiments and for their characteristic positioning motifs asymmetrically distributed around the nucleosome, allowing us to study their role during assembly. Specifically, 10-bp periodic A/T base steps are more frequent on the left side of 601 (called the beta side in Ref. [17], see Fig 1A), leading to enhanced nucleosome unwrapping on the right side relative to the left (i.e. asymmetric unwrapping) [9,38].
We performed nucleosome MD simulations for 10 8 MD steps at 400 mM mono-valent ion concentration starting from the tetrasome conformation, in which the H3/H4 tetramer is bound on the 601 DNA at the optimal position and both of H2A/H2B dimers are unbound from the tetramer but bound on DNA at different positions depending on the simulation run (see the left snapshot in Fig 1B as an example). Out of 215 runs, we found only one trajectory that reached to the complete nucleosome, which is shown in Fig 1B and S1 Movie.
As highlighted by the timeline of the sum of the radii of gyration of the left and right DNA sections (RgL+RgR), in this successful trajectory assembly occurs in two steps: firstly, the right half of DNA bends together with the binding of one H2A/H2B dimer to the tetramer (central cartoon in Fig 1B), then, the left half of DNA folds concomitantly with the assembly of the other H2A/H2B dimer to form the complete nucleosome (right cartoon). In Fig 1C, the same trajectory is projected on the 2-dimensional free energy landscape (obtained by the Markov state model as described below) along the left and right radii of gyration of DNA.
The free energy landscape also shows that the left side of the 601 DNA bends more easily (i.e., with a lower free energy cost) than the right side; this asymmetry is discussed later in the results. Other trajectories stopped at various apparently metastable states, due to the limitation of the simulation time.
Starting from the same tetrasome conformations used at 400 mM, we also performed 215 simulations at 200 mM and 300 mM, finding respectively 20 and 7 cases of complete nucleosome assembly (a representative trajectory at 200 mM is shown in Fig 1C). We will discuss the salt concentration dependence in more details later in the results.

Kinetic landscape of nucleosome assembly via Markov state modeling
In order to characterize the complete landscape and kinetics of nucleosome assembly and disassembly, we generated a Markov state model (MSM) of nucleosome dynamics using 1000 independent simulation runs of 10 8 MD steps each at 400 mM salt, starting from a diverse set of partially or fully assembled nucleosomes (215 of these are the same described in the previous section, see Methods for more details). The salt concentration of 400 mM was chosen so that neither the fully disassembled nor the fully assembled nucleosome conformations are too much preferred over the others, facilitating the characterization of the dynamics. To build an MSM, we first divide the conformational space into a set of M discrete microstates, each corresponding to a group of similar conformations, and then we learn from the unbiased MD trajectories the M x M transition probabilities to go from each state to another after a certain time interval, the so-called lag-time. Observing either complete assembly or disassembly events in a single trajectory is extremely rare, but the Markov state modeling approach allows us to combine many trajectories together to fully characterize the kinetics of the system, identifying the main long-lived conformations and the dominant pathways of nucleosome assembly. We generated an MSM with M=400 microstates and a lag-time of 4x10 6 MD steps (see Methods section for more details, and S1  In the T state one dimer is on the left side of the unwrapped DNA while the other dimer is on the right side, remaining close to the target H3/H4 interface. On the other hand, in the TL and TR states both unbound dimers interact solely with either the left or right side of the unwrapped DNA, respectively. The seven hexasome structures are divided into two types, depending on whether one H2A/H2B dimer is bound to the left (HL0, HL1 and HL2) or the right (HR0, HR1, HR2 and HR3) binding interface of the H3/H4 tetramer (respectively wrapping the left or the right sides of the 601 DNA in the canonical nucleosome). Each of these two hexamer types can adopt different conformations depending on the location of the unbound H2A/H2B dimer. In the left hexasome HL0, the unbound dimer is either far away or it interacts only weakly with nucleosomal DNA. In HL1, the unbound dimer is located on the right side of the DNA relatively close to its optimal location in the complete nucleosome. In HL2, the unbound dimer bridges two DNA gyres while facing outside of the nucleosome, in a configuration that cannot easily lead to assembly. HR0, HR1 and HR2 are the respective right hexasome versions of those just described. Finally, in hexasome HR3 a H2A/H2B dimer binds to the right interface of the H3/H4 tetramer, but it is stabilized by the wrapping of the left side of the 601 sequence. This metastable configuration is not found on the opposite side (there is no HL3 basin), suggesting that it originates from the marked propensity of the left side of 601 to bend. Notably, in the HR3 hexasome the DNA follows a right-handed super-helix, contrary to the canonical form that follows a left-handed one. For each of the 11 identified metastable states, we deposited as supplementary material 4 representative coarse-grained nucleosome configurations in PDB format (S1 Data). Researchers interested in investigating sub-nucleosome conformations in all-atom MD may use these as starting configurations following a previously-developed back mapping procedure [42].

Asymmetric dynamics of 601 nucleosomes
Our MSM also reveals a clear asymmetry between the left and right side of the 601 sequence. For example, the left hexasome state HL1 has an equilibrium probability about 20 times higher than its right counterpart HR1, consistent to what observed in past experiments on hexasomes [43]. Kinetics itself is also highly asymmetric: binding always occurs with a higher rate on the left side of 601, whereas unbinding occurs with a higher rate on the right (Fig 2), as observed in past experimental studies [17]. For instance, a tetrasome T transitions into a left hexasome HL1 after 4x10 6 MD steps with probability of ~3x10 -3 , one order of magnitude greater than the probability to observe a T to HR1 transition (~2x10 -4 ).
Similarly, the rate of nucleosome formation is ten times faster when this last assembly step involves H2A/H2B binding on the left side (i.e., from HR1 to N).
What is the molecular origin of this kinetic asymmetry? As shown in Fig 1A, the left side of 601 contains periodically spaced A/T base steps at most 5+10n positions from the dyad, whereas on the right side this periodic pattern is much weaker. The importance of the periodic motifs in nucleosome folding stems from the intrinsic bending of the DNA helix, which lowers the energy cost of wrapping around histones [32,38,44]. Indeed, we find that when the histone octamer is assembled (state N) the left side of the 601 DNA has a much higher probability to be fully wrapped than the right side ( Fig 3A). Furthermore, even the distributions of H2A/H2B dimer positions along DNA when they are unbound from the tetramer display strong differences ( Fig 3B): the left distribution has two sharp peaks separated by 10 base pairs, one of which corresponds to the location found in the left hexasome and complete nucleosome states, while H2A/H2B positioning on the right side of 601 is much more uniform. This shows that the sequence motifs on the left side of 601 (highlighted again in Fig 3B) facilitate the positioning of the H2A/H2B dimer so that it is poised for a successful binding with the H3/H4 tetramer.

Assembly pathways and salt dependence
The analysis of transition pathways [21] (S3 Fig) in the coarse-grained MSM of the 11 longlived states shows that two pathways make up most of the successful assembly transitions between the tetrasome T and the full nucleosome N: 66% of the pathways go through the more stable left hexasome HL1 (TàHL1àN), while 19% go through the less stable right hexasome HR1 (TàHR1àN). Therefore, all other hexasome conformations (HL0, HL2, HR0, HR2 and HR3) do not significantly contribute to successful assembly and may be therefore considered off-pathway kinetic traps.
The above landscape of nucleosome assembly and disassembly was obtained at an intermediate ionic strength (400 mM). However, an interesting question is how the assembly and disassembly processes are affected by changes in salt concentration. As described before, our assembly simulations from the tetrasome configurations at 200 mM, 300 mM, and 400 mM salt resulted in a complete nucleosome assembly in 20, 7 and 1 cases out of 215 runs, suggesting that low salt favors assembly. This is reasonable, since nucleosome are stabilized by protein-DNA electrostatic interactions. However, it does not explain the observation that the optimal procedure to assemble nucleosomes in vitro is by slowly decreasing the salt concentration from high to low values [23], and otherwise chromatin remodelers are required to successfully complete assembly [14]. One possibility is that at low ionic strength the kinetic landscape becomes more rugged, and the system struggles to escape from off-pathway traps. Indeed, Fig  During the successful assembly trajectories at 200 mM and 300 mM salt, we also found evidence of new metastable states that are not clearly captured by our MSM at 400 mM. In these conformations the nucleosome is already very compact but partially opened: due to the absence of DNA bending at super-helical location +/-3 (i.e., 3 DNA helical turns away from the dyad), one or both H2A/H2B dimers are unbound from the tetramer and oriented at a wider angle relative to the canonical conformation (see snapshot in Fig 4C). During our trajectories, we found this angle to vary between ~10 and ~50 degrees (the angle was computed based on the orientation of the longest helices in histones H2A and H4). PDB coordinates of representative partially opened conformations (part of the HL, HR or T states) have been deposited in S1 Data together with the other metastable states defined from the MSM. To further investigate their properties, we run 100 10 7 -steps simulations starting from the partially opened right hexasome at 200, 300, and 400 mM salt. Fig 4C shows that, similarly to the HR2 state, the escape from the partially opened state becomes slower as we decrease the salt concentration, presumably because this conformation is stabilized by electrostatic interactions of the H2A C-terminal and H2B N-terminal histone tails bridging the two DNA gyres. Furthermore, at 300 mM we find that starting from a partially opened hexasome conformation the probability to transition either into the complete nucleosome or into a more open hexasome is about 50%, similarly to a transition state (within 5x10 6 MD steps, we observe complete assembly events in 269 runs out of 400 starting from the left partially opened state, and in 156 out of 400 from the right one). Finally, we note that all the evidence points to these partially opened nucleosomes being the same as those identified in a recent FRET study [17]. In both experiments and in our simulations, 1. the angle between the open H2A/H2B dimer and the tetramer very similar (a value of 20 degrees was suggested from experiments), 2. the stability of these states decreases with increasing ionic strength, and 3. they are intermediates along the nucleosome assembly pathways.
In order to explore whether the disassembly mechanism is also affected by the salt concentration, we run additional 10 8 -step simulations starting from the complete nucleosome at 500, 600, 700, and 800 mM salt (30 trajectories each). As expected, nucleosomes become unstable at high salt, with frequent DNA unwrapping and dimer unbinding.
Interestingly, disassembly pathways appear qualitatively different at intermediate (400 mM) and high (800 mM) salt: while in the former case H2A/H2B unbinding from the tetramer proceeds cooperatively with the unwrapping of DNA, in the latter case DNA unwrapping anticipates histone unbinding (Fig 5). This trend is in qualitative agreement with the experimental literature: the cooperative transition is consistent with the recent results obtained by FRET at intermediate salt [17], while the non-cooperative behavior was suggested by a combination of SAXS and FRET at very high 1.9 M salt concentration [9].

Discussion
In conclusion, our coarse-grained MD simulations revealed the complex landscape of nucleosome assembly and its modulation by salt and DNA sequence. Firstly, we used Markov state modeling to summarize the assembly dynamics on 601 sequences at 400 mM salt concentration (at which the assembly is a reversible). Our MSM highlighted, apart from the expected canonical nucleosome, tetrasome and hexasome states, several long-lived configurations characterized by alternative interactions between histones and DNA.
Interestingly, in one of the right hexasome states (HR3), a canonical histone hexamer wraps the DNA in a right-handed way. Right-handed nucleosomes can form in experiments after the application of positive DNA supercoiling [45,46] and are also found in vivo within centromeres [15]. However, while past works investigated how such change in DNA handedness could originate from an equivalent change in the arrangement of the histone folds [45][46][47], our simulations suggest that these right-handed structures may also form via an alternative wrapping of the DNA around a canonical histone hexamer, without requiring a complex rearrangement of histones [45]. Since this state is found only on one side of 601, the simulations further suggest that right-handed nucleosomes are stabilized by specific DNA sequences. Torsional stresses, such as those generated by the passage of RNA polymerase, are also expected to affect the stability of various nucleosome configurations, as found in experiments on nucleosomes with super-coiled DNA [48], and in a recent MD study that investigated the effect of torsion on nucleosomal DNA unwrapping [49].
Our simulations also provide fresh insights into asymmetric nucleosome dynamics on 601 DNA. This sequence is characterized by several nucleosome positioning motifs, namely A/T base steps periodically spaced every 10 bp, which favor the wrapping of DNA around histones. However, these motifs are asymmetrically distributed along the sequence, favoring H2A/H2B dimer assembly on the left side of 601 by about one order of magnitude.
Asymmetric disassembly of nucleosomes along 601 was observed in many experiments [9,17,38]; our MD simulations revealed that DNA sequence motifs promote assembly by directing the binding of H2A/H2B dimers towards the H3/H4 target, an effect originating from the co-operativity between histone binding and DNA bending at low salt concentration. We speculate that by promoting the assembly of nucleosomes directly at highly flexible genomic regions [50], while avoiding stiff DNA regions such as poly-A tracts, the sequence dependence in the rates of assembly may also favor the maintenance of nucleosome positioning in vivo after disruptive events such as replication. In the future, we plan to investigate the dynamics of nucleosomes on genomic sequences to further explore this hypothesis. assembly, we find that it also stabilizes off-pathway kinetic traps, explaining the importance of salt dialysis for the successful assembly process [23]. Under some conditions, experiments found that histones and DNA form non-canonical pre-nucleosome structures that require chromatin remodelers to be converted into the canonical forms [14]. We speculate that pre-nucleosomes may resemble the haxasome states HL2 and HR2, where the unbound H2A/H2B dimer bridges the two opposite ends of nucleosomal DNA to prevent successful assembly unless some large rearrangement first occurs.
Simulations at 200 and 300 mM also revealed critical intermediate states along the successful pathways of nucleosome assembly: here the nucleosome is almost completely formed, but one H2A/H2B dimer does not yet make direct contacts with the H3/H4 tetramer, with the two interfaces at a wider angle relative to the canonical nucleosome. These partially opened nucleosomes where also recently identified in a series of FRET experiments [17].
Simulations indicate that these partially opened states are roughly halfway along the nucleosome pathways at low salt (i.e., they are transition states); in virtue of this, modulating the stability of this state, for instance by histone chaperones or epigenetic modifications, should have a large effect on assembly and disassembly rates, suggesting its potential role in vivo. Again, we found that the formation of these structures does not require any conformational change within histones (which may nevertheless play a role in nucleosome dynamics [51]), but are instead stabilized by non-specific electrostatic interactions between histone tails and DNA. In addition to conformational stability, salt concentration also affects the assembly/disassembly pathway itself: while at low and intermediate salt the H2A/H2B dimer binds the H3/H4 tetramer co-operatively with DNA bending (so that DNA sequence can directly affect kinetics), at high salt these two processes are uncoupled, with DNA unwrapping anticipating dimer-tetramer dissociation. These results recapitulate past experimental observations under various conditions [9,17].
The position coordinates of the metastable nucleosome conformations deposited in the supplementary material will represent a useful resource for researchers interested in further investigating nucleosome assembly, for instance by aiding the design of FRET experiments [17], or by providing starting configurations for all-atom MD simulations through a backmapping procedure [42]. Finally, we envision that our coarse-grained model will be of great use to further characterize nucleosome assembly in more realistic scenarios typical of in vivo experiments, for example in the presence of RNA polymerase [52], histone chaperones [47], and remodelers [13].

Coarse-grained model
To enable the complete study of nucleosome assembly and disassembly, we utilize coarsegrained molecular modeling: we map each amino-acid to a single bead [25], and each nucleotide to three beads corresponding to base, sugar, and phosphate groups [24]. The functional potentials and force-field parameters are chosen according to the AICG2+ structure-based model for histones [25] and the 3SPN2.C sequence-dependent model for DNA [24]. The flexible histone tails (residue ids 1-32 for H3, 1-23 for H4, 1-14 and 121-128 for H2A, 1-26 for H2B) are modeled according to a statistical local specifically designed for disordered regions [53] . The potential of the remaining folded regions is based on the structures of the first copies of each H3/H4 and H2A/H2B dimers present in the nuclesome crystal structure with PDB id 1KX5 (using a single reference for each pair of dimers ensures the symmetry of the histone octamer, which is important to study the asymmetry induced by DNA sequence). Briefly, whenever two protein residues are within a certain cutoff distance in the crystal structure, the coarse-grained potential includes an attractive interaction that favors the binding of the two residues [25].
Experimentally, in the absence of DNA or at high salt the histone octamer breaks into a H3/H4 tetramer and two H2A/H2B dimers [9,17,54], but using the default AICG2+ potential we do not observe histone octamer breakage under these conditions. This happens because the default AICG2+ scaling factor defining the overall strength of the residueresidue interactions was optimized for studying the folding of single-domain proteins, but in our case the attraction between the H3/H4 and H2A/H2B dimers is over-estimated. In order to find a suitable scaling factor for the native interactions between H3/H4 and H2A/H2B residues, we scanned different values performing MD simulation at either low salt (200 mM), at which we expect our nucleosomes to be stable, or high salt (800 mM), at which we expect disassembly. We find that scaling factors lower than 0.2 lead to unstable nucleosomes even at low salt, whereas for values higher than 0.4 we do not observe disassembly at even at Histones and DNA interact via excluded volume, Debye-Hückel electrostatics, and hydrogen bond interactions. The hydrogen bonds are modeled according to a distance-and angle-dependent potential based on nucleosome crystal structures, and was calibrated based on nucleosome structural stability and DNA unwrapping [26]. The size of the beads to define excluded volume interactions, and the parameter settings for the hydrogen bonds are the same used in our previous works [26][27][28]. The charges on the histone residues are determined via the RESPAC method [55], which optimizes the coarse-grained electrostatic potential against the all-atom one. The RESPAC procedure was applied to the globular part of the H3/H4 tetramer and the two H2A/H2B dimers, whereas for the flexible tails we used the standard integer charges on the Lysine, Arginine, Aspartic and Glutamic acids residues only.

Simulation setup
All reported molecular dynamics (MD) simulations have been performed with the software Cafemol [56] by integrating the equations of motion using Langevin dynamics and default MD settings. During the MD simulations, we apply harmonic restraints on the structured region of the two H3 histones (0.001 kcal/mol/Å 2 ) and on the phosphate beads of the DNA dyad residues (bp id 73, 0.1 kcal/mol/Å 2 ), limiting large-scale sliding of DNA relative to the H3/H4 tetramer (but not the unbound H2A/H2B dimers). The nucleosome sliding mechanism was already analyzed in previous studies [26,27] , and especially for the 601 positioning sequence considered here can be extremely slow. In our simulations, the H3/H4 is always placed at its optimal experimental position on the DNA sequence; preventing its sliding allows to avoid exploring less favorable positions and to focus solely on the assembly dynamics, which is our main focus. Finally, the system is enclosed into a spherical volume with a radius of 20 nm by a harmonic potential. Cafemol input files to run nucleosome assembly simulations are provided in the supplementary S2 Data.
The starting conformations for the 1000 MD simulation runs at 400 mM used to generate the MSM have been obtained during high-salt (800 mM) simulations starting from complete nucleosomes, and then selecting 1000 timeframes that include tetrasomes (215 conformations), left and right hexasomes (105 and 118 conformations, respectively), and nucleosomes (562 conformations). To quantify the salt-dependence of the nucleosome kinetics, we also performed, at both 200 and 300 mM salt, 215 10 8 -steps simulation runs starting from the 215 tetrasome conformations. To analyze the escape from 2 metastable states (either the HR2 hexasome or the partially opened state) as a function of salt concentration, we run 100 independent 10 7 -steps trajectories starting from the same conformation (either HR2 or partially opened HR1) at 200, 300, and 400 mM salt. In order to characterize the time necessary to escape from these states (analysis in Fig 4), we projected the molecular positions onto the slowest TICA motions [41] within each basin. The

Markov state modeling and analysis of metastable states
To generate the Markov state model of nucleosome assembly at 400 mM salt from our 1000 MD simulations, we first have to cluster the configurations into discrete states based on a distance between configurations defined on a suitable space. To this aim, we project the particle positions onto a series of collective variables that capture all the relevant modes of nucleosome dynamics: H2A/H2B dimer binding to DNA and to H3/H4, DNA wrapping, and DNA handedness. Since the two H2A/H2B dimers are identical, we generated variables that are symmetric under an exchange of these two dimers. In principle, the two H3/H4 binding interfaces are also identical, but the 601 DNA that is wrapped around the tetramer is not symmetric, so we do want to distinguish between the left and right H3/H4 interfaces, corresponding respectively to the dimer binding the left or the right side of the 601 DNA. We defined 5 types of collective variables (for a total of 4+4+2+2+1=13 collective variables): • For each H3/H4 tetramer interface, the sum and the maximum value of the contacts with the two H2A/H2B dimers. A dimer-tetramer contact is defined as c=1/(1+d/σ), where σ=1 nm and d is the distance between the center of mass of H2B residues 73-100 and the center of mass of H4 residues 65-92 (corresponding to the two interfacial regions). Therefore, a contact takes a value close to 1 when a H2A/H2B dimer binds the H3/H4 tetramer in its native configuration, whereas it is close to 0 when the H2A/H2B dimer is far from the H3/H4 tetramer. We first compute the 4 contacts between each of the 2 tetramer interfaces (t1 and t2) with the two dimers (d1 and d2): ct1,d1, ct1,d2, ct2,d1, and ct2,d2. Then, we define the 4 collective variables: ct1,d1+ct1,d2 ; ct2,d1+ct2,d2 ; max(ct1,d1,ct1,d2); and max(ct2,d1,ct2,d2). This variable enables us to capture the sliding of the H2A/H2B dimers along the DNA when they are unbound from the H3/H4 tetramer.
• For each side of the DNA, the number of base pairs wrapped around the histones, where a base pair is considered unwrapped if the center of mass of its base groups is displaced radially from the nucleosome super-helix axis by more than 6 Å. Starting from these 13 collective variables, we aggregate all 1000 MD trajectories and perform a further dimensionality reduction using time-lagged independent component analysis (TICA) [41] as implemented in PyEMMA 2 [57] using a lag-time of 10 6 MD steps.
We then select the top 8 slowest coordinates (which explain 95% of the kinetic variance in the data) and cluster the conformations into 400 discrete states using k-means [58] . The

Funding
The study was supported partly by JSPS KAKENHI grants 16H01303 (ST) and 20K06587     (C) The escape from the partially opened right hexasome identified during the successful hexamer-to-nucleosome assembly events. In order to highlight the conformational change, we indicate with two dotted red lines the wider angle between the unbound H2A/H2B dimer and the neighboring H3/H4 dimer, relative to that found in the fully assembled nucleosome.

Figures and Tables Captions
The left and right panels are the same plot as (B) except that the initial state is defined as the TICA 1 interval from -0.5 to 0.5. The time required to escape from these trapped states increases at low salt.  (HL1 or HR1). Above the arrows we indicate the percentage of pathways passing through this transition. The x coordinate corresponds to the committor probability as computed from the MSM, which is defined as the probability to reach state N before coming back to state T. S1 Movie. Representative assembly trajectory at 400 mM starting from a tetrasome conformation, the same shown in Fig 1B in the main text. S1 Data: For each of the 11 metastable states identified from our MSM at 400 mM using PCCA clustering, we deposited 4 representative coarse-grained nucleosome conformations in PDB format. The conformations are named "state*conf#.pdb", where * corresponds to the state name as described in the text (T, TL, TR, HL0, HL1, HL2, HR0, HR1, HR2 and HR3, N), and # goes from 1 to 4. The data also contain 4 representative partially opened conformations for each of the left hexasome, right hexasome, and tetrasome states, named "partially opened_*_conf#.pdb", where * is HL, HR or T and # goes from 1 to 4. S2 Data: Example Cafemol input files used to run nucleosome assembly simulations as described in the Methods section.