Dynamic simulations of transcriptional control during cell reprogramming reveal spatial chromatin caging

Chromosome structure is a crucial regulatory factor for a wide range of nuclear processes. Chromosome Conformation Capture (3C)-based experiments combined with computational modelling are pivotal for unveiling 3D chromosome structure. Here, we introduce TADdyn, a new tool that integrates time-course 3C data, restraint-based modelling, and molecular dynamics to simulate the structural rearrangements of genomic loci in a completely data-driven way. We applied TADdyn on in-situ Hi-C time-course experiments studying the reprogramming of murine B cells to pluripotent cells, and characterized the structural rearrangements that take place upon changes in the transcriptional state of 11 genomic loci. TADdyn simulations show that structural cages form around the transcription starting site of active loci to stabilize their dynamics, by initiating (hit) and maintaining (stick) interactions with regulatory regions. Consistent findings with TADdyn for all loci under study suggest that this hit-and-stick mechanism may represent a general mechanism to trigger and stabilize transcription.


INTRODUCTION
The three-dimensional (3D) structure of the genome has been shown to modulate transcriptional regulation [1][2][3] and to play a role in cancer and developmental abnormalities 4 . In the effort of characterising 3D genome structures, Chromosome Conformation Capture (3C)-based experiments 5 allow to capture a single snapshot of the genome conformation at a given time. A plethora of theoretical approaches have been developed to take advantage of 3C-based experimental data and model genome spatial organization. Restraint-based modelling approaches 6 take 3C-based contact frequencies as input and employ ad hoc conversions to spatial distances for determining 3D genome structure [7][8][9][10][11][12] . This approach has provided valuable insights into the structural organization of chromosomal regions in various organisms 13 .
Decreased sequencing costs together with more refined experimental protocols has recently permitted performing 3C-based time-resolved experiments to monitor genome conformation dynamics of biological processes at high resolution. For example, Hi-C experiments have been applied to study nuclear organization during mitosis 27,28 or meiosis [29][30][31] , perturbations induced by hormone treatment 32 and during induced neural differentiation 33 or cell reprogramming 34 . However, none of the strategies mentioned above can currently take full advantage of these new datasets, and computational approaches specifically designed for the simulation of timedependent conformational changes (4D) are not yet available. To fill this gap, we introduce TADdyn, a novel computational method which allows modeling 3D structural transitions of chromatin using time-resolved Hi-C datasets combining restraint-based modelling and molecular dynamics. We found two distinct phases of 3D genome conformation dynamics by applying TADdyn to 11 representative gene loci during cell reprogramming of mouse pre-B lymphocytes into Pluripotent Stem Cells (PSCs) 34 . First, the transcription start site (TSS) forms a 'cage'-like structure within a TAD, favoring specific contacts with open and active (enhancer) regions that may be located several kilo-bases (kb) away (here called the 'hit phase'). Second, the cage confines and stabilizes the dynamics of the TSS and its interactions with active regions against simulated thermal motion (the 'stick phase'). We propose this hit-and-stick model as a general mechanism for chromatin architectural changes linked to transcriptional activation.

RESULTS
The TADdyn modelling strategy. TADdyn is based on the following methodological steps (Methods and Fig. 1): (i) collection of experimental data, (ii) representation of selected chromatin regions using a bead-spring polymer model, (iii) conversion of experimental data into time-dependent restraints, (iv) application of steered molecular dynamics to simulate the adaptation of chromatin models to satisfy the imposed restraints, and (v) analysis of the spatial conformations obtained.
We applied TADdyn to a previously published in-situ Hi-C interaction time-series dataset (GEO accession number GSE96611). This included seven time-points of insitu Hi-C experiments during C/EBPa priming followed by Oct4, Sox2, Klf4 and Myc (OSKM)-induced reprogramming of B cells to PSCs 34 . We focused on 10 different ~2Mb regions of the mouse genome encompassing the promoters of 11 genes of interest. We selected these specific genes because they are representative of different time-dependent patterns of transcriptional activity, which allowed us to study how various different transcription dynamics interplay with changes in the 3D genomic organization. Specifically, we analyzed the early activated Tet2 gene; the late activated Sox2 and Nanog genes; the transiently activated Mmp3, Mmp12, CEBPa and Nos1ap genes; the transiently silenced Lmo7 genes, the stably silenced  Table 1).
Next, experimental Hi-C interaction matrices of the 10 regions at 5kb resolution were converted into TADdyn time-dependent spatial restraints (Fig. 1a). Specifically, each 5kb-bin was represented as a spherical particle of diameter 50nm, and at each experimental stage the Hi-C interaction matrix was converted into harmonic spatial restraints between these particles (Fig. 1b). This conversion follows the simple, yet effective, rationale that pair of particles with a high number of Hi-C interactions between them are restrained to stay close in space, while poorly contacting particle pairs are kept far apart 8 (Methods). The parameters of each imposed restraint between two particles (that is, the spring constant and its equilibrium distance) are then linearly interpolated between the values of the consecutive experimental time points using steered molecular dynamics simulations. This strategy favors the adaptation of the chromatin models to the imposed dynamic restraints (Methods and Suppl. Fig. 12). TADdyn restraints are specifically designed to allow smooth structural changes between models of consecutive experimental time-points.
TADdyn simulations are an accurate dynamic representation of the input Hi-C data. To quantify the degree of agreement of the TADdyn 4D models to the timeseries Hi-C interaction matrices, contact maps were calculated on an ensemble of 100 models obtained at each time step of the trajectories (Fig. 1d). Next, for each time-point the Spearman correlation coefficient (SCC) of the modeled contact map and the experimental Hi-C interaction matrices was computed as a measure of the agreement between the two interactions matrices (Methods). For the 11 loci, the SCC ranged between 0.60 and 0.89, which is indicative of accurate models 35 (Fig.   1d for Sox2 simulations and Suppl. Fig. 1-11c for all other loci). Notably, at a fixed cell stage, the Hi-C interaction map correlated best with the contacts maps at the corresponding time of the simulated reprogramming dynamics. These results show that for a diverse set of loci the simulated structures are a reliable 3D representation of two-dimensional Hi-C interaction matrices, which effectively reproduce the actual contact pattern at the correct time of the trajectory.
To test whether TADdyn is suited to model processes characterized by gradual chromatin structural transitions, two alternative sets of 100 simulations each for Sox2 were performed by completely deleting the restraints of stages D2 (DD2 simulation) and D6 (DD6 simulation). By doubling the time duration between the remaining adjacent transitions (Ba to D4 and D4 to D8), the total duration of the trajectories was maintained constant. We observed that the missing restraints only marginally affected the results of the simulations. Specifically, the conformations expected to represent the missing cell stages along the trajectories still provided accurate models at the removed stage (SCCD2=0.68 and SCCD6=0.75 in DD2 and DD6 simulations, respectively). Additionally, in the case of DD2, the Hi-C interactions map at D2 correlated best with the contact map computed on the models predicting the D2 stage (for DD6 the SCC between Hi-C at D6 and models in ES correlated slightly better than the one for D6; SCCES=0.76 vs. SCCD6=0.75). These results suggest that the simulated reprogramming dynamics are dominated by smooth and gradual chromatin rearrangements that can be effectively simulated by TADdyn.
In the following sections, we present the simulation details the simulations for three values between pairs of 3D models (Methods). As for the matrix-based analysis, the dRMSD clustering reflects the presence of different folding states that correlate with transcriptional activity (Fig. 2b and Suppl. Figures 1-11e).
Gene activation correlates with the appearance of "cage"-like structures. In our simulations (Supplementary videos 1-11), we observed a "caging" effect of the transcription start site (TSS) at the time when a given locus was transcriptionally active. To quantify this observation, we first calculated for each TSS the timedependent changes in its 3D structural embedding as well its explored volume (Methods, Fig. 3 and Suppl. Figs. 1-11f,g). Notably, TADdyn simulations showed that, as cells reprogramed, the TSS for all 11 loci remained largely accessible to other particles during their less transcriptionally active phases (e.g. stages B-D4 for Sox2 and D6-PSC for Mmp12) (Fig. 3a). During the most active stages the TSS particle became embedded in the 3D model (the TSS embedding was between 0.8-0.95 at the PSC stage for Sox2 and B-D4 for Mmp12) (Fig. 3a). In contrast, the embedding profile of the Rad23a locus was maintained around 0.6 during the entire simulation in parallel with the overall constant transcriptional activity of the gene.
Next, we assessed whether the 3D embedding would also affect the spatial dynamics of the TSS from different loci. For each simulation, we calculated the convex-hull of the TSS particle every 50 time-steps of the trajectory as a proxy of the volume explored by the TSS at each reprogramming stage (Methods and Fig. 3b).
The results indicate that the TSS of Sox2 explores a smaller volume in the D6-PSC stages, when the gene is transcriptionally active, compared to the volume explored in the B to D4 stages when the gene is transcriptionally silent. The TSS of Mmp12 acquired increased mobility during reprogramming as its expression levels decreased (compare B-D4 and D6-PSC stages). Interestingly, the explored volume of the Rad23a TSS barely changed during the entire reprograming process. Taken together, these findings, are consistent with previous super-resolution imaging works suggesting that the mobility of the promoter is constrained upon activation 36,37 .

Gene activity correlates with domain borders dynamics and enhancer proximity.
To further characterize the topological transitions between gene expression states, we calculated dynamic changes in TAD insulation score and border strength 38  This spatial proximity was maintained for the rest of the simulation while the Sox2 gene was transcribed. In contrast, the TSS 3D distance profiles of Rad23a and Mmp12 remained overall constant during reprogramming ( Fig. 4b) with marginal changes during the simulation.
To further assess to what extend Active particles ( Table 2) became proximal to the TSS of the locus, we counted for each reprogramming stage how many of the Active (A) particles became available to the TSS particle, either by spatial proximity only Analysis of these 11 loci revealed that TSS particles tend to be embedded inside the model structure and closer to other active particles when the gene's expression is higher (Fig. 5a, b), exploring less volume (Fig. 5c) and interact with greater numbers of surrounding APD particles (Fig. 5d). These findings suggest that the signatures of the caging effect are common to all 11 loci studied. Additionally, the finding that the numbers of both the AP and APD particles increased with elevated transcription activity (Figs. 4c and 5d)  against the random thermal motion (Fig. 5e).

DISCUSSION
Here In two alternative simulations of Sox2, we found that missing steps only marginally affect the results of our simulations suggesting that the simulated reprogramming dynamics are dominated by smooth and gradual spatial chromatin rearrangements.
In fact, TADdyn has turned out to be useful in determining whether a smooth spatial reorganization of a chromatin region between two consecutive time steps is physically feasible or is hindered by "energetic barriers" between the structures to be simulated.

TADdyn simulations have provided new insights into mechanisms underlying
chromatin domain formation and gene expression regulation, extending our current understanding of these processes. First, we found that upon activation, the promoter region establishes interactions with active chromatin particles that contain overlapping signals of chromatin accessibility (ATAC-seq) and regulatory activity (H3K4me2) (Fig. 4). These promoter-active interactions build a local structural domain that embeds the TSS and makes it significantly less accessible to external particles and more spatially constrained against random thermal motion (Figs. 3 and   4). Interestingly, these findings are consistent with previous super-resolution imaging studies suggesting that the mobility of the promoter is constrained upon activation 36,37 .
Notably, these active particles with ATAC-seq and H3K4me2 signal often include annotated enhancers as well as newly predicted putative enhancers. Such predicted enhancers can act at long-range, as exemplified by the distal active particles interacting with the Sox2 promoter. Upon transcription activation, the dynamic formation of local topological domains (cages) inferred here from high-resolution time-resolved Hi-C data, is reminiscent of structural or topological motifs (such as rosettes, or cliques) suggested in previous studies using polymer-physics-based models 42 as well as transcription factories or phase-separated condensates 39,40,43,44 .
We speculate that these cages might trigger and maintain the activation of the promoter by maximizing its local association with a series of proximal enhancers. At a larger scale, such single-gene cages might eventually coalescence into larger domains to form large multi-gene condensates.
Based on our findings that the active promoter becomes caged inside a structural domain together with open and active regions we propose a hit-and-stick model for promoter-enhancer transcription regulation (Fig. 5e). Our model predicts that this molecular cage ensures the establishment (hit phase) and the maintenance of a spatial neighborhood where promoter-enhancers interactions are favored for the entire active phase (stick phase). This second phase is accompanied by low accessibility to external particles and stabilization of the TSS against thermal motion.
Our results seem incompatible with alternative mechanisms of gene regulation that do not necessarily depend on promoter-enhancer interactions 55,56 nor its extension in time 57 . In such an alternative scenario, promoter-enhancer contacts are short-lived but continue to maintain a transcriptionally-competent chromatin environment at promoters to ensure their activity even if long-range interactions are lost. However, as the time difference between consecutive reprogramming samples used to obtain Hi-C interaction maps was in the order of tens of hours we cannot rule out that at smaller scales cell sub-populations diverge from the general mechanism here proposed. Therefore, this limitation could have precluded detection of other possible dynamic pathways of genome conformation, satisfying only a subset of the input interaction matrices. To address this, TADdyn will have to be implemented in the future using data obtained from finer time-resolved Hi-C or imaging-based approaches 58,59 . remaining loci (Ebf1, Lmo7, and Nos1ap) were longer than 100kb and were simulated in 3.0Mb regions centered on the promoter ( Fig. 1a and Supplementary   Figs. 1-11b). The possibility to study these regions using restraint-based modelling was tested a priori computing the matrix modelling potential (MMP) 35 on the 70 normalized sub-matrices (that is, the 10 genomic regions and the 7 cell stages). In brief, all Hi-C interaction matrices had MMP scores higher than 0.78, and contained very few low-coverage bins (<0.05% of the bins), indicating that the restrained-based approach at 5kb should provide accurate 3D models for all the regions 35
Representation of the chromatin region using bead-spring polymer model. Chromatin was represented as a bead-spring polymer describing the effective physical properties of the underlying chromatin 18,21 . Specifically, each Hi-C matrix bin of 5kb was represented as a spherical particle of diameter 50 nanometers (nm) using a compaction ratio of 0.01bp/nm 63,64 . Additionally, two non-harmonic potentials were introduced taking into account the excluded volume interaction (purely repulsive Lennard-Jones) and the chain connectivity (Finitely Extensible Nonlinear Elastic, FENE). Both potentials were used as previously described 18 . In the present application, the bending rigidity potential, although it is available in TADdyn, is not used for consistency with the initial models generated using the TADbit software at the B stage (see below). The center of mass of the chromatin chain was tethered to the origin O=(0.0,0.0,0.0 x, y, z coordinates, respectively) of the system using a Harmonic (Kt=50., deq=0.0), and was simulated inside a cubic box of size 50μm (much larger of the size of the models), which was also centered at O.
Encoding of the experimental data into TADdyn restraints. TADdyn accepts one of three possible initial conformations for the chromatin: (i) a polymer chain prepared as a random walk, (ii) an arrangement made of stacked rosettes 18 , or (iii) a previously generated data-driven model. For the 100 time-series simulations performed here, the initial conformations were the 100 optimal models were built using TADbit (https://3DGenomes.org/tadbit) as previously described 62  The optimal TADbit parameters optimized for the B stage (that is, lowfreq of -1.0, upfreq of 1.0, and maxdist of 300nm) were then used to define the set of distance harmonic restraints of the other time points of the series (Fig. 1b). To adapt the harmonic restraint of a given pair of particles (i,j) between consecutive time points tn and tn+1, one of the following 3 possible scenarios was applied (Fig. 1c): 1. If the pair (i,j) was restrained by the same type of distance restraint (Harmonic or LowerBoundHarmonic) in both tn and tn+1, the strength (k) and the equilibrium distance (deq) of the harmonic were both changed gradually from the values they had in tn to the values they had in tn+1. transformed into embedding (E=1.0-A) that is a measure of the propensity of a particle to be caged inside an internal cavity. The embedding ranges from 0.0 meaning that the particle is located on the interface of the model to 1.0 when the particle is closely surrounded by other particles inside the model (caged) (Fig. 3a and Supplementary Figs. 1-11f). (ii) The explored volume per particle every 5τLJ of trajectory was calculated. Specifically, all trajectories were partitioned in time intervals of 5τLJ. In each time interval, the 50 positions (i.e. one every 0.1τLJ) occupied by the particle i were considered, and the convex-hull embedding these 50 positions was calculated. In a given time interval, the average (over the 100 replicate runs) convex-hull volume explored by particle i was considered as the typical volume explored by the model particle i during the time interval ( Fig. 3b and Supplementary   Fig.s 1-11g). (iii) The spatial distance between selected pairs of particles was computed for the ensemble of simulations as the Euclidian distance in nanometers using TADbit (Fig. 4a and Supplementary Figs. 1-11h).
Time dependent insulation score analysis of TADdyn models. To study the partitioning of the models into structural domains (reminiscent of TADs 72-74 ), the insulation score (I-score) analysis 38 was performed on the models contact maps using the command line --is 100000 --ids 50000 --ez --im mean --nt 0.1 --bmoe 3.
The called domain borders, whose border strength was deemed significant by the Iscore pipeline, were used for further analysis ( Fig. 4b and Supplementary Fig.s 1-11i).
Active particles analysis. In each cell stage, we classified some model particles (5kb) into one of 3 possible categories (Supplementary Table 2 The statistical comparison between the distributions of the different quantities analyzed (spatial distance, embedding, and explored volume) was performed by bootstrapping (re-sampling with replacement). The re-sampling procedure is repeated 1,000 times for a fixed number of elements, specifically 200 for the comparisons in Fig. 4c and 1000 for the ones in Fig. 5a-c). For the 1,000 comparisons of the re-sampled distributions, we tested for significant differences a Wilcoxon test using R. The higher p-value was assigned to the corresponding pair of distributions under comparison. The comparisons that resulted in maximum p-values<0.01 were deemed to be significantly different.

DATA AVAILABILITY
The TADdyn approach is currently available as part of the TADbit Github repository (https://github.com/david-castillo/TADbit/tree/poly). The modified version of the COLVAR plug-in for LAMMPS is available at the Github repository (https://github.com/david-castillo/colvars) Particles are colored from red (start of the modeled region) to blue (end of the modeled region). The TSS of the locus of interest is represented by a black particle.

FIGURES
The models for all the stages (that is, from B to PSC) were dynamically built by TADdyn. Only the models at the experimental time-points are shown, but TADdyn allows to visualize the entire dynamics filling the blanks between stages (Suppl.