Abstract
We propose a model for the formation of chromatin loops based on the diffusive sliding of a DNA-bound factor which can dimerise to form a molecular slip-link. Our slip-links mimic the behaviour of cohesin-like molecules, which, along with the CTCF protein, stabilize loops which organize the genome. By combining 3D Brownian dynamics simulations and 1D exactly solvable non-equilibrium models, we show that diffusive sliding is sufficient to account for the strong bias in favour of convergent CTCF-mediated chromosome loops observed experimentally. Importantly, our model does not require any underlying, and energetically costly, motor activity of cohesin. We also find that the diffusive motion of multiple slip-links along chromatin may be rectified by an intriguing ratchet effect that arises if slip-links bind to the chromatin at a preferred "loading site". This emergent collective behaviour is driven by a 1D osmotic pressure which is set up near the loading point, and favours the extrusion of loops which are much larger than the ones formed by single slip-links.
The formation of long-range contacts, or loops, within DNA and chromosomes is a process which critically affects gene expression [1, 2]. For instance, looping between specific regulatory elements, such as enhancers and promoters, can dramatically increase transcription rates in eukaryotes [1]. The formation of these loops can often be successfully predicted by equilibrium polymer physics models, which balance the energetic gain of protein-mediated interactions with the entropic loss associated with loop formation [3–5].
However, recent high-throughput chromosome conformation capture (“Hi-C”) experiments [6, 7] have fundamentally challenged the view that equilibrium physics is sufficient to model chromosome looping. Hi-C experiments showed that the genomes of most eukaryotic organisms are partitioned into domains – called “topologically associated domains”, or TADs. In several cases, these domains were found to be enclosed within a chromosome loop, 100 – 1000 kilo-basepairs (kpb) in size, and the bases of the loops are statistically enriched in binding sites for the CCCTC-binding factor (CTCF) [7, 8]. CTCF is a DNA-binding protein with an important role in gene regulation, and CTCF-mediated loops preferentially enclose inducible genes, which are normally silent and are pressed into action in response to a stimulus (e.g., an inflammation or an increased concentration of a morphogen during development) [9]. The DNA-binding motif of CTCF is not palindromic, meaning that it has a specific direction on the DNA. Surprisingly, Hi-C analyses have recently revealed that most of the CTCF binding sequences only form a loop when they are in a “convergent” orientation (Fig. 1a) [7, 10]. Very few contacting CTCFs have a “parallel” orientation, and virtually none have a “divergent” one. This strong bias is puzzling, because, if we imagine drawing arrows on the chromatin fiber (corresponding to the CTCF binding site directions), then two loops with a pair of convergent or divergent arrows at their base are compatible with the same 3D structure [7, 9]. Consequently, no equilibrium polymer physics model can possibly distinguish between the two patterns.
In most cases CTCF-mediated loops are associated with cohesin [11], a ring-like protein complex thought to bind DNA by topologically embracing it [12]. There are two popular models for how cohesin might achieve this – as a dimer acting as molecular “hand-cuffs” in which each ring embraces one DNA duplex (Fig. 1a), or as a single ring that embraces two duplexes [13]. In both cases, the dimer/ring acts as a sliding bridge or molecular slip-link [14, 15], and we will use the latter term to describe both cases. In vitro and in vivo experiments show that cohesin does indeed topologically link to DNA (with binding mediated by “loader proteins” such as Scc2 or NIPBL [1, 16]), that it can slide along DNA diffusively, and that it remains bound for τ ∼ 20 minutes before dissociating [16–20].
One recent attempt to address the mechanism underlying CTCF and cohesin-mediated looping is the “loop extrusion model” which argues that cohesins (or other “loop extruding factors”) can actively create loops of 100 – 1000 kbp by travelling in opposite directions along the chromosome [21–23]. This model is appealing because it naturally explains the bias in favour of convergent loops, if the slip-link gets stuck when it finds a CTCF binding site pointing towards it (an assumption consistent with experiments probing CTCF and cohesin binding [9, 22, 24]). However, the model is based on several assumptions for which experimental evidence is currently lacking: most notably it requires that (i) each cohesin is able to determine and maintain the correct direction in order to extrude (rather than shrink) a loop, and (ii) that cohesin must be able to extrude loops of 100 – 1000 kbp in a timescale ∼ τ. The extrusion speed would therefore need to exceed that of an RNA polymerase (which is v ∼ 1 kbp/min), one of the most efficient and processive known chromosome-bound motors active during interphase. Whilst cohesin is known to have ATPase activity, this seems not to be involved in directional motion; instead it drives the gate-opening mechanism needed to form a topologically stable association with DNA [13].
Here, we propose an alternative model for the formation of CTCF-mediated loops, which does not require unidirectional motion, or any energetically costly explicit bias favouring loop extrusion. We start from the observation that the molecular topology of cohesin dimers – i.e. that of a slip-link – is compatible with diffusive sliding along DNA or chromatin [17]. From this premise, we formulate a non-equilibrium model where the binding and unbinding kinetics of cohesin violates detailed balance, and show that within this context passive sliding is sufficient to account for both the creation of loops of hundreds of kbp before dissociation, and the formation of convergent CTCF-mediated loops. The probability of formation of such loops in our framework differs from the canonical power law decay governing the statistics of equilibrium polymer looping, and is consistent with currently available data on CTCF loops. Finally, we show that many-body steric interactions between diffusing slip-links which always bind close to a preferred “loading site” can lead to the emergence of an “osmotic ratchet” which promotes loop extrusion over shrinking, again in the absence of any bias in the microscopic molecular diffusion.
Results
A. Single slip-link, 1D model
We begin by discussing an exactly solvable 1D model where a slip-link consisting of two cohesin rings in a dimer slides along the chromatin fiber. We assume that the slip-link binds with the cohesin rings at adjacent positions on the fiber (as in [22]), and that there is a constant detachment rate koff ∼ τ−1. We consider two CTCF proteins bound to the fiber at a separation l to create a convergent pair of CTCF binding sites. [The case of a divergent pair is treated in the SI, and as expected leads to no stable looping (Fig. S1).] As cohesin interacts with a CTCF in a directionality-dependent manner (only when it faces the CTCF binding motif [9, 22]) we assume that when the slip-link reaches the two convergent CTCF sites it undergoes a conformational change decreasing koff [39]. For simplicity, we allow the rings forming one cohesin to diffuse until their separation reaches l, or until the dimer spontaneously unbinds, and consider both to be absorbing states. This is a non-equilibrium model as the binding-unbinding kinetics violate detailed balance: this violation is consistent with the experimentally well-established [13] ATPase activity associated with cohesin-chromatin interactions.
At a given time t, the slip-link holds together a chromatin loop of size x(t). In order to take into account the entropic loss associated with this loop, we include an effective thermodynamic potential V(x) (detailed below). The probability that the cohesin holds a loop of size x at time t, obeys the following generalised Fokker-Plank equation, where D and γ are the effective diffusion and drag coefficients describing the relative motion between chromatin and cohesins. The fluctuation-dissipation theorem implies D = kBT/γ. The initial condition for Eq. (1) is p(x, 0) = δ(x − σsl), where σsl is the size of the slip-link. Boundary conditions are reflecting at x = σsl and absorbing at x = l.
We consider three possible cases. First, we model the “loop extrusion” process proposed in [21, 22, 27] by setting D = 0 and 1 , where v is the extrusion speed. Second, we consider a “diffusion” model where cohesin diffuses in the absence of a potential, V = 0. Third, we model the effect of chromatin looping on a diffusing cohesin dimer by setting V(x) = ckBT log(x), which models the thermodynamic entropic cost of looping via the known contact (looping) probability peq(x) ∼ x−c. In this formula, c is a universal exponent, which in 3D is equal to 1.5 for loops made by infinitesimally thin random walks [28], ∼ 2.1 for internal looping within self-avoiding chains [28, 29], and 1 for contacts within a “fractal globule” [25]. We refer to this third model, with a logarithmic looping potential, as the “slip-link” model, as it more closely resembles the dynamics of slip-links on polymers [4, 14, 15].
As detailed in the SI, we can analytically find the probability that a cohesin dimer binding at t = 0 will, at some point, form a CTCF-mediated loop before detaching. Denoting this probability by p(l), the three models predict the following dependence on loop size l (Fig. 1b). where , n = (1 − c)/2, and m = (1 + c)/2; I and K denote the modified Bessel functions of the first and second kind respectively. Note that we have taken the σsI → 0 limit for the loop extrusion (pextr(l) and diffusion (pdiff(l)) cases.
For large l, Eqs. (2) predict exponential decay of CTCF-mediated looping probabilities for all cases (Fig. 1b), with a power law correction for slip-links, pslip(l) ∼ e−αll−c/2. This is markedly different from the power laws which determine the looping probability of an equilibrium polymer [14, 15]. The decay length is v/koff for the loop extrusion model [22], and for the diffusion and slip-link models; these are therefore the typical looping lengths formed before cohesin detaches. A typical CTCF-mediated loop length in vivo is ∼ 100kbp [7, 9]; taking τ = 20 min means that loop extrusion is viable if v > 5 kbp/min or more (compare v = 1 kbp/min for polymerase), whereas the diffusion or slip-link models require D ∼ 10 kbp2/s or above. The latter condition is achievable under normal conditions: for instance, assuming a diffusion coefficient D0 = 0.01 − 0.1 μm2s−1, reasonable for protein sliding on chromatin [30], and a compaction rate C = 50 bp/nm (intermediate between a 10 and a 30 nm chromatin fiber) we get D = D0C2 ∼ 25 − 250 kbp2/s (figS. 1b, 1c).
Hi-C experiments measuring the frequency of contacts between all genomic loci can be used to infer chromatin looping probabilities, and largely support a power law decay of contacts, with a chromosome-dependent exponent whose average is 1 [31]. However, these data do not distinguish between CTCF-mediated loops and other contacts, which may either form stochastically or through other chromatin-binding proteins [31–34]. Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) experiments [9] are able to single out contacts where both anchor points are bound to a protein of interest. Intriguingly, in CTCF ChIA-PET data [26], a fit to an exponential leads to a reasonable decay length (typical loop size) equal to ∼ 500 − 1000 kbp (Fig. 1d, and SI, Fig. S1). On the contrary, a fit to a power law is poorer and yields an effective exponent which is far from those which may be expected from equilibrium polymer physics models (Fig. 1e, and SI, Figs. S8, S9). This simple analysis supports the idea that the statistics of CTCF-mediated loops retain a signature of their underlying non-equilibrium nature and that, remarkably, this feature is captured by our simple 1D model of diffusing slip-links.
B. Single slip-link, 3D simulations
We now ask whether the effects predicted by our simple 1D model are confirmed by 3D Brownian dynamics simulations, which can more accurately account for both the 3D structure of chromosomal loops and the steric interactions between a molecular slip-link and chromatin. Specifically, we enquire whether diffusive sliding may account for the formation of CTCF-mediated loops, and what the probability of formation of such loops is in a 3D simulation. We consider a chromatin fiber modelled as a bead-and-spring polymer with bead diameter σ = 30 nm, C = 100 bp/nm, and vary the persistence length lp (see SI for further details and results); we load a single slip-link, modelled by two rigid rings (each of the ring has diameter 2R ∼ 3.4σ, and thickness σsI = σ) linked via a semiflexible hinge, which favours a planar hand-cuff configuration with the centre of the rings a distance 2R apart (Fig. 2a, inset, and SI, Fig. S5).
Figure 2a shows the frequency with which the slip-links form loops of size l when koff = 0 and the persistence length of the fiber takes values equal to either 4σ or 16σ (see SI for more values of lp). Since the slip-link cannot unbind, these curves represent equilibrium looping probabilities, peq(l); they indeed show clear evidence of power law decay for large l (Fig. 2a, and SI, Fig. S6). The exponent is c ∼ 2, in line with the contact probability between internal points in a self-avoiding walk in a good solvent [4, 28, 29] (appropriate for a segment of open chromatin). Each of the curves in Fig. 2a also shows two peaks at small and intermediate l. The first peak is associated with the minimum loop length needed to bring two beads ∼ 2R apart. The second one is due to the competition between the energy required to bend the chromatin fiber and entropy of loop formation. This behaviour is recapitulated by an analysing the distribution function of internal distances of a semi-flexible polymer (see SI, Fig. S6).
An important question is to what extent this more accurate 3D description can account for CTCF-mediated looping spanning several hundreds of kbp under realistic values of koff. In Figure 2b we plot the probability pslip(l) that a slip-link reaches a separation l along a flexible chromatin fiber. In other words, we ask, as in our 1D model in Figure 1, whether diffusing cohesin dimers can reach CTCF sites separated by a distance l before disassociating. Our simulations predict that such loops can indeed form; for instance a 100 kbp loop can form with probability ∼ 0.3 with min (see also SI, Suppl. Movie 1 and Fig. S7). We highlight that, in agreement with our 1D non-equilibrium models, the decay of pslip(l) is only compatible with an exponential, rather than a power law decay typically found in equilibrium simulations (Fig. 2a).
C. Multiple slip-links and the osmotic ratchet
So far, we have considered the case of a single cohesin dimer diffusing on chromatin. When, instead, multiple slip-links are present on the same chromatin segment, they may interact either sterically or entropically.
We first modify our 1D model to simulate the stochastic dynamics of N slip-links diffusing along a chromatin fiber of size L (figs. 3a-c, see Methods), discretized into segments of length σsl. Each slip-link can exist in an unbound or chromatin-bound state while the binding and unbinding rates are kon and koff respectively. When binding, the two slip-link monomers always occupy neighbouring sites along the fiber. [For simplicity, we set kon = koff; then, the number of bound and unbound slip-links is equal in steady state.] There are excluded volume interactions between cohesins, such that the ends of the slip-links cannot cross each other. In the SI, we present further results which include a “looping weight” (Figs. S3 and S4), i.e. an effective potential which accounts for the entropy, or probability of formation, of a network of loops [14, 15]. This effective potential has a quantitative effect on our results, but it does not modify the qualitative trends; in this section we report findings from 1D stochastic simulations without looping weight, as this simpler version is simpler to analyse theoretically.
We consider two cases: in the first one slip-links bind at random (unoccupied) locations on the fiber; in the second one binding occurs at a preferred “loading site”. Figure 3a shows the time average of the maximal loop size 〈lmax〉 in steady state as a function of N, for the case of random rebinding. As the fiber gets more crowded, the slip-links form consecutive loops (Fig. 3a, inset, and Fig. S3c) which compete with each other. The maximum loop size which can be formed is thus limited and we observe that 〈lmax〉 decreases steadily with N (Fig. 3a).
A strikingly different result is found when slip-links always bind to the same location. This scenario mimics the experimental finding that the topological association of cohesin to DNA is facilitated by a loader protein (e.g., Scc2 or NIPBL [1, 16]), which has preferential binding sites within the genome. In this case, we observe that the maximum loop size 〈lmax〉 increases with N, rather than decreasing (Fig. 3b), thereby favouring effective loop growth over shrinking. In other words, the system now works as a ratchet, which rectifies the diffusion of the two ends of the loop subtended by a slip-link. The typical loop network found in steady state is very different from the case of random rebinding, and now consists of a large proportion of nested loops (Fig. 3b, inset, and Fig. S3d), which reinforce each other rather than competing for space along the fiber. Figure 3c shows the probability distribution of sizes for the largest loop and confirms the dramatic difference between the cases with and without preferential “loading site”.
In order to fully address this intriguing ratchet effect, we performed 3D Brownian dynamics simulations of a chromatin fiber interacting with N slip-links which can bind and unbind, in the presence of a loading site (see Methods). We find that the cooperative behaviour of multiple slip-links loaded at a specific site again leads to ratcheting and, in particular, we find that the outer loops can easily span hundreds of kbp (Fig. 3d) even when considering only a few slip-links. This ratchet effect may therefore provide a microscopic basis for the loop extrusion model in [21–23], valid under conditions where several cohesins (or other molecular slip-links) are bound to the same chromatin region.
The inset in Figure 3d shows a typical snapshot of our 3D simulations, which also highlights that nested loops are formed by closely “stacked” slip-links (Fig. 3d, inset, and Fig. S3e) that can be easily recognised in arc-diagram representations by characteristic “rainbow” patterns (SI, Suppl. Movie 2). Stacking is triggered by entropic forces which tend to diminish the total number of loops [14] and, thus, cluster the slip-links together. This behaviour is reminiscent of the “bridging-induced attraction” [32, 33] which drives the formation of protein clusters, although the underlying mechanism is here purely entropic.
To understand the emergence of a self-organized ratchet, we construct a simple theory by further analysing the 1D model (without looping weight). The key factor is the existence of a non-uniform slip-link density, and hence an osmotic pressure; the associated pressure gradient creates a force that rectifies the motion of cohesin rings placed close to the loading site. If volume exclusion does not significantly affect the density or pressure profiles (an assumption which is true if Nσsl/L is small enough, and which holds in our 1D stochastic simulations, Fig. S2), we can write down the following phenomenological equation determining the size of a loop subtended by a symmetrically progressing slip-link starting from the loader where Noff = Nkoff/(kon + koff) is the average number of unbound cohesins.
The maximal speed of this “osmotic ratchet” is achieved for loops close to the loading site, and is v ∼ konNoffσsl, which holds for The maximal possible ratchet speed, achieved for , is instead v ∼ D/σsl, similar to the case of a “Brownian ratchet” modelling actin polymerisation close to a fluctuating membrane [35]. Eq. (3) further predicts that at a given time, l should grow logarithmically with N, and our data are indeed fitted well by the functional form a + b log N (Fig. 3b). While the theory we have presented explains why the case with and without loading are fundamentally different, and why the former can create a ratchet, we should not expect it to be quantitative as it describes the dynamics of a typical loop, rather than the largest one, which is considered in Figure 3. In this respect, a more refined theory would require the application of extreme value statistics to a problem of N random walkers [36]: it would be of interest to pursue this analysis in the future.
Discussion and conclusions
In summary, we have proposed a series of non-equilibrium models to study the dynamics of molecular slip-links which bind to and detach from a chromatin fiber, and can slide diffusively in 1D along it when bound. These slip-links are a model for cohesins, condensins or other structurally similar proteins which bind DNA by topologically embracing it.
We suggest that these slip-links may play a pivotal role in the dynamic organization of chromosome loops. First, we have shown that 1D diffusive sliding of cohesin [16] is sufficient to explain the experimentally observed bias favouring the looping of convergent CTCF binding sites over divergent ones. The only additional assumptions are that the binding kinetics violate detailed balance, and that cohesin and CTCF interact in a directional manner, in agreement with experimental evidence [9, 22, 24]. Second, we have found that the probability of formation of cohesin/CTCF-mediated loop follows an exponential decay; hence it is fundamentally different from the power laws which govern polymer looping in thermodynamic equilibrium. Third, we have shown that a non-trivial and self-organized collective behaviour emerges when multiple slip-links slide along the same chromatin region. In particular, when binding occurs preferentially at a “loading site”, a ratchet effect arises, where slip-links set up an osmotic pressure which rectifies molecular diffusion. This ratchet provides a viable microscopic mechanism yielding extrusion of chromatin loops, which has been postulated by recent models of chromatin organization [22, 23].
Each of these results depends critically on our assumption that the binding/unbinding interactions between slip-links and chromatin violate detailed balance so that the system is out of equilibrium. This assumption is consistent with the current understanding of cohesin, which displays an ATPase activity associated with conformational changes. Importantly, our results show that the formation of convergent CTCF-mediated loops spanning several hundreds of kbp is consistent with simple unbiased diffusion at the molecular level, in the absence of any background motor activity or unidirectional motion of the slip-links, as was required in the previous loop extrusion model [22].
A consequence of our work is that it poses well-defined constraints on the minimal cohesin diffusion coefficient, D0 (in μm2/s) and chromatin compaction, C (in bp/nm), which are needed for slip-links to be able to organise chromosome loops of hundreds of kbp, such as the typical convergent CTCF-mediated loops found in mammalian genomes [7]. Specifically, our analysis shows that a single slip-link requires a value of D0C2 ∼ 10 kbp2/s or more, to reach the end of a 100 kbp loop. The worst possible case for our theory occurs if the substrate along which cohesin slides is decompacted, as covering the same genomic stretch will then require larger D0. This scenario, corresponding to C ∼ 10 bp/nm (i.e., a 10-nm fiber), requires D0 ∼ 0.1 μm2/s for loops to form in practice. This value is achievable for 1D protein diffusion along chromatin [30].
Recent in vitro experiments of cohesin diffusion on naked DNA [16] have been used to extrapolate a slow diffusion rate of D0 ∼ 0.001 μm2/s on chromatin. If this is the case in vivo, a single slip-link can still create CTCF-mediated loops, but only if cohesin slides on a compact 30-nm-like fiber (C ∼ 100 bp/nm). In practice, though, the ratchet effect described in Fig. 3 would enhance diffusivity to facilitate loop formation (e.g. ∼ 10 cohesin molecules can effectively increase D0 by an order of magnitude). A second factor which may favour the formation of longer loops is that during the S and G2 phase of the cell cycle there is evidence of a subpopulation of cohesins with τ ≫ 20 min. To quantify these arguments, Fig. 4 shows the probability of formation of a long 570 kbp convergent CTCF-mediated loop over time, for chromatin fibers with different N (see also Fig. S7). The results show that the osmotic ratchet is at work with as few as 3 bound cohesins per loop, and dramatically enhances looping probability (Fig. 4c).
We hope that the findings reported in this work will prompt new studies to measure cohesin diffusion accurately on reconstituted chromatin fibers in vitro, and as a function of the number of cohesins bound to the fiber. This would allow a test of our osmotic ratchet in the lab, and at the same time determine whether our mechanism of loop formation based on unbiased diffusive sliding can work in vivo. Furthermore, we envisage that high-throughput (e.g., ChIA-PET) experiments probing the looping probabilities controlled by different loop-mediating proteins (such as PolII) will also help illuminate general non-equilibrium features of loop formation in chromosomes. Finally, from a theoretical point of view, it would be of interest to study the behaviour of chromatin fibers subject to both molecular slip-links and more conventional bridging and writing proteins, such as those considered in [31–34, 37].
Methods
3D Brownian dynamics simulations of molecular slip-links
In our Brownian dynamics simulations we follow the evolution of a chromatin fibre and of one or more slip-links which are topologically bound to the fibre. The dynamics are evolved using the LAMMPS software [38] in Brownian dynamics mode (see SI).
Briefly, the force field includes: (i) non-linear springs between neighbouring chromatin beads to ensure chain connectivity; (ii) bending rigidity of the chromatin fiber; (iii) excluded volume interactions between any two beads, representing part of either the chromatin fiber or any slip-link. The main novelty in our simulations is represented by the slip-links, which are modeled as a pair of rings, each of which moves as a rigid body. The two rings are kept together by non-linear springs, and there are bending interactions favouring the “open” hand-cuff configuration (see SI, Fig. S5).
Slip-links can detach (when on the chromatin) or bind (when detached), at rates koff and kon respectively. Stochastic detachment/binding are simulating by means of an external code, which is interfaced with LAMMPS and is called every 1000 Brownian dynamics steps. More details are given in the SI.
1D stochastic simulations of molecular slip-links
1D stochastic simulations of N diffusing slip-link dimers on a chromatin fiber were performed by using a kinetic Monte-Carlo algorithm, where rules were defined as follows. At each time step, on average, we attempt to randomly move, either to the left or to the right, each of the monomers in slip-links which are bound to chromatin. Moves which would lead to clash between any two monomers are rejected. In the case with looping weight (see SI), we also include a Metropolis acceptance test, with an effective potential which mimics the entropic weight associated with the instantaneous looping network. At each time step, we also attempt to rebind on average each detached slip-links, and detach each bound slip-links, with rates kon and koff respectively; rebinding occurs either at a random position or at a loading site in the middle of the chromatin fiber.
NON-EQUILIBRIUM 1D MODELS OF A SINGLE SLIP-LINK: STOCHASTIC SIMULATIONS
In this section we consider 1D stochastic simulations of a single slip-link diffusing in a logarithmic potential in the presence of two CTCF proteins, at mutual distance l, which act as barriers. For simplicity, as in the main text (Fig. 1) we only consider the relative distance between the slip-link monomers, x, and assume that it performs a random walk in an effective potential, whereas in reality both monomers diffuse and are subject to a potential dependent on the monomer-monomer separation – we expect the two situations to be qualitatively analogous. With respect to the case considered in the main text, we here assume that there are no absorbing states, but rather that the slip-link gains an energy є when it reaches a separation between the monomers x = l (i.e., sticking between CTCF and cohesin is not permanent here, so x can decrease later on). Correspondingly, the detachment rate will decrease at x = l: for concreteness, we assume koff is constant, and equal to k0, for x ≠ l, while it is equal to koff = k0e−є/(kBT) for x = l. The single cohesin we model, once off, rebinds at rate kon = k0, and when it does the monomers always start close together, so x = σsl (which is equal to the lattice spacing in our simulations). The logarithmic potential is V(x) = ckBT log x, and we choose here c = 2.1 which corresponds to the formation of internal loops in a self-avoiding walk (see discussion in the main text, different values of c lead to the same qualitative trends). The logarithmic potential and CTCF-cohesin interactions are incorporated in the algorithm via a standard Metropolis acceptance test.
Fig. S1 shows a plot of the probability that the slip-link is on and has x = l (i.e., the probability that a CTCF-mediated loop forms) once steady state is reached. As might be expected, we find that increasing є strongly favours the CTCF-mediated loops, with respect to other states where the slip-link subtends a smaller loop size. This case is instructive because it suggests that a thermodynamic directional attraction between CTCF and cohesin (here, the interaction parametrised by є) is sufficient to favour the formation of CTCF-mediated loops. It should be noted that the model is still a non-equilibrium one, because koff is constant for x ≠ l, and, mainly, because upon rebinding the slip-link always returns to the case with x = σsl. This second feature renders our model (both here and in the main text) to some extent similar diffusion “with resetting” model considered in [1], although here the motion is further constrained by the logarithmic potential. Based on our results, we therefore suggest that non-equilibrium (re)binding (i.e., the resetting) and thermodynamic directional attraction are enough to explain the bias favouring the formation of convergent CTCF loops (є = 0) with respect to divergent ones (where there is no directional attraction, and hence є ≠ 0). Again, and as in the main text, because this is a non-equilibrium model, the probability of formation of CTCF-loop is not compatible with a power law: rather it decays approximately exponentially (see the log-linear plot in Fig. S1).
1D MODELS WITH MANY INTERACTING SLIP-LINKS, AND THE OSMOTIC RATCHET
1D model without looping weight
We now consider the case of multiple slip-links studied in the main text, and derive the formula for the density and effective extrusion force in the case where slip-links always rebind at the same “loading site”. This is the case which leads to the osmotic ratchet discussed in the main text. In this section, we consider 1D models (3D simulations are described separately below).
We first consider a simplified model without “looping weights”, where N slip-links simply diffuse on a chromatin fiber of length L: i.e., this model neglects the entropic cost associated with the formation of a given loop network. If we disregard excluded volume interactions we can write down the following partial differential equation for the (average) density ρ(x,t) of slip-links bound to chromatin at position x, where the loading site is located at x = 0, where is the average number of unbound cohesins (which are available to bind at the loading site). The three terms on the right hand side of Eq. (S27) respectively denote binding at the loading site with rate kon, unbinding with rate koff from any site, and diffusion. Note that here D is the diffusion constant for a slip-link monomer moving along the chromatin fiber. This equation does not include noise, therefore it should be seen as a mean field theory, which predicts the average value of ρ(x,t). The steady state solution of Eq. (S27) which decays for x → ∞ (relevant for L → ∞) is given by where A is a constant and in a similar way to before we define . Similarly to what was previously done in the section “Exactly solvable non-equilibrium models”, the constant A can be determined by integrating Eq. (S27) around 0, from x = −є to x = +є, and then sending є → 0. This procedure leads to the requirement that and therefore ρ(x) in steady state is given by
Computer simulations of N slip-links diffusing with excluded volume interactions on a chromatin fiber of size L confirm that the average density profile of bound slip-links is an exponentially decaying function centred on the loading site, in good agreement with Eq. (S30) even for a large number of slip-links (Fig. S2).
The 1D pressure exerted by the slip-link gas is equal to p(x) = NkBTρ(x); for a given slip-link at position x, there will be a difference in the pressure on the inside and outside of each head of the link, resulting on an outward force. Since the size of one of the slip-link head is σsl, we can estimate the force acting on a head at position x as follows:
If we now imagine a slip-link placed symmetrically around the loading site, so that its two heads are at positions ±x, then the osmotic pressure will tend to increase the size of the loop l= 2x. If we assume for simplicity, that the loop will remain symmetrical with respect to the loading site, we can write down the following equation for the effective extrusion velocity of the loop, v = dl/dt, where γ is the slip-link’s effective drag coefficient. Eq. (S32) predicts that the maximal extrusion speed is when the loop is close to the loading site, where it can be approximated as where we have used the fluctuation-dissipation relation D = kBT/γ. Note that the solution of Eq. (S32) is given by so that this simple theory predicts that extrusion should slow down with loop size, which should only increase logarithmically at later times. Note that Eq. (S34) predicts the average evolution of the loop size for a slip-link binding at the loading site at t = 0, whereas in Fig. S2 in the main text we plot the size and distribution probability of the largest loop at a given time. However, this simplified theory is useful as it clarifies that loops can be extruded provided the steady state slip-link density ρ(x) is not constant. Of course, if there is not a preferred loading site, the first term in Eq. (S27) becomes konNoff/L: in this case ρ(x) is constant in steady state, and there is no longer an osmotic pressure driving extrusion, in line with the results discussed in the main text for the case with random rebinding.
1D Model with looping weight, and looping diagrams
The model discussed above corresponds to the case “without looping weight”. The case “with looping weight” discussed in the main manuscript can be considered by introducing an entropic potential which affects the motion of slip-link monomers (in practice, this is done through a standard Metropolis test). For simplicity, we assume that all loops are Gaussian, i.e., we disregard self-avoidance effects in this calculation. To compute the looping of a given configuration of slip-link heads (e.g., that in Figs. S3a,b), we first identify all loops. The number of loops, n, is equal to the number of bound slip-links, Nb, and we label their sizes l1,…,ln (see Figs. S3a,b). We then identify the number of “simple loops”, which do not contain another loop inside. In general, there will be a number ns ≤ n of simple loops. The probability of formation of each loop is ∼ l−3/2, and this is weighted by another factor for chromatin fiber. The looping weight is then simple loops to model the energetic cost of bending; is a constant associated with the persistence length of the chromatin fiber. The looping weight is then
This looping weight is defined up to a multiplicative constant, and, in turn, it defines the potential in which the slip-links move (up to an irrelevant additive constant) via
In Fig. 3 in the main manuscript we present results without looping weight; Fig. S4 shows the results of simulations with looping weight, with κ = 8 (in units of Δx−2, where Δx = σsm/2), a choice corresponding to a rather flexible polymer. The results show that the looping weight makes a notable quantitative change, but the qualitative trends are very similar to those in Fig. 3 in the main manuscript, with the model with loading leading to the ratchet effect discussed above and in the main manuscript.
Diagrams such as that in Figure S3 are useful to determine visually the looping topology without the need to show the 3D configuration. Such diagrams are used extensively for RNA secondary structure representations; we refer to these in our context as “looping diagrams”. In the text we refer to some specific loop configurations which are most easily described by these diagrams: these are the “consecutive loop” arrangement in Figure S3c, the “nester loop” one in Figure S3d, and “the stacked loops” of Figure S3e, where some of the loops in a nested loop are packed close to each other. As discussed in the main text, stacked loops are entropically favoured, hence they appear often in our simulations: in Supplementary Movies 2 and 3, where each arc is colored differently, stacked loops appear as rainbow patterns.
3D BROWNIAN DYNAMICS OF A CHROMATIN FIBER WITH A SINGLE MOLECULAR SLIP-LINKS
In this section we give details and additional results for the three-dimensional Brownian dynamics simulations of a slip-link sliding diffusively on a chromatin fiber, which are discussed in the main text.
Brownian dynamics: force field and other simulation details
In our Brownian dynamics simulations we follow the evolution of a chromatin fiber and of a slip-link which is topologically bound to the fiber. The dynamics are evolved using a velocity-Verlet integration scheme within the LAMMPS software [2] in Brownian dynamics mode (NVT ensemble).
The chromatin fiber is modelled, as in Ref. [3], as a bead-spring self-avoiding and semi-flexible polymer; each of its beads have size σ. If we denote the position of the centre of the i-th chromatin bead by ri, and the separation between beads i and j by di,j = |ri − rj |, then we can express the finitely-extensible non-linear (FENE) spring potential modelling the connectivity of the chain as follows: for di,i +1 < R0 and UFENE(i,i + 1) = ∞, otherwise; here we chose R0 = 1.6 σ and k = 30 є/σ2.
The semi-flexibility (bending rigidity) of the chain is described through a standard Kratky-Porod potential, defined in terms of the positions of a triplet of neighbouring beads along the polymer as follows: where we set the persistence length lp = 4σ (which maps to ≃ 120 nm – see below; this is reasonable for chromatin [4]).
Self-avoidance is ensured by introducing a repulsive Weeks-Chandler-Anderson (WCA) potential between every chromatin bead as follows: for di,j < 21/6σ, and ULJ(i, j) = 0 otherwise. In Eq. (S39) we set є = kBT.
The total potential energy experienced by chromatin bead i is given by and its dynamics can be described by the Langevin equation where m is the bead mass, ξ is the friction coefficient, and ηi is a stochastic delta-correlated noise. The variance of each Cartesian component of the noise, , satisfies the usual fluctuation dissipation relation .
In order to model slip-links we build a pair of rings out of beads also of diameter σ, and allow each ring to move as a rigid body. The translational motion of the centre of mass of the ring is described by a Langevin equation as in Eq. (S41), while rotation is described by a similar equation where the force term is replaced by the torque on the centre of mass, calculated from the forces experienced by the component beads of the ring. Each ring is composed of 10 beads arranged so that it is large enough to encircle the chromatin fiber. The two rings are held together by a pair of FENE bonds (as in Eq. (S37)), and they are kept in an open “handcuff” arrangement via two bending interactions (as in Eq. (S38), but with lp = 100σ). The slip-link beads interact with each other, and with chromatin beads with the WCA potential described above. Figure S5A shows the arrangement of a pair of rings and indicates the interactions between them. Slip-links are attached to a chromatin fiber by first positioning them in a folded handcuff arrangement such that each ring encircles an adjacent polymer bead; the bending interactions between the two rings then act to open the the handcuff, and bend the polymer (see Fig. S5B). After this the slip-link is free to diffuse in 3D and along the polymer.
As is customary [5], we use simulation units where the mass of a polymer bead m =1, and the distance and energy units σ =1 and є = kBT respectively; the simulation time unit is given by There are two other time scales in the system, the velocity decorrelation time τin = m/ξ and the Brownian time τB = σ/Db; we set the friction ξ =1 meaning that τin = τLJ = τB. Here Db = kBT/ξ is the diffusion coefficient of a bead of size σ. From the Stokes’ friction coefficient for spherical beads of diameter σ we have that ξ= 3πηsol σ where ηsol is the solution viscosity. For the slip-link rings we set a total mass of each ring of mr = 2.75m; keeping τin = τLJ ensures a suitably larger friction ξr for these larger proteins (this means we approximate that each ring diffuses like a sphere of diameter 2.75σ). The numerical integration of Eq. (S41) uses a time step Δt = 0.01τLJ.
The mapping from simulation to physical units can be made as follows. Energies are mapped in a straightforward way as they are measured in units of kBT. To map length scales from simulation to physical units, we set the diameter, σ, of each bead to ∼ 30nm≃ 3 kbp (assuming an underlying 30 nm fiber hence a 100 bp/nm compaction; of course, all our results would remain valid with a different mapping). For time scales, by requiring that the mean square displacement of a polymer bead matches that measured experimentally in Ref. [6], as done in Ref. [7], we obtain that τB = τLJ = 0.1 s.
Brownian dynamics: additional results
In Fig. 2 in the main text, we have shown the looping frequency for a slip-link, with a flexible chromatin fiber (lp = 4σ). While this value is appropriate for open chromatin, which is usually found within CTCF-mediated loops [8], from a theoretical point of view it is of interest to ask what is the effect of changing the persistence length lp.
Figure S6A shows the frequency of looping as a function of loop size, as found with our Brownian dynamics simulations. It can be seen that, while the first peak, for smaller values of the loop length, remains in a similar position for all values of lp, for stiffer polymers there is a second shoulder, or smaller peak, for larger values of the loop length. Fig. S6B shows a prediction of looping probabilities obtained by using the analytical estimate in Ref. [9]. These results show that both peaks can be explained by considering a simple theory for semi-flexible polymers which neglects excluded volume interactions, by assuming that looping via a slip-link is equivalent to the constraint that the two ends of a loop are separated in 3D by a distance r. Physically, the first peak arises because very small loops cannot form, as a loop must at least span a distance r. The second peak is related to the well known optimal size of a loop in a semi-flexible polymer, which comes about due to the competition between the entropic cost, which favours shorter loops, and bending penalties, which favours longer loops [10].
For completeness, we report here the form of the analytical approximation for the distribution probability of the end-to-end distance r of a semiflexible polymer of size L, pL(r), used in Ref. [9]: where
In Eq. (S43), I0 is a modified Bessel function of the first kind, and J(L) is the so-called Shimada and Yamakawa J-factor, measuring the ring closure probability for a wormlike chain (see [10] for details and the derivation of this factor).
Finally, Fig. S7 shows results from simulations of a model chromatin fiber of size L = 2000σ, split into sections, in each of which we place a slip-link (see Fig. 4 of the main manuscript). At the end of each section we locate beads with high affinity for the slip-link – modelling convergent CTCF sites (the end “CTCF bead” of a given section is 5σ away from the start “CTCF bead” of the next section). The results show the fraction of slip-links which reach the sticky CTCF sites for different values of the persistence length, lp, and of the loop, or section, size (see also Suppl. Movie 1, valid for lp = 8σ and a loop size of 90σ). The simulations “with ratchet effect” consider three slip-links per section, with the topology arranged so as to give three loops where the largest loop contains the middle loop which in turn contains the smallest loop (see also Suppl. Movie 2, and the corresponding nested rainbow rings determining looping topology). The “ratchet” effect, which is discussed more in detail in the next Section, leads to a dramatic increase in the fraction of large loops which can be formed within a flexible fiber. Our results also show that the stiffer the fiber, the more likely is the formation of a CTCF-mediated loops (Fig. S7B).
3D Brownian dynamics with multiple slip-links
In Figure 3d of the main text we present results from 3D Brownian dynamics simulations of multiple slip-links (see also Suppl. Movie 3). These simulations were performed with the same geometry and force field described in the Section “3D Brownian dynamics of a chromatin fiber with a single molecular slip-links”, but now we consider N slip-links, which can additionally: (i) bind with rate kon if detached, (ii) detach with rate koff if bound. In practice, to simulate this we have coupled LAMMPS with an in-house code modelling stochastic detachment and binding. Detached slip-rings are not simulated directly via Brownian dynamics but are taken into account to determine which slip-links unbind and which rebind.
ANALYSIS OF CHIA-PET DATA
To estimate the in vivo probability of finding cohesin and CTCF-mediated loops of a given length, we analysed ChIA-PET (Chromatin Interaction Analysis by Paired-End Tag Sequencing) data from Ref. [11] (data publicly available from the Gene Expression Omnibus (GEO), accession number GSE72816). In these experiments chromatin-chromatin interactions mediated by a specific proteins are identified using immunoprecipitation; in this way pairs of interacting chromatin regions which are both bound by CTCF are identified. In particular, Fig. 1d of the main manuscript shows the contact probability between CTCF-bound sequences in GM12878 cells (data set from GEO accession number GSM1872886). Data were sorted into bins of size 5 kbp according to loop size. This contact probability, in the range 100 – 1000 kbp, is compatible with an exponential decay (see Fig. 1d), where the decay length of the exponential is of the order of hundreds of kbp, within the typical range of CTCF-mediated loops [8]. These data can also be fitted by a power law, but the fit is quite poor (figS. 1e or S8a). The exponent resulting from the fit is ∼ −0.35, which is lower than and far from those which can be explained by a polymer physics model (e.g., ∼ −2.1 for an ideal self-avoiding walk, ∼ −1.5 for an equilibrium globule or ∼ −1 for a fractal globule). It is interesting to note that ChIA-PET contacts between sequences bound to RNA polymerase II appear to be better fitted by a power law albeit with a similarly low exponent (Fig. S8b; data set from GEO accession number GSM1872887 [11]). These result suggest that CTCF-mediated contacts (as well as RNA PolII-mediated ones) appear to obey contact decay laws which are incompatible with the laws which would be predicted on the basis of equilibrium polymer models. Notably, such polymer models can however account very well for the contact decay laws typically found in Hi-C experiments [12] (which probe chromatin-chromatin interactions genome wide without selecting for specific proteins). A possible explanation is that Hi-C contacts encompass many interactions arising randomly from spatial proximity in 3D (i.e. there does not have to be a protein mediated interaction, since any chromatin regions with close proximity can be captured), which can be explained by polymer physics assuming randomly diffusing polymers. Taken together these observations support our hypothesis that cohesin/CTCF mediated loops form through a non-equilibrium mechanism in vivo, and are distinct from other chromatin diffusion mediated loops.
CAPTIONS FOR SUPPLEMENTARY MOVIES
Suppl. Movie 1: This movie shows the dynamics corresponding to Suppl. Fig. 4A, where a chromatin fiber of length L = 2000σ and persistence length lp = 8σ is divided into sections of size 90σ; we assume that each of the sections contains a single slip-link (modelling cohesin) at all times, and that it is delimited by a bead at each boundary which is sticky for the slip-link, to model the presence of CTCF convergent sites. Each of the arcs shown in the movie tracks the positions of the two ends of each slip-link along the chromatin fiber. The interaction between CTCF and cohesin is large enough to ensure virtually irreversible binding on the timescale of our simulations.
Suppl. Movie 2: As Suppl. Movie 1, but now with lp = 4σ, and with sections of size 190σ, with three slip-links per section. It can be seen that the simultaneous presence of the three slip-links leads to a ratcheting effect which favours loop formation.
Suppl. Movie 3: This movie shows the self-organization of the osmotic ratchet. The dynamics are slightly different from that shown in Fig. 3 of the main manuscript: now there is not a fixed number of slip-links, but slip-links bind (i.e., are created, when the loading site is unoccupied) at rate kon = 10−3 s−1 and detach (i.e., are destroyed) at rate koff = 10−4 s−1. The formation of “rainbow patterns” with arcs tightly stacked against each other is due to entropic forces which favour the presence of a single loop, kept together by several clustered slip-links, over that of many loops, where slip-links are homogeneously distributed.
Acknowledgements
This work was supported by ERC (CoG 648050, THREEDCELLPHYSICS), by ISCRA Grants HP10CYFPS5 and HP10CRTY8P, by computer resources at INFN and Scope at the University of Naples, and by the Einstein BIH Fellowship Award to MN.