## ABSTRACT

We demonstrate that DNA-loops can stochastically propel the site-specifically bound transcription factors (TFs) towards the promoters. The gradual release of elastic energy stored on the DNA-loops is the source of propulsion. The speed of looping mediated interaction of TFs with promoters is several times faster than the sliding mode. Elastic and entropic energy barriers associated with the looping actually shape up the distribution of distances between TF binding sites and promoters. The commonly observed multiprotein binding in gene regulation is acquired through evolution to overcome the looping energy barrier. Presence of nucleosomes on the genomic DNA of eukaryotes is required to reduce the entropy barriers associated with the looping.

## INTRODUCTION

Looping of DNA is critical for the activation and expression of various genes across prokaryotes to eukaryotes [1-6]. Binding of transcription factors (TFs) with their specific *cis*-regulatory motifs (CRMs) on the genomic DNA activates the downstream promoters of genes via looping of the intervening DNA segment to form a synaptosome type complex [7]. In most of the biological processes, looping of DNA is warranted for the precise protein-protein interactions required for the gene expression and recombination [8]. The statistical mechanics of looping and cyclization of DNA has been studied extensively [4, 9, 10]. However, it is still not clear why the DNA-looping is an integral part of the transcription activation and repression although such underlying site-specific protein-protein and protein-DNA interactions can also be catered via a combination of one (1D) and three-dimensional (3D) diffusions of TFs [11-14]. It is also not clear how exactly the DNA-loop is formed between the CRMs and promoters via TFs though Rippe et.al., [3] had already taken the snapshots of the looping intermediates. Schleif [2] had argued that the looping of DNA can simplify the evolution of the genomic architecture of eukaryotes by not imposing strict conditions on the spacing between the TF binding sites and promoters. In this letter, we will show that the DNA-looping combined with an asymmetric binding energy profile can stochastically propel TFs towards the promoters along DNA. We further demonstrate that the physics behind the looping mediated propulsion of TFs along DNA actually shapes up the genomic architecture.

## THEORY

In our model, we assume that 1) TFs of interest (or multiprotein complex) has two different DNA binding domains (DBDs) corresponding to CRMs (DBD1) and the promoters (DBD2) similar to the synaptic complexes of transcription activation in eukaryotes and 2) TF reaches its specific binding site on DNA via a combination of 3D and 1D diffusions within the theoretical framework of Berg-Winter-Hippel [12-16]. Here 1D diffusion is always slower than the 3D diffusion. Therefore, any factor which speeds up the sliding will eventually speed up the overall searching of TFs. Site-specific binding of TFs with their CRMs causes bending of DNA [3, 4]. The *site-specific binding energy* (*E*_{bind}) released at the DNA-TF interface will be dissipated partially as the elastic energy required to bend the DNA (*E*_{elastic}), partially to form specific non-covalent bonds (*E*_{bond}) and partially as the energy required to compensate the chain entropy loss (*E*_{entropy}) at the specific binding site. The energy stored by the specific DNA-TF complex is *E* ≈ *E*_{bond} + *E*_{elastic}. In these settings, the DBD2 of TF needs to distally interact with the promoter and activate the transcription via looping of the intervening DNA segment.

Let us assume that the radius of gyration of TF is *rp*. Upon binding with its cognate stretch of DNA with size of *X*_{0} base-pairs (bp, where 1 bp = *l*_{p} = 3.4 × 10^{-10m}) located in between S1 to S2, the TF bends the DNA segment into a circle around its spherical solvent shell surface such that *X*_{0} = 2π *r*_{P} as shown in **Fig 1A**. We set *X* = 0 at S1 and *X* = *X*_{0} at S2 where *X* is the current location of the DBD2 of TF on DNA that spans over (0, *L*) as in **Fig. 1B**. Here *X* is also the loop-length. The energy required to bend a linear DNA will be *E*_{bend} = *E*_{elastic} + *E*_{entropy}. For a radius of curvature of *rP*, (*k _{B}T* units) where

*a*is the persistence length of DNA [9, 17]. Clearly,

*E*

_{elastic}required to bend

*X*length of DNA into a loop will be

*E*

_{elastic}≈ 2 π

^{2}

*a*/

*X*This energy has to be derived either from the specific binding energy of TFs or via an external energy input in the form of ATP hydrolysis [18]. We will investigate

*E*

_{entropy}later.

The energy stored in the site-specific DNA-TF complex can undergo three different modes of dissipation viz. 1) thermal induced physical dissociation of TF from DNA in which both bonding and elastic energies dissipate into the heat bath along with increase in the chain entropy, 2) physical dissociation of only DBD2 from S2 and re-association somewhere via looping over 3D space while S1-DBD1 is still intact as modelled by Shvets and Kolomeisky [19] and 3) stochastic propulsion of TF on DNA via sliding of DBD2 which can be achieved by gradual increase in the value of *X* from *X*_{0}. Here mainly the elastic energy dissipates that in turn causes bulging of DNA-loop around TF. The chain entropy does not increase much here since the intervening DNA is still under looped conformation. This is similar to the sliding of nucleosome via bulge induced reptation dynamics of DNA [20-22]. The probability of spontaneous dissociation will be inversely correlated with the bonding energy and it is an endothermic process. Clearly, physical dissociation will not be the most probable route of dissipation of the energy stored in the site-specific DNA-TF complex.

When the binding energy profile of TF is such that the bonding energy near S1 is much higher than S2, then the bending energy stored in the site-specific TF-DNA complex can be gradually released via bulging of DNA-loop around TF which in turn stochastically propels the sliding DBD2 of TF towards the promoter located at *L* as shown in **Fig. 1B**. Rippe et.al [3] have studied NtrC system where binding of NtrC at its specific site activates the downstream closed complex of *glnA* promoter-RNAP-σ^{54} via looping out of the intervening DNA. They have shown that the transition from inactive-closed to an active-open promoter complex involved an increase in the bending angle of the intervening DNA which in turn is positively correlated with an increase in the radius of curvature. This is represented as bulging of DNA-loop in our model. Therefore, our assumption that the propulsion of TFs via increase in the radius of curvature of the bent DNA is a logical one. Here the asymmetric binding energy profile is essential to break the symmetry of the stochastic force acting on the sliding TFs [23]. This is also a logical assumption since S1-DBD1 is a strong site-specific interaction and S2-DBD2 is a nonspecific interaction. **Fig. 1C** shows another possibility in the formation of DNA-loop which is common in case of silencer TFs. Based on these, the position *X* of TF on DNA obeys the following Langevin equation [24-26].

In **Eq. 1**, *F* (*X*) = −*dE dX* = 2*π*^{2}*a*/*X* ^{2} is the force acting on TF that is generated by the bending potential *E* ~ *E*_{elastic} upon bulging of the DNA-loop, Γ_{t} is the Δ-correlated Gaussian white noise and *Dc* is the 1D diffusion coefficient of the sliding of TF. The energy involved in the bonding interactions will be a constant one so that it will not contribute to the force term. Here we ignore the energy dissipation via chain entropy mainly because binding of TFs at their specific sites attenuates the conformational fluctuations at the DNA-TF interface [12, 15, 27]. The Fokker-Planck equation describing the probability of observing a given *X* at time *t* with the conditions that *X* = *X*_{0} at *t* = *t*_{0} can be written as follows [24, 25].

The form of *F*(*X*) suggests that it can propel the DBD2 of TF only for short distances since lim _{X→∞}*F* (*X*) = 0 although such limit will be meaningless for *X* > 2*π*^{2}*a* where *E*_{elastic} will be close to the background thermal energy. Initial condition for **Eq. 2** will be *P* (*X, t*_{0} | *X* _{0}, *t*_{0}) = *δ* (*X* - *X* _{0}) where *X* _{0} = 2*πr*_{P} and the boundary conditions are,

Here *X*_{0} acts as a reflecting boundary for a given size of TF and *L* is the absorbing boundary where the promoter is located. The asymmetric energy profile with respect to S1 and S2 is required for the validity of the reflecting boundary condition at *X*_{0}. Upon reaching the promoter via loop-expansion of the intervening DNA segment, TFs subsequently activate the transcription. The mean first passage time *T*_{B} (*X*) associated with the DBD2 of TF to reach *L* starting from *X* ϵ (*X*_{0},*L*) obeys the following backward type Fokker-Planck equation along with the appropriate boundary conditions [15, 28].

The solution of **Eqs. 4** can be expressed as follows.

Here and interestingly lim_{L→∞}*T*_{B} (*X*) = *T*_{N} (*X*). Here *T*_{N} (*X*) is the mean first passage time required by the DBD2 of TF to reach *L* via pure 1D sliding in the absence of DBD1 which is a solution of the following differential equation [12, 15, 28].

To obtain the target finding time, one needs to set *X* = *X*_{0} in **Eqs. 5** and **6**. One can define the number of times the target finding rate of TF can be accelerated by the looping mediated propulsion over 1D sliding as *η* = *T*_{N} (*X*_{0})/*T*_{B}(*X*_{0}) which is clearly independent of *D*_{C} of TF and solely depends on (*L, a,* and *X*_{0}). Explicitly one can write it as,

This is the central result of this letter. Detailed numerical analysis (Supporting Material) suggests that there exists a maximum of η at which∂ η/∂*L* = 0 with *L* = *L*_{opt} and clearly, we have lim_{L→∞} η = 1 (**Figs. 2A** and **B**). This is logical since when *L* > *L*_{opt} then η → 1and when *L* < *L*opt then the stored energy is not completely utilized to propel the DBD2 of TF. Further, since its numerator part goes to zero much faster than the denominator (**Fig. S1**, Supporting Material). The persistence length of typical DNA under *in vitro* conditions is *a* ~ 150 bp and the radius of gyration for most of the eukaryotic TFs will be *rP* ~ 10-15 bp. Therefore, one can set the initial *X* = 2*πr*_{P} ~ 50-100 bp [30, 31]. Simulations (**Fig. 2A**) on η at different values of *X*_{0} and, *L* from *X*_{0} to 10^{5} suggested that *L*_{opt} ~ 3*X*_{0} (see **Figs. 2C** and **2D**). When *a* ~150 bp and *X*_{0} ~ 50-100 bp, then *L*_{opt} ~ 150-300 bp. Remarkably, this is the most probable range of the distances between the CRMs and promoters of various genes observed across several genomes [32].

## RESULTS AND DISCUSSION

The efficiency of the stochastic propulsion will be a maximum at *L*_{opt}. Although *L*_{opt} is not much affected by *a*, the maximum of *η* is positively correlated with *a*. This is logical since the stored elastic energy is directly proportional to *a*. Remarkably at *L*_{opt} the speed of interactions between CRM-TFs complex with the promoters will be ~10-25 times faster than the normal 1D sliding. These results are demonstrated in **Figs. 2**. Here we assumed that the nonspecifically bound DBD2 of TF does not dissociate until reaching the promoter which is valid only for where *k*_{r} is the dissociation rate constant [15] that is defined as where and *μNS* is the nonspecific binding energy associated with DBD2. Clearly *μNS* ≥ 12 *kBT* is required to attain *L* ~ 300 bp which can be achieved via multiprotein binding.

Noting that *E*_{elastic}≈ 2*π*^{2}*a*/*X* ~ 3000/*X* (for *a* ~ 150 bp), *E*_{entropy} component for a Gaussian chain can be computed as follows. Let us assume that looping occurs when where is the end-to-end distance vector, ξ is the minimum looping-distance (in m) and *Xl*_{d} is the maximum length of the DNA polymer. The density function of is) [33, 34] where *X* is the number of monomers in the polymer and *b* is the average distance between the monomers. The entropy loss upon looping of DNA is Δ*S*_{loop} ≈ln (*P*_{l} /*P*_{all}) (*kB* units) where is the probability of finding loops and is the probability of finding all the configurations including loops. Explicitly one can write down Δ*S*_{loop} as follows.

Here Erf is the error function [29]. When ξ ≃ *b* ≃ *l*_{d} is small then for large values of *X* [19]. This expression is closely linked with the Jacobson-Stockmayer factor, or *J*-factor associated with polymer looping [9]. One finally obtains that *E*_{entropy} (3/2) ln (*π X*/6).

Clearly, bending of linear DNA with size of 50-100 bp into loops requires the hydrolysis of at least 3-5 ATP molecules (using *E*_{bend} = *E*_{elastic} + *E*_{entropy}, 1 ATP ~ 12 *kBT*). Actually, *E*_{bend} will be a minimum at *X*_{C} ≈ 4π^{2}*a*/3 where the average search time required to form the synaptosome will be at minimum [19]. When *X* < *X*_{C} then *E*_{bend} ∞ *X*^{-1}. When *X* > *X* then *E*_{bend} ∞ ln (*X*). When *a* ~ 150 bp and *Xc* ~ 2 kbp then the minimum of *E*_{bend} ~ 13 *kBT* which requires the hydrolysis of at least 1 ATP. These results are demonstrated in **Fig. 3**. To simplify our model, we have ignored the entropic barriers imposed by the flanking regions of DNA. However, it increases only in a logarithmic manner along the chain length compared to the elastic energy. In the absence of energy input, biological systems can overcome the looping energy barrier via three possible ways viz. 1) multiprotein binding [10] which could be the origin of the combinatorial regulatory TFs in the process of evolution, 2) placing sequence mediated kinetic traps corresponding to DBD2 in between CRMs and promoters [35] and 3) the placing nucleosomes all over the genomic DNA to decrease the *E*_{entropy} component. All these aspects are observed in the natural systems.

In multiprotein binding, the free energies associated with the DNA-protein and protein-protein interactions among TFs will be utilized in a cooperative manner for the looping of DNA. Here DBD1 and DBD2 may come from different proteins. Vilar and Saiz [10] had shown that the looping of DNA would be possible even with small concentrations of TFs when the number TFs in a combination is sufficiently large. Multiprotein binding eventually increases *X*_{0} values. However, increasing *X*_{0} will eventually decreases both the maximum possible acceleration of TF search dynamics and the energy barrier associated with the DNA-looping. As a result, natural systems optimize *X*_{0} between these two-opposing factors for maximum efficiency via manipulating the number of TFs in the combinatorial binding. We conclude here with the open fundamental question. What is the exact mechanism of DNA-loop mediated transcription activation in the real systems? Is it via the stochastic propulsion of our model or via the repeated association-dissociation of DBD2 as proposed [19] by Shvets and Kolomeisky? Future single molecule experiments need to address these basic questions.

## CONCLUSION

In summary, for the first time we have shown that DNA-loops can stochastically propel the transcription factors along DNA from their specific binding sites towards the promoters. We have shown that the source of propulsion is the elastic energy stored on the specific looped DNA-protein complex. Actually, elastic and entropic energy barriers associated with the looping of DNA shape up the distribution of distances between TF binding sites and promoters. We argued that the commonly observed multiprotein binding in gene regulation might have been acquired over evolution to overcome the looping energy barrier. Presence of nucleosomes on the genomic DNA of eukaryotes is required to reduce the entropy barrier associated with the looping.

## Supporting Material

The mean first passage time (MFPT) associated with the DNA binding domain 2 (DBD2) of TF to reach the promoter of a gene via *DNA-loop mediated propulsion mechanism* while the DBD1 of TF complex is still tightly bound with S1 of DNA (see **Fig.1** of main text for details) can be given as follows.

Here *X*_{0} is the initial position of DBD2 of TF on DNA or the initial loop length, *L* is the location of the promoter, *a* is the persistence length of DNA and *Dc* is the one-dimensional diffusion coefficient associated with the sliding of DBD2 of TF along DNA. We have assumed here that the site-specific DBD1-S1 is strong and intact (**Fig. 1A** of main text). The function *G* can be defined as follows.

Here is the E_{1} exponential integral [1].

Noting that *T*_{N}(*X* _{0}) = (*L* – *X* _{0})^{2}/2*D*_{C} (from **Eq. 6** of the main text) which is the MFPT associated with the finding of the promoter by TF starting from *X*_{0} via pure sliding dynamics, one can define *η* which is the number of times the DNA-loop driven searching of TF for the promoter is faster than the normal 1D sliding dynamics as follows.

Clearly *η* is not dependent on *DC* and it depends only on the parameters (*L, X*_{0} and *a*). Further, since *T*_{N} (*X*_{0}) approaches zero much faster than *T*_{B} (*X* _{0}) as *L* tends towards infinity (see **Fig. S1** for details). There also exists an asymptotic limit as lim_{L→∞} *η*= 1. This means that (see **Fig. S2** for details). The optimum distance between CRMs and promoter i.e. *L*_{opt} at which *η* is a maximum can be obtained by solving *dη*/*dL* = 0 for *L* for given *a* and *X*_{0}. Explicitly one can write down this as follows.

This has a trivial solution *L* = *X*_{0}. Upon ignoring this one, *L*_{opt} can be obtained by numerically solving the following equation for *L* at given *a* and *X*_{0}.

## REFERENCES

- [1].