Stochastic motion and transcriptional dynamics of pairs of distal DNA loci on a compacted chromosome

Chromosomes in the eukaryotic nucleus are highly compacted. However, for many functional processes, including transcription initiation, the 3D pair-wise motion of distal chromosomal elements, such as enhancers and promoters, is essential and necessitates dynamic fluidity. Therefore, the interplay of chromosome organization and dynamics is crucial for gene regulation. Here, we use a live imaging assay to simultaneously measure the positions of pairs of enhancers and promoters and their transcriptional output in the developing fly embryo while systematically varying the genomic separation between these two DNA loci. Our analysis reveals a combination of a compact globular organization and fast subdiffusive dynamics. These combined features cause an anomalous scaling of polymer relaxation times with genomic separation and lead to long-ranged correlations compared to existing polymer models. This scaling implies that encounter times of DNA loci are much less dependent on genomic separation than predicted by existing polymer models, with potentially significant consequences for eukaryotic gene expression.


.1 Generation of the distance series
The endogenous eve locus was tagged with MS2 stem loops. First, an attP site was integrated at the first intron of eve using CRISPR-mediated homology-directed repair [1]. The two Cas9 cutting sites are marked in Fig. S1a. Second, an attB-MS2 plasmid was used to deliver MS2 into the attP site. A genomic source of phiC31 integrase (BDSC #34770) was used for the second injection. The synthetic eve-MS2 allele carries 10.6 kb between the two Cas9 cutting sites.
A series of attB landing site lines were established by mobilizing the P-element construct P-2xattB-mW from a CyO balancer to the chromosome containing the eve-MS2 allele. A ∆2-3 P-transposes line (BDSC #3629) was used for the transposition. Offspring displaying both Cy+ and w+ were subject to inverse PCR to map the location of the attB insertions. attB insertions at -3321.9, -589.3, +47.5, +77.5 and +179.3 kb from the eve promoter in cis are used in this study. Furthermore, in order to avoid the insertion biases of P-element, two MiMIC lines (MI02916 and MI00239, [2]) were recombined with the eve-MS2 chromosome to yield the -75.9 and -186.8 kb lines. The exact insertion points of these lines and their conversion to distances between the blue and green labelled spots are listed in Table S1.
Finally, the PP7 reporter transgenes (parS-homie-evePr-PP7 or parS-lambda-evePr-PP7) were integrated into the landing sites described above through recombination-mediated cassette exchange using BDSC #34770 as the integrase source. Transgenic flies with insertion orientations shown in Fig. S1a were obtained and used for the imaging experiments.
Fluorescent protein tagged MCP, PCP and ParB lines were made as previously described [3]. For maternal MCP::3xmTagBFP2 and PCP::3xmKate2, a genomic landing site at 38F1 was used. For maternal ParB::eGFP, a landing site at 89B8 was used. All microinjections were performed as described previously [4].  Figure S1: Map of the genomic constructs. The transcription start site (TSS) of eve (2R:9979319), the center of the endogenous homie element (2R:9988933) and the two Cas9 cutting sites (2R:9979605 and 2R:9980606, marked by stars) used for MS2 knock-in are labelled with dashed arrows. Each evePr-PP7 reporter line is named after the distance between its insertion site and the eve TSS, with a sign indicating upstream (-) or downstream (+) to the eve locus. Note that the new eve allele is larger than the endogenous one due to the presence of transgenic selection markers (GFP and RFP). The MS2-parS distance can be calculated from the distance shown below the map. Genomic distances and size of the sequence elements are drawn in scale.

Microscopy
Virgins carrying three fluorescent protein fusions (yw; MCP::3xmTagBFP2/PCP::3xmKate2; ParB::eGFP/+) were crossed with males containing the eve-MS2 allele and the parS-evePr-PP7 reporter transgenes. Embryos from the crosses above were dechorionated manually and mounted as described in [5].  Table S1: Details of the distance series. All breaking point coordinates are from the Dm5 reference. Throughout the text, we cite the MS2-parS distances, and refer to their rounded values in legends for simplicity.
A Leica SP5 confocal microscope with an oil immersion 63x NA 1.44 objective was used in all imaging experiments. Three laser lines at 405 nm, 488 nm and 591 nm were used to excite mTagBFP2, eGFP and mKate2, respectively. For each z slice, the emission channels for 405 nm and 591 nm were acquired simultaneously, followed by the 488 nm channel. Power measured at the objective for the 405 nm, 488 nm and 591 nm lines was 0.4 µW, 1.1 µW and 0.5 µW, respectively. Three HyD detectors in photon counting mode were used to collect fluorescence emission spectra.
The voxel size for all images is 107x107x334 nm 3 (x/y/z). The stack interval for all time-lapse videos was 28 s, except for the ones shown in Fig. 3c, 4b, and 4d, where the stack interval is 5.4 s. For the data with a 28 s frame rate, an 8 µm z-stack covering the whole nucleus along the apical-basal direction is acquired during the 28 s time interval. Images were taken at 1,024 x 256 x 25 voxels and focused on the posterior half of the embryo, encompassing eve stripes 3-7. Embryos were timed by their exit from mitosis 13 [6]. All embryos were imaged from 20 ± 2 min to 55 ± 2 min in cell cycle 14. Based on a localization control construct, the measurement errors in spot localization in this setup are 180 ± 6 nm (mean ± s.e.m.) [3]).

Nucleus and DNA locus segmentation and tracking
Segmentation and tracking of cell nuclei and DNA loci was done as described before [3]. Briefly, we computed the difference between the blue and red channels (since NLS::MCP::3x mTagBFP2 is enriched in the nuclear compartment while ParB-eGFP is enriched in the cytoplasm) by subtracting the maximum z-projection of the green channel from the blue channel, followed by Gaussian blurring (σ = 5) and binarization (local Otsu's threshold at 5 x 5 µm 2 [7]). To account for the motion of the whole embryo and the nuclei during imaging, we inferred the nuclear movement by minimizing the cross-correlation between nuclear masks of two consecutive frames. All nuclear segmentation and tracking results were scrutinized manually.
For DNA locus segmentation, we first identify candidate loci using sharpened raw image stacks from each of the three channels using a 3D bandpass filter (size 11 x 11 x 7 pixels), derived by subtracting a uniform filter from a Gaussian kernel (σ = (1, 1, 0.6) pixel). All local maxima of this filtered image are considered as candidate loci. At each local maximum, we constructed cylinder mask with diameter of 13 pixels (1.4 µm) and a height of 7 pixels (2.3 µm). We then set an intensity threshold such that the maximum number of candidate loci in the nucleus at each time point was less than 20, followed by filtering the set of candidate locus using information on nuclear lineage, spot tracking and the relative location of spot pairs.
For DNA locus tracking, we calculated the sub-pixel centroid of each candidate locus, only considering candidate loci located in the corresponding nuclear region obtained from the nuclear segmentation results. Spot tracking was then performed in three steps: a pre-tracking step, a gap-filling step and a Bayesian filtering step [3].

Chromatic aberration correction
Imaging the positions of the eve-locus and the ectopic reporter with different wavelengths introduces chromatic aberrations, i.e. an instrumental error. This appears as a systematic error in our data (constant position displacements), which can vary across the field of view. Since positions of imaged loci are random, we assume that on average they have to be centered at zero, and any systematic errors are caused by the chromatic aberrations between the two channels. We perform a correction of this systematic error using data-driven and internally controlled (i.e., for the entire data set on any given day of data collection) calibration [3]. We consider the entire field of view and make a correction at each position in the 3D field of view.
We assume that the mean of the blue-green vector along each dimension is zero. For example, the probability that a blue spot appears above the associated green spot should be the same as the probability that it is below the associated spot. By applying linear regression to each dimension of displacements (x, y, and z) and extracting a 3x3 matrix, effectively the slope of this bias across the field of view in each dimension, the spot coordinates from a given color channel are scaled by the matrix to obtain the correction vector. This correction vector is the difference in the scaling of two channels, using one as a reference for the other to be matched to. Subtracting the correction vector from the original spot gives the corrected spot centroid, which can then be used for displacement calculation. This procedure effectively rescales the coordinates of all spots as to readjust the aberration slope across the fields of view to zero.
The assumptions underlying this approach were tested explicitly using control experiments with colocalized fluorescent proteins and three-color TetraSpec beads [3] (Fig. S2). This approach also allowed estimation of the residual measurement error in spot localization as being 180 ± 6 nm, in line with the errors inferred from our MSD analysis, see below (section 4.1).
Chromatic aberration correction was performed on a weekly basis, using all the embryos of the same genotype imaged over the week. An example of the correction is given in Fig. S2. We used this datadriven and internally controlled calibration scheme to estimate the blue-green distance for all genomic constructs, which we report in the main text.   Figure S3: Tracking of cell nuclei. a, Trajectories of nuclei (black), enhancer (blue) and promoter (green) within a single embryo (s=149 kb) in the lab frame of reference. b, Motion corrected DNA locus trajectories. c, Single-locus MSD M 1 (t) (Eq. (S25)) of the enhancer (blue), promoter (green), their center of mass (r 1 (t) + r 2 (t))/2 (red), and the nucleus trajectories (black). Approximate scaling of the nucleus MSD scaling is indicated. d, Velocity cross-correlation of the motion corrected locus trajectory and the nucleus trajectory, indicating no correlation beyond the first time point t = ∆t, as expected for purely random measurement errors in the nucleus tracking (section 4.3).

Nucleus motion correction
Following tracking of the DNA loci, two further operations are performed: (i) we corrected for chromatic aberration to gain access to the dynamics of the 3D distance vector R ij ; (ii) we performed motion correction of the trajectories to obtain the absolute motion of DNA loci within the nucleus frame of reference.
To correct chromatic aberration, we perform a data-driven, internally controlled calibration; and approach that was tested extensively in previous work [3]. Briefly, we pooled raw instantaneous locuspair distances across time, nuclei, and embryos, and analyzed the raw distances as a function of the locus-pair positions in the image field of view. To correct aberration, we applied a multivariate normal regression model to infer the correction matrix that allows the calculation of calibrated distances (see section above), which were used throughout further analysis.
To perform motion correction, we subtract the nucleus trajectories from the DNA locus trajectories (Fig. S3a,b). In many experimental systems, this is challenging as nuclei often undergo complex combinations of rotation, shape changes, and translation. During the final phase of the syncytial blastoderm stage of the developing Drosophila embryo, nuclei undergo very little relative motion, do not rotate, and simply show affine global translation, which we can track as described above. As the same nucleus trajectory is subtracted from both DNA loci, measurement errors in the nucleus tracking could introduce spurious correlations in the observed relative motion of the two DNA loci. To exclude systematic measurement errors in the nucleus tracking, we calculate the velocity cross-correlation of the motion corrected locus trajectory and the nucleus trajectory (Fig. S3d), which is consistent with the expectation for time-uncorrelated measurement errors in nucleus tracking (section 4.3). Furthermore, we find no traces of spurious motion in the trajectories of the DNA locus center of mass, as its MSD follows the same subdiffusive exponent as that of individual loci, which should be contrasted with the superdiffusive exponent of the nucleus movement (Fig. S3c). Together, these tests demonstrate that nucleus tracking and motion correction only introduces random time-uncorrelated measurement errors in the locus trajectories, which we account for using error-correction in the velocity cross-correlations (section 4.3).

Hidden Markov Model inference
Based on the architecture of our genetic system comprising the enhancer and promoter, which are each flanked by a homie insulator element, the system can be in one of three states: (i) A state where the two homie elements are unbound, termed O off . In this state, we expect large inter-locus distances. (ii) A topologically distinct state where the two homie elements are bound, but where transcription is inactive, termed P off . In this state, we expect small inter-locus distances, and no signal in the red channel. (iii) A state where the two homie elements are bound and transcription is active, termed P on . In this state, we expect small inter-locus distances, and the red channel to be on. The existence of these three states was demonstrated previously using various control experiments with individual elements of the genetic cassette missing [3]. To verify the assumption that active transcription corresponds to smaller physical distances in all sampled genomic separations, we plot interlocus distances depending on the intensity of the red channel (Fig. S4). This confirms that transcriptionally active states are associated with smaller physical distances, consistent with previous observations [3]. b full data set data with red < 0.25 data with red > 0.25 a Figure S4: Interlocus distances as a function of transcriptional state. a, Scatter plots of the entire data set for each genomic separations, with points colored by whether the red signal is on (red intensity > 0.25; red), or off (red intensity < 0.25; blue). b, Histograms of the interlocus distances for the entire data set (grey), points where the red signal is on (red intensity > 0.25; red), or off (red intensity < 0.25; blue).
Together, these arguments therefore suggest an underlying structure of state transitions as shown in Fig. S5a. Transitions between O and P toggle between two distinct topological states; transitions be-tween OFF and ON toggle between two transcriptionally distinct states. Importantly, the details of the state inference performed here does not influence our later analyses on the polymer configurations or dynamics, which are based directly on the observed distance trajectories, which do not sensitively depend on the inferred underlying states (see Supplementary Section 2.6).
To infer the particular state of the system at each moment in time, we modeled the data (inter-locus distance and intensity time-series) as a Hidden Markov Model (HMM) [8] calibrated using a Markov Chain Monte Carlo (MCMC) sampler [9]. The general idea underlying 'classical' inference is to maximize the probability of the data under some model, namely to find the parameters Θ that maximize the likelihood of the data D. In this manuscript we adopted a Bayesian approach, simultaneously estimating the state identity and the probability of the state transition rates given the observed data, i.e., the joint posterior distribution P(Θ|D), using Bayes' rule: where P(Θ) is the prior that encodes for prior knowledge about the parameter values, which we take to be log-uniform here since these are real positive and have some unknown scale [10].
The likelihood of the data P(D|Θ) is expressed in terms of the likelihood of the individual nuclei timeseries, which are independent: where D i = {(R 0 , R 1 , ...), (I 0 , I 1 , ...)} corresponds to the time series of the inter-locus distance R and the transcriptional fluorescence intensity I for a single nucleus. Assuming a HMM, the likelihood for a single nucleus takes the following form: where λ t is the hidden state of the system at time t (either O off , P off , or P on , Fig. S5a), P(R t , I t |λ t ) the "emission" probability, P(λ t |λ t−1 ) the transition probability, P(λ 0 ) the initial state probability, and the sum is performed over all possible state configurations Λ ≡ (λ 0 , λ 1 , ...). The likelihood above is computed efficiently using the forward algorithm [8].
For our specific case, we modeled the emission probability that relates the data to the hidden states, as two independent probability distributions, i.e. P(R, I|λ) = P(R|λ)P(I|λ). The first is the probability distribution P(R|λ) for the inter-locus distance R ij given each state λ. We find that the Maxwell-Boltzmann distribution provides a good fit to the experimental distributions of inter-locus distances (Fig. S5e). We therefore assume this general functional form for the distance distributions: This distribution is quantified by a single parameter σ R , which is the average distance, i.e. R ij = σ R . This is the probability distribution of end-to-end distances of an ideal chain polymer, with σ 2 R = Na 2 , where N is the number of monomers and a is the monomer length [11]. We choose this general functional form here as it simplifies the inference (single parameter) and also provides a good fit to our empirical observations. The second distribution P(I|λ) models the fluorescence I of the red channel (as a measure of transcriptional activity) conditioned on whether the state λ is transcriptionally active (P on ) or inactive (O off or P off ). To capture the broad distribution of fluorescence intensities in the transcriptionally active state (P on ), we employ a Rayleigh distribution as an effective model to approximate the expected overdispersed Poisson noise due to transcriptional bursts and the additive Gaussian measurement noise : Importantly, this distribution remains simple as it is quantified by a single parameter σ I , and turns out to be flexible enough to also account for the distribution of background fluorescence intensities in the transcriptionally inactive state (O off or P off ). Indeed, the above distribution captures the empirical distribution of intensities decently well (Fig. S5f), and the specific choice we made here did not impact significantly our state calling.
Next, the transition between the three states (O off , P off , and P on ) are defined by the generator or Laplacian matrix of the system according to the network in Fig. S5a: The resulting transition probabilities P(λ t |λ t−1 ) in Eq. S3 are then computed as the matrix exponential of G, i.e. P(λ t |λ t−1 ) ≡ exp[G∆t], where ∆t is the sampling time interval. The initial state distribution P(λ 0 ) is chosen as the steady state distribution of the system, which is obtained by solving the linear system GP = 0.
Lastly, Θ represents the entire set of parameters governing the observed distances, the observed intensities and the state transition rates with a total of six physical scales and five kinetic parameters that need to be inferred: Due to the underlying symmetries of our model, the eleven parameters above are not fully identifiable (structural identifiability). Indeed, in absence of structure imposed on the σ R and σ I , the states O off , P off and P on could be permuted and the inference would be degenerate. However, as we mentioned in the beginning of this section (see Fig. S4), we have some prior knowledge about the structure of the system. From the data, it is clear that Thus, to break the symmetries, we decided to enforce σ R (P off ) = σ R (P on ) ≡ σ R (P), effectively reducing the number of parameters by one. The same treatment could be performed on the σ I 's (i.e., enforcing σ I (O off ) = σ I (P off )), further reducing the number of parameters. Instead, we decided to keep these three parameters free as a control and monitor the relative scale of the σ I 's.
Inferring Θ for each clone individually (i.e., for different genomic separation s) is a challenging task, given the large number of parameters (ten parameters) and the risk of overfitting. This is especially true for the clones for which we have limited data (n ∼ 50 traces for s = 57, 82 and 88 kb, see Table S5). However, given our system, most of the parameters could be shared among the clones, since genomic separation is unlikely to affect all the σ I 's, σ R (P), and the rates { f 2 , b 1 , b 2 , b 3 } related to the P off ↔ P on transitions and the homie-homie unbinding transition P off → O off . To drastically simplify our inference problem, we thus initially made the assumption that only f 1 and σ R (O off ) are clone-specific (and thus , σ I (P on )} related to the observation distributions. c, Posterior distributions P(Θ|D) of the transition rates that are shared among the different clones, i.e., parameters that are assumed independent of genomic separation. d, Posterior distributions P(Θ|D) of the parameters that are specific to each clone, i.e., the distance σ R (O off ) and the rate f 1 . e, Comparison between the calibrated observation distributions (dotted lines, Eq. (S4) using the inferred σ R (O off ) and σ R (P)) and the underlying state based empirical distributions for distances. f, Comparison between the calibrated observation distributions (dotted lines, Eq. (S5) using the inferred σ I (O off ), σ I (P off ) and σ I (P on )) and the underlying state based empirical distributions for intensities. depend on genomic separation s), whereas all the other parameters are shared among the clones. Such a structure defines a hierarchical model characterized by the following likelihood function: where D = {D(s 1 ), ..., D(s N s )} is the full data set made of all the genomic separation s j , P(D(s j )|Θ) is the likelihood of the data for a specific s j (see Eq. S2), , f 1 } are the local parameters specific to each s j . Notably, this assumed hierarchical structure leads to a massive reduction in number of parameters, namely #Θ = 8 + 2 · N s instead of #Θ = 10 · N s (with N s = 6, #Θ = 20 versus #Θ = 60 respectively).
We first estimated the joint posterior distribution P(Θ H |D) of our hierarchical model (S8). The posterior P(Θ H |D) was sampled using a MCMC sampler [9], which enables estimation of the marginal posterior distribution for each parameter. All the marginal posterior distributions are shown in Fig. S5b-d, and the resulting parameter estimates (mean and standard deviation) can be found in Table S2 and S3. Interestingly, the posterior of the rate b 3 looks like the prior (log-unfiorm) and the mean rate b 3 is much smaller than b 2 . This clearly indicates that the contribution of b 3 to the model dynamics is negligible and 0.094 ± 0.006 0.034 ± 0.003 0.063 ± 0.005 0.002 ± 0.002 Leveraging the inferred parameters of the hierarchical model, we then reconstructed the states underlying the inter-locus distance and transcriptional activity of individual nulcei using a Viterbi algorithm and posterior decoding [8]. These algorithms take as input the time series of the distances R and red fluorescence values I of a single nucleus. The Viterbi algorithm finds a single state trajectory Λ = (λ 0 , λ 1 , ...) that maximizes P(Λ|D i , Θ), namely the most likely trajectory given the data and the inferred parameters. Alternatively, we also used posterior decoding that enables computation of the state marginal distribution P(λ t |D i , Θ) at time t given the data and the inferred parameters. A single state trajectory Λ = (λ 0 , λ 1 , ...) can then be computed from the state marginal distribution as λ t = argmax P(λ t |D i , Θ). It turns out that for our system, both approaches, either the Viterbi or Posteriro deconding, predict essentially the same underlying latent state trajectories (see example in Using the state inference, we then checked whether the emission probabilities P(R|λ) and P(I|λ) (Eq. S4 and S5) calibrated with the inferred parameters (Fig. S5e,f). Overall, the agreement is excellent supporting the validity of our inference. Lastly, to test our assumption behind our hierarchical model, that most parameters  are shared, we perform inference for each genomic separation separately. To do so, we fixed σ I (O off ), σ I (P off ) and σ I (P on ) to the values of the calibrated hierarchical model (Table S2) and we set b 3 = 0 as this parameter appeared unnecessary (see paragraph above). We then re-estimated f 1 , for each clone individually and these can be found in Table S4. Overall, the resulting parameter values are very close to the estimates of the hierarchical model (see also section 2.5, and Fig. S10), thus supporting our key assumption.

Validating the HMM using simulated data
To validate our reconstruction of the states and the inference of the underlying transition rates, we tested our HMM approach on simulated data. We aimed to generate synthetic data sets that mimic real data, namely simulated data that satisfy the same latent state transition with realistic rates We chose to test our approach on three different genomic separations; short (s = 58 kb), medium (s = 149 kb) and long (s = 595 kb). For each s, we generated 300 time-series of blue-green distance R and transcriptional intensity I , whose duration equalled 40 min and sampling time ∆t 30 sec as in real data. Time-series for the blue-green distance were generated using simulations of a Rouse polymer model with 1000 beads. We tracked the position of two beads equidistant from the central bead of the polymer, with a separation s/2 from the center such that the "genomic distance" was given by s = 58, 149, 595 beads. We then rescaled time and length-scales such that the relaxation time of the polymer τ = s 2 and the average inter-locus distance matched the experimental values for each value of s. To capture the dependence of the transition from the O off to the P off state on the dynamics of the 3D distance, we defined a simple implementation of distance-dependent homie-homie binding. Specifically, we defined a success rate of homie-homie binding r, such that the two ends of the polymer bind with probability r when in close proximity (defined by a threshold on the 3D distance of 400 nm). The parameter r was assumed constant for all simulations and set to 0.1% to approximately reproduce the f 1 rate estimated from real data. Note that this parameter r cannot be compared to the biological success rate of real homie binding, as it depends on the time-scale at which the system is sampled. Once the two homie elements are bound, we include an additional linear spring linking the two loci, with spring constant k link = 0.02k, where k = 1 is the spring constant of the backbone of the polymer. The parameter k link was chosen to generate distances R whose mean R 400 nm in the P off and P on state. In addition, once the P off state is reached, we ran a Gillespie algorithm to simulate the transition from the P off to P on state and from the P off back to the O off state [12]. The Gillespie simulation was performed according to the state transition in Fig. S5a, with rates set by the global parameters determined from real data (Table  S2 and b 3 = 0).
Next, to simulate the intensities I in the O off and P off state (i.e., the background), we generate uncorrelated samples from a Rayleigh distribution with parameter σ I (O off ) and σ I (P off ) set from Table S2, respectively. Lastly, to simulate the transcriptional intensities in the P on state, we sampled in log-space from a Gaussian processĨ(t) with meanμ, varianceσ 2 and covariance function K(t, t ). We chose K(t, t ) = exp (−|t − t |/τ e ) (Ornstein?Uhlenbeck) and we set the correlation time to τ e = 2 min, which approximates the temporal correlation in the transcriptional signal. The meanμ and the varianceσ 2 were defined such that the intensity time-series in real-space I(t) = exp(Ĩ(t)) is log-normal distributed with mean and variance equal to a Rayleigh distribution of parameter σ I (P on ) determined from Table  S2. Namely,μ andσ 2 are given by: All the simulated time-series were assumed to start in the O off state. As an example, a pair of simulated distance R and intensity I time series is shown in Fig. S7c.
With these three simulated data sets in hand (s = 58, 149 and s = 595 kb), we first aimed to compare the inferred parameters to the input ones used to generate the data. We performed the HMM inference for each data set separately, by sampling the joint posterior distribution P(Θ|D) with MCMC as described in section 2.1. Thus, for each data set, we estimated the ten free parameters all at once, namely The resulting marginal posterior distributions for s = 149 kb are displayed in Fig. S7a with the input (true) parameters highlighted by a vertical dashed line. For each simulated data sets, we estimated the relative error on all parameters. Overall, the absolute relative error for the rates are 30% on average (Fig. S7b top), while for the observation parameters these are 7% on average (Fig. S7b bottom). Thus, even when estimating all the parameters at once (hardest case), the errors remain reasonable and clearly allow us to distinguish different scales among the parameters.
Having checked our ability to infer parameters correctly, we then tested whether our state inference was accurate. We thus performed posterior decoding using our trained HMM on the three simulated data sets, as described in section 2.1. The whole state reconstruction for s = 149 kb is shown in Fig. S7d,e; the inferred state trajectories are visually hardly distinguishable from the ground truth. Assessing our reconstruction error on all simulated data sets, we recovered the correct state for more than 95% of the time points (Fig. S7f). Overall, the state inference is excellent and does not strongly depend on the underlying error on the rate parameters, since the signatures of the different states in the data are rather strong (especially for O off and P on ).

Single trajectory state analysis
Here, we demonstrate that the majority of single-cell trajectories explore the full range of interlocus distances that are sampled by the whole ensemble of trajectories in all three inferred states. From the ensemble of state trajectories (Fig. S8a), we plot the distance distributions for each single trajectory as a function of its inferred transcriptional and topological state (Fig. S8b-c). We find that in the majority of cases, the single trajectory distributions approximately match the population averaged distribution, suggesting that the single-cell trajectories explore the full range of distances of each state.
As a more stringent criterion, we perform a hypothesis test of the null hypothesis that the single-cell distribution is indistinguishable from the ensemble averaged distribution. Specifically, we use the twosided Kolmogorov-Smirnov test comparing for each trajectory the single-trajectory distribution to the population distribution. The null-hypothesis that the distributions are the same is rejected if p < 0.05. We find that for most data sets, for the majority of trajectories in all states for all genomic separations, the null-hypothesis is not rejected (Fig. S8e).

Estimation of transcriptional lifetime
To estimate the lifetime of transcriptional activity, we use the inferred state trajectories (section 2.1) to calculate the typical time the system remains in the P on state. Lifetime estimation is challenging due to the problem of censoring, referring to the fact that trajectories may start or end in the P on state, which gives only partial access to the lifetime in such a trajectory. To deal with this problem, we use the Kaplan-Meier estimator for the survival function of the transcription lifetimes [13]. The survival function S(t) gives the probability that the transcriptionally active state has not transitioned to a transcriptionally inactive state, and is defined as where p(t) is the probability distribution of transcriptional lifetimes. We use the Kaplan-Meier estimator [13,14] where N  As a second method to estimate the median lifetime is a maximum likelihood estimate under the assumption that the lifetime distribution is exponential, i.e.
where τ trans is the mean transcriptional lifetime. Then, the maximum likelihood estimate of τ trans is given by [14] τ trans = 1 and the median lifetime is given by ln(2) τ trans .
The median lifetime estimates from both methods are given in main text Fig. 2d. For the genomic separation of 595 kb, there are only 5 observed P on states out of a total of 164 trajectories, meaning that the lifetime is hard to estimate for this sample (see last panel of Fig. S9a). For 3.3 Mb, there are no observed P on states.
Furthermore, we compare these estimates to simply taking the mean of all observed lifetimes (Fig. S9b), showing that also in this case, there is no variation with genomic separation, and the typical lifetime is ≈ 10 min.  Figure S9: Lifetime-estimation using survival probability curves. a, Kaplan-Meier survival curves for each genomic separation. Shaded regions: 95% confidence intervals [14]. Grey lines: exponential fits obtained by using the maximum likelihood mean lifetime (using Eq (S13) in Eq. (S12)). b, Median lifetimes obtained by the exponential fit (triangles), or by reading off the time at which the Kaplan-Meier survival curves cross S = 0.5 (dots). Equivalent to main text Fig. 2d for the P on state. The mean of median lifetimes across genomic separations is noted above (error bars: standard deviation calculated from total variance). c, Mean lifetimes obtained by the exponential fit (triangles) and by simply taking the mean of all observed lifetimes (squares). Errorbars on exponential fit results represent 95% confidence intervals [14], and on means represent bootstrap error bars. Panels a-c, are for the P on state (red), while panels d-f show the same plots for the P off state (cyan), and panels g-i for the joined paired state when the system is either in P on or P off (magenta).

HMM parameter scaling analysis
To assess which parameters of our HMM model scale with genomic separation s, we systematically compare the parameters of our global inference using the hierarchical model to the parameters obtained by doing inference on each s separately (see also section 2.1). The hierarchical model was built on the assumption that most parameters are independent of s, and thus can be shared across clones (Eq. S8).
We first analyzed the parameters that were assumed to not scale with s (Fig. S10a), namely the distance σ R (P) and the rates { f 2 , b 1 , b 2 } related to the P off ↔ P on transitions and the homie-homie unbinding transition P off → O off . Overall, these parameters, when inferred separately for each s, remain numerically very close to the global estimate (black line) made with the hierarchical model. In addition, the power-law exponents characterizing the scaling of these parameters as function of s are close to zero (within error bars), validating our assumption that these are mostly independent of s. Next, we investigated the parameters that were assumed to scale with s (Fig. S10b), namely the open distance σ R (O off ) and the rate f 1 . When these parameters are inferred separately for each s, we again recover values that are numerically very close to the global estimate (black dots). The observed scaling of the parameters with s are very similar between the two inference approaches (hierarchical vs individual model, black and gray lines respectively). Moreover, the estimated scaling exponents are significantly different from zero and match the estimation based on data (Fig. 2).
Next, we investigated the impact of the parameter scaling on the steady state occupancy (i.e., the probability) of the three states. To calculate the occupancy, we solved analytically the steady state equation GP = 0, where G is the Laplacian matrix of the system (Eq. S6). We obtained the following expression for the probabilities as function of the rates: Using the equations above, the conditional probabilities P(P off |P) and P(P on |P) are easily calculated: Interestingly, both P(P off |P) and P(P on |P) do not depend on f 1 , and thus are not expected to vary with genomic separation s. Indeed, when computing these conditional probabilities using the parameters of the hierarchical model (Fig. S10c, black line), we found P(P off |P) = 0.65 and P(P on |P) = 0.35, which are independent on s and match the values estimated from data Fig. 2e. Similarly, when computing these probabilities using the parameters from the individual fits (Fig. S10c, color dots), the probabilities cluster around the global estimate (albeit s = 58 kb appears to be an outlier).
We then investigated the scaling of the occupancies themselves (Eq. S14). Given our parameter regime, we expect the following scaling for the probabilities: P(O off ) ∼ 1/(c + f 1 ) and P(P) ∼ P(P off ) ∼ P(p on ) ∼ f 1 , where f 1 ∼ s −1.17±0.51 (see Fig. S10b). We computed these probabilities using both the hierarchical and individual fit parameters and found scaling relationships that are very consistent with our expectation (Fig. S10d).
Next, we investigated the scaling of the lifetime (mean residence time) of the different states. In our HMM framework, the lifetimes are phase-type distributed and are easily computed from the state tran-sition matrix (Eq. S6). We found: Only T(O off ) depends on f 1 , and thus varies with s, whereas T(P off ), T(P on ) and T(P) are supposed to be mostly independent on s. We computed these lifetimes using both the hierarchical and individual fit parameters and found scaling relationships that are again very consistent with our expectation (Fig.  S10e). Indeed, T(P off ), T(P on ) and T(P) are essentially constant, while T(O off ) varies as the inverse of f 1 .
Lastly, we compared the occupancies and lifetimes predicted by the HMM parameters to the occupancies and lifetimes estimated from data based on state splitting (Fig. S10f,g). Overall, the excellent agreement between the different quantities predicted by the HMM parameters and the ones estimated from data (using state splitting) supports the validity of our inference and is indicative of the mathematical consistency of our approach. Importantly, it should be noted that such an agreement is not necessarily trivial. Indeed, the HMM model mainly acts as prior for the state inference, thus if the data possessed features that could not be accounted for by the model, discrepancies would have arisen.

Results without state splitting
The analysis of the two-locus dynamics shown in the main text is done for the O off only. To demonstrate that our results on the two-locus dynamics do not sensitively depend on the details of the state inference, we compare the statistics of the O off state to those obtained from the entire data set comprising all three states. Since the data are dominated by the O off state, the state splitting does not have a strong impact on most statistics, especially for large genomic separations. As expected, the average distances of the full data are lower than those of the O off state only, and the inferred two-locus diffusion coefficients Γ 2 are lower. However, key scaling exponents are not strongly affected, including the scaling of the inferred relaxation times τ (Fig. S11).

Analysis of no-homie data
To test the role of the homie insulator-mediated focal contacts for the transcriptional dynamics in our system, we employ a reporter construct in which the homie sequence is replaced by a λ DNA sequence of the same length. In this system, only two topological states are possible: a transcriptionally inactive configuration O off and a transcriptionally active configuration P on . To account for this in our state inference, we simplified our HMM hierarchical model by only modeling the O off ↔ P on transitions with rates f 1 and b 1 . Thus, we had only four global parameters {σ R (P on ), σ I (O off P on ), σ I (P on ), b 1 } and two local parameters {σ R (O off ), f 1 } to infer from the no-homie data. The resulting inferred parameters are displayed in Fig. S12. The state inference was then performed using this model specifically calibrated on the no-homie data.
Based on the state trajectories, we find that there is still frequent transcriptional activity for genomic separation 58 kb (8.5 ± 0.8%), but almost none (< 1% of time-points) for 149 kb (Fig. S13a). We furthermore compare the statistics of the 3D interlocus distance between the homie and no-homie constructs of the same genomic separation. Due to the different statistics of the state dynamics in the two constructs, we compare data from the O off state in both cases. We find that distance distributions largely overlap (Fig. S13b). The two-locus MSDs also exhibit a similar shape and are well fitted by the ideal chain theoretical expression, although we observe small, but not systematic, differences in the two-locus diffusion coefficient between homie and no-homie constructs (Fig. S13c).

Analysis of single-locus dynamics
The analysis of the motion of single chromosomal loci is only possible if any displacement and rotation of the cell nucleus as a whole is corrected for, as this would affect the apparent dynamics of the DNA loci. This problem does not affect relative measures based on the inter-locus distance vector R ij (t) (section 4.1). In particular correcting rotations is challenging as at least three loci need to be tracked to infer global rotations. Previous works have therefore focused solely on the statistics of the inter-locus distance [14][15][16], or, in recent work, relied on multi-locus tracking to correct global movement [17].
In our system, however, the only global mode to be corrected is an overall affine translation of the cell nuclei within the embryo, which we can correct for by nucleus tracking in the xy-directions (section 1.3). Nucleus tracking is difficult in the axial z-direction. We therefore estimate the statistics of single locus diffusion using the xy-components of the dynamics.
To characterize the motion of single loci, we calculate the single-locus mean-square-displacement (MSD) defined in each spatial dimension i = {x, y, z} as where averages are averages over time and samples. The 3D MSD is then given by Assuming that all three spatial dimension are identical, we therefore estimate M 1 (t) from the x and y components via M 1 (t) = 3 2 (M (x) . We find that the single-locus MSD is consistent across enhancer vs promoter loci, for different genomic separations, for different imaging time-intervals, and for the three inferred topological states (Fig. S14).
Additionally, we also report the velocity autocorrelation functions of single loci [18], given by where velocities are calculated over variable time-interval δ = n∆t where ∆t is the imaging timeinterval: v (δ) (t) = (r(t + δ) − r(t))/δ. We find that these correlations collapse with δ and are well described by the Rouse model prediction [18,19] (Fig. S15):

Analysis of two-locus dynamics
In this section, we provide details of how we test the theoretical predictions from section 5 using experimental data by calculation of two-locus MSDs, inference of the relaxation time (section 4.1) and collapsing the two-locus autocorrelations (section 4.2). Note before any further analysis, we correct R ij to account for chromatic aberration (section 1.4).

Two-locus MSD and relaxation time inference
As shown in main text Fig. 3, we find that the early-time scaling of the two-locus MSD exhibits behaviour close to the ideal Rouse exponent β = 1/2, and the long-time scaling of the two-locus autocorrelations is also close to the ideal Rouse expectation. Thus, we attempt to fit the complete two-locus MSD with the analytical prediction obtained for the ideal chain Rouse model [14]. We use the expression for an infinite continuous polymer; for discussion of the assumption s N, see section 5: where J = R 2 and τ = (J/Γ 2 ) 2 . Note that we refer to the diffusion coefficient obtained from the two-locus MSD as Γ 2 , while we write Γ 1 for the single locus diffusion coefficient. For any homogeneous polymer model, such as the ideal or crumpled chains, Γ 1 = Γ 2 . However, we do not make this constraint on the data, and indeed find that the fitted Γ 1 and Γ 2 differ (Fig. S14).
We find that the two-locus MSDs are well-fitted by Eq. S28, and the fitted value of J coincides with the measured asymptote R 2 (Fig. S17a). Additionally, the presence of measurement errors in the tracking affects the MSD curves, leading to an additive contribution of 4(σ 2 x + σ 2 y + σ 2 z ), reflecting that the error can be different in each spatial direction. We perform a Bayesian maximum likelihood fit of the twolocus MSD using the python-library tracklib [14] with free parameters {J, Γ 2 , σ z }. We use a log-uniform prior for these parameters since they are real positive and have some unknown scale. We do not fit measurement errors in the x and y-directions as the estimates for these errors obtained from such a 5-parameter fit are consistent with σ x = σ y = 0, and therefore not significant.
To determine the inference errors on the fitted parameters, we perform bootstrapping as described in refs. [20,21]. Briefly, from each data set of N cell trajectories {r k }, where k = 1...N, we generate N BS bootstrap realizations by randomly sampling N trajectories with replacement for each realization. To obtain the error in the parameters {J, Γ 2 , σ z }, we infer their value for each bootstrap realization by performing Bayesian MSD fitting on each realization and take the standard deviation of all obtained fit parameters as our estimate for the error in each fit parameter. The distributions of the bootstrap estimates are shown in (Fig. S17b-d), and range between 1 − 6% in the polymer parameters J, Γ 2 , with larger errors in the inferred measurement error amplitude σ z . Having obtained our estimate for the relaxation times, we compare them to the time-scales of transitioning from the O off state to the paired configuration, estimated from the HMM model: T(O off ) = f −1 1 . We find that these transition times correlate well with the relaxation time (Fig. S16a), and their absolute value is larger by a factor 10 on average (Fig. S16b). Furthermore, we demonstrate the accuracy of the relaxation time inference using ideal chain Rouse simulations. Our sampling of the two-locus MSD is limited by two key factors: the imaging time interval ∆t and the maximum trajectory length t max . We aim to infer the relaxation time scaling from a set of genomic separations with minimum and maximum relaxation times τ min and τ max , corresponding to the shortest and longest genomic separation. The inference will be limited if ∆t > τ min or t max < τ max . Based on the inference from experimental data, the shortest and longest genomic separation are close to these thresholds (Fig. S17a). To test the accuracy of the inference in this case, we generate simulated data with t max /τ max = 1 (as in the experimental data for s = 3.3 Mb), and vary ∆t/τ min (Fig. S18a). Experimentally, we have ∆t/τ min = 0.76 (for s = 58 kb). The larger this ratio, the more difficult the inference. However, we find that the inference performs well for ratios as high as ∆t/τ min = 3, and recovers the correct scaling of the relaxation times for various time interval to relaxation time ratios (Fig. S18b). An alternative, simplified way to obtain an estimate of the relaxation time is by fitting the early time twolocus MSD and measuring the intercept with the measured value of 2 R 2 (Fig. S19a). This intercept is then given by τ ∼ ( R 2 /Γ 2 ) 1/β . We test this simplified approach both with β ≈ 0.5 as measured (1.1 ± 0.03) × 10 6 (4.7 ± 0.1) × 10 4 3328 10 179 340 ± 10 (2.1 ± 0.1) × 10 6 (4.35 ± 0.09) × 10 4  from the single-locus MSD. We find that this simplified approach recovers the strongly reduced scaling exponent compared to the ideal Rouse expectation of γ = 2 (Fig. S19b).

a b
Figure S19

Two-locus autocorrelation collapse
We make use of the predicted collapse of the two-locus auto-correlation (ACF) (Eq. (S57)) to test the quality of the collapse for the relaxation time exponents for various polymer models and compare them to the anomalous scaling collapse S20. As expected, we find that the correlations do not collapse for the ideal and crumpled chain relaxation time scaling exponents, but they do exhibit an approximate collapse when using the measured relaxation time scaling γ = 0.7.

Velocity cross-correlations
Error velocities, defined by C These correlations are challenging to measure as the absolute motion is affected by global motion of the cell nuclei. Since the syncitial nuclei in the early Drosophila embryo are relatively stable and only exhibit slow affine translations, we correct for global motion in the xy-direction and estimate velocity cross-correlations for the x and y components as done for single locus MSDs (section 3). To further correct for the random localization errors in the nucleus tracking, we derive error-corrected correlation estimators. As we show in Fig. S3d, measurement errors in the nucleus tracking are uncorrelated in time. Importantly, however, even time-uncorrelated measurement errors can impact the estimated velocity cross-correlations, because the nucleus movement is subtracted from the motion of both loci, which may lead to spurious additional correlations in their apparent velocities. To overcome this issue, we derive the noise correction terms of the velocity cross-correlations, providing correlation estimators that are unbiased with respect to measurement noise.
Specifically, throughout, we consider the position of locus i in the frame of reference in the nucleus, where r i is the locus position in the lab frame and r n is the nucleus position in the lab frame. Our measurement of these positions contains measurement noise, and thuŝ where we assume temporally uncorrelated measurement errors, i.e. η i (t) = ξ(t) = 0 and η i (t)η j (t ) = δ ij δ(t − t ) and ξ(t)ξ(t ) = δ(t − t ). Here σ r is the measurement noise in locus tracking and σ n is the measurement error i the nucleus tracking. Combining these assumptions, we find and thusv (δ) Multiplying out, we can therefore find the correction terms to various correlation functions. First, we consider the correlation of the locus velocity with the nucleus velocity: This shows that under the assumption of uncorrelated noise this correlation should vanish except for τ = {0, δ}, which we verify on the experimental data, validating the assumption of time-uncorrelated tracking errors (Fig. S3d).
Similarly, in the velocity cross-correlations, most terms due to measurement noise average to zero, but the terms due to nucleus tracking errors do not, as the same random error affects both loci: Similarly, for the velocity autocorrelation, we obtain Note that the above derivation was done in 1D, for 3D all contributions simply add up, meaning that σ 2 n → σ 2 n,x + σ 2 n,y + σ 2 n,z . Importantly, these corrections show that errors scale with the time-interval δ at which velocities are calculated. Consequently, we find that the estimated velocity cross-correlations are more strongly affected at small δ. To identify how large δ should be for our estimate to be robust against measurement noise, we plot the cross-correlation intercepts C vv (0)/C v (0) as a function of δ for different values of the estimated measurement error. Based on the fitted measurement errors from the two-locus MSDs (section 4.1), we expect σ ≈ 200µm. We find that for δ in the range of 5 to 60 seconds, the estimated cross-correlations are strongly affected by realistic measurement error amplitudes (Fig. S21a). In contrast for δ > 60 s, the correlations are largely insensitive to realistic measurement errors for all genomic separations (Fig. S21b). The larger δ, however, the smaller the sample size of available timewindows in a finite trajectory, resulting in larger random errors. As a tradeoff, we therefore focus our analysis of velocity cross-correlations on δ = 90 s, but additionally show variation with δ in Fig. S22.

Theory prediction of velocity cross-correlations:
The velocity cross-correlation of two loci in the ideal chain Rouse model have been calculated analytically in ref. [19]. Importantly, in the limit s N, where N is the total polymer length, and δ τ N , where τ N is the relaxation time of the whole polymer, the cross-correlations of two loci separated by genomic separation s are completely determined by the dimensionless ratio δ/τ, where τ is the relaxation time of the segment s. The correlations are calculated using the tabulated values in ref. [19] (using viscoelastic exponent α = 1).
Thus, we can predict the velocity cross-correlations expected based on the inferred relaxation times without any free fitting parameters. We predict the correlations for varying s based on the scaling of Here, we additionally demonstrate the ideal chain prediction for varying A, showing that the prediction fails for any possible value of A (Fig. S22a).
To predict the correlations expected based on the anomalous scaling that we find in experiments, we use the measured value of A = 1.89 s kb −γ and γ = 0.7. Thus, both parameters are fixed based on the experimental data with no additional fitting, giving the blue line in Fig. S22a. Based on the arguments in the previous section, suggesting that larger time-intervals δ are less affected by measurement errors, we use δ = 300 s in the main text figure. Here, we additionally show the correlations for varying values of δ (Fig. S22b,d). The functional dependence of the correlations on δ is shown in Fig. S22c. We find that the correlations are generally well predicted by the theory curves that are completely constrained by the inferred relaxation time, with no further fitting involved.

Two-locus dynamics: theory
In this section, we provide a synthesis of the literature on the scaling exponents measured in this paper, and explain the theoretical arguments underlying our analysis and interpretation two-locus dynamics.

Two-locus MSD:
Here, we provide details of how we quantify the joint dynamics of two coupled DNA loci using the two-locus MSD: i.e. the MSD of the 3D interlocus distance vector R ij (t) = r 1 (t) − r 2 (t), where r 1,2 are the 3D positions of the two loci.
The two-locus MSD gives insight into the coupled dynamics of the two loci, with two simple limits: at small times t τ, we expect the two-locus MSD to simply reflect the independent diffusion of the two loci, while at large times t τ, the MSD approaches a plateau given by the equilibrium size of the chain. Here, τ is the relaxation time which quantifies the cross-over between the two regimes.
In the large-time regime, the chain segment connecting the two loci has fully relaxed (under the assumptions that the system reaches a steady state), meaning that the two-locus MSD is given by the equilibrated size of the chain For a fractal polymer chain configuration, the average 3D RMS-distance R 2 scales with the genomic separation s: where the static exponent ν = 1/d is determined by the fractal dimension d of the polymer. The fractal dimension is d = 2 for an ideal chain polymer, and d = 3 for a crumpled chain.
In the small-time regime, the chain segment connecting the two loci has not had sufficient time to relax, and the two loci therefore move independently, meaning that the two-locus MSD is given by the sum of the single-locus MSDs: Importantly, the dynamic exponent β can be related to the static exponent ν, and thus the fractal dimension d under specific assumptions for the dynamics of the polymer, which we detail below. To avoid confusion, we try to be specific about which assumptions relate to the statics and which to the dynamics. For statics, we use the terms "ideal" and "crumpled chain statistics" for fractal dimensions d = 2 and d = 3, respectively. For the dynamics of such ideal or crumpled chains, we use the term "generalized Rouse assumption" to refer to the assumption of independent viscous friction on each monomer, consistent with the terminology in [22].
Generalized Rouse approach: Within the framework of a generalized Rouse approach, β can be related to the static properties for arbitrary fractal dimension d [22][23][24]: This is derived as follows: at a time-lag t, due to stress propagation, a region of spatial size ∼ t β/2 behaves effectively as a single monomer. Then, the key assumption is the so-called 'free-draining' limit, i.e. only local friction on every monomer and no hydrodynamic interactions. Thus, the effective diffusion coefficient D of this region then scales with the number of monomers contained within the region, which is determined by the fractal dimension: D ∼ −d . Therefore, the MSD of the region is Dt ∼ t 1−βd/2 , which yields Eq. (S45). Eq. (S45) then yields β = 1/2 for d = 2 and β = 2/5 for the crumpled chain (d = 3).
Zimm model: The Zimm model describes polymers that have a hydrodynamic self-interaction [11], meaning that the monomer beads do not only interact with their nearest neighbours through the springs linking them, but through long-ranged hydrodynamic interactions. Within the Zimm model, the dynamic exponent is independent of the fractal packing dimension d and is given by β = 2/3.

Relaxation time scaling:
To quantify the time-scale on which the cross-over between the two regimes (Eq. (S42) and Eq. (S44)) occurs, we focus on the relaxation time τ. The relaxation time can be estimated as the time taken for the two loci to diffuse their typical distance of separation [22]. In terms of the two-locus MSD, this time-scale is therefore given by the intersection of the diffusive part (Eq. (S44)) and the long-time plateau (Eq. (S42)): Based on this relationship, we can infer the relaxation times from the experimental two-locus MSD curves, as shown in detail in section 4.1. Furthermore, Eq. (S46) predicts the scaling of the relaxation time with genomic separation for different fractal dimensions for both the generalized Rouse and the Zimm models: We therefore obtain the classical relaxation time scaling for the ideal chain Rouse model: For a crumpled chain described by the generalized Rouse model, both the diffusion exponent and the packing exponent change, and we therefore obtain Thus, the relaxation time exhibits a weaker scaling with genomic separation in the crumpled chain model (exponent ≈ 1.66) than in the ideal chain model (exponent 2). These two exponents are shown as the theory predictions in main text Fig. 4.
We can make the same arguments based on the exponents of the Zimm model for ideal and crumpled chains. Thus, for the ideal chain Zimm model, For a crumpled chain described by the Zimm model, we obtain τ 2/3 ∼ s 2/3 → τ ∼ s 1 (S51) These various results are summarised in Table S6.  Two-locus autocorrelation function: Besides the two-locus MSD, a second quantity that quantifies the joint dynamics of two loci based on the interlocus distance vector is the two-locus autocorrelation, given by The quantities M 2 and C 2 contain the same information, since they are directly related to each other through the relationship [14] M 2 (t) = 2(C 2 (0) − C 2 (t)) (S53) However, the function C 2 gives access to the scaling exponents that characterize the approach to equilibration in the long time limit t τ, which are 'washed out' in the cross-over to the plateau in M 2 .
Specifically, for C 2 , the short time behaviour (t τ) is determined by the chain size: The asymptotic power-law behaviour of the two-locus correlation function in the limit t τ has been derived based within the generalized Rouse framework [22], predicting that it decays with a power-law whose exponents depend on the fractal dimension: which gives λ = 1/2 for ideal and λ = 4/5 for crumpled chains. Note that this derivation additionally assumes s N, where N is the total length of the chromosome. In our system (Drosophila chromosome 2L), N = 25 Mb, significantly larger than the typical genomic separations we probe (58 kb -3.3 Mb). Furthermore, we do not see any signatures of whole chain relaxation in the single-locus MSDs (Fig. S14), suggesting this assumption is valid. Thus, the two-locus autocorrelation can be used to quantify the approach to equilibration of the system, which is separate to the small-time scaling of the two-locus MSD, quantified by exponent β, which is determined by the independent diffusion of the loci.
Finally, we can use the two-locus autocorrelation as an additional way to infer the relaxation times by collapsing the autocorrelations onto a single master curve. Specifically, we expect the correlations to collapse onto the mastercurve [22] C 2 (t) ∼ s −2/d C 2 (ts −γ ) where γ is the exponent of the relaxation time, τ ∼ s γ . Thus, by quantifying the collapse of C 2 for varying s, we can identify the scaling exponent γ (see section 4.2).

Comparison to mESC data
To compare our quantitative results to previous measurements of live imaging of pairs of DNA loci, we make a side-by-side comparison of our data set with that in ref. [14], where CTCF sites in the Fbn2 locus imaged in mouse embryonic stem cells (mESCs) were tagged with a genomic separation of 515 kb between fluorescent tags (C36 line). We compare our closes genomic separation to this data (595 kb), and find that both average distances and diffusive time-scales are much higher in our system. Specifically, while the average distance for mESCs is R ≈ 0.35µm, we find R ≈ 1.05µm, a factor 3 larger. Furthermore, the two-locus diffusivity Γ 2 = 0.0024µm 2 s −1/2 for mESCs, and Γ 2 = 0.045µm 2 s −1/2 in our system, almost 20-fold larger. As a result, despite the larger physical distance between the two loci, the relaxation time is almost an order of magnitude shorter than observed in ref. [14]. a b Figure S23: Two-locus analysis compared to data from ref. [14]. a, Probability distributions of the inter-locus distances R ij . Blue: our data for s = 595 kb (all states), orange: data of CTCF sites in the Fbn2 locus imaged in mouse embryonic stem cells (mESCs), with a genomic separation of 515 kb (C36 wildtype line) [14].

Data and materials availability
Code for this paper is available at https://github.com/dbrueckner/DynamicChromosome. The raw trajectory data is available at: https://doi.org/10.5281/zenodo.7616203. Additional code and software used in this paper are listed in Table S7.

Movie description
Supplementary Movies S1-S7 Three-color live imaging of two loci on the second Drosophila chromosome separated by increasing genomic distances of {57.8, 81.7, 87.5, 148.7, 189.6, 595.1, 3327.7} kb. The embryo is 2 h old. Movie starts from 27 min in cell cycle 14 and ends at 55 min. Blue spots mark eve-MS2 transcription as well as the physical position of the eve locus and eve enhancers. Green spots are from the parS sequence, which marks the location of the reporter construct. Red spots mark transcriptional activity from the PP7 reporter. Scale bar: 5 µm.