## Abstract

During cell division, the duplication of the genome starts at multiple positions called replication origins. Origin firing requires the interaction of rate-limiting factors with potential origins during the S(ynthesis)-phase of the cell cycle. Origins fire as synchronous clusters is proposed to be regulated by the intra-S checkpoint. By modelling either the unchallenged or the checkpoint-inhibited replication pattern of single DNA molecules from *Xenopus* sperm chromatin replicated in egg extracts, we demonstrate that the quantitative modelling of data require: 1) a segmentation of the genome into regions of low and high probability of origin firing; 2) that regions with high probability of origin firing escape intra-S checkpoint regulation; 3) that the intra-S checkpoint controls the firing of replication origins in regions with low probability of firing. This model implies that the intra-S checkpoint is not the main regulator of origin clustering. The minimal nature of the proposed model foresees its use to analyse data from other eukaryotic organisms.

## Introduction

Eukaryotic genomes are duplicated in a limited time during the S phase of each cell cycle. Replication starts at multiple origins that are activated (fired) at different times in S phase to establish two diverging replication forks that progress along and duplicate the DNA at fairly constant speed until they meet with converging forks originated from flanking origins (** DePamphilis and Bell, 2010**;

**). The mechanisms that regulate the origin firing timing remain largely unknown (**

*Machida et al., 2005***;**

*Raghuraman, 2001***;**

*Heichinger et al., 2006***;**

*Eshaghi et al., 2007***;**

*Baker et al., 2012***;**

*Audit et al., 2013***). The core motor component of the replicative helicase, the MCM2-7 complex, is loaded on chromatin from late mitosis until the end of G1 phase as an inactive head-to-head double hexamer (DH) to form a large excess of potential origins (**

*Rhind and Gilbert, 2013***). During S phase, only a fraction of the MCM2-7 DHs are activated to form a pair of active Cdc45-MCM2-7-GINS (CMG) helicases and establish bidirectional replisomes (**

*DePamphilis et al. 2006 Ticau et al., 2015***). MCM2-7 DHs that fail to fire are inactivated by forks emanating from neighboring fired origins (**

*DePamphilis and Bell, 2010***). Origin firing requires S-phase cyclin-dependent kinase (CDK) and Dbf4-dependent kinase (DDK) activities as well as the CDK targets Sld2 and Sld3 and the replisome-maturation scaffolds Dpb11 and Sld7 in**

*Blow et al., 2011**S. cerevisiae*. The six initiation factors Sld2, Sld3, Dpb11, Dbf4, Sld7 and Cdc45 are expressed at concentrations significantly lower than the MCM complex and core replisome components, suggesting that they may be rate-limiting for origin firing (

**). Among these six factors, Cdc45 is the only one to travel with the replication fork.**

*Mantiero et al. 2011 Tanaka et al. 2011*DNA replication initiates without sequence specificity in *Xenopus* eggs (** Harland and Laskey, 1980 Méchali and Kearsey, 1984**), egg extracts (

**) and early embryos (**

*Mahbubani et al., 1992 Hyrien and Méchali, 1992 Carli et al., 2016, 2018***) (for review see**

*Hyrien and Méchali, 1993 Hyrien et al., 1995***). To understand how a lack of preferred sequences for replication initiation is compatible with a precise S-phase completion time, investigators have studied replication at the single DNA molecule level using the DNA combing technique (**

*Hyrien et al., (2003)***). In contrast to population based approaches (which average replication characteristics, this technique reveals cell-to-cell differences in origin activation important for understanding how genomes are replicated during S-phase) these experiments did not detect a regular spacing of initiation events but revealed that origin firing rate strongly increases from early to late replication intermediates, speeding up late replication stages (**

*Lucas et al., 2000 Herrick et al., 2000 Blow et al., 2001 Marheineke and Hyrien, 2001, 2004***). An observation that has been also confirmed for many other model organisms, including human cell lines (**

*Lucas et al., 2000 Herrick et al., 2000***).**

*Goldar et al., 2009*Mathematical modelling based on the assumption (mean-field hypothesis) that the probability of firing of each replication origin can be replaced by the averaged probability of firing calculated over all degree of freedom of origin firing process (MCM2-7 DH density, genomic position, chromatin compaction, nucleosome density, etc) and agremented with the assumption of independent origins and a constant fork speed, allowed the extraction of a time-dependent rate of replication initiation, *I* (*t*), from the measured eye lengths, gap lengths and eye-to-eye distances on combed DNA molecules (** Herrick et al., 2002**). The extracted

*I*(

*t*) markedly increased during S phase. Simulations incorporating this extracted

*I*(

*t*) reproduced the mean eye length, gap length and eye-to-eye distance, but the experimental eye-to-eye distance distribution appeared “peakier” than the simulated one (

**). Modulating origin firing propensity by the probability to form loops between forks and nearby potential origins resulted in a better fit to the data without affecting**

*Hyrien et al., 2003 Jun et al., 2004**I*(

*t*) (

**).**

*Jun et al., 2004*Importantly, experiments revealed that in *Xenopus*, like in other eukaryotes, replication eyes are not homogeneously distributed over the genome but tend to cluster (** Blow et al., 2001 Marheineke and Hyrien, 2004**). First, a weak correlation between the sizes of neighbouring eyes was observed (

**), consistent with firing time correlations. Second, more molecules with no or multiple eyes than expected for spatially uniform initiation were observed in replicating DNA (**

*Blow et al., 2001 Marheineke and Hyrien, 2004 Jun et al. 2004***). There are two potential, non-exclusive mechanisms for these spatiotemporal correlations. The first one, compatible with a mean-field hypothesis, is that activation of an origin stimulates nearby origins. The second one, no longer consistent with a mean-field hypothesis, is that the genome is segmented into multi-origin domains that replicate at different times in S phase. This second hypothesis has been explored numerically in human and has been shown to be compatible with the universal bell shaped**

*Marheineke and Hyrien, 2004**I*(

*t*) profile (

**).**

*Gindin et al., 2014*Interestingly, experiments in *Xenopus* egg extracts revealed that intranuclear replication foci labelled early in one S phase colocalized with those labelled early in the next S phase, whereas the two labels did not coincide at the level of origins or origin clusters were examined (** Labit et al., 2008**). Given the different characteristic sizes of timing domains (1-5 Mb) and origin clusters (50-100 kb) in the

*Xenopus*system, it is possible that origin correlations reflect both a programmed replication timing of large domains and a more local origin cross-talk within domains.

It is now well accepted that the intra-S phase checkpoint regulates origin firing during both unperturbed and artificially perturbed S phase (** Marheineke and Hyrien 2004 Ge and Blow, 2010 Guo et al., 2015 Platel et al., 2015 Forey et al., 2020**). DNA replication stress, through the activation of the S-phase checkpoint kinase Rad53, can inhibit origin firing by phosphorylating and inhibiting Sld3 and Dbf4 (

**). The metazoan functional analogue of Rad53 is Chk1. Experiments in human cells under low replication stress conditions showed that Chk1 inhibits the activation of new replication factories while allowing origin firing to continue within active factories (**

*Zegerman and Diley, 2010***). Experiments using**

*Ge and Blow, 2010**Xenopus*egg extracts suggested that the checkpoint mainly adjusts the rate of DNA synthesis by staggering the firing time of origin clusters (

**). Recently, we showed that even during an unperturbed S phase in**

*Marheineke and Hyrien, 2004**Xenopus*egg extracts, Chk1 inhibits origin firing away from but not near active forks (

**). We used our initial model for DNA replication in**

*Platel et al., 2015**Xenopus*egg extracts (

**) (which combined time-dependent changes in the availability of a limiting replication factor, and a fork-density dependent affnity of this factor for potential origins) to model the regulation of DNA replication by the intra-S checkpoint. To account for the regulation of DNA replication by the intra-S checkpoint, we replaced the dependency of origin firing on fork density by a Chk1-dependent global inhibition of origin firing with local attenuation close to active forks as was proposed in other contexts (**

*Goldar et al., 2008***). This model was able to simultaneously fit the**

*Trenz et al., 2008 Dimitrova and Gilbert, 2000 Thomson et al., 2010 Ge and Blow, 2010**I*(

*f*) (the rate of origin firing expressed as a function of each molecule’s replicated fraction

*f*) of a control and a UCN-01-inhibited Chk1 replication experiment (

**). However, in that work we did not push further the analysis to verify if our model was able to explain simultaneously**

*Platel et al., 2015**I*(

*f*) (temporal program) and the eye-to-eye distance distribution (spatial program).

In the present work, using numerical simulations, we quantitatively analyse both the temporal and spatial characteristics of genome replication as measured by DNA combing in the in vitro *Xenopus* system. The use of *Xenopus* egg extracts has been proven to study DNA replication in metazoans (** Hoogenboom et al., 2017**). Rooted on experimental data, we build a general and minimal model of DNA replication able to predict its temporal and spatial characteristics either during an unchallenged or a challenged S phase. By analysing the spatio-temporal pattern of DNA replication under intra-S checkpoint inhibition and comparing it to an unchallenged pattern we disentangle the complex role of the intra-S checkpoint for replication origin firing.

## Results

### Finding the best integrative model of unperturbed S phase

Our previous model (** Platel et al. 2015**) failed to simultaneously reproduce the eye-to-eye distance distribution and the

*I*(

*f*) of the same control experiment (

**a and b). This discrepancy could be explained if initiation events have a strong tendency to cluster (**

*Figure 1***). Clustering produces an excess of small (intra-cluster) and large (inter-cluster) eye-to-eye distances compared to random initiations, but only the former could be detected on single DNA molecules due to finite length (**

*Blow et al., 2001 Marheineke and Hyrien, 2004***). Chk1 action has been proposed to regulate origins clusters (**

*Marheineke and Hyrien 2004***). However, Chk1 inhibition by UCN-01 did not result in the broader eye-to-eye distribution predicted by random origin firing (**

*Ge and Blow, 2010***c and d), suggesting that other mechanisms than intra-S checkpoint are involved in the origin clustering.**

*Figure 1*We therefore explored the ability of several nested models with growing complexity (designated MM1 to MM4) (Appendix 1). MM1 corresponds to a mean field hypothesis of origin firing : all potential origins have a constant firing probability *P*_{out} (** Goldar et al., 2008 Gauthier and Bechhoefer, 2009**). MM2 corresponds to MM1 with a local perturbation, whereby the proximity of forks facilitates origin firing (

**) over a distance**

*Jun et al., 2004 Löb et al., 2016**d*downstream of an active fork where the probability of origin firing is

*P*

_{local}. In MM3 origin firing does not follow mean field hypothesis but assumes that the genome can be segmented into regions of high and low probabilities of origin firing (

**) as accepted for most eukaryotes (**

*Gindin et al., 2014 Löb et al., 2016***). In this scenario, the probability of origin firing of potential origins located within a fraction**

*McCune et al., 2008 Yang et al., 2010 Rhind and Gilbert, 2013 Boulos et al., 2015 Das et al., 2015 Petryk et al. 2016 Siefert et al., 2017**θ*of the genome,

*P*

_{in}, is assumed to be higher than the firing probability

*P*

_{out}of potential origins in the complementary fraction 1 −

*θ*. Lastly, MM4 combines the specific features of MM2 and MM3 into a single model. Furthermore, to verify if the localized nature of potential origins (

**;**

*Yang et al., 2010***) can influence the spatio-temporal program of origin firing, each considered scenario was simulated assuming either a continuous or a discrete distribution of potential origins.**

*Arbona et al., 2018*For each model, we coupled dynamic Monte Carlo numerical simulations to a genetic optimization algorithm to find the family of variables that maximized the similarity between the simulated and measured profiles of *I* (*f*), replicated fraction of single molecules, global fork density, eye-to-eye distances, gap lengths and eye lengths. MM4 with localized potential origins (** Figure 2**) provided the best fit to the experimental data (Appendix 1,

**). The increase in concordance between MM4 and the data occurs at the expense of increasing the number of parameters, which is justifiable on statistical grounds (Appendix 1,**

*Figure 8***).**

*Table 2*### Verifying the predictive ability of MM4 model

The real DNA replication process is far more complex than any of the above models. To explore how accurately MM4 can map a more complex process, we built, based on replication process in other eukaryotes (** McCune et al., 2008**;

**;**

*Yang et al., 2010***;**

*Rhind and Gilbert, 2013***;**

*Boulos et al., 2015***;**

*Das et al., 2015***;**

*Petryk et al., 2016***) and our previous model (**

*Siefert et al., 2017***), a more elaborate model (MM5, Appendix 2) to generate**

*Platel et al., 2015**in silico*data with 8% 19 % and 53% global replicated fractions. MM5 assumes that the replication pattern of the genome is reproduced by the coexistence between regions with low probability of origin firing and localised domains with higher probability of origin firing, furthermore MM5 includes explicitly the effect of intra-S checkpoint through supplementary probabilities of origin firing inhibition. However, as during combing experiment the genome is broken randomly into smaller molecules the positional information of each combed single molecule is lost and therefore only genome averaged information can be extracted from a traditional combing experiment. We calculated the expected genome averaged values for each parameter of MM5 (Appendix 2, “Reduction of MM5 to MM4”). Each sample was then fitted with MM4 (Appendix 2

**and**

*Figure 1, Figure 2***) and we compared the extracted parameters with their expected values after reduction of MM5 to MM4 (**

*Figure 3***; Appendix 2,**

*Figure 3***).**

*Table 3*For each sample, the mean values of the inferred parameters were statistically similar to the input ones (Appendix 2, ** Table 3**) and none of the pairwise differences between the predicted parameters values for the 3 considered samples were statistically significant. This demonstrates that our fitting and comparison strategies do not introduce artifactual differences between parameters if their values do not change between different samples (Appendix 2

**). In conclusion, any variation in parameter value detected by MM4 when analysing samples at different time points independently can be considered as statistically significant. Therefore, MM4 can adequately model more complex DNA replication dynamics than itself using a reduced number of parameters.**

*Figure 4*### Retrieving the dynamics of an unchallenged S phase using the MM4 model

MM4 faithfully reproduced the temporal and spatial program of DNA replication from unperturbed S phase samples with global replicated fractions of 8% 19% and 53% (Appendix 1, ** Figure 8**; Appendix 3,

**and**

*Figure 1***). The fitted values of parameters changed as S phase progressed (**

*Figure 2***). However, only changes in**

*Figure 4**J, θ, P*

_{out}and

*d*were statistically significant (Appendix 3

**). In particular we found that**

*Figure 3**J*increased from 8% to 19% replication and then drop back at 53% replication.

*θ*and

*P*

_{out}increased only from 8% to 19% replication but not later, while

*d*stayed constant between 8% and 19% replication and decreased at 53% replication.

These observations suggest that during an unchallenged S phase both the fraction (*θ*) of the genome with high probability of origin firing and the background probability (*P*_{out}) of origin firing outside that fraction increase as S phase progresses. Interestingly, *P*_{local} is higher than *P*_{in} and *P*_{out}, suggesting that firing of an potential origin significantly favours the firing of nearby potential origins over a distance *d*, compatible with a chromatin looping process (** Löb et al., 2016**). This fork-related firing process is consistent with the observation that nearby origins tend to fire at similar times, which has been proposed to result from a different regulation of nearby and distant origins by Chk1 (

**;**

*Ge and Blow, 2010***).**

*Platel et al., 2015*### Modeling DNA replication under Chk1 inhibition

To decipher the regulation of origin firing by Chk1, we examined if the MM4 model could also reproduce the replication program observed when the intra-S phase checkpoint was inhibited by the specific Chk1 inhibitor UCN-01. We analyzed combed fibres from a replicated sample in the presence of UCN-01 (replicated fraction 22%) that had spent the same interval of time in S phase as the control sample (global replicated fraction of 8%). The MM4 model reproduced the experimental observations very well (Appendix 3, ** Figure 4**,

*GoF*

_{global}= 0.85). The three parameters

*J, θ*, and the

*P*

_{out}were significantly higher in the UCN-01 treated sample than in the control samples with either the same harvesting time or a similar replicated fraction (22% and 19%, respectively) (

**and Appendix 3**

*Figure 5***). The other parameters were unchanged compared to both control samples. These results suggest that upon Chk1 inhibition (i) a fraction**

*Figure 5**θ*of the genome, where initiation probability is high, increases during S phase; (ii) the probability of origin firing is insensitive to Chk1 within this fraction (

*P*

_{in}is unaltered) but is increased in the rest of the genome (

*P*

_{out}is increased); (iii) the import/activation rate of the limiting factor,

*J*, is increased, while the starting number of factors,

*N*

_{0}, is unaffected. As was expected, MM4 detected that Chk1 inhibition by UCN-01 increased origin firing (

**;**

*Platel et al., 2015***;**

*Syljuasen et al., 2005***;**

*Guo et al. 2015 Michelena et al. 2019 Pommier and Kohn, 2003***).**

*Deneke et al. 2016*In conclusion, the level of active Chk1 appears to regulate the kinetics of S phase progression (i) by limiting the genome fraction that escapes its inhibitory action, (ii) by down regulating the probability of origin firing outside this fraction (** Syljuasen et al., 2005**;

**), and (iii) by controlling the import/activation rate of limiting firing factors (**

*Maya-Mendoza et al. 2007 Guo et al. 2015 Michelena et al., 2019***.,**

*Guo et al***). However, no significant differences in the strength of origin regulation by nearby forks (**

*2015**P*

_{local}) was observed after Chk1 inhibition, suggesting that this local action is not mediated by Chk1 (

**;**

*Trenz et al., 2008***).**

*Ge and Blow 2010*## Discussion

We explored several biologically plausible scenarios to understand the spatio-temporal organization of replication origin firing in *Xenopus* egg extracts. We used a quantitative approach to objectively discriminate which model best reproduced the genomic distributions of replication tracks as analyzed by DNA combing at different stages of S phase. We found that model MM4 with discrete potential origins best reproduced the experimental data with a minimal number of adjustable parameters. This model combines five assumptions (** Herrick et al. 2002 Goldar et al. 2008 Gauthier and Bechhoefer 2009 Blow and Ge 2009 Sekedat et al. 2010 Yang et al. 2010 Platel** et al.

*): 1) origin firing is stochastic*

**2015 Löb et al. 2016 Gindin et al. 2014 Arbona et al. 2018***) the availability of a rate-limiting firing factor captures the essential dynamics of the complex network of molecular interactions required for origin firing*

**2***) the speed of replication forks is constant 4) origins fire in a domino-like fashion in the proximity of active forks (*

**3****); 5) the probability of origin firing is heterogeneous along the genome (**

*Guilbaud et al. 2011 Löb et al. 2016***).**

*Yang et al. 2010 Gindin et al. 2014*We used MM4 to model DNA combing data from *Xenopus* egg extracts in presence or absence of intra-S checkpoint inhibition. In both conditions, this model was able to match the experimental data in a satisfactory manner. Furthermore, the inferred parameters values indicated that the global probability of origin firing and the rate of activation/import of the limiting firing factor (*J*) were increased after Chk1 inhibition by UCN-01(** Pommier and Kohn 2003 Seiler et al. 2007 Guo et al. 2015**). Importantly, this model assumes a heterogeneous probability of origin firing and suggests that Chk1 exerts a global origin inhibitory action during unperturbed S phase (

**). On the other hand, the constancy of the initial number of limiting factors**

*Platel et al. 2015**N*

_{0}in the presence or absence of UCN-01 suggests that Chk1 does not actively control origins before S phase actually starts (

**). These observations indicate that MM4 can deliver a reliable, minimally complex picture of origin firing regulation in**

*Lupardus et al. 2002 Stokes et al. 2002 Forey et al. 2020**Xenopus*egg extracts.

### The global inhibition of origin firing by Chk1

We previously showed that Chk1 is active and limits the firing of some potential origins in an unperturbed S phase (** Platel et al. 2015**). Therefore, the earliest origins must be immune to Chk1 inhibition while later potential origins are strongly inhibited. The comparison between the modelling of Chk1 inhibition and of unperturbed S phase data suggests that i) the probability of origin firing is reduced by active Chk1 in a fraction 1 −

*θ*of the genome, ii) in this Chk1-sensitive fraction the probability of origin firing increases as S phase progresses and iii) the probability of origin firing is unaffected by Chk1 inhibition within the Chk1-immune,

*θ*fraction of the genome. Therefore, this model supports the idea that at the start of S phase, some origins fire unimpeded by Chk1, whereas others remain silent. The latter only becomes progressively relieved from Chk1 inhibition as S phase progresses. Indeed, recent works in cultured mammalian cells (

**),**

*Moiseeva et al. 2019**Drosophila*(

**) and**

*Deneke et al. 2016**Xenopus*(

**) showed that in unperturbed S phase the global origin firing inhibitory effect (by Chk1 and Rif1) is reduced as S phase progresses. Interestingly, a recent study in unperturbed yeast cells suggests that dNTPs are limiting at the entry into S phase, so that, similar to**

*Krasinska et al. 2008**Xenopus*(

**), the firing of the earliest origins creates a replication stress that activates the Rad53 checkpoint which prevents further origin firing. Rad53 activation also stimulates dNTP synthesis, which in turn down regulates the checkpoint and allows later origin firing (**

*Zou 2007***). However, it remains uncertain if this feed-back loop does also exist in**

*Forey et al. 2020**Xenopus*egg extracts which contain an abundant pool of dNTPs.

A key mechanism of our model is the enhancement of origin firing close to active forks. The necessity to introduce this mechanism supports the domino-like view of DNA replication progression (** Guilbaud et al. 2011 Löb et al. 2016**). It was previously shown in

*Xenopus*egg extracts that the probability of origin firing could depend on the distance between left and right approaching forks (

**). While this could in principle reflect an origin firing exclusion zone ahead of forks (**

*Jun et al. 2004***), our model did not allow for a negative**

*Lucas et al. 2000 Löb et al. 2016**P*

_{local}. Other proposed mechanisms for origin clustering include the relief of Chk1 inhibition ahead of active forks by checkpoint recovery kinase polo like kinase 1 (Plk1) (

**).**

*Trenz et al. 2008 Platel et al. 2015*However, we find that the range, *d*, and the strength, *P*_{local}, of origin stimulation by nearby forks, were both insensitive to checkpoint inhibition (** Figure 5** a and b). Other potential mechanisms such as propagation of a supercoiling wave ahead of forks may better explain this insensitivity to Chk1 inhibition (

**).**

*Achar et al. 2020*### Heterogeneous probability of origin firing

In this model, the origin firing process in *Xenopus* egg extracts is not fiably described by a mean-field approximation. In other words, the probability of origin firing is heterogeneous along the genome. Based on this hypothesis, one important outcome of our study is that the genome can be segmented into domains where origin firing probability is either high and immune to Chk1 inhibition or subjected to a tight Chk1 control that attenuates as S phase progresses. This picture challenges the common view that the embryonic *Xenopus in vitro* system would lack the temporal regulation by the intra-S checkpoint at the level of large chromatin domains in contrast to findings in somatic vertebrate cells where Chk1 controls cluster or replication foci activation (** Maya-Mendoza et al. 2007**). However, observations of replicating nuclei in

*Xenopus*system have shown that early replication foci are conserved in successive replication cycles, supporting the heterogeneous domain hypothesis (

**). Furthermore, we found that the fraction of the genome covered by these domains increases and that the inhibitory action of Chk1 decreases over time during an unperturbed S phase (**

*Labit et al. 2008***and**

*Figure 4***), consistent with the idea that as S phase progresses more regions of the genome evade the checkpoint inhibition of origins. By comparing samples that have spent the same time interval in S phase or that have reached the same replicated fraction in the absence and presence of UCN-01 (**

*Figure 5***), we noticed that the probability of origin firing in the Chk1-immune domains (**

*Figure 5**P*

_{in}) did not change upon Chk1 inhibition. This further suggests that these domains escape actually the regulation of origin firing by Chk1 that rules the rest of the genome.

All together the results of our modelling approach and the existing literature suggest that in the *Xenopus* system the position of early replicating, Chk1-immune domains is conserved in individual nucleus. However, there is no experimental or numerical evidence that the positions of these domains are conserved in a population of nuclei. Assuming that the position of these domains changes randomly from one nucleus to another would result in a flat mean replication timing pattern and involves that each nucleus has its specific replication regulation process. While we cannot reject such a hypothesis objectively, the recent report of a structured replication timing program in zebrafish early embryos (** Siefert et al. 2017**) encourage us to assume that in

*Xenopus*early embryos the position of early replication domaines are conserved from one nucleus to an other. Thus, we propose that the mean replication timing pattern of

*Xenopus*sperm nuclei in egg extracts is not flat but is structured similarly to other eukaryotic systems (

**).**

*Baker et al. 2012 Rhind and Gilbert 2013 Boulos et al. 2015*The generality of assumptions and conclusions of our model suggest that it can be used to analyze the dynamics of S phase and its regulation by the intra-S phase checkpoint in other organisms.

## Methods and Materials

### Monte Carlo simulation of DNA replication process

A dynamical Monte Carlo method was used to simulate the DNA replication process as before (** Goldar et al. 2008**). We simulate the replicating genome as a one-dimensional lattice of

*L*= 10

^{6}blocks of value 1 for replicated and 0 for unreplicated state, respectively. To match the spatial resolution of DNA combing experiments each block represents 1kb. After one round of calculation an existing replication track grows in a symmetric manner by 2 blocks. Considering that the fork speed

*v*= 0.5

*kb*.

*min*

^{−1}is constant, one round of calculation corresponds to 2 minutes. In the continuous case we assume that the potential replication origins are continuously distributed on the genome with an average density of one potential origin per 1kb (1 block); in the discrete case we assume that potential origins are randomly distributed along the genome with an average density of one potential origin per 2.3 kb (

**). In both cases origins fire stochastically. Origin firing requires an encounter with a trans-acting factor which number**

*Edwards et al. 2002**N*(

*t*) increases as S phase progresses with a rate

*J, N*(

*t*) =

*N*

_{0}+

*Jt*. If an encounter produces an origin firing, the trans-acting factor is sequestrated by replication forks and hence the number of available trans-acting factor is

*N*

_{f}(

*t*) =

*N*(

*t*) −

*N*

_{b}(

*t*), where

*N*

_{b}(

*t*) is the number of bound factors. To ensure that origins do not re-fire during one cycle and are inactivated upon passive replication, only “0” blocks are able to fire. At each round of calculus, each block is randomly assigned 2 independent values between 0 and 1. The first one is compared to

*θ*to decide whether the block belongs to the

*θ*or 1 −

*θ*fraction of the genome. The second one to

*P*

_{in}or

*P*

_{out}, respectively, to decide whether the block may fire. In total,

*M*“0” blocks (

*M*≤

*L*) with value strictly smaller than their reference probability may fire. If

*M*≤

*N f*(

*t*) all

*M*blocks may fire, otherwise

*N*(

*t*) blocks may fire. Furthermore in MM2 and MM4, we consider that the probability of origin firing

*P*

_{local}may be increased downstream of a replication fork over a distance

*d*

_{fork}. The trans-acting factors sequestered by forks are released and are made available for new initiation events when forks meet.

### Measuring: the replicated fraction *f* (*t*), the rate of origin firing *I* (*t*), fork density *N*_{fork} (*t*), eye-to-eye, eye and gap length distributions

The genome is represented as an one-dimensional lattice of 10^{6} elements *x*_{i} ∈ {0,1}. At each round of calculation the replicated fraction is calculated as *f* (*t*) = ⟨*x*⟩ _{i} corresponding to the average value of *x*_{i} over the genome.

The rate of origin firing per length of unreplicated genome per time unit (3 min) is calculated at each round of calculation, by counting the number of newly created “1” blocks, *N*_{1} and where Δ *t* = 3 *min* and *L* = 10^{6}. The density of replication forks is calculated at each round of calculation by counting the number of “01” tracks, *N*_{left}, and “10” tracks, *N*_{right} and . The distributions of eye-to-eye distances, eye lengths and unreplicated gap sizes are then computed from the distribution of “0” and “1” tracks after reshaping the data (see below).

### Comparing experimental and numerical data

The simulation results were compared to the DNA combing data from Platel *et al*. (** Platel et al. 2015**). The fluorescence intensities for total DNA and replicated tracks of each fiber were measured and binarized on a Matlab

^{®}platform by using a thresholding algorithm. The threshold value was chosen to minimize the difference between the replicated fraction measured by

*α*32P-dATP incorporation and by DNA combing. Replicated tracks larger than 1kb were scored as eyes. Gaps were considered significant if

*>*1kb, otherwise the two adjacent eyes were merged. The eyes whose lengths span from 1 to 3 kb were considered as new origin firing events. The time interval in which these new detectable events can occur was calculated as Δ

*t*= 3

*min*assuming a constant replication fork velocity of

*v ≈*0.5

*kb*.

*min*

^{−1}. This data reshaping protocol was also applied to simulated DNA molecules, in order to match the spatial and temporal resolutions between the experimental and simulated data. The global replicated fraction of each sample was computed as the sum of all eye lengths divided by the sum of all molecule lengths. To minimize finite molecule length effects in comparisons between data and simulations, the experimental molecule length distribution was normalised and considered as probability density of molecule length in the sample and used to weight the random shredding of the simulated genome at each time (

**). The global replication fraction of simulated cut molecules was calculated. Only molecules from the simulation time that had the same global replication fraction as the experimental sample were further considered.**

*Figure 6*Molecules were sorted by replicated fraction *f* (*t*). The rate of origin firing and fork density were calculated for each molecule as a function of *f* (*t*) (*I* (*f*) and *N*_{fork} (*f*), respectively) for both simulated and experimental data. The experimental *I* (*f*), *N*_{fork} (*f*), eye-to-eye distances, eye and gap length distributions were computed as the averaged value of three independent experiments.

### Modeling experimental data: parameters optimization

To estimate the parameters of the model, we fitted the six experimental observables (*I* (*f*), *N*_{fork} (*f*), replicated fiber, eye-to-eye distances, eye and gap length distribution) using a genetic optimization algorithm (Matlab ^{®}). The fitness function was defined as the sum of the square of the differences between experimental and simulated data curves divided by the squared mean of the experimental data curve. The genetic optimization algorithm was set over three subpopulations of 20 individuals with a migration fraction of 0.1 and a migration interval of 5 steps. Each individual defined a set of variables for the simulation and the variables were chosen within the range reported in ** Table 1** for the model that best fit the data. At each generation,3 elite children were selected for the next generation. The rest of the population corresponds to a mixture between 60% of children obtained after a scattered crossover between two individuals selected by roulette wheel selection and 40% of children obtained by uniform mutation with a probability of 0.2, leading to a variability of 8%. The genetic algorithm was stopped after 50 generations corresponding to the convergence of the optimization method. As the size of variable space is unknown, we considered a large domain of validity for the variables. This has as an effect to reduce the probability that the optimization process reaches a unique global minimum. For this reason we repeat the genetic optimization method 100 times independently over each data set and consider for each optimization round only the best elite individual.

## Acknowledgments

This article is dedicated to the memory of our colleague and friend Alain Arneodo, who passed away during its elaboration and writing. The authors acknowledge Alain’s enthusiasm and constant support. This work was supported by the Fondation pour la Recherche Médicale [FRM DEI201512344404], the Centre National de Recherche Scientifique (CNRS), the department of genome biology of I2BC, by a PhD fellowship of IdEX Paris-Saclay university, the Commissariat à l’énergie atomique (CEA) and the Cancéropole Ile-de-France [PLBIO16-302].

## Appendix 1

### Different models

To model experimental observations a series of nested models were compared with experimental data. Below are the fits of each model to experimental sample with 8% global replicated fraction. To assess the goodness of the fit (GoF) we considered the normalised mean square error between the simulated profile and the fitted entity as the indicator of likelihood . GoF costs vary between −∞ (bad fit) to 1 (perfect fit). If *GoF* = 0, *y*_{fit} is no better than a straight line at matching experimental data. The global cost is calculated as where *i* represents one fitted entity. All models reproduce with the same accuracy the distribution of replicated fibres, gaps lengths and eyes lengths distributions. The major contributions to score values come from residuals of average fork density, average *I* (*f*) and eye-to-eye distances distribution fits. From the value of *GoF*_{global} (Appendix1, ** Table 1**), the model that best described the whole data set is the MM4 with localized distribution of potential origins: its

*GoF*

_{global}value is closest to one. However, MM4 also has the highest number of fitting variables (7) compared to other models (MM1 has 3 fitting variables, MM2 and MM3 have 5 fitting variables), and facilitating fit to the data.

### Models comparison

To address whether the better data fit with MM4 is solely due to the higher degree of complexity of the model, we used two different approaches : a traditional statistical hypothesis testing: the extra sum of squares F test (** Bevington and Robinson, 2003**) and the Akaike’s criterion (

*AIC*) that is based on information theory (

**). We can objectively reject MM1 as it did not reproduce in a satisfactory manner the averaged fork density,**

*Ljung, 1998**I*(

*f*) and eye-to-eye distances distributions (Appendix 1,

**and**

*Figure 1***). MM2 and MM3 satisfactorily reproduced all measured quantities (Appendix 1,**

*Figure 5***,**

*Figure 2***and**

*Figure 3, Figure 6***) but with lower**

*Figure 7**GoF*

_{global}value than the MM4 models (Appandix1,

**). The discrete MM4 model has higher**

*Table 1**GoF*

_{global}value than the continuous one, whereas the continuous MM2 and MM3 models were better than or equal to their discrete version, respectively (Appandix1,

**). To choose the best model, we compared the discrete MM4 model, continuous MM2, MM3 and MM4 corresponding to fits with highest**

*Table 1**GoF*

_{global}values (Appendix1,

**). Comparing the discrete MM4 with the continuous MM2, MM3 and MM4 models led in all cases to**

*Table 1**F >*1 with p-values

*p <*10

^{−6}and negative

*AIC*values (Appendix1,

**). The discrete MM4 model is therefore the best model and the observed increase in**

*Table 2**GoF*

_{global}does not reflect an overfitting of the data.

## Appendix 2

### The MM5 model used to generate the *in silico* data

In MM5, localized potential origins were distributed with a uniform density *p* = 1 *kb*^{−1} and *N*_{dom} domains of size *l*_{dom} were randomly positioned along a genome of length *L* = 10^{5} *kb*. As in previous works, we assumed that at the start to S phase *N*_{0} limiting factors were available for origin firing and their number, *N* (*t*), increased during the cours of S phase as *N* (*t*) = *N*_{0} + *Jt*, and that each factor was sequestrated by new forks upon origin activation and released and made available again for origin firing upon coalescence of converging forks. Forks progressed at a constant velocity *v* = 0.5 *kb*.*min*^{−1}. The probability of origin firing by encounter with a limiting factor was higher inside the domains (*P*_{0} + *P*_{dom}) than outside them (*P*_{0}). In addition, origins outside but not inside the domains had a non-null probability *P*_{inhib} of being inhibited. Two local effects were allowed to act within a distance *d*_{fork} from active forks: *P*_{0} was enhanced by *P*_{fork} and origin inhibition was relieved with a probability *P*_{deinhib}. We simulated 300 complete S phases using the 10 parameter values listed in Appendix 2, ** Table 1**, and extracted snapshots at 8%19 % and 53% global replicated fractions. Each snapshot was considered as an independent sample and for each of them: i) the genome was randomly cut following the molecule length distribution presented in

**, ii) the data were reshaped as described in material and methods to account for the finite experimental resolution and iii) the distributions of**

*Figure 6**I*(

*f*), replicated fraction of single fibres, global fork density, eye-to-eye distances, gap lengths and eye lengths were determined.

### Fitting the *in silico* data by MM4 model

By independently fitting the simulated profiles of each global replicated fraction, we implicitly assume that samples could originate from separated experiments, hence MM4 parameters values are possibly different for each global replicated fraction. This allows us to accurately reproduce observations from each sample (Appendix2 ** Figure 1, Figure 2** and

**).**

*Figure 3*### Reduction of MM5 to MM4

In the MM5 model origins fire globally with two origin firing probabilities (*P*_{0} and *P*_{0} + *P*_{dom}) eventually increased by a local origin firing probability (*P*_{fork}) close to an active fork, and the genome is divided into domains that either support or escape some inhibitory probability of firing (assumed to represent inhibition by the intra-S checkpoint). As the position of these domains is not identical between repeated simulations, we can reduce their description by specifying a fraction of the genome where origins escape checkpoint inhibition. In these domains, the global origin firing probability , with the pre-factor being due to normalization considerations. The local probability of origin firing (close to a fork) inside a domain is . Outside these domains, the global probability of origin firing is modulated by the probability of origin inhibition . In the same manner the local probability of origin firing is modulated by the action of intra-S checkpoint and the local cancellation of inhibition process . Local probabilities of origin firing only influence origins over a distance *d*_{fork} downstream of a fork. The MM4 model contains a unique local probability of origin firing, that corresponds to the average value of the two local probabilities of origin firing, . Therefore, by considering the essential ingredients of the MM5 model, we combined the parameters of the model to retrieve the parameters of MM4 (** Table 2**).

The values of these parameters can be compared directly to parameters of MM4 model obtained from the fitting of the simulated data for each sample (** Table 3**). To assess if the difference between the expected and the inferred value of a parameter is statistically significant we calculate , for

*t ≥*1 the difference is statistically significant otherwise it is not. The values of parameters changed as the global replicated fraction increased (Appendix 2,

**). To assess the level of significance of these variations we calculated coeffcient between the values of the same parameter obtained for different global replicated fraction. If**

*Table 3**x*

^{2}

*<*1 the difference between the two values was not statistically significant otherwise it was significant. Appendix 2,

**shows that the differences of predicted parameters values among the 3 considered samples were not statistically significant, as was expected.**

*Figure 4*All *t <* 1 and *x*^{2} *<* 1 (Appendix 2, ** Figure 4**), meaning the constancy of parameters values for all three samples. Therefore, we conclude that the optimization procedure was able to circumscribe the expected parameters values in an accurate manner for each sample. It should be noted that we choose a very conservative criterion to assess if two parameters are different or not. The conditions of

*x*

^{2}= 1 or

*t*= 1 are equivalent to a confidence level of

*a*= 10

^{−7}in the case of a two sided and one sided t statistics. In other words, with our criterion the probability to find that the values of two parameters are different by chance is smaller than 10

^{−7}.

The ability of the fitting procedure i) to circumscribe the values of MM4 model parameters close to the expected ones (Appendix 2, ** Table 3**) and ii) to retrieve the constancy of these parameter’s values as the global degree of replication increases (Appendix 2,

**) demonstrates the adequacy of our fitting strategy to recover the dynamic of DNA replication during S phase in the framework of MM4 model by setting the null hypothesis as : the values of MM4 parameters do not change as S phase progresses. Therefore, rejection of this hypothesis for a considered parameter means its variation during S phase.**

*Figure 4*## Appendix 3

### Fitting the experimental profiles by MM4 model : Unchallenged S phase

We fitted independently the measured profiles for each global replicated fraction by discrete MM4 model. The fits of observations from 8% global replicated fraction are presented in Appendix 1** Figure 8** and those of 19% and 53% are presented Appendix 3

**and**

*Figure 1***respectively. In Appendix 3**

*Figure 2***we give the value of the fitted parameters. The reliability of observed differences among inferred MM4 parameters are assessed statistically by using**

*Table 1**x*

^{2}coeffcient as defined in Appendix 2 (Appendix 3

**)**

*Figure 3*### Fitting the experimental profiles by MM4 model : Chk1 inhibited S phase

We fitted with the discrete MM4 model a sample that had spent in the presence of UCN-01 the same time interval in S phase as the control sample with 8% global replicated fraction. The global replicated fraction of the of the UCN-01 sample was 22%. The fits are presented in Appendix 3 ** Figure 4** and the obtained parameters values are given in Appendix 3

**. The reliability of observed differences among inferred MM4 parameters between controls and Chk1 inhibited sample are assessed statistically by using**

*Table 1**x*

^{2}coeffcient as defined in Appendix 2 (Appendix 3

**)**

*Figure 5*