The high-resolution in vivo measurement of replication fork velocity and pausing by lag-time analysis

An important step towards understanding the mechanistic basis of the central dogma is the quantitative characterization of the dynamics of nucleic-acid-bound molecular motors in the context of the living cell, where a crowded cytoplasm as well as competing and potentially antagonistic processes may significantly affect their rapidity and reliability. To capture these dynamics, we develop a novel method, lag-time analysis, for measuring in vivo dynamics. The approach uses exponential growth as the stopwatch to resolve dynamics in an asynchronous culture and therefore circumvents the difficulties and potential artifacts associated with synchronization or fluorescent labeling. Although lag-time analysis has the potential to be widely applicable to the quantitative analysis of in vivo dynamics, we focus on an important application: characterizing replication dynamics. To benchmark the approach, we analyze replication dynamics in three different species and a collection of mutants. We provide the first quantitative locus-specific measurements of fork velocity, in units of kb per second, as well as replisome-pause durations, some with the precision of seconds. The measured fork velocity is observed to be both locus and time dependent, even in wild-type cells. In addition to quantitatively characterizing known phenomena, we detect brief, locus-specific pauses at rDNA in wild-type cells for the first time. We also observe temporal fork velocity oscillations in three highly-divergent bacterial species. Lag-time analysis not only has great potential to offer new insights into replication, as demonstrated in the paper, but also has potential to provide quantitative insights into other important processes.


I. INTRODUCTION
At a single-molecule scale, all cellular processes are both highly stochastic as well as subject to a crowded cellular environment where they typically compete with a large number of potentially-antagonistic processes that share the same substrate [1,2]. In spite of these challenges, essential processes must be robust at a cellular scale to facilitate efficient cellular proliferation. Understanding how these processes are regulated to achieve robustness remains an important and outstanding biological question [3][4][5][6][7][8][9]. However, a central challenge in investigating these questions is the quantitative characterization of the activity of enzymes in the context of the living cell. For instance, although single-molecule assays can resolve the pausing of molecular motors on nucleic-acid substrates in the context of in vitro measurements [10,11], performing analogous measurements in the physiologically-relevant environment of the cell, where these processes are subject to antagonism, poses a severe challenge to the existing methodologies [12].
In this paper, we develop an approach, lag-time analysis, that facilitates the quantitative characterization of dynamics, with resolution of seconds, in the context of the living cell. The approach exploits exponential growth as the stopwatch to capture dynamics in exponentially proliferating cellular cultures [13] and unlike competing approaches, it can circumvent the difficulties and potential artifacts introduced by cell synchronization [14] or fluorescent labeling. Although in principle the approach is broadly applicable to all central dogma processes (i.e. replication, transcription, and translation) as well as other dynamics, we will focus on replication dynamics exclusively for concreteness.
In this paper, we demonstrate the first locus-specific genome-wide measurement of the fork velocity, in units of kilobases per second, and the duration of replisome pauses. It facilitates detailed comparisons to be made, not just between different loci in a single cell, but between wild-type and mutant cells as well as between bacterial species. We apply this approach to analyze three model bacterial systems: Bacillus subtilis, Vibrio cholerae, and Escherichia coli. In B. subtilis, we analyze transcription-induced replication antagonism which is the main determinant of replisome dynamics in a set of mutants with retrograde (reverse-oriented) fork motion. An analysis of V. cholerae provides evidence that fork number is an important determinant of fork velocity, but also provides clear evidence that fork velocity is time dependent. To explore this time-dependence, we analyze the fork-velocity in E. coli which provides strong evidence for temporally-oscillating fork velocity, consistent with a recent report [15]. Finally, we demonstrate that these oscillations are observed in all three organ- isms. In summary, the observed phenomena demonstrate the central importance of characterizing central dogma processes in the context of the living cell, where their activity is regulated and modulated by the cellular environment.

II. RESULTS
The bacterial cell cycle. The bacterial cell cycle is divided into three periods [16,17]: The B period is analogous to the G 1 phase of the eukaryotic cell cycle, corresponding to the period between cell birth and replication initiation. The C period is analogous to the S phase (and early M phase) in which the genome is replicated and simultaneously and sequentially segregated [18]. The D period is analogous to a combination of phases G 2 and late-M, corresponding to a period of time between replication termination and cell division, including the process of septation (i.e. cytokinesis). The demographics of cell-cycle periods of exponentially-growing bacterial cells were first quantitatively modeled by Cooper and Helmstetter in an influential paper [19] and then refined by multiple authors [20][21][22]. In the Methods Section, we generalize these models to demonstrate that the measured marker-frequency analysis quantitatively measures the cell-cycle replication dynamics. The key results are summarized below.
Lag-time analysis. Our strategy will be to use exponential growth as the stopwatch with which we resolve cell-cycle dynamics. In short, cells with greater cell-cycle progression (i.e. age) are depleted in the population, equivalent to an independent, exponentially-proliferating species that lags newborn cells by a time equal to its age [13]. (See Fig. 1 for a schematic illustration of the approach.) Lag-time analysis is the measurement of this time lag. In principle, this approach can be applied to characterize the dynamics of any biological molecules or complexes; however, for concreteness, we will focus on replication dynamics since the replication process is of great biological interest and next-generation sequencing provides a powerful tool for digital, as well as genome-wide, quantitation of the number of genomic loci.
In marker-frequency experiments, the number of each sequence N ( ) in a steady-state, asynchronously growing population is determined by mapping nextgeneration-sequencing reads to the reference genome. This marker frequency can be reinterpreted as a measurement of the lag time τ ( ): where N ( ) is the observed number of the locus at genomic position and N 0 is the observed number of the origin in the culture and k G is the growth rate. This relation can be understood as a consequence of the exponential growth law [13].
In a deterministically-timed model, the measured lag time would be equal to the replication time relative to initiation. In reality, the timing of all processes in the cell cycle is stochastic. We previously showed that the measured lag time is related to the distribution of durations in single cells by the exponential mean [13]: where E t is the expectation over stochastic time t with distribution t ∼ p i (·).
Determination of replisome pause durations. Replisome pause durations or the lag time difference between the replication of any two loci can be computed using the difference of lag times between the two loci: Repetitive sequences that cannot be mapped result in gaps. Panel C: Fork velocity is locus-dependent. The fork velocity is shown as a function of genomic position with an error region. Statistically significant differences in the fork velocity are observed between loci. There is significant bilateral (i.e. mirror) symmetry around the origin. Panel D: A visual representation of the relation between the log-marker-frequency and lag-time plots: Fold at the origin and rotate. Panel E: Lag-time analysis. The replication forks start at the origin at lag time zero and then accelerate and decelerate synchronously, as the forks move away from the origin. The consistency in arm position is a manifestation of bilateral symmetry. Panel F: Fork velocity as a function of lag time. In addition to bilateral symmetry, after Chr2 initiates, all four forks show roughly consistent velocities.
We emphasize that the observed lag-time difference is the exponential mean of the stochastic time difference, which has important consequences for slow processes.
Determination of the fork velocity. For fast processes, like single nucleotide incorporation, the exponential mean leads to a negligible correction (see Methods); therefore, the fork velocity has a simple interpretation: it is the slope of the genomic position versus lag-time curve: or equivalently it is the ratio of the growth rate to the log-slope: which can be directly determined from the marker frequency.
Lag-time analysis reveals V. cholerae replication dynamics. To explore the application of lag-time analysis to characterize replication dynamics, we begin our analysis in the bacterial model system Vibrio cholerae, which harbors two chromosomes: Chromosome 1 (Chr1) is 2.9 Mb and Chromosome 2 (Chr2) is 1.1 Mb. The origin of Chr1, oriC1, fires first and roughly the first half of replication is completed before the replication-initiator-RctBbinding-site crtS is replicated, triggering Chr2 initiation at oriC2 [23][24][25]. Chr1 and Chr2 then replicate concurrently for the rest of the C period. (See Fig. 2A.) To demonstrate the power of lag-time analysis, we compute the marker frequency, lag-time, and fork ve-  Lag-time analysis enables the measurement of the duration of fast processes. We focus first on the duration of time between crtS replication and the initiation of oriC2. Fluorescence microscopy imaging reveals that this wait time is very short [26], but it is very difficult to quantify since the precise timing of the replication of the crtS sequence is difficult to determine by fluorescence imaging; however, this is a natural application for lag-time analysis. To measure the lag-time difference between crtS replication and oriC2 replication, we use Eq. 3 to compute the replication time difference from the relative copy numbers. For this analysis, we generate a piecewise linear model with knots at the crtS and oriC2 loci. The measured lag time is a pause duration which is clearly resolved in the lagtime plot shown in Fig. 2E.
The fork velocity is locus dependent. It is qualitatively clear from the fork-distance-versus-time plot (Fig. 2E) that the fork velocity is locus dependent since the trajectory is not straight. To test this question statistically, we compare the 39-knot model to the null hypothesis (constant fork velocity), which is rejected with a p-value of p 10 −30 and therefore the data cannot be described by a single fork velocity. (See Tab. I.) The resulting velocity profiles are shown in Fig. 2CF.
Bilateral symmetry supports a time-dependent mechanism. Our understanding of the replication process motivated two general classes of mechanisms: (i) timedependent and (ii) locus-dependent mechanisms. Timedependent mechanisms, like a dNTP-limited replication rate, affect all forks uniformly and therefore loci equidistant from the origin should have identical fork velocities: where is the genetic position relative to the origin. In contrast, in a locus-dependent mechanism, like replication-conflict-induced slowdowns, the slow regions are expected to be randomly distributed over the chromosome. In this scenario we expect to see no bilateral symmetry between arms (e.g. Fig. 7B).
A bilateral symmetry between the arms is clearly evident in the data (the mirror symmetry about the origin in Figs. 2BC and is manifest in the lag-time analysis as the coincidence between the left and right arm trajectories and velocities in Figs. 2EF. To quantitate this symmetry, we divide the variance of the fork velocity into symmetric and antisymmetric contributions. (See the Supplemental Material Sec. IV A.) A time-dependent mechanism would generate a f S = 100% symmetric variance, whereas a locus-dependent mechanism would be expected to generate equal symmetric and antisymmetric variance contributions (f S = 50%). V. cholerae Chr1 and Chr2 have f S = 76% symmetry, consistent with a time-dependent mechanism playing a dominant but not exclusive role in determining the fork velocity. (See Tab. I.) The replisome pauses briefly at rDNA in B. subtilis. To explore the possibility that locus-dependent mechanisms could play a dominant role in determining the fork velocity profile, we next characterized the fork dynamics in the context of replication conflicts, where the antagonism between active transcription and replication, have been reported to stall the replisome by a locus-specific mechanism [9,27]. In B. subtilis, there are Distance: |`| (bp) Lag time: 3. B. subtilis fork dynamics and transcriptional conflicts. Panel A: Chromosomal structure for wild-type and mutant B. subtilis strains. The ter region in wild-type B. subtilis is positioned at 172 • , rather than 180 • , making the right arm shorter than the left. In rrnIHG(inv), the rrnIHG locus is inverted so that it is transcribed in a head-on orientation with respect to replication. In 257 • ::oriC, the origin is moved to 257 • , resulting in a short left arm that terminates at the terminus and a long right arm that replicates initially in the retrograde direction, before replicating the residuum of the right arm in the antegrade orientation. Panel B: Lag time in wild-type cells. Replication on the right arm (red) is delayed relative to the left arm (blue) by multiple endogenous co-directional rDNA loci. Panel C: Head-on conflicts lead to pausing. The rrnIHG genes are inverted so that transcription of the rDNA locus is in the head-on direction. A longer lag-time pause is observed at intermediate growth rates (CA, purple) than slow growth (minimal, red). Fork velocities elsewhere are roughly consistent. Panel D: Retrograde fork motion is slow. The retrograde fork motion in R is slow compared to antegrade replication in A1. Late antegrade motion in A2 is faster than early antegrade motion in A1. seven highly-transcribed rDNA loci on the right arm and only a single locus of the left arm. Consistent with the notion of rDNA-induced pausing, the ter locus is positioned asymmetrically on the genome, at 172 • rather than 180 • , leading the right arm of the chromosome to be shorter than the left arm. (See Fig. 3A.) In spite of the difference in length, both arms terminate roughly synchronously, implying that the average fork velocity is lower on the right arm, consistent with putative fork pausing at the rDNA loci. Are these conflict-induced pauses present in wild-type cells where the replication and transcription are co-directional? We have previously reported evidence based on single-molecule imaging that they are [12], but there is as of yet no other unambiguous supporting evidence.
To detect putative short pauses at the rDNA loci in wild-type B. subtilis, a low-noise dataset was essential. We therefore examined a number of different datasets, including our own, to search for a dataset with the low-est statistical and systematic noise. A marker-frequency dataset for a nearly wild-type strain growing on minimal media was identified for which the noise level was extremely low. (See Supplemental Material Sec. III C2.) The lag-time analysis is shown in Fig. 3B. Replication pauses should result in discrete steps in the lag time (e.g. Fig. 7B); however, no clearly-defined steps are visible in the lag-time plot. The pauses are either absent or too small to be clearly visible without statistical analysis.
To achieve optimal statistical resolution, we used the AIC model-selection framework [28,29] on four competing hypotheses: In Model 1, fork velocities are constant and equal on both arms with no pauses. In Model 2, fork velocities are constant but unequal on the left and right arms with no pauses. In Model 3, fork velocities are constant and equal on the left and right arms with equal-duration pauses at each rDNA locus. In Model 4, fork velocities are constant and unequal on the left and right arms with equal-duration pauses at each rDNA lo-cus. AIC selected Model 3 (equal arm velocities with rDNA pauses) and a pause duration of: is observed. The pause models were strongly supported over the non-pause models (∆AIC 23 = 4.3 and ∆AIC 43 = 9.4). Therefore, statistical analysis supports the existence of short slowdowns (i.e. pauses) at the rDNA, even if these features are not directly observable without statistical analysis. In higher-noise datasets, the statistical inference was ambiguous.
Strong, head-on conflicts lead to long pauses. Although we have just demonstrated that endogenous codirectional conflicts are detected statistically, they do not lead to a clear unambiguous signature. In contrast, strong, exogenous head-on conflicts in which the replisome and transcriptional machinery move in opposite directions can lead to particularly potent conflicts and even cell death [3][4][5][6][7][8][9]30]. The ability to engineer conflicts at specific loci facilitates the use of lag-time analysis for measuring the duration of the replication pauses.
To measure the pause durations due to head-on conflicts, we analyze the marker frequency from a strain, rrnIHG(inv), generated by Srivatsan and coworkers with three rDNA genes (rrnIHG) inverted so that they are transcribed in the head-on orientation. Markerfrequency datasets were reported for this strain in two growth conditions: minimal supplemented with casamino acids, in which the strain grows at an intermediate growth rate, and unsupplemented minimal media [31]. (Mutant cells cannot proliferate in rich media, presumably because the transcription conflicts are so severe [31].) In both slow and intermediate growth conditions, a clearly resolved step at the head-on locus is observed in the marker-frequency and lag-time analysis (Fig. 3B), exactly analogous to the simulated pause (Fig. 7B).
To determine the pause durations in the two growth conditions, we again consider a model with an unknown pause duration (at the inverted rDNA locus) and constant but unequal fork velocities on the left and right arms. The observed lag-time pauses are for the slow and intermediate growth rates, respectively. Although lag-time analysis reports a precise pause duration, it is important to remember that the observed lag time corresponds to the exponential mean of the stochastic state lifetime (Eq. 2), including cells that arrest and therefore never complete the replication process. Eq. 18 accounts for the pause generated by this arrested cell fraction. Srivatsan and coworkers report that 10% of the cells are arrested in intermediate growth, which accounts for 8.3 min of the lag time, leaving an estimated pause time of ∆τ pause = 1.4 ± 0.9 min for non-arrested cells, which is roughly consistent with the pause time observed in slow growth conditions.

Slow retrograde replication in B. subtilis.
Are all conflict-induced slowdowns consistent with long pauses at a small number of rDNA loci? Wang et al. have previously engineered a head-on strain, 257 • ::oriC, with less severe conflicts by moving oriC down the left arm of the chromosome to 257 • [32]. (See Fig. 3A.) The resulting strain has a very short left arm and a very long right arm, the first third of which is replicated in the retrograde (i.e. reverse to wild-type) orientation. This retrograde region contains only a single rDNA locus. All other regions are replicated in the antegrade (i.e. endogenous) orientation.
Consistent with the analysis of Wang et al., we position knots to divide the chromosome into three regions with three distinct slopes: an early antegrade region A1 (the short left arm) with log-slope α A1 = 0.34 ± 0.01 Mb −1 , a retrograde region R with log-slope α R = 0.63 ± 0.01 Mb −1 and a late antegrade region A2 with log-slope α A2 = 0.26 ± 0.01 Mb −1 , that replicates after the left arm terminates. (See Fig. 3C.) Due to the higher percentage of head-on genes in the R region compared with the A1 region, the conflict model predicts more rapid replication in region A1 versus R. Consistent with this prediction, the ratio of replication velocities is: revealing a strong replication-direction dependence. The slope appears relatively constant, consistent with a model of uniformly-distributed slow regions rather than a small number of long pauses as observed in the reversal of the rDNA locus rrnIHG. Our quantitative analysis is consistent with the interpretation of Wang et al. [32].
Rapid late replication due to genomic asymmetry. This dataset has a striking feature that is not emphasized in previous reports: Late antegrade fork velocity is faster than early antegrade velocity: v A2 /v A1 = 1.29 ± 0.05.
Although this effect is weaker than the replicationdirection dependence discussed above (Eq. 10), its size is still comparable. An analogous late-time speedup is seen in two other ectopic origin strains. (See the Supplemental Material Secs. VI H and VI J.) One potential hypothesis is that a locus-dependent mechanism slows the fork in the A1 region relative to the A2 region; however, no velocity difference is evident in these regions in the wild-type cells (Fig. 3B). Alternatively, one could hypothesize that there is some form of communication between forks that leads to a slowdown in region A1 due to the slowdown in region R; however, no coincident slowdown is observed in rrnIHG(inv) at a position opposite the rrnIHG locus, inconsistent with this hypothesis. Another possible hypothesis is that latetime replication is always rapid; however, no significant speedup is observed in either wild-type B. subtilis (Fig. 3A) or V. cholerae cells at the end of the replication process ( Fig. 3B and Fig. 2E). However, there is one extremely important difference between 257 • ::oriC and the wild-type strains: Due to the asymmetric positioning of the origin and replication traps at the terminus (Fig. 3A), there is only a single active replication fork as the A2 region is replicated. We therefore hypothesize that the fork velocity is inversely related to active fork number.
Fork number determines velocities in V. cholerae. To explore the effects of changes in the fork number on fork velocity, it is convenient to return to V. cholerae. In slow growth conditions, the cells start the C period with a pair of replication forks, for which the fork-number model predicts faster fork velocity, and finish the replication cycle with two pairs of forks, predicting slower fork velocity.
Although the structure of the velocity profile is more complex than predicted by the fork-number model alone, the observed fork velocity is broadly consistent with its predictions. If a mean fork velocity is computed before and after oriC2 initiates, the ratio is: which is quantitatively consistent with the hypothesis that more forks lead to a slowdown in replication and the size of the effect is comparable to what is observed in B. subtilis (Eq. 11). A mutant V. cholerae strain has been constructed that facilitates a non-trivial test of the fork-number model: In the monochromosomal strain MCH1, Chr2 is recombined into Chr1 at the terminus of Chr1, resulting in a single monochromosome (Chr 1-2). (See Fig. 4A.) Both the wild-type and MCH1 strains have essentially identical sequence content, implying the locus-dependent model would predict identical replication velocities; however, all replication in MCH1 occurs with only a single set of forks whereas the wild-type strain replicates the latter half of the C period with two pairs of forks, one pair on each chromosome.
The measured fork velocities are shown in Fig. 4B and support the fork-number model: MCH1 replicates the sequences after crtS at roughly 1.6 times the fork velocity of the wild-type cells, consistent with the forknumber model. Alternatively, we can consider the same quantitation of fork velocity we considered above: The ratio of fork velocities of loci replicated before crtS to those replicated afterwards: therefore only a very small slowdown is observed after crtS is replicated in MCH1, even though exactly the same sequences are replicated, again consistent with the fork-number model.
The fork velocity oscillates in E. coli. Although experiments in V. cholerae clearly support the fork-number model, there is significant variability that cannot be explained by this model alone. Are time-dependent variations in fork velocity also observed in organisms that The monochromosomal strain MCH1 has a single chromosome (green) which was constructed by recombining Chr2 (orange) into the terminus of Chr1 (blue) [33]. Under slow growth conditions the first part of the chromosome in both strains is replicated by a single pair of forks. When the fork reaches the crtS sequence on the right arm, Chr2 is initiated at oriC2 in the wild-type cells. All of Chr2 and the residuum of Chr1 replicate simultaneously, resulting in two pairs of active forks. In contrast, all sequences in MCH1 are replicated using a single pair of forks. Panel B: In MCH1, where all sequences are replicated by a single pair of forks, the fork velocity is faster than is observed in WT cells during the multifork region (greysequences replicated after crtS).
replicate a single chromosome? To answer this question, we worked in the gram-negative model bacterium Escherichia coli, which harbors a single 4.6 Mb chromosome. A large collection of marker-frequency datasets have already been generated for both rapid and slow growth conditions by the Rudolph lab [34]. As with the B. subtilis marker-frequency datasets, we selected those that had the lowest statistical and systematic noise. (See the Supplemental Material Sec. III C2.) The fork velocities are shown in Fig. 5. As before, statistically significant variation is observed in the fork velocity as a function of position. (See Tab. I and Supplemental Material Sec. IV C.) As discussed above in the context of V. cholerae, we had initially hypothesized that this variation might be a consequence of rDNA position or some other locus-dependent mechanism; however, there are three arguments against this hypothesis: (i) The slow-velocity regions are not coincident with rDNA locations (Fig. 5A) or relative GC content (Supplemental Material Sec. VI A). (ii) Consistent with the timedependent model, 84% (and 59%) of the observed variation in the fork velocity is symmetric for fast (and slow) growth. (iii) We would expect that a locus-dependent model would predict slow regions that are consistent between fast and slow growth, which is not observed.
(See the purple arrows in Fig. 5A.) We therefore conclude that the dominant mechanism for determining the fork velocity is a time-dependent mechanism, consistent with our observations for V. cholerae. The lag-time analysis is particularly informative in the context of the E. coli data (Fig. 5B): Although there is no alignment in the velocity with respect to locus position, there is clear alignment of the fork velocity variation with respect to lag time, not only between the left and right arms of the chromosome, but between slow and fast growth conditions. The observed oscillations appear roughly sinusoidal in character.
Fork velocity oscillations are observed in three organisms. Temporal oscillations in the fork velocity are an unexpected phenomenon. Are these features a systematic error with a single dataset? First we note that these oscillations are present in two E. coli growth conditions (LB and minimal). This phenomenon would be on sounder footing if similar oscillations are observed in two evolutionarily distant species: the gram-negative V. cholerae and gram-positive B. subtilis. If this phenomenon is observed, to what extent are the oscillations of similar character (e.g. phase, amplitude, and period)?
We compared the lag-time-dependent fork velocity for all three species. In B. subtilis, we have already discussed a rDNA-induced pausing on the right arm, which could complicate the interpretation of the data. We therefore consider the fork velocity on just the left arm. For E. coli and V. cholerae, we compute the average velocity as a function of lag time between the two arms. Since the different organisms and growth conditions have different mean fork velocities, we compare the fork velocity relative to the overall mean. The results are shown in Fig. 6.
All three organisms show oscillations with the same qualitative features: Each fork velocity has roughly the same phase: The velocity begins high, before decaying. The relative amplitudes, roughly 0.5 peak-to-peak, are all comparable with the largest-amplitude oscillations observed in V. cholerae and the smallest in E. coli. When the relative velocities are compared, it is striking how much consistency there is between growth conditions in E. coli and B. subtilis. Finally, the period of oscillation is comparable but distinct in all three organisms, ranging from 10 to 15 minutes. The oscillation characteristics are summarized in a table in Fig. 6.

III. DISCUSSION
The focus of this paper is on the development of lagtime analysis, which uses exponential growth as the timer to characterize cellular dynamics. Although the approach is in principle widely applicable to characterize the dynamics of any molecule of complex, we focus on replication dynamics to explore this novel approach. Although the approach has been understood at a conceptual level since the pioneering work of Cooper and Helmstetter [19], our recent exploration of stochastic models and the introduction of the exponential mean have clarified its interpretation [13] and, from an experimental perspective, the introduction of next-generation sequencing greatly expanded the potential of the approach for characterizing the dynamics of nucleic acids and in particular replication dynamics, where it has the potential to make precise measurements of replication timing. In multiple applications, we have used this approach to quantitatively measure time durations as short as seconds, a time resolution that is challenging, if not impossible, to achieve using other methods.
To appreciate the power of our approach, it is useful to compare our results to recent results of Nieduszynski and coworkers who have used experimental methods, cell synchronization (sync-seq), to directly resolve repli-   cation dynamics [35,36]. In this case, the time resolution is limited by a combination of the precision of cell synchronization, which is an imperfect tool [37], and the frequency of fraction collection (every 5 minutes). Although it would be interesting to compare the relative precision of our approach to this competing method, the authors do not report fork velocities, pause durations, or provide an error analysis of their reported replication times, questioning to what extent the approach is truly quantitative. Since the fractions are collected on five-minute intervals, this time resolution is the floor of the direct time resolution achieved by this approach. In contrast, we report on a range of pause durations that are shorter than 5 minutes. A significant experimental shortcoming of the syncseq approach is the necessity of cell synchronization. In most systems, synchronization requires cell-cycle arrest, which introduces a significant potential for artifactual results [14]; whereas lag-time analysis probes dynamics in steady-state growth. Our own preliminary analysis suggests that the timing of initiation at a subset of loci in Saccharomyces cerevisiae is changed by the cell synchronization procedure relative to steady-state growth [38].
In addition to these quantitative and high-timeresolution applications, we have also demonstrated the approach in a more conceptual context: using lag-time analysis to argue that the observed oscillations in fork velocity were temporal rather than locus dependent. Together, these examples demonstrate both conceptual and concrete advantages to lag-time analysis over the traditional marker-frequency analysis.
The significance of the fork velocity. Previous markerfrequency analyses have often reported a log-slope (e.g. [32,39]), which is closely related to the fork velocity. What new insights does the measurement of the fork velocity offer over this closely related approach? The fork velocity approach has two important advantages: (i) The first advantage is a conceptual one. The underlying quantity of interest is velocity (or rate per base pair). This is the quantity that is measured in vitro and is relevant in a mechanistic model. In contrast, the log-slope is an emergent quantity that is only relevant in the context of exponential growth. (ii) The second advantage is concrete: Although log-slope measurements allow ratiometric comparisons between fork velocity at different loci in the same dataset, they cannot be used to make comparisons across datasets. Any comparison of the logslope between cells with different growth rate (e.g. due to changes in growth conditions, mutations, species, etc.) are meaningless. For instance, the log-slopes of the wild-type and MCH1 V. cholerae strains are very different even though the changes in the fork velocity are small. Our wide-ranging comparisons between growth conditions, mutants, and organisms demonstrate the power of reporting fork velocity over the log-slope.
Applications to eukaryotic cells. Although our focus has been on replication in bacterial cells, an important question is to what extent our approach could be adapted to eukaryotic cells. First, we emphasize that the lag-time analysis is directly applicable without modification to the eukaryotic context. As such, the timing of the replication of loci can be analyzed; however, since the S phase is typically a smaller fraction of the cell cycle and the genomes of eukaryotic cells are larger, deeper sequencing will be required to achieve the same resolution we demonstrate in the context of bacterial cells. One significant potential refinement to this approach is the use of cell sorting (sort-seq) to enrich for replicating cells which can greatly increase the signal-to-noise ratio [35,36]; however, this approach appears to lead to significant flattening near early-firing origins, as we have observed in other contexts (Supplemental Material Sec. III C2), and therefore increasing sequencing depth is probably the most promising approach for eukaryotic systems when quantitative characterization is a priority. (See Methods Eqs. 22 and 23 for an estimate of resolution.) Although lag-time analysis can easily be extended to the eukaryotic context, the measurement of the fork velocity will require some care. A critical assumption in our analysis is that replication forks move unidirectionally at any particular locus, i.e. it can be either rightward or leftward moving but not both. (See Supplemental Material Sec. III C9.) Fork traps prevent this bidirectionality in many bacterial cells, but this does not apply to eukaryotic cells. However, if regions of the chromosome can be found where fork movement is unidirectional, e.g. sufficiently close to early-firing origins, fork velocity measurements could be made in eukaryotic cells. For instance, these conditions appear to be met for a significant fraction of the Saccharomyces cerevisiae genome [36]. With significant increases in sequencing depth, we expect analogous replication phenomenology, including pausing and locus-and time-dependent fork velocities, will be observed in eukaryotic systems using lag-time analysis. Importance of a model-independent approach. As we prepared this manuscript, we became aware of a competing group which also uses marker-frequency analysis to test a specific hypothesis: the fork velocity is oscillatory in E. coli [15], consistent with our own observations. Although our reports share some conclusions, this competing approach requires detailed models for the cell cycle and the fork velocity, along with explicit stochastic simulations. We demonstrate an approach to measure fork velocities independent of model assumptions or detailed hypotheses for the fork velocity, without the need for numerical simulation and complete with the ability to perform an explicit and tractable error analysis. We therefore expect our analysis will be both widely applicable as well as accessible to other investigators without specialized tools or modeling expertise.
Systematic error in datasets. It may seem perplexing that we have not pooled many existing datasets from multiple independent experiments. This would naïvely increase the statistical resolution and sensitivity from an analytical perspective. However, it is important to emphasize that not all marker-frequency experiments are of equal quality and that many datasets we have analyzed have clear signatures of systematic error. (See the Supplemental Material Sec. III C2.) In our analysis, we have prioritized the selection of artifact-free datasets over the indiscriminate pooling of data. We emphasize that to date, datasets have not been generated with quantitative replication dynamics analysis as a goal and we are confident that experimental protocols can be optimized to improve the data. Ref. [40] describes a promising approach, including harvesting populations earlier in exponential phase. We too are developing new protocols to increase data quality.
Multiple factors determine replisome dynamics. Our measurements of the replication velocity reveal that there are multiple important determinants that results in complex velocity profiles. dNTP pools regulate the fork velocity. Previous work had already demonstrated that increases (or decreases) in dNTP pool levels lead to concomitant decreases (or increases) in the C period duration, consistent with a dNTP-limited model of the replication velocity [41][42][43][44]. Our data is broadly consistent with these previous results, but in a subcellular context: (i) The fork-number model, in which fork velocities decrease as the number of active forks increase, is clearly consistent with a mechanism in which the nucleotide pool levels, although highly-regulated [45], cannot completely compensate for the increased incorporation rate associated with multiple forks. (ii) The observation of the fork velocity oscillations is also consistent with an analogous failure of the regulatory response to compensate, this time temporally. The initial fall in the fork velocity is consistent with a model in which dNTP levels initially fall as replication initiates and nucleotides begin to be incorporated into the genome. Reduction in the dNTP levels causes a regulatory response to increase dNTP synthesis by ribonucleotide reductase [45], but the finite response time of the network could lead to dynamic overshoot in the regulatory feedback, leading to oscillations [46]. Ref. [15] has also argued that this oscillating-dNTP-level model would lead to time-dependent oscillations in the mutation rate which are consistent with the origin-mirror-symmetric distribution of the mutation observed in E. coli. However, this interesting phenomenon and this hypothesized mechanism will require further investigation.
Retrograde fork motion leads to slow replication velocities. Retrograde fork motion, where the fork moves in the opposite direction from wild-type cells, lead to the largest changes in fork velocity observed. To what extent is the observed slowdown a consequence of a few long-duration pauses versus a region-wide slowdown? In regions which exclude the rDNA, the effect appears well distributed. However, it is important to note that the genomic resolution of lag-time analysis is still much too low to resolve individual transcriptional units. We anticipate that with increased sequencing depth as well as improvements in sample preparation, this approach could detect genomic structure in the fork velocity at the resolution of individual transcriptional units. Although we did analyze a number of mutants with retrograde fork movement in V. cholerae and E. coli (analysis not shown), the competing effect of increased fork number as well as the genomic instability of these strains made these experiments difficult to interpret quantitatively, since fork number and direction were both affected in these strains [32,47]. We concluded qualitatively that retrograde replication direction appears not to play as large a role in these gram-negative bacteria as it does in gram-positive B. subtilits, consistent with previous evidence [31,32,[48][49][50]. However, we expect lag-time analysis could be used to characterize even small effects of the retrograde fork orientation in more-carefully engineered strains, analogous to those that we analyzed in the context of B. subtilis [31,32].
Previous reports [31,32], including our own [12,[51][52][53][54], had reported long-duration replication-conflict induced pauses, especially in mutant strains where the orientation of rDNA [31] or other highly transcribed genes [51] are inverted to give rise to a head-on conflict between replication and transcription. The contribution of lag-time analysis in this context is multifold: First, we provide a quantitative number in the context of the very-short-duration pauses for co-directional transcription in wild-type cells. This analysis supports a longstanding hypothesis that the right arm of the B. subtilis chromosome is shorter than the left arm to compensate for pausing at the rDNA loci that arm predominately located on this arm.
We also report quantitative measurements for the longer pauses that results from head-on conflicts in mutants where highly transcribed genes are inverted. Our analysis gives us the ability to quantitatively differentiate the contributions of fork pausing and arrest in the analysis of the marker frequency, which was previously impossible. Our measurement of a timescale of minutes is consistent with our previous in vivo single-molecule measurements in which we report transcription-dependent disassembly of the core replisome [12]. Could the observed fork-velocity oscillations be misinterpreted as pauses? The observed lag-time offset between the two arms (e.g. Fig. 3B) is not predicted by fork-velocity oscillations.

Conclusion.
In this paper, we introduce a novel method for quantitively characterizing cellular dynamics by lagtime analysis. Although more broadly applicable, we focus our analysis on the characterization of replication dynamics using next-generation sequencing to quantitate DNA locus copy number genome-wide. The approach has the ability to make precise, even at the resolution of seconds, measurements of time differences and pause durations, as well as the ability to quantitatively measure fork velocities in vivo in physiological units of kb/s, at genomic resolutions of roughly 100 kb, for the first time. Importantly, unlike marker-frequency analysis, our approach allows direct quantitative comparisons to be made between growth conditions, mutant strains, and even different organisms. The resulting measurements of replication dynamics reveal complex phenomenology, including temporal oscillations in the fork velocity as well as evidence for multiple mechanisms that determine the fork velocity. The lag-time analysis has great potential for application beyond bacterial systems as well as the potential to significantly increase in resolution and sensitivity as sequencing depth and sample preparation improve.

IV. METHODS
Introduction to marker-frequency analysis. Our focus will be on marker-frequency analysis, which measures the total number of a genetic locus in an asynchronous population. The model was generalized to predict the marker frequency N ( ) of a locus a genomic distance away from the origin [20][21][22]: where N 0 is the number of origins, which grows exponentially in time with the rate of mass doubling of the culture, k G . Since the origin is replicated first, the number of origins is always largest compared to the numbers of other loci. Quantitatively, the copy number is predicted to decay exponentially with log-slope: where k G is the population growth rate and v is the fork velocity, typically expressed in units of kilobases per second. To derive this result, two critical assumptions were made: (i) the timing of the cell cycle is deterministic and (ii) the fork velocity is constant [19,20]. Initially, our naïve expectation was that the interplay between the significant stochasticity of the cell-cycle timing with the asynchronicity of the culture would prevent marker-frequency analysis from being used as a quantitative tool for characterizing cell-cycle dynamics. For instance, significant stochasticity is observed in the duration of the B period [55] (i.e. the duration of time between cell birth and the initiation of replication). Does this stochasticity lead to a failure of the log-slope relation (Eq. 15)?
Stochastic simulations support the universality of the log-slope relation. To explore the role of stochasticity and a locus-dependent fork velocity in shaping the marker frequency, we simulated the cell cycle using a stochastic simulation. Our aim was not to perform a simulation whose mechanistic details were correct, but rather to study how strong violations of the Cooper-Helmstetter assumptions, in particular how stochasticity, as strong or stronger than that observed, influenced the observed marker frequency and the log-slope relation (Eq. 15). In short, we used a Gillespie simulation [56] where the B period duration and the lifetime of replisome nucleotide incorporation steps are exponentially distributed, and we added regions of the genome where the incorporation rate was fast as well as a single slow step on one arm. See Fig. 7A and the Supplemental Material Sec. V A for a detailed description of the model, as well as movies of the marker frequency approaching steady-state growth, starting from a single-cell progenitor.
To our initial surprise, the stochasticity of the model had no effect on the predicted log-slope of the locus copy number. (See Fig. 7B.) In spite of the stochastic duration of the B period and the locus-dependence, the marker frequency still decays exponentially with the same decay length locally, i.e.: where k G was the empirically determined growth rate in the simulation and v( ) was the local fork velocity at the locus with position . We hypothesized that this result might be a special case of choosing an exponential lifetime distribution, since this is consistent with a stochastic realization of chemical kinetics. To test this hypothesis, we simulated several different distributions, including a uniform distribution, for the duration of the B period and the stepping lifetime for the replisome (as well as simulating multifork replication). In each case, the local log-slope relation (Eq. 16) held, even as the growth rate and fork velocities changed with the changes in the underlying simulated growth dynamics. We therefore hypothesized that Eq. 16 was a universal law of cell-cycle dynamics and independent of Cooper and Helmstetter's original assumptions.
The exponential-mean duration. Motivated by this empirical evidence, we exactly computed the population demography in a class of stochastically-timed cell models. We described these results elsewhere [13]. In short, we showed that there is an exact correspondence between these stochastically-timed models and deterministically-timed models in exponential growth. The relationship between the corresponding deterministic lifetime τ i of a state i and the underlying distribution p i in the stochastic model is the exponential mean (Eq. 2) [13]. The exponential mean biases the mean towards short times, the growth rate k G determines the strength of this bias, and the biological mechanism for this bias is due to the enrichment of young cells relative to old cells in an exponentially growing culture [13].
To understand the consequences of this result, we consider two special cases of this exponential mean. For processes with lifetimes short compared to the doubling time, Eq. 2 can be Taylor expanded to show that the exponential mean is: the regular arithmetic mean µ t with a leading-order correction proportional to the product of the growth rate and variance σ 2 t . In the context of single-nucleotide incorporation, this correction is on order one-part-in-amillion and therefore can be ignored. As a consequence, Eq. 16, corresponding to the transitions between states with short-lifetimes, is unaffected by the stochasticity, exactly as we observed in our simulations.
Another important case to consider is the strong disorder limit, in which a small fraction of the population stochastically arrests, i.e. with lifetime ∞, while the other individuals have exponential-mean lifetime τ 0 . Using the definition in Eq. 2, it is straightforward to show that the deterministic lifetime is: where T is the population doubling time and the second equality is an approximation for small . The exponential mean duration is extended by the arrest, but remains finite. Therefore, an arrest of a subpopulation is indistinguishable from a longer duration pause in an exponentially proliferating population. (See Ref. [13].) Marker-frequency demography. For a stochastic model with locus-dependent fork velocity, we showed that Eqs. 14 and 15 generalize to where we will call τ ( ) the lag time of a locus at position , which is equal to the sum of the differential lag times for each sequential step: where δτ i is the differential lag time for state i or the exponential mean of the state lifetime [13]. In the continuum limit, it is more convenient to represent this sum as an integral: where the fork velocity is defined: v( i ) ≡ 1 bp/δτ i . To demonstrate that the generalized stochastic model predicts the log-slope relation (Eq. 16), the log-slope can be derived by substituting Eq. 21 into Eq. 19, as was observed in the stochastic simulations, demonstrating the universality of Eq. 4. We note that Wang and coworkers had previously derived an equivalent expression using the deterministic framework of the Cooper-Helmstetter model in the Material and Methods Section of Ref. [31].
Stochasticity has a minimal effect on the marker frequency. We initially had hypothesized that stochasticity should affect the marker frequency. As explained above, it is the rapidity of base incorporation that explains why stochasticity is dispensable in this context. The same argument does not apply to the B period which is comparable to the duration of the cell cycle. However, for the marker frequency, it is lag-time differences between the replication times of loci that is determinative, and therefore the lag time of the B period cancels from these lagtime differences. Although it is mostly irrelevant for understanding wild-type cell dynamics, stochasticity and an arrested subpopulation will play an important role in one phenomenon we analyze: replication-conflict induced pauses.

Time resolution.
Due to the large number of reads achievable in next generation sequencing, the time resolution will be high in carefully designed analyses. The In spite of the stochasticity, it obeys the log-slope law locally (Eq. 16). Furthermore, the inferred lag-time pause (4.9 min) is predicted by the exponential mean (Eq. 2). Panel C: Tradeoff between genomic resolution and velocity precision. As the spacing between knots decreases, increasing the genomic resolution, the error in the velocity measurement increases.
number of reads is subject to counting or Poisson noise. It is therefore straightforward to estimate the experimental uncertainty in the lag time due to finite read number: where we have used a read number inspired by the replication-conflict pausing example. This estimate suggests that under standard conditions, time measurements with an uncertainty of seconds are possible using this approach.

Fork-velocity resolution.
To compute the slope in Eq. 4, the log-marker-frequency is fit to a piecewise linear function with equal spacing between knots. See Fig. 7B. There is an important tradeoff between genomic resolution (i.e. the genomic distance between knots) and fork velocity precision (i.e. the uncertainty in velocity measurement): Increasing the genomic distance between knots reduces the genomic resolution but also reduces the uncertainty in the velocity measurement. We therefore consider a series of models with increasing genomic resolution and use the Akaike Information Criterion (AIC) to select the optimal model [28,29]. See Supplemental Material Sec. III C10. This approach balances the desire to resolve features by increasing the genomic resolution with the loss of velocity precision. Given a knot spacing, it is straightforward to estimate the relative error: where n is the read depth in reads per base and ∆ is the spacing between knots in basepairs. Therefore, for a canonical next-generation-sequencing experiment, we can expect to achieve roughly 10% error in the fork velocity for 100 kb genomic resolution. Note that in our error analysis, we have included only the error from cell number N , not the error from the uncertainty in the cellcycle duration, which covaries between loci in a particular experiment. B. Traxler, A. Nourmohammad, J. Mougous, and J. Mittler for many useful conversations. We would like to thank P. Levin, J. Wang and L. Simmons for advice on our manuscript. We thank S. B. Peterson and A. Schaefer for help with V. cholerae. We would like to thank J. Wang, C. Possoz, F.-X. Barre, C. Rudolph for detailed conversations about their data. This work was supported by NIH grant R01-GM128191.
Data availability statement. The sequencing datasets generated during the current study are currently available on reasonable request, but will be uploaded to a public repository prior to publication. Relevant sequencing data from other studies can be found at the accession numbers given in Sec. II of the Supplemental Material. Digitized data are available on reasonable request.
Code availability statement. All MATLAB scripts written for this study are available on reasonable request. V. Supplemental theoretical results S10 A. Stochastic simulations S10 1. Stochastic simulations method S10 2. Stochastic simulations match the predictions of the log-slope S11 3. Simulation models S11 4. Simulation results S12 5. Re-scaling simulation units to compare to measured data S12 6. Movies of marker frequency dynamics approaching steady state growth S12

II. DATASETS USED IN THIS STUDY
All datasets were obtained from cells grown at 37 • C. Next-generation-sequencing FASTQ files and marker frequencies were either generated for this study, obtained from Wang et al. [2,3], or obtained from the European Nucleotide Archive (ENA). The corresponding project accessions are included if applicable. In the following table, Exp is shorthand for exponential phase growth and Stat is shorthand for stationary phase. Growth
All populations were grown at 37 • C. In our study, exponential phase cultures were grown to OD 600 of 0.60-0.80. Stationary phase samples were grown to OD 600 of 1.50 or greater. In Galli et al. [1], exponential phase cultures were grown to an OD 650 of 0.05 and 0.2 in M9 and LB, respectively. In Midgley-Smith et al. [4], exponential phase cultures were grown to an OD 600 of 0.48. In Srivatsan et al. [3], exponential phase cultures were grown to an OD 600 of 0.2-0.6.

B. Generation of marker frequency data for this study
For detailed protocols of marker-frequency generation for each dataset, see the corresponding references in Sec. II. In this study, strains were struck on solid agar and grown overnight at 37 • C. Three individual colonies were selected from each strain and used to inoculate 10 mL of LB media, which was grown overnight at 37 • C w/ 260 rpm shaking. The following morning, these overnight pre-cultures were back-diluted to OD 600 of 0.05 in 5 mL of either LB or M9 media. The same biological replicate was used to inoculate both growth conditions. Exponential phase cultures were grown to OD 600 of 0.60-0.80. Stationary phase samples were grown to OD 600 of 1.50 or greater. To harvest, cultures were centrifuged at 10,000 rpm for 5 mins at 25 • C, and the supernatant was removed. gDNA was prepared immediately following harvest using GeneJet Genomic DNA Purification Kit (Thermo Scientific, K0721). Purified gDNA was quantified using Qubit dsDNA HS Assay Kit (Invitrogen, Q32851) and quality was assayed using a Nanodrop 2000 Spectrophotometer (Thermo Scientific, ND-2000). Whole-genome libraries were prepared from purified gDNA using Twist 96-Plex Library Prep Kit (Twist Bioscience, 104950) with dual adapters. Libraries were subsequently pooled and sequenced on Illumina NovaSeq6000 (S4) to a depth of 6,000,000 reads per sample.

Sequence alignment
To align deep sequencing read outputs to the bacterial reference genomes for MFA, we used the read alignment tool Bowtie 2 via the MATLAB function bowtie2 [5]. We then extracted the position information from the resulting SAM file using an in-lab written MATLAB script and the command-line tool SAMtools [6].

Data selection
Not all marker-frequency experiments are of the same quality and many datasets we have analyzed have clear signatures of systematic error. We use two key criteria to determine the quality of the data. The first is a qualitative measure involving the shape of the marker-frequency profile and the second is a quantitative measure based on resolution analysis.
Marker profile near origin. Our analysis uses exponential growth as the stop-watch for resolving dynamics, so it is essential that the population is harvested at the right time for sequencing. We have found that the marker-frequency profile has a signature flattening near the origin of replication if the population is harvested too late in exponential phase, as shown in Fig. 1. The population is entering stationary phase, where nutrient depletion begins to cause slower growth. Entering stationary phase also causes a decrease in the population-wide rate of replication initiation, which leads to the flattening of the profile near the origin. For the rest of the chromosome, replication continues unchanged for the majority of the population. Since exponential growth is crucial to our analysis, we have chosen to only use datasets with cusp-like behavior near the origin(s), as opposed to a rounded concave-down shape.

Resolution analysis.
In the case where there is a single origin of replication for each chromosome, we expect only a single maximum in the marker frequency profile, corresponding to the origin. In an ideal noiseless situation, any local maxima distinct from the origin would correspond to other points of replication initiation and the fit would predict a retrograde fork velocity. However, due to the stochastic nature of the sequencing data, there can be spurious local maxima if the resolution is chosen to be too high. Fluctuations due to noise can cause the knots in the piecewiselinear fit along each arm of the chromosome to be non-monotonic, as shown in Panel C of Fig. 7 in the main paper. Thus, given two datasets for the same organism and growth condition, data quality was determined based on the best resolution that can be achieved without introducing spurious maxima.

To divide or not to divide by stationary phase data
To further remove any sequencing induced bias, we divide the copy number data obtained during exponential growth by the data obtained during stationary phase. This normalization step is done with the 1 kb binned values. Since the division of two noisy uncorrelated data sets typically results in a decreased signal-to-noise ratio (SNR), we used a comparison of the variance divided by the mean before and after division by stationary phase.
We expect the relative variance to decrease by normalization if the reduction in sequencing bias resulting from division has a larger effect than the increase in noise from dividing two noisy data sets. We first divide the noisy stationary phase data by its expected mean value. This ensures that it is of unit order and will not introduce a global scale factor during division, which would affect only the normalized data and not the pre-normalization data. We find that after division, the variance is roughly halved, suggesting that the normalization by stationary phase is successful in reducing sequencing bias without resulting in excessive noise. We thus chose to divide all exponential phase data with the corresponding stationary phase data. For some datasets, the stationary phase results were not provided for some of the ectopic origin mutants. The only difference in these mutants from their original strains is the addition of a short segment of DNA for the ectopic origin, which does not significantly change the locations of sequencing bias. Therefore, we chose to divide the exponential phase data for the ectopic origin strains by the stationary phase data from the original strains.

Filtering the data
To remove outliers from the marker-frequency data, we used a centered 200 kb median filter, such that at each position along the chromosome, we set the median marker-frequency from − 100 kb to + 100 kb as a baseline. The data was given periodic boundary conditions such that the centered medians still had 200 kb regions at the boundaries of the data set. We then discarded the 10% of points that most deviated from this centered median baseline, sorted by the absolute value of the difference. We also removed data points from regions known to be associated with mobile elements of low GC content, such as the Super Integron on Chr2 of V. cholerae, which is subject to anomalous sequencing bias.

Nearest-neighbor variance estimator
We used a nearest-neighbor estimate of variance: where x N +1 = x 1 , in agreement with periodic boundary conditions. This approach allows us to obtain a variance measurement without assuming the underlying distribution of the data and the corresponding expected value of each bin.

S6
6. Fitting slopes to the log of the copy number After taking the natural logarithm of the copy number, we expect the data to follow a roughly linear trend along each arm of the chromosome. Since our goal is to obtain higher resolution measurements of fork velocity along the chromosome, it is necessary to subdivide each arm into smaller segments with variable slopes. The fit to the data should be continuous, since discontinuities would create copy number ambiguity at segment junctions. Therefore, we use a continuous piecewise linear least squares fit to obtain slope measurements from the data. We call the MATLAB function lsqnonlin as part of an in-lab written fitter function. The fitter obtains vertical control point values at the junctions between segments. Each control point is used in the measurement of the slope both to the left and to the right of it.

Estimating errors for the control points
To obtain error estimates for the control points of the least squares fit, we use the Jacobian that lsqnonlin returns, which is the Jacobian of the difference between the fit data and the experimental data. If we have k fit parameters θ α , where α = 1, ..., k, and N data points y i , where i = 1, ..., N , then the Jacobian takes the form: where µ i (θ) is the fitter estimate of y i , based on the model with the set of parameters θ. We use the matrix form of the Fisher information: where f (Y ; θ) is the probability density function (PDF) of the random variable Y , given parameters θ. In our case, we expect the noise to be Gaussian distributed around the mean, so we have the PDF: where σ is the standard deviation. Now we take the derivative of the logarithm: Plugging this into Eq. 3 and noting that the matrix multiplication contracts over the i index, we are left with: The Cramér-Rao bound states that the inverse of the Fisher information provides a lower bound for the covariance matrix of unbiased estimators. More explicitly, the relation is as follows: We take the equality for our error estimates. The variances of the parameter estimates are given by the diagonal elements of the covariance matrix. To convert the control points of the piecewise linear fit into slopes and fork velocities, we use coordinate transformation matrices, which have the added benefit of also transforming the variance estimates from the Fisher information. The coordinate transformation matrices can either be obtained through direct reasoning or as Jacobians of the final coordinates relative to the initial coordinates.
The following examples will be square matrices of dimension 3, acting on sets of 3 parameters. These examples illustrate the procedure of obtaining the transformation matrices and are easily generalizable. To convert from a vector of control point values to a vector where the first element is the leftmost control point value and the other elements are the slopes, we use the following matrix: where ∆ ij is the chromosomal position (horizontal value) of the j-th control point minus the position of the i-th control point. We can thus relate the vector of slopes α and the vector of (vertical) control point values θ: From the slopes, we can obtain the fork velocities with the following matrix: The velocities v are related to α like so: To properly transform the covariance matrix obtained in Sec. III C 7, we must use the transformation matrix on both sides, like so: cov v = Slopes2Vels T · cov α · Slopes2Vels.
The error estimates for each parameter are the square roots of the corresponding diagonal elements.

Behavior near the terminus
Due to the stochastic nature of replisome progression, replication forks do not always meet at the terminus, which corresponds to the dif site [1]. If one fork arrives first, it can continue past the dif site, converting from antegrade to retrograde motion along the other arm. While there are Tus-Ter traps in E. coli and B. subtilis to limit the amount of retrograde motion, they are not 100% efficient and they are not present in V. cholerae [1]. Thus, in a population, the true termination points along the chromosome would form a distribution around the dif site. This means that in regions near the terminus, there can be antegrade and retrograde replication forks both contributing to the marker frequency, which contradicts one of the assumptions required for our analysis: replication proceeding only in one direction. Such retrograde motion would cause a flattening out around any fork convergence points observed in the marker-frequency data. Therefore, when doing a multi-segment analysis, we remove the two fork velocity measurements on either side of the marker-frequency minimum.

Selection of the number of fitted knots
First we bin the data into 1 kb regions, following the convention of previous marker frequency analysis studies.
To determine the number of segments that would best fit the data, we use the Akaike Information Criterion (AIC).
We compare a series of nested models with 2 k continuous piecewise-linear least squares fits, where k = 1, ..., 10. The AIC-optimal model for fast growth (in LB) had 39 knots, spaced by 100 kb, generating 38 measurements of locus velocity across the two chromosomes of WT V. cholerae. Other strains either have similar or smaller region sizes that minimized the AIC value, so for ease of comparison with other mutant strains, we take 100 kb to be the step size for determining fork velocity.

A. Bilateral symmetry analysis
To analyze whether an observed velocity profile was consistent with the predictions of the time-dependent mechanism, we test for bilateral symmetry between the left and right arms: where is the locus position relative to the origin. Assume the fork velocity has been computed over 2m equal-length genome segments, arranged symmetrically about the origin ( = 0). The segments i = −1... − m correspond to sequentially labeled segments along the left arm and the segments i = 1...m correspond to sequentially labeled segments along the right arm such that = ∆ i corresponds to the end point of each segment where ∆ is the segment length. Let the mean fork velocity be defined: where v i is the velocity over segment i.
To divide the variance into symmetric and antisymmetric contributions, we define symmetrized, (i), and antisymmetrized velocities, [i], for index pairs ±i: We define the symmetric and antisymmetric variances and the total variance is the sum of the symmetric and antisymmetric variances: Finally, we define the symmetric fraction of the variance: If the variation in the velocity obeys the bilateral symmetry (Eq. 16), the variance is all symmetric (i.e. f S = 1). If on the other hand, the variation is randomly distributed along the genome, we expect equal symmetric and antisymmetric combinations (i.e. f S = 1/2).

B. Estimation of average fork number per cell cycle
The following analysis is only an estimation. There are a few key assumptions that are inconsistent with cell phenomenology, but are kept to make the analysis tractable. In particular, we take the assumption that neither arm of the chromosome has fork arrest, which is inconsistent with the results for B. subtilis. This assumption allows us to obtain an estimate without prior knowledge of where fork arrest may occur. We also assume that the D period (time between end of chromosomal replication and cell division) is negligible, which is inconsistent when growth is extremely slow. This assumption allows us to use the copy number of the terminus as an approximation of the number of cells, which cannot be determined from MFA.
The population number density of forks n f ( ) at any position is given by the change in copy number between two consecutive positions along the chromosome: This equation is a result of the following reasoning: If there are two copies at = x and one copy at = x + 1, then there must be a fork that has just replicated = x. We need to integrate the number density over the entire chromosome to get the total number of forks for the population: where the factor of 2 came from having two arms. To get the average number of forks per cellN f , we divide by the total number of cells, which is roughly equal to the number of termini N ter : This can be related to lag time: which can be substituted into the equation above to get: This is closely related to the result derived in [7]. For two chromosomes, we add the number of forks for both, but divide everything just by the minimum N ter,i , which corresponds to the terminus of Chr1 in this case: In general, when there are multiple chromosomes: where i denotes which chromosome. To get a relative lag time, we use: Letting the subscript end denote the lag time corresponding to min(N ter,i ). We thus have: where τ end is the maximum lag time measured. This can be simplified to: which enables a calculation of average fork number per cell cycle using lag times.

S10
...  2. Stochastic simulation model schematics. Replication initiates at the origin (state 1) and proceeds bi-directionally along the left (L) and right arms (R). Each transition between simulation bases (i.e. state number) represents the replication of roughly 25 kb. States have exponentially-distributed lifetimes with a rate equal to the fork velocity at that position: kX = v( X ). A number of different initiation schemes were simulated. In the Origin Model, the initiation rate was proportional to the number origins (per cell). In the Terminus Model, the initiation rate was proportional to the number termini (per cell). In the Uniform Model, there is a uniformly distributed wait time after initiation before the next initiation occurs. In the Precise Model, there is a fixed time interval after termination before the next initiation occurs.

C. Statistically significant deviations of local fork velocity from the global mean
To test the statistical significance of the measured velocities, we use a null hypothesis test. The null hypothesis corresponds to a model with uniform fork velocity on the right and left arms. The alternative hypothesis corresponds to a model with a model-selected number of knots (m), corresponding to m − 1 genomic region-specific velocities.
To calculate the p-value, we begin by obtaining the χ 2 -test statistic: where O i denotes the observed values, E i denotes the expected values, and σ 2 i is the variance of the fork velocities measured using the fitter function (as described in Supplemental Material Sec. III C 8). In this case, E i is the global mean fork velocity for all i, since that is the null hypothesis. Since there is only one degree of freedom for the model, we use the one-dimensional χ 2 cumulative distribution function to determine the probability of measuring a χ 2 statistic as extreme as the one we've measured, assuming the null hypothesis is true. This p-value is found to be 10 −30 for all datasets except one with a p-value of 6 × 10 −12 . All of the p-values are much smaller than the standard threshold of 0.05, so we reject the null hypothesis. We thus conclude that there are statistically significant variations of the fork velocity relative to the mean.

Stochastic simulations method
The purpose of the stochastic simulation was to investigate the role of stochasticity in determining the log-phase growth demography of the population. It was not to generate a mechanistically realistic and detailed model of the cell cycle.
To simulate the cell cycle, we performed an exact stochastic simulation using the Gillespie Algorithm [8]. We idealized the cell cycle as follows: Genome representation: The genome was divided into two equal length arms each consisting to 100 course-grained bases. An explicit realization of all 5 Mb would be too slow to rapidly explore different cell models. Finer coarse-grained basepairs were explored but made no difference to the results. Replication initiation: We experimented with a number of different models for initiation since we initially believed that the stochasticity of initiation would limit the ability to quantitate the fork velocity. (i) We initially investigate a Terminus Model in which there was a constant rate of initiation k init after the termination of the on-going replication. (ii) In order to simulate a more realistic model, we then considered an Origin Model where there was constant rate k init of initiation at each origin. (iii) We explored a Precise Model with precise timed B period length T B . (iv) We explored a Uniform Model with a uniform distribution of B periods. Fork velocity: We used the replication of the coursegrained bases had exponentially distributed wait times k rep = v/∆ (i.e. constant rate per unit time) where v is the replication velocity and ∆ is the length of the course-grained bases. We note that this assumption makes replication more stochastic than a more realistic model in which wait times are exponentially distributed at the single-base level. We experimented with a number of different types of locus-dependent fork velocities. In addition, we added exponentially-distributed pauses of various durations at a specific locus. Termination: We only allowed a single direction of fork propagation. When the fork reached the end of the arm, replication terminated. Cell division: We assumed cells divided immediately after termination of the slowest arm of the chromosome. Initiation of the simulation: All simulations initiated from a single new-born cell with an un-replicated chromosome. Stop conditions: Simulation were run until there were 10 5 cells. Determination of growth rate. The number of cells was fit to an exponential to determine the growth rate k G between N cell (t) = 10 4 and N cell (t) = 10 5 cells. Generation of simulated marker frequency. The marker frequency was defined as the number of each genetic locus in the population at the termination of the simulation. In summary, the model was described by the parameters k init and the fork velocity v( ) (or pause time distribution) at each locus. The model output was the marker frequency at each locus N ( ) and the growth rate k G . See Fig. 2.

Stochastic simulations match the predictions of the log-slope
To test whether the log-slope law applies locally, irrespective of stochasticity in cell-cycle timing, we used a stochastic simulation to generate simulated marker-frequency data. The basic strategy was to simulate a stochastically timed cell cycle, including stochastically timed B periods as well as stochastic fork dynamics, and compare the observed population demographics to the prediction of non-stochastic models.
Log slope. We define the log slope: where N ( ) is the population copy number of the locus at position . Theoretically, the slope is predicted to be: where k G is the growth rate and v( ) is the locus-dependent fork velocity.
Log difference. For long pauses, the dynamics are characterized by a step rather than a slope. The step is defined: where N ( ) is the population copy number of the locus at position . Theoretically, the log difference is predicted to be: where k G is the growth rate, δτ ( ) is the exponential mean of the step wait time and the last equality applies in the special case of an exponentially-distributed wait time with rate k.

Simulation models
The model descriptions and motivations are summarized below: Terminus Model 1: In the terminus model, the replication initiation rate is proportional to the terminus number (per cell). In this model, there is a locus-independent fork velocity.
Terminus Model 2: Same as above, but with two fork velocities. On the last half of the right arm (region 1), the fork velocity is higher than the rest of the chromosome (region 0).
Terminus Model 3: Same as above, but with an exponentially-distributed pause close to the origin on the right arm.
Terminus Model 4: Same as Terminus 1, but with a lower initiation rate.
Origin Model: In the origin model, the replication initiation rate is proportional to the origin number.

Uniform Model:
The timing between initiations is uniformly distributed. This model was included to explore whether the observations were a special case of exponentially-distributed wait times.

Precise Model:
The timing between initiations is uniformly distributed. This model was included to explore whether the observations were a special case of exponentially-distributed wait times.

Simulation results
The results from the simulations are summarized in the table below. All numbers have at least precision of 1% and are in simulation units: simulation bases (sb) and simulation time (st).

Model
Parameter Parameter Observed Observed Predicted Initiation Fork dynamics Growth rate: k G Log slope/difference Log slope/difference Terminus 1

Re-scaling simulation units to compare to measured data
The simulations were performed in simulation units. We call the coarse-grained bases simulation bases (sb) and time units simulation time (st). To compare these simulations to experimental data, the prediction need to be rescaled to place the observations in biological units. To model a 5 Mb bacterial genome, with typical fork velocity of 1 kb/s, the conversion factors are: since we simulated typical fork velocities of 20 sb/st which we rescale the units to be equivalent to 1 kb/s.

Aside:
Note that due to the smaller number of sb relative to bp, the stochasticity of fork dynamics should be exaggerated in the simulations, strengthening our argument that the stochasticity does not effect the model predictions.

Movies of marker frequency dynamics approaching steady state growth
We include two examples of movies tracking the marker frequency dynamics from a single cell to steady state growth. Movies term1 mov.mp4 and term3 mov.mp4 show the dynamics of the Terminus 1 and Terminus 3 models respectively. Movie time is shown in simulation units. The dynamics emphasizes the importance of performing the calculations of the marker frequency in the steady-state limit. Not taking this limit correctly leads to anomalous results (e.g. [9]). S13

A. Comparison with GC content
One potential rate-limiting factor for replication is the distribution of GC content across the genome. However, at the current genomic resolution of 100 kb, we find no significant evidence of GC content determining the rate of replication. See Fig. 3 for the results of wild-type E. coli strain MG1655 in LB and M9.  Detailed results tabulated in SuppData.xlsx.