## Abstract

The time taken for cells to complete a round of cell division is a stochastic process controlled, in part, by intracellular factors. These factors can be inherited across cellular generations which gives rise to, often non-intuitive, correlation patterns in cell cycle timing between cells of different family relationships on lineage trees. Here, we formulate a framework of hidden inherited factors affecting the cell cycle that unifies known cell cycle control models and reveals three distinct interdivision time correlation patterns: aperiodic, alternator and oscillator. We use Bayesian inference with single-cell datasets of cell division in bacteria, mammalian and cancer cells, to identify the inheritance motifs that underlie these datasets. From our inference, we find that interdivision time correlation patterns do not identify a single cell cycle model but generally admit a broad posterior distribution of possible mechanisms. Despite this unidentifiability, we observe that the inferred patterns reveal interpretable inheritance dynamics and hidden rhythmicity of cell cycle factors. This reveals that cell cycle factors are commonly driven by circadian rhythms, but their period may differ in cancer. Our quantitative analysis thus reveals that correlation patterns are an emergent phenomenon that impact cell proliferation and these patterns may be altered in disease.

## I. INTRODUCTION

Cell proliferation, the process of repeated rounds of DNA replication and cell division, is driven by multiple cell extrinsic and intrinsic factors [1, 2]. Stochasticity in any or all of these factors therefore influences the time taken for a cell to divide, generating heterogeneity in cell cycle length, even in genetically identical populations. For example, stochastic gene expression [3] can lead to heterogeneity in cell cycle length [4–6] as these fluctuations can be propagated by concerted cellular cues [7]. These cues can exhibit reproducible stochastic patterns that are important in development, homeostasis and ultimately, for cell survival [8].

Single-cell technologies illuminate a world of cellular variation by replacing bulk-average information with single-cell distributions. A key challenge is to exploit cell-to-cell variability to identify the mechanisms of cellular regulation and responses [8, 9]. Time-lapse microscopy allows us to resolve cell dynamics such as division timing, growth and protein expression [10] (Figure 1a, left). This has led to many discoveries in cell cycle dynamics in bacteria [11–14] and mammalian cells [15–18]. Early advances included measuring the distribution of division times across single cells [19] and the correlations between cellular variables leading to cell size homeostasis [11], while more recent applications of time-lapse microscopy have captured multiple generations of proliferating cells, making lineage tracing possible [20, 21].

While single-cell distributions measure variation between cellular variables, they ignore both temporal signals and variations propagating across generations to entire lineage trees [24, 26–28]. These lineage tree correlation patterns can be robust and steady, similar to what is known in spatio-temporal pattern formation [29, 30]. Common examples of lineage tree correlation patterns concern the mother-daughter and the sister correlations that have been used to study cell size homeostasis in *E. coli* [11, 31] and other mechanisms generating correlated interdivision times such as population growth rate [19] and initiation of DNA synthesis [32].

A counter-intuitive correlation pattern presented by many cell types is the ‘cousin-mother inequality’ [27], where the interdivision times of cousin cells are more correlated than those of mother-daughter pairs. This inequality can be observed both in bacteria and mammalian cells (Figure 1b). More generally, lineage tree data gives rise to correlation patterns by comparing a single cell to any other cell on the tree (Figure 1a, right). Family relations – such as daughter, grandmother, cousin cells etc. – encode inheritance patterns, and correlations between these related cells have been used to understand the dynamics of cell populations [33, 34] (Figure 1c). Several stochastic models have been proposed to explain interdivision time correlation patterns. Most of them make prior assumptions on the underlying mechanism controlling cell division such as those focusing on cell size control [31], DNA replication [32, 35] or underlying oscillators [14]. For example, inheritance of DNA content can explain the correlation in interdivision time between sister cells in bacteria [32]. Similarly, it has been shown that a simple model with interdivision time correlations [28] cannot satisfy the ‘cousin-mother inequality’ [27], but a more complex kicked cell cycle model does [36]. It is presently unclear what information correlation patterns carry about the underlying mechanisms that generate them. This is because a unified and systematic framework to generate any desired interdivision time correlation pattern is lacking.

Here, we propose a stochastic model to investigate how cell cycle factors – which we define in this work as hidden properties that affect interdivision time – shape the lineage tree correlation patterns of cells. These could include physiological factors, such as cell size, growth rate and cell cycle checkpoints, or specific cell cycle drivers such as CDKs, mitogens and division proteins. We will only focus on data describing patterns of interdivision time in bacterial and mammalian cell types, which circumvents intricate measurements of cell volume, mass, and DNA replication. This also avoids dealing with fluorescent reporter strains that may be difficult to engineer depending on cell type. We propose a generative model of correlation patterns that involves a number of hidden cell cycle factors and reduces to common mechanistic cell cycle models for specific parameter choices. Our theory predicts three distinct lineage correlation patterns; aperiodic, alternator and oscillator. We demonstrate how the model can be used to identify these patterns using Bayesian inference in bacteria and mammalian cells. Our analysis reveals several dynamical signatures of cell cycle factors hidden in lineage tree interdivision time data.

## II. RESULTS

### A. A general inheritance matrix model provides a unified framework for lineage tree correlation patterns

Previous studies [27, 28] found that simple inheritance rules, where interdivision times are correlated from one generation to another through a single parameter, cannot explain the lineage correlation patterns seen in experimental single-cell data. To address this issue, we propose a unified framework where the interdivision time is determined by a number of *cell cycle factors* that represent hidden variables such as cell cycle phase lengths, protein levels, cell growth rate or other unknowns (Figure 1c), that each have their own inheritance pattern.

The states of the cell cycle factors is assumed to be a vector *y*_{p} = (*y*_{p,1}, *y*_{p,1}, …, *y*_{p,N})^{⊤} that determine the interdivision time of a cell with index *p* via

Inheritance from mother to daughter of the *N* cell cycle factors is described by a nonlinear stochastic Markov model on a lineage tree:
where *m* in ℕ denotes the mother cell index and 2*m* and 2*m* + 1 the daughter cell indices. and are possibly nonlinear functions that model the dependence of the interdivision time on cell cycle factors and the inheritance process. *e*_{p} = (*e*_{p,1}, *e*_{p,2}, …, *e*_{p,N})^{⊤} is a noise vector for which the pair *e*_{2m}, *e*_{2m+1} are identically distributed random vectors with covariance matrix independent of *m*. A non-zero covariance between these noise vectors can account for correlated noise of sister cells. We implicitly assume symmetric cell division such that the deterministic part of the inheritance dynamics ** g** is identical between the daughter cells. Note that we choose (1a) to be deterministic since division noise can be modelled by adding one more cell cycle factor that does not affect inheritance dynamics

**.**

*g*The general model (1) includes many known cell cycle models as a special case. For example, the interactions between cell cycle factors could model cell size control mechanisms (Appendix 1 - Section A6 1), the coordination of cell cycle phases (Appendix 1 - Section A6 3), or deterministic cues, such as periodic forcing of the cell cycle (Appendix 1 - Section A7 1), or coupling of the circadian clock to cell size control (Appendix 1 - Section A7 2).

The full model can only be solved for specific choices of *f* and ** g**, and these functions are generally unknown in inference problems. To overcome this limitation, we assume small fluctuations resulting in an approximate linear stochastic system (see Appendix 1 - Section A1 for a derivation) involving the interdivision time

The vector of cell cycle factor fluctuations *x*_{p} = (*x*_{p,1}, *x*_{p,2}, …, *x*_{p,N})^{⊤} obeys

Here, is the stationary mean interdivision time, ** θ** is the

*N × N inheritance matrix*and

*z*_{2m}and

*z*_{2m+1}are two noise vectors of length

*N*that capture the stochasticity of inheritance dynamics and differentiate the sister cells (Figure 2a). We denote the

*N × N*covariance matrices

*S*_{1}= Var(

*z*_{2m}) = Var(

*z*_{2m+1}) and

*S*_{2}= Cov(

*z*_{2m},

*z*_{2m+1}), for all

*m*in ℕ of the noise terms

**(and**

*z***) in individual cells and between sister cells, respectively. The noise terms are independent for all other family relations. The cell cycle factor fluctuations are scaled such that**

*e***= (**

*α**α*

_{1},

*α*

_{2}, …,

*α*

_{N})

^{⊤}is a binary vector of length

*N*made up of 1s and 0s depending on whether the function

*f*determining the interdivision time has dependence on a given cell cycle factor (see Appendix 1 - Section A1 for details). Under this scaling each cell cycle factor has a positive effect on the interdivision time, and hence we do not distinguish between factors with positive or negative effects on interdivision time.

When the special case of a single cell cycle factor (*N* = 1) is considered, the inheritance matrix model system reduces to a well-known model with correlated division times [28, 37–39], and we will refer to this case as simple inheritance rules (see also Appendix 1 - Section A5). In the following, we will explore the correlation patterns generated by multiple cell cycle factors.

### B. The inheritance matrix model reveals three distinct interdivision time correlation patterns

Here, we define a correlation pattern to be the correlation coefficients of pairs of cells on a lineage tree. Here we introduce a function *ρ*(*k, l*) which we call the *generalised tree correlation function*:
where *τ*_{k} and *τ*_{l} are the interdivision times of cells in the pair (*k, l*), and *s*_{τ} is the interdivision time variance. The coordinate (*k, l*) describes the distance in generations from each cell in the pair to their shared nearest common ancestor (Figure 2b,c). We have derived a closed-form formula for *ρ*(*k, l*) (Eq. (M3) in Methods A; see Appendix 1 - Section A3 for a full derivation) as a weighted sum of powers of the inheritance matrix eigenvalues *λ*:
with

We observe that the eigenvalues determine the dependence of the tree correlation function on *k* and *l*, while the noise matrices *S*_{1} and *S*_{2} determine their relative weights *w*_{ij} (see (5)).

Our theoretical analysis reveals three distinct correlation patterns that can be generated by the inheritance matrix model (further details in Methods B). These can be classified by the eigenvalues of the inheritance matrix ** θ**: (i) if the inheritance matrix exhibits real positive eigenvalues, we observe an

*aperiodic*pattern (Figure 2d); (ii) if the inheritance matrix has real eigenvalues with at least one negative eigenvalue, we observe an

*alternator*pattern (Figure 2e); and (iii) if there is a pair of complex eigenvalues we observe an

*oscillator*pattern (Figure 2f). An intuitive interpretation of the eigenvalue decomposition is that it transforms the cell cycle factors into effective factors inherited independently. Hence, the inheritance matrix is diagonal in this basis. However, the analogy is limited to the case where the inheritance matrix is symmetric and the eigenvalues are real. For simplicity, we will focus on models with two cell cycle factors and note that in higher dimensions (

*N*≥ 3), the correlation patterns involve a mixture of the three patterns discussed in detail in this section (Appendix 1 - Figure A6c,d and g,h).

To demonstrate the aperiodic correlation pattern, we utilise an inheritance matrix with positive real eigenvalues (Figure 2d). Characteristically, the modelled interdivision time correlations decay to zero as the distance to the most recent ancestor increases (Figure 2g) since the eigenvalues in (4) are bounded between 0 and 1. To look more closely at the patterns on the tree, we utilise two reductions of the generalised tree correlation function. These are the *lineage correlation function* (*ρ*(*k, l*) for *k* or *l* = 0) and the *cross-branch correlation function* (*ρ*(*k, l*) for *k* = *l*). We look at these functions for continuous *k, l* to visualise better the patterns that occur down the lineage and across the branches of the tree. The lineage correlation function gives the correlation dynamics as you go down the lineage tree, whereas the cross-branch correlation function gives the correlation dynamics as you move across neighbouring branches of the lineage tree. We observe that the interdivision time correlations decrease as we move both across generations and branches (Figure 2j).

In contrast, the alternator pattern generates oscillations with a fixed period of two generations in the lineage correlation function. The behaviour is typically observed for cell cycle factors with negative mother-daughter correlations (Appendix 1 - Section A6 1). In this case, we have at least one negative eigenvalue and thus (4) will alternate between positive and negative values for successive generations, producing the period two oscillation. We demonstrate this correlation pattern for the generalised tree correlation function (Figure 2h) using a diagonal ** θ** matrix (Figure 2e). We observe alternating correlations across generations in the lineage correlation function, and the continuous interpolation of the cross-branch correlation function (Figure 2k). Although the period is fixed to two generations, the amplitude of the correlation oscillation varies with the absolute magnitude of the eigenvalues (Methods B).

To investigate the oscillator correlation pattern, we propose a hypothetical inheritance matrix ** θ** with eigenvalues which are complex for

*D, P*≠ 0 and ,

*k*in ℤ (Figure 2f). The parameters

*P*and

*D*control the period and the respective damping of an underlying oscillator, i.e., the limit

*D*→1 leads to an undamped oscillation and

*D*→ 0 corresponds to an overdamped oscillation (see Methods C for details). Correspondingly, the graph of the generalised tree correlation function (Figure 2i) shows clear oscillations across generations. These correlation oscillations are also evident in the lineage correlation function but are absent in the cross-branch correlation function (Figure 2l). However, oscillations are possible in the cross branch correlation function for other choices of

**with complex eigenvalues (see model fits in Section II D and Methods B). In summary, the qualitative behaviour of the interdivision time correlation patterns can be studied using the eigenvalue decomposition of the inheritance matrix**

*θ***.**

*θ*### C. The cousin-mother inequality is not required to generate complex correlation patterns

Our analysis shows that of the three specified patterns, only the oscillator pattern cannot arise from simple inheritance rules. This is because it requires at least two inherited cell cycle factors (*N* ≥ 2) for the inheritance matrix to possess complex eigenvalues. We therefore asked whether the oscillator pattern is necessary for the cousin-mother inequality to be satisfied. We find that this is not the case, but instead, all three correlation patterns can be compatible with the cousin-mother inequality if *N* ≥ 2. To demonstrate this, we choose three specific two-dimensional inheritance matrices ** θ** that produce the required eigenvalue structure (Figure 2d-f). We then use these matrices with our analytical solution for the

*generalised tree correlation function*(Methods A) to map the regions where the cousin-mother inequality can be satisfied (Figure 2m-o). Interestingly, we find that oscillations can arise even in parameter regions that violate the cousin-mother inequality (Figure 2o). We conclude that both the cousin-mother inequality and the oscillator pattern are sufficient but not necessary conditions to rule out simple inheritance rules.

To understand which datasets can be explained by simple inheritance rules, we fit the one-dimensional model (*N* = 1) to six publicly available lineage tree datasets (Appendix 1 - Table A1) using Bayesian methods (Methods D). These datasets were chosen as they each had a sufficient number of cells for correlation analysis and covered a broad range of cell types. We found that the model fit is poor for the datasets that display the cousin-mother inequality, which is the case for cyanobacteria, clock-deleted cyanobacteria, neuroblastoma and human colorectal cancer cells (Appendix 1 - Figure A1a-f). Despite not obeying the cousin-mother inequality, the fit is also poor for mouse embryonic fibroblasts (Appendix 1 - Figure A1f) as the median inferred correlation lies outside the 95% confidence intervals for both the grandmother and cousin correlations which are included in the model fit, and the confidence intervals for the data vs the credible intervals from the inference show minimal overlap (Appendix 1 - Figure A2f). Another inequality may be violated in this dataset that cannot be explained using the one-dimensional model, suggesting that the absence of the cousin-mother inequality cannot rule out more complex division rules. The only cell type that has a good fit for the one-dimensional model is mycobacteria (Appendix 1 - Figure A1c). We thus conclude that the majority of the datasets must be described by higher dimensional inheritance dynamics of multiple cell cycle factors.

### D. The two-dimensional inheritance matrix model fits interdivision time correlation patterns from a range of cell types

We asked whether the correlation patterns are better described by a two-dimensional inheritance matrix model. Bayesian inference (Methods D) produced a good model fit for all six datasets (Figure 3a-f) for the two-factor inheritance matrix model, within relatively narrow error bars of mother, grandmother, sister and cousin correlations (Appendix 1 - Table A1). The credible intervals from the Bayesian inference matched the confidence intervals of correlations used for fitting (Appendix 1 - Figure A2). We quantified the quality of our fits using the Akaike information criterion (AIC) (Methods D, (M11)) for each dataset and compared these to the one-dimensional model (Appendix 1 - Table A1). The AIC estimates the goodness of fit with a penalty for model complexity allowing us to select the simplest model that explains the data. The AIC values indicate that the inheritance matrix model with two cell cycle factors provides the simplest fit for all cell types used here, except for the mycobacteria data where simple inheritance rules provided an equally good fit with a significant reduction in the number of model parameters. We expected the AIC to select the two dimensional model where the cousin-mother inequality was satisfied such as in cyanobacteria, clock-deleted cyanobacteria, neuroblastoma and human colorectal cancer cells. The match with the two-factor inheritance matrix model in fibroblasts was less obvious.

Crucially, we find that the model has a good predictive capacity for correlations further down the lineage tree. For each pattern, we show several samples from the conditional posterior distribution (solid and shaded lines) to illustrate fits of the lineage correlation and cross-branch correlation function (Figure 3a-f). For all datasets except neuroblastoma, the curves also intercept the great-grandmother and great-great-grandmother correlations that were not used for fitting (Figure 3a-d,f), and bootstrapped confidence intervals from the data overlapped with the credible intervals obtained from Bayesian inference (Appendix 1 - Figure A2). We then asked which correlation patterns underlie the data. To assess this, we calculated the eigenvalues of each posterior sample of the inheritance matrix to categorise the aperiodic, alternator and oscillator patterns (Figure 3a-f, bar charts). We found that in every dataset, the dominant correlation pattern was identifiable with probabilities well above 50%, except for mycobacteria (Figure 3c) that was better described by simple inheritance rules (Appendix 1 - Figure A1c).

Cyanobacteria, (Figure 3a), human colorectal cancer (Figure 3d) and mouse embryonic fibroblasts (Figure 3f) display a dominant oscillator pattern, but we see that their lineage correlation functions exhibit widely different periodicities. For example, the posterior lineage correlation for cyanobacteria displays a higher frequency oscillation than those in human colorectal cancer cells and fibroblasts. Clock-deleted cyanobacteria (Figure 3b) and mycobacteria (Figure 3c) display a dominant alternator pattern which could be induced by strong sister correlations. We see that clock-deleted cyanobacteria (Figure 3b) has a 100% alternator pattern in contrast to the 100% oscillator pattern seen for wild type cyanobacteria, suggesting that the deletion of the clock gene has completely transformed the correlation pattern and has abolished the underlying oscillation. Neuroblastoma (Figure 3e) displays a dominant aperiodic pattern. The predictive capacity for this cell type is weaker than for the other datasets, which we assume is due to the tight confidence interval in the correlations. Despite this discrepancy, we find that the inheritance matrix model produces excellent fits and has good predictive capacity for all other cell types studied in this work.

### E. Bayesian inference reveals that individual inheritance parameters are not identifiable

We next ask which mechanisms are responsible for generating the observed correlation patterns. The Bayesian inference used for model fitting (Methods D) samples parameters using a MCMC Gibbs sampler. The Gibbs sampler can be thought of as a random walk in parameter space that settles around parameter regions with high likelihood. We found that the explorations of the Gibbs sampler did not settle in a particular parameter subspace but meandered off to explore vast areas of the parameter space without improving the likelihood values (Appendix 1 - Figure A3a,b). Such behaviour is expected when model parameters are not identifiable and the posterior distribution of parameters cannot be efficiently sampled [40, 41].

To provide further evidence of unidentifiablity, we obtained four histograms of a single parameter of the inheritance matrix for different initialisations. The four distributions are very different (Figure 4a), showing that the random walk does not settle to a stationary distribution. We further observe that the mean squared displacement increases without bound (Figure 4b) showing that the sampling does not settle in a particular subset of the parameter space. In contrast to the individual parameters, the sampled posterior distribution of the eigenvalues is consistent across the averages (Figure 4c) and their mean squared displacement converges rapidly (Figure 4d). We note that unidentifiability arises for the inheritance matrix model with multiple cell cycle factors and does not feature for simple inheritance rules (Appendix 1 - Section A5). This ultimately demonstrates that the interdivision time correlation patterns do not identify a single set of inheritance parameters, but rather need to be described by a distribution of inheritance mechanisms.

### F. The inheritance matrix model predicts the hidden dynamical correlations of cell cycle factors

Clock-deleted cyanobacteria and neuroblastoma both satisfy the cousin-mother inequality (Figure 1b), which indicates that at least two cell cycle factors are responsible for the corresponding correlation patterns. The eigenvalues of the inheritance matrix concentrate in different regions of the admissible parameter space (Figure 4e), suggesting the correlation patterns that generate the cousin-mother inequality are distinct. For the clock-deleted cyanobacteria dataset, we found that all posterior samples were consistent with an alternator correlation pattern, while most posterior samples presented aperiodic correlation patterns in neuroblastoma (Figure 3b,e bar charts).

We hypothesised that different inheritance models generate these patterns. To verify this hypothesis and since we cannot identify the cell cycle factors directly, we computed the mother-daughter correlations between the two hidden cell cycle factors. Since the order of factors is interchangeable, we only distinguish between mother-daughter correlations between the same (corr(*x*_{m,i}, *x*_{2m,i}) and corr(*x*_{m,i}, *x*_{2m+1,i}) for *i* = 1, 2) and alternate factors (corr(*x*_{m,i}, *x*_{2m+1,j}) and corr(*x*_{m,i}, *x*_{2m,j}) for *i* ≠ *j* = 1, 2). The resulting posterior distributions revealed distinct correlation patterns of cell cycle factor correlations for clock-deleted cyanobacteria and neuroblastoma (Figure 4f). For clock-deleted cyanobacteria, we predict that at least one factor has a negative mother-daughter correlation while its cross-correlation with the other factor must be positive; while the correlations are of opposite sign for neuroblastoma (Figure 4f). We sketch influence diagrams that summarise these relationships between factors (Figure 4g,h). Thus, the different interdivision time correlation patterns observed for clock-deleted cyanobacteria and neuroblastoma stem from distinct hidden correlation patterns of cell cycle factor fluctuations.

### G. The inheritance matrix model reveals biological rhythms underlying the cell cycle

We observe that the lineage correlation functions of cyanobacteria, human colorectal cancer cells, and fibroblasts exhibit vastly different correlation oscillation periods (Figure 3). Next, we are interested to see whether the oscillations seen in these datasets are compatible with biological oscillators known to affect cell cycle control.

#### 1. Correlation oscillations and underlying rhythms can exhibit vastly different periods

The period of the correlation oscillation is related to the location of the eigenvalues of the inheritance matrix on the complex plane. We consider an eigenvalue *λ* of the inheritance matrix. In terms of the mean interdivision time , the correlation period *T*_{0} is:
and the inequality means that the period *T*_{0} is always greater than twice the mean interdivision time . More generally, there is an oscillation period associated with each eigenmode of the inheritance matrix, but the period is infinite for real eigenvalues, and thus only complex eigenvalues generate correlation oscillations. This inequality follows from (6) using |Arg(*λ*)| ≤ *π*. However, known biological oscillators that influence cell cycle control often have periods *less* than twice the mean interdivision time, such as stress response regulators [42, 43] and gene expression oscillations [44–46]. How can relatively slow observed correlation oscillations be compatible with much faster biological oscillators underlying the cell cycle?

The resolution to this issue is that the period of the correlation oscillation does not always match the frequency of the underlying oscillator. Instead there are a number of possible oscillator periods *T*_{n} compatible with the correlation oscillation period *T*_{0} (Appendix 1 - Section A4) given by:
for *n* in ℤ. This phenomenon, that the same correlation oscillation can be explained by multiple underlying oscillators, can be understood using the intuition in Figure 5a.

#### 2. Circadian oscillations in cyanobacteria and fibroblasts support coupling of the circadian clock and the cell cycle

Cyanobacteria and fibroblasts both exhibit correlation patterns consistent with an oscillator underlying cell divisions (Figure 3e, bar chart). We observe that the posterior distribution of the eigenvalues is confined to a region with negative real parts for cyanobacteria and positive real parts for fibroblasts (Figure 5c). Using these distributions we estimate the median period of the correlation oscillations (using Equation 6) to be 41.7h for cyanobacteria and 144.3h for fibroblasts (Figure 5d). We wondered whether the stark difference in the periods of the correlation oscillations indicates a different underlying rhythm. Conversely, we found this was not the case, but both correlation patterns were consistent with an approximate circadian rhythm. The posterior of the oscillator period *T*_{−1}, which is closest to the period of correlation oscillation *T*_{0}, suggests a median period of 24.6h for cyanobacteria and a median period of 23.8h for fibroblasts (Figure 5e). We also validated the inference result using simulated data (Appendix 1 - Figure A9). This finding supports a strong coupling of circadian rhythms to the cell cycle, as reported previously for both cyanobacteria [13, 47] and fibroblasts [48–50]. Notably, we see that clock-deleted cyanobacteria displays 100% alternator pattern (Figure 3b) and therefore has a lineage tree correlation pattern that cannot be described by an approximate 24h oscillator, in contrast to wild type cyanobacteria.

#### 3. Bimodal posterior distribution of underlying oscillations in human colon cancer

Finally, we turn to the analysis of cancer cell data. The dominant correlation pattern was oscillatory (78% posterior probability, Figure 3d, bar chart). The posterior distribution of complex eigenvalues for the oscillator pattern has support in a large region of the parameter space. It has two distribution modes depending on whether the eigenvalues have positive or negative real parts (Figure 5f). Similarly, the posterior of the correlation oscillation period is bimodal, too (Figure 5g), which means that two competing oscillator patterns are compatible with the data.

To disentangle these alternative hypotheses, we cluster the posterior samples by the real part of the eigenvalues. We label cluster A for negative real parts and cluster B for positive ones. The correlation periods of the individual clusters do not provide us with immediate clues about the underlying oscillators. Cluster A has a median correlation oscillation period of 51.2h while cluster B has a median period of 100.6h (Figure 5g). We therefore inspected the oscillator periods *T*_{−1} for each cluster, which are closest to the observed correlation period (Figure 5h). The median of the predicted oscillator period of cluster A has an oscillator period *T*_{−1} of 24.1h, which hints at a circadian oscillator underlying the cell cycle in agreement with a previous model [23]. However, only about 33% of posterior samples with complex eigenvalues were assigned to this cluster. The majority of posterior samples, cluster B, had a different predicted period with a median of 19.6h (Figure 5h). A possible explanation is that the circadian period is shortened in cancer cells.

A strength of the Bayesian framework is that it allows us to express our confidence in this prediction. We find that our analysis is not conclusive about the correlation pattern as 78% of posterior samples showed an oscillator pattern. As a result, about 52% of all the posterior samples favour a 19.6h oscillator and 26% for the 24.1h oscillator, matching approximately circadian rhythm. 16% of the samples demonstrate alternator correlation patterns, and the remaining 6% samples are aperiodic (compare bar charts in Figures 3d and 5g). We therefore ask whether these competing models make predictions that translate into testable hypotheses. We found that the oscillator correlation pattern predicts a negative grandmother correlation while the alternator pattern predicts a positive grandmother correlation (Figure 5i,j). Thus measuring the grandmother correlation with higher precision, for example, via increasing sample size, would tighten the confidence intervals of measured correlations (Figure 5i), and improve our ability to narrow down the true pattern. On the contrary, predicting the great-grandmother correlation allows us to distinguish between the 19.6h and 24.1h rhythms (Figure 5h). Posterior samples in cluster A predicted a positive interdivision time correlation between a cell and its great-grandmother, while cluster B predicted a negative correlation (Figure 5k,l). While the great-grandmother correlation could not be estimated using the present data, deeper lineage trees could be used to discriminate the period of the biological oscillator and help reveal whether the circadian period is altered in cancer cells, or not. In summary, our theory helps to predict the hidden periodicities of biological oscillators from lineage tree interdivision time data.

## III. DISCUSSION

We propose a Bayesian approach to predict hidden cell cycle factor dynamics from interdivision time correlation patterns. Our underlying model fits the lineage tree data for a range of bacterial and mammalian cell types and allows us to classify different correlation patterns. Our inference demonstrates that these patterns are identifiable, but the individual inheritance parameters are not. This finding suggests that interdivision time correlations alone are insufficient to gain mechanistic insights into cell cycle control mechanisms. The identified correlation patterns, however, reveal the dynamics of the underlying cell cycle factors.

We focused on a data-driven approach without any prior assumptions of the division mechanism, allowing the interdivision time data to speak for itself. Other studies used a model similar to the inheritance matrix model proposed here, and linked latent factors to the interplay between cell cycle progression and growth [24]. Autoregressive models have also been used in bacteria to discriminate between different mechanisms of cell size control [14]. Additionally, they have been used to combine growth and cell cycle reporters to explain interdivision time dynamics in fibroblasts [25]. In principle, the inheritance matrix model can be used to model the inheritance dynamics of any factor affecting the interdivision time of a cell. In fact, it comprises many mechanistic models as special cases, such as those based on DNA replication, cell size control or cell cycle phases (Appendix 1 - Section A6 and Appendix 1 - Figure A5 and A6). In future work, it will be useful to improve the identifiability of the model parameters. This could be accomplished either through including knowledge of inheritance mechanisms through prior distributions, or by including additional data on measured cell cycle factor dynamics – such as cell cycle phases, cell size, protein expression etc. – in the inference.

Another limitation of our inference is that we computed the interdivision time variance *s*_{τ} in (M2) of the model assuming that trees have equal number of generations in each branch. The advantage of this estimator is that it does not assume any particular noise distribution but this may lead to a statistical bias compared to the sample variance of tree-structured data with branches of varying length [19, 24, 27, 51–53]. However, the approximation does not change the identified correlation patterns and the conclusion of this work, since any variance bias can be compensated by multiplying the noise matrices (*S*_{1,2} in Eqs. (5)) with a constant, and, for the data analysed, the interdivision time variance estimators cannot be distinguished within the 95% confidence intervals (Appendix 1 - Table A3). Developing a theory correcting for such biases in lineage tree data will be the subject of future work.

An important result of the present analysis is that lineage tree correlation patterns of very different cell types – cyanobacteria, mouse embryonic fibroblasts and human colorectal cancer – can be explained through an underlying circadian oscillator coupled to cell division. While the coupling between the cell cycle and circadian clock is well established both in cyanobacteria and mouse embryonic fibroblasts, it is less well studied in cancer [54, 55]. Our method robustly reconstructs the circadian rhythms from the interdivision time correlation patterns despite the lack of the cousin-mother inequality for fibroblasts, demonstrating the cousin-mother inequality is not required for complex correlation patterns (Section II C). It is interesting to observe the differences in the oscillatory correlation patterns in these organisms. They are characterised by complex eigenvalues with negative real parts in cyanobacteria, but positive real parts in fibroblasts (Figure 5c), resulting in opposite mother-daughter correlations for these datasets (Figure 3a,f).

It would be interesting to explore what mechanisms underlie these different patterns. While the circadian clock in fibroblasts relies on transcriptional mechanisms [49, 56, 57], the origin of the clock is non-transcriptional in cyanobacteria [58–60]. The negative mother-daughter correlation in cyanobacteria likely stems from size control mechanisms that are modulated by the circadian clock [13]. However, the mechanisms that generate positive mother-daughter correlations in fibroblasts are still to be explored. Interestingly, in human colorectal cancer, two oscillatory correlation patterns divide the posterior distributions into two distinct clusters with positive and negative mother-daughter correlations. If the circadian clock was to generate a positive mother-daughter correlation, as it does in fibroblasts which have a structurally related clock, the period corresponds to a 20h rhythm. This finding thus suggests that the circadian period is altered in cancerous cells. Indeed, several studies report similar periods of 18h and 20h for gene expressions in the human colorectal cancer core-clock [61, 62].

Our theory predicts that an oscillator’s period does not always match the period of the observed correlation oscillations. We describe a lower bound on the correlation period that is reminiscent of the Nyquist-Shannon sampling theorem. This theorem describes temporal aliasing in digital audio processing, where a high frequency signal produces low frequency oscillations when sampled at a frequency less than twice the sampling frequency. Similarly, spatial aliasing is observed in digital image processing as a moire pattern. In our analogy, the high frequency signal is a biological oscillator that couples to cell division and is sampled at the cell division frequency (Figure 5a). Our result thus extends the Nyquist-Shannon sampling theorem to lineage trees. Our finding has fundamental implications for the reconstruction of oscillator periods from interdivision time data, revealing that there exists a number of oscillators that can all explain the same correlation pattern.

Here, we concentrated on the oscillator periods *T*_{−1} that are closest to the correlation oscillation periods *T*_{0}. In principle, we cannot exclude that oscillators with shorter physiological periods are contributing to the observed lineage tree correlation patterns. For example, HES1 expression oscillates with a period of around 5h in human colon cancer cells [44, 45]. The stress response regulators NF-*κ*B and p53, which are critical for tumour development, oscillate with periods of approximately 100min and 5h respectively [42, 43]. The posterior distributions for periods in this region are not well separated (Appendix 1 - Figure A7c), which makes it challenging to identify factors that oscillate significantly faster than the cell cycle using interdivision time data. It is, however, unknown whether such hypothetical factors couple to cell division specifically in a manner to induce oscillatory interdivision time correlation patterns.

Going forward, there is a need to go beyond the Nyquist-Shannon limit and develop methods that have increased sensitivity to discriminate a broader range of oscillator periods. One way to circumvent the limitation would be to employ fluorescent reporters of the circadian clock that could be correlated directly with cell division timing. Another way, would be to provide parallel readouts of the underlying rhythm through events that sub-sample the cell cycle, such as DNA replication, or the timing of individual cell cycle phases. Not only would we be able to look at the correlation in interdivision time between cells on a lineage tree, but we would also be able to analyse the correlations between individual phases and family members, to reveal specific phase control mechanisms. Our main findings result from the the inheritance matrix model with two cell cycle factors, as this was sufficient to explain the correlation patterns of the chosen data. In principle, increasing the number of interacting cell cycle factors can lead to more complex composite patterns that involve combinations of the three patterns discussed in this paper, such as the alternator-oscillator (Appendix 1 - Figure A6c, d), aperiodic-oscillator (Appendix 1 - Figure A6g, h), or birhythmic correlation patterns. Such composite patterns could also arise as the result of nonlinear fluctuations that, within our framework, can be described by adding complexes of cell cycle factors to the inheritance matrix model (Appendix 1 - Section A2). The presence of such complexes induces higher-order harmonics in the correlation oscillations, similar to those observed in the cyanobacterial and mammalian circadian clock [12, 63], and detecting such complexes could provide an alternative route to increase the sensitivity of our inference method.

In summary, our findings highlight the predictive power of Bayesian inference on single-cell data and how it can be leveraged to draw testable hypotheses for the design of future experiments. This was exemplified for human colorectal cancer cells, where various patterns were compatible with the data, something that non-probabilistic approaches cannot accomplish as they fit only a single correlation pattern. In the future, it will be crucial to understand why different cell types have evolved specific lineage correlation patterns and how these patterns affect cell proliferation and disease. It would be interesting to understand whether specific correlation patterns give or reveal some fitness advantage and whether we can use them to predict cell survival. We anticipate that identifying hidden cell cycle factors and their rhythmicity using non-invasive methods such as interdivision time measurements will be instrumental in answering these questions and may benefit other fields where cell proliferation plays a pivotal role.

## CODE AVAILABILITY

Code available at https://github.com/pthomaslab/Lineage-tree-correlation-pattern-inference.

## METHODS

### A. Analytical solution of the inheritance matrix model

From (2) and **E**[*z*_{p}] = 0, for all *p* in ℕ, we see that the vector of cell cycle factors has zero mean **E**[*x*_{p}] = **0**. Its *N* × *N* covariance matrix **Σ** = Cov(*x*_{p}, *x*_{p}) satisfies a discrete-time Lyapunov equation:

From the solution of (M1), we compute the variance of the interdivision time
and the generalised tree correlation function *ρ*(*k, l*) (see Appendix 1 - Section A3 for a detailed derivation) given by:
where with

To ensure that the lineage tree correlation pattern is stationary, we require SR(** θ**)

*<*1 where SR(

**) = max(**

*θ**λ*

_{1},

*λ*

_{2}, …,

*λ*

_{N}) is the spectral radius of

**. This also ensures that the solutions to (M1);**

*θ***Σ,**

*S*_{1}and the function (M3) are unique and independent of the initial conditions.

### B. Analysis of tree correlation patterns

The patterns of the generalised tree correlation function can be characterised through its eigendecomposition. The general decomposition proceeds through finding the matrix of eigenvectors *U* of ** θ** such that
is the diagonal matrix of eigenvalues. Defining

*Ŝ*_{1,2}=

*U*

*S*_{1,2}

*U*

^{⊤}and , the solution to (M1) is given by

This result can then be used to find an explicit expression for the generalised tree correlation function: where

(M7) can be rewritten as a superposition of patterns (4) with weights given by (5).

The pattern of the tree correlation function is thus governed by the eigenvalues of the inheritance matrix ** θ**: (i) if one eigenvalue, say

*λ*

_{1}, is positive then the factor contributing to lineage correlation decays monotonically. The factor contributing to the cross-branch correlation decays twice as fast; (ii) if there is a negative eigenvalue, the factor alternates between negative and positive values with an envelope of |

*λ*

_{1}|

^{k}, while the corresponding contribution to the cross-branch correlation decays monotonically with rate as |

*λ*

_{1}|

^{2k}. Finally, if we have a pair of complex eigenvalues then the factors contributing to the lineage correlation function display damped oscillations with frequency Ω and envelope

*D*

^{k}, while the factor and the factor oscillate with frequency 2Ω.

### C. Determining the period of correlation oscillations from the eigenvalues

We consider the case where the inheritance matrix ** θ** has a pair of complex conjugate eigenvalues

*λ*

^{±}=

*De*

^{±i2π/P}. The lineage correlation function then oscillates whenever

*D*≠ {0, 1} and in ℤ. The period of correlation oscillations per generation is given by where Arg(

*λ*) in (−

*π, π*] is the argument of the eigenvalue and ln(·) is the complex logarithm. The former is the angle made between the line joining the origin and the eigenvalue

*λ*on the complex plane with the real axis. This means that if and only if

*P >*2. Otherwise,

*T*

_{0}is calculated in terms of

*P*by equation (M9) (Appendix 1 - Figure A8).

### D. Data analysis & Bayesian inference of the inheritance matrix model

We determined all pairs of cells in a lineage tree, sorted them by family relations (*k, l*) and calculated the sample correlation coefficient of interdivision times (3). To maximise the number of samples used to calculate these correlations, an individual cell can appear in more than one pair. For example, if a cell had two cousins, it would be counted in two separate cousin pairs in the cousin-cousin correlation coefficient calculation. For training, we focus on the sample statistics with *C* = {(1, 0), (2, 0), (1, 1), (2, 2)} comprised of the interdivision time sample variance and four interdivision time sample correlation coefficients given by the mother-daughter, grandmother-granddaughter, sister-sister and cousin-cousin relations (Figure 2a). Note that *ŝ*_{τ} is computed across all interdivision times used to calculate the correlation coefficients in each dataset. Errors are estimated using bootstrapping by re-sampling cell pairs with replacement 10,000 times. The resulting variances and correlation coefficients are given in Appendix 1 - Table A1.

The vector of inferred model parameters for the two-dimensional model is **Θ** = (*θ, S*_{1}), where we fix ** α** = (1, 1)

^{⊤}and

*S*_{2}=

**0**for simplicity. A different choice of

**did not affect our results (Appendix 1 - Figure A4). Since**

*α*

*S*_{1}is symmetric, it consists of the

*N*variances and

*N*(

*N*− 1)

*/*2 correlation coefficients between the components of

**. Thus for**

*z**N*= 2 the inheritance matrix model has seven free parameters to be estimated. We assumed that the log-likelihood for these statistics is the sum of square errors: which is equivalent to assuming that the sample variance and correlation coefficients are normally distributed for large sample sizes. We calculate the interdivision time variance

*s*

_{τ}and the generalised tree correlation function

*ρ*(

*k, l*) from (M2) and (M3). Note that (M2) is the interdivision time variance from a tree where all lineages have the same number of generations, which approximates the variance across all cells in the observed trees (Appendix 1 - Table A3). For simplicity, we neglected possible correlations between the sample statistics in and used bootstrapped estimates for the standard deviation of the sample statistics and (Appendix 1 - Table A1). Note that the likelihood is independent of the mean since it is irrelevant for the correlation pattern. We assumed a flat prior with support restricted to SR(

**)**

*θ**<*1 and

*S*_{1}positive semi-definite to guarantee the existence of a stationary correlation pattern.

The numerical implementation uses the adaptive Gibbs-sampler implemented in the Julia library `Mamba.jl` [64]. For each dataset, we sample 11 million parameter sets which include a burn-in transient of 1 million samples. These samples are removed before analysis of the output.

For model comparison we use the AIC [65] given by
where *k* is the number of model parameters and ln is the maximum value of the log-likelihood function given by (M10). For *S*_{2} ≠ **0**, the inheritance matrix model has *k* = *d*(1 + 2*d*) parameters where *d* is the number of cell cycle factors in the model. For *S*_{2} = **0** the number of parameters reduces to .

## ACKNOWLEDGMENTS

We thank Bruno Martins, Dimitris Volteras and Paul Piho for their comments on the manuscript. This work has been supported by a scholarship to FAH provided by the EPSRC Centre for Mathematics of Precision Health-care (EP/N014529/1) and MRC core funding to the London Institute of Medical Sciences (MC-A658-5TY60). ARB is funded by a CRUK Career Development Fellowship (C63833/A25729). PT is funded by a UKRI Future Leaders Fellowship (MR/T018429/1).

## APPENDIX

### A1. Small noise approximation

Here, we will derive the inheritance matrix model given by equations (2) in the main text. We assume that the fluctuations in the hidden cell factor dynamics are small, which leads to a computationally efficient approximation.

Firstly, in the limit of zero fluctuations, all cells must be identical. Hence, all cell cycle factors are equal to their means ** µ** = (

*µ*

_{1},

*µ*

_{2}, …,

*µ*

_{N})

^{⊤}=

**E**(

*y*_{p}) and similarly for the noise vectors

**= (**

*β**β*

_{1},

*β*

_{2}, …,

*β*

_{N})

^{⊤}=

**E**(

*e*_{2m}) =

**E**(

*e*_{2m+1}) in Eq. (1b). From (1a) and (1b) we then find that which can be efficiently solved for and

**using standard numerical methods.**

*µ*Secondly, we can decompose the interdivision time and the cell cycle factor vector into their respective mean and fluctuating components by

Denoting the index of the present cell by *p* and the one of its mother by *m*, we can expand *f* and ** g** around the limit of zero fluctuations and we obtain to leading order
where

Using this expansion and (A2) in (1a) and (1b) of the main text we arrive at where we have set and giving the fluctuations around the mean for the noise vectors. Comparing (A7) with (A1) and collecting terms to leading order, we obtain the linearised system:

Next, we define the diagonal scaling matrix **Γ** with non-zero elements as
for *i* = 1, 2, …, *N*. Using the rescaled noise sources , we find the rescaled inheritance matrix ** θ** and

**-coefficients**

*α*The rescaled cell cycle factor fluctuations follow (2) of the main text and we reach rescaled variance-covariance matrices *S*_{1} and *S*_{2} as follows

### A2. Beyond the small noise approximation: cell cycle factor complexes account for nonlinear fluctuations

Here we analyse the effect of nonlinearity on the interdivision time correlation patterns. For simplicity we consider a single cell cycle factor and follow the same lines as in section A1, Eq. (A9), while including terms of order *x*^{2}. This leads to the expansion interdivision time and factor fluctuations
where is the Jacobian of the cell cycle factor dynamics, as before, and is the Hessian. From the second equation we obtain

Defining and , combining Eqs. (A14) and (A16), and rescaling variables as in (A12) and (A13), we find the extended inheritance matrix model where

Here and *β* = 1 if , and analogously, and *β* = 0 if . Hence, the interdivision time correlation patterns with small to moderate fluctuations can be described through an extended linear system (A17) that includes nonlinear terms . These additional terms can be interpreted as cell cycle factors forming binary complexes. The presence of these complexes increases the number of cell cycle factors and extends the eigenvalue spectrum of the effective inheritance matrix Θ by *θ*^{2}. Hence, the presence of complexes leads to mixed correlation patterns. For example, for a single cell cycle factor, the eigenvalues of **Θ** are (*θ, θ*^{2}), which corresponds to an alternator pattern for *θ <* 0. More generally, we may expect that nonlinear patterns can be described through mixtures of aperiodic, alternator, and oscillatory patterns. For example, the complex eigenvalue spectrum of an oscillator pattern (*e*^{±i2π/P}) will include powers of complex eigenvalues (*e*^{±i4π/P}) resulting in harmonics of the fundamental correlation oscillation frequency similar to higher order harmonics observed in single-cell time-series of the circadian clock [12, 63].

### A3. Derivation of the generalised tree correlation function

In this section we derive an analytical expression for the *generalised tree correlation function*. This gives the Pearson correlation coefficient in interdivision time for any pair of related cells. We start with the equation for the Pearson correlation coefficient, and from there derive a formula for the interdivision time covariance using the known properties of the cell cycle factors ** x**. From this, we can derive the general formula for the correlation coefficient between any related cell pair.

We associate a cell pair with an index (*k, l*) which measures the distance to the nearest common ancestor as given in Section II B (Figure 2a). From this, we denote their interdivision time fluctuations as and respectively. The Pearson correlation coefficient between these fluctuations is given by
where *s*_{τ′} is the variance of the interdivision time fluctuations.

The interdivision time fluctuations and are calculated from the vector of rescaled cell cycle factor fluctuations *x*_{k} as given in Section II A, giving the equations

Substituting (A20) into (A19), we obtain a formula for *ρ*(*k, l*) in terms of the cell cycle factor fluctuations ** x** and the

**coefficients alone**

*α*Since *x*_{k} and *x*_{l} are identically distributed in steady state, we have that Var(*x*_{k}) = Var(*x*_{l}) = Cov(** x, x**) =

**Σ**as specified in Methods A. We can write

*ρ*(

*k, l*) now as where

*α*^{⊤}

**Σ**gives the variance of the interdivision time fluctuations

*α**τ*

^{′}.

Using the model equation (2) we can write the formula for the ** x** vectors for the two cells in the cell pair (

*k, l*) as where cells

*k*and

*l*have mother cells

*k*− 1 and

*l*−1 respectively. The two cells are sisters if and only if their subscripts are both equal to 1, meaning they share a mother cell. Using recurrence of the model, we can write these equations as where

*x*_{0}is the vector of cell cycle factors for the most recent common ancestor for a cell pair given by (

*k, l*).

All that remains is to derive a function for Cov(*x*_{k}, *x*_{l}) which we will denote ** ω**(

*k, l*). We calculate

**(**

*ω**k, l*) as follows using expectations: where and are the mean vectors of

*x*_{k}and

*x*_{l}respectively which are both equal to

**0**, giving

To find in terms of the model parameters, we substitute in equations (A26) for *x*_{k} and *x*_{l} and get

The noise term fluctuations ** z** are only correlated if the cells are sisters, which only occurs when the distance we have (

*k, l*) = (1, 1). So for the summations above, we exclude all terms except where

*i*=

*j*= 1. Doing this and expanding we get where

*δ*

_{k≥1}and

*δ*

_{l≥1}are given in Equation M4. We also have that

The matrix Cov(*x*_{0}, *x*_{0}) is equivalent to the covariance matrix for any ** x**, giving Cov(

*x*_{0},

*x*_{0}) =

**Σ**. This gives

Similarly we have,

As **E**(** z**) =

**0**, and Cov(

*z*_{2m},

*z*_{2m+1}) =

*S*_{2}as stated in Methods A, we obtain,

Equation (A31) therefore becomes:

Substituting (A37) back into (A29) we get
giving us the final equation for ** ω**(

*k, l*). Using the above equation in (A23), we obtain Eq. (M3) of the Methods.

### A4. Derivation of the formula for the oscillator periods, *T*_{n}

The period of correlation oscillation as observed in the lineage correlation functions is given by (6). We can reveal the underlying oscillator periods by shifting the inferred period *T*_{0} to obtain a smaller period *T*_{n}. This means that shorter periods would produce the same inferred period in the lineage correlation function when sampled at the original frequency of once per cell cycle (Figure 5a).

The oscillator periods are obtained by adding or subtracting multiples of 2*π* to the argument of the eigenvalue which results in the new argument being in the same position in the complex plane. The oscillator period *T*_{n} with shift *n* in ℤ is therefore given by

Taking (A39) and substituting in (6), we obtain *T*_{n} in terms of *T*_{0} as (7).

### A5. Solution of the tree correlation function and parameter identifiability for simple inheritance rules

We consider the limiting case of a single cell cycle factor (*N* = 1) resulting in simple inheritance rules. This situation could model a growth factor that can either increase or decrease interdivision times of cells depending on the monotonicity of *f* in Eq. 1. The analytical solution (M7) of the inheritance matrix model then reduces to
where . First, we observe that, given single cell measurements of the mother-daughter correlation coefficient *ρ*(0, 1), the daughter-daughter correlation coefficient *ρ*(1, 1) and the variance *s*_{τ}, the parameters *θ, S*_{1} and *S*_{2} are uniquely identifiable:

Thus measurements of the variance, lineage- and cross-branch correlations fully determine the parameters. The tree correlation function is, however, independent of *f*, which means that the interdivision time correlation pattern carries no information whether the growth factor increases or decreases growth. The reason for this indifference is that cell cycle factors are identified only by their fluctuation pattern, i.e., for each cell cycle factor whose fluctuations increase interdivision time *x*, we could define another cell cycle factor fluctuation that decrease interdivision time −*x*. We accounted for this unidentifiability issue trough a similarity transformation using the scaling matrix **Γ** in (A12) and (A13) that transforms all cell cycle factor fluctuations to increase interdivision time. Of course, this unidentifiabiliy could be removed through explicitly measuring the involved cell cycle factors.

### A6. Mapping mechanistic cell cycle and cell size control models to the inheritance matrix model

To further investigate the output of the inheritance matrix model, we propose multiple models of known cell cycle control mechanisms, and map them to our inheritance matrix model framework. All cell size models assume symmetric division.

#### 1. Cell size control model with correlated growth

Considering the influence of cell size control on interdivision time [11, 14, 31], here we propose a cell size control model where we have some mother to daughter inheritance of both the added size Δ and the growth rate *κ* (Appendix 1 - Figure A5g). The model equations are given by:

The noise terms *ξ* and *ϕ* are independent between sisters such that Cov(*ξ*_{2m}, *ξ*_{2m+1}) = Cov(*ϕ*_{2m}, *ϕ*_{2m+1}) = 0. Assuming exponential growth the formula for the interdivision time is given by
where *p* represents the index of a given cell. Taking the vector of cell cycle factors for the mother cell to be *y*_{m} = (*y*_{m,1}, *y*_{m,2}, *y*_{m,3})^{⊤} = (Δ_{m}, *s*_{b,m}, *κ*_{m})^{⊤} and comparing (1a, 1b) with (A42) and (A43), we obtain

Then we can calculate the means from (A1),

Then using (A44) in (A5) and (A12), we find

Assuming and using (A13), we find

The ** θ** matrix has eigenvalues which give an aperiodic pattern for

*a, b, c >*0 and an alternator pattern otherwise (Appendix 1 - Figure A5i,j). These same patterns arise for all real eigenvalues in the 3D model in the same was as in the two-dimensional system. Only a single negative eigenvalue is needed for the lineage correlation function to display an alternator pattern. We are restricted to

*a*in (−2, 2) and

*b, c*in (−1, 1) to ensure SR(

**)**

*θ**<*1. The cousin-mother inequality for this system is too complex to be looked at analytically, so we use numerical methods to visualise the parameter region in which the cousin-mother inequality can be satisfied (Appendix 1 - Figure A5h).

For the case of the aperiodic pattern, we observe positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1 - Figure A5k). In contrast, for an alternator pattern, the mother daughter same factor correlation is negative, but the alternate factor correlations vary between positive and negative values (Appendix 1 - Figure A5l).

#### 2. Simple cell size control model

For the special case of *b* = Var[*ϕ*] = 0 and *c* = 1, the model reduces to a simple cell size control model with fluctuating added size (Appendix 1 - Figure A5a). The inheritance matrix ** θ** then has eigenvalues . Thus depending on the choice of

*a*, this model can produce both an alternator and aperiodic pattern (Appendix 1 - Figure A5c,d). In this case, using (M3) the cousin-mother inequality becomes which cannot be satisfied for |

*a*|

*<*2, which implies . Hence the cousin-mother inequality cannot be satisfied for any reasonable choice of

*a*in this simple model (Appendix 1 - Figure A5b).

For an aperiodic pattern, this simplified model exhibits positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1 - Figure A5e). In the alternator case, this model exhibits negative same factor mother-daughter correlation and also negative alternate factor mother-daughter correlation (Appendix 1 - Figure A5f).

#### 3. Abstract cell cycle phase model

We propose a model of two abstract cell cycle phases that have no integrated dependence on cell size (Appendix 1 - Figure A5m). The model equations are given by

The noise terms *ξ* and *ϕ* are independent between sister cells such that Cov(*ξ*_{2m}, *ξ*_{2m+1}) = Cov(*ϕ*_{2m}, *ϕ*_{2m+1}) = 0. In this case we have that the two factors make up the length of the cell cycle, so we simply have *τ*_{p} = *y*_{p,1} + *y*_{p,2}.

Therefore using (1a), and (1b) we obtain

We calculate the means from (A1),

Then using (A50) in (A5) and (A12), we find

As the noise terms are independent between sisters we have and using (A13) we obtain

The inheritance matrix ** θ** has eigenvalues

**= (**

*λ**a, c*) which gives an aperiodic pattern for

*a*and

*c >*0 and an alternator pattern otherwise (Appendix 1 - Figure A5o,p).

The analytical form of the cousin-mother inequality is complex so we use numerical methods to visualise the parameter region in which the cousin-mother inequality can be satisfied (Appendix 1 - Figure A5n).

We calculate individual factor mother-daughter correlations and find that for an aperiodic pattern, the model exhibits a range of correlation patterns (Appendix 1 - Figure A5q). However, for an alternator pattern, we obtain positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1 - Figure A5r)

### A7. Models of circadian-clock-driven correlation patterns

#### 1. Kicked cell cycle model

Here we analyse the kicked cell cycle model [36] with our framework (Appendix 1 - Figure A6a). We will propose an inheritance matrix and then show that it reduces to the kicked cell cycle model for certain parameter choices. Consider the 3 *×* 3 inheritance matrix ** θ** and noise vector

*z*_{n}given by for

*n*∈ {2

*m*, 2

*m*+ 1}. We have that

*S*_{1}is given by Cov(

*z*_{2m},

*z*_{2m}), however we assume that the noise terms

*ξ*

_{n}are independent between sisters such that

*S*_{2}= Cov(

*z*_{2m},

*z*_{2m+1}) =

**0**. Assuming

**= (1, 0, 0)**

*α*^{⊤}, the interdivision times are governed by

The oscillator is represented by the cell cycle factors that evolve according to with oscillator inheritance matrix

We can solve (A56) along an ancestral lineage of *n* generations
where is the state of the ancestral cell. Substituting (A58) into (A55) and assuming , i.e., the cell cycle oscillator is deterministic, the interdivision time of the mother determines the interdivision time of the daughter cell via
where and which represent initial conditions. Assuming approximates the time at birth for *n* ≫ 1, this leads to

Comparing (A60) to Eq. (1) and (2) in [36], we see that our IMM agrees with the kicked cell cycle model when *D* = 1, , and large *n*.

#### 2. Circadian-clock-driven cell size control model

Here we analyse the model of cell size control driven by the circadian clock proposed in Martins et al. [13] within the inheritance matrix model framework (Appendix 1 - Figure A6e). The division rate, in Eq. (1) of [13] is given by
where *s* is the cell size with *s*_{b} being the size at birth. *G*(*t*) is a function of time *t* that couples the size control to the circadian clock, and *S*(*s, s*_{b}) is the division rate per unit volume of the cell. Assuming cells grow exponentially with growth rate *α*, we have
and the division size follows
where *s*_{b} is the size at birth and *t*_{b} is the time at birth.

To map these to our inheritance matrix model, we observe that samples from (A63) follow
where is a drift term and is a zero-mean noise term that depends both on time of day and birth size. Note that both and are periodic functions of time at birth *t*_{b,m}. Since the latter is not explicitly modelled in our framework, here, we replace it with the state *x*_{0,m} of the circadian clock, such that the update equations in (A64) now appear as

To gain intuition into the shape of the unknown functions *g* and *h*, we linearise the equations around some basal level *x* = *δ* of a clock-less mutant, which gives

For simplicity assume and that the clock-less mutant follows a linear cell size control model with gamma-distributed size increments *ϕ*_{A,m} ∼ Gamma with mean Δ as in [13]. These assumptions lead to the relations,

Using *s*_{b,2m} = *s*_{b,2m+1} = *s*_{d,m}*/*2, we can obtain the linearised inheritance matrix model equations for the circadian cell size control model (Appendix 1 - Figure A6e):
where now *ξ*_{m} is the added size and *x*_{0,m} is the output *x*_{0,m} = *x*_{1,m} + *x*_{2,m} of a circadian oscillator governed by
for cell generation *n*, where ** ϕ** = (

*ϕ*

_{1},

*ϕ*

_{2})

^{⊤}are noise terms added to the elements of

*x*_{0}and is some complex eigenvalued 2

*×*2 inheritance matrix given by

Following this, we see that the circadian clock is incorporated into this cell size control system in the same way as the kicked cell cycle model outlined in the previous section (Appendix 1 - Section A7 1). Using (A62) we can write the interdivision time of a cell with index *p* as

Then taking the vector of cell cycle factors for the mother cell to be *y*_{m} = (*y*_{m,1}, *y*_{m,2}, *y*_{m,3}, *y*_{m,4})^{⊤} = (*s*_{b,m}, *x*_{1,m}, *x*_{2,m}, *ξ*_{m})^{⊤} and comparing (1a, 1b) with (A68) and (A71) we obtain

Computing the means using (A1) we get

Then using (A72) in (A5) and (A12), we can solve for the means

Then taking and using (A13), we obtain the following for *S*_{1}:
where cor_{ij} indicates the correlation between a pair of noise terms *ϕ*_{i} and *ϕ*_{j}, and for *i, j* ∈ *{*1, 2, *A}*.

#### 3. Model comparison

We notice that the kicked cell cycle model has three cell cycle factors, while the circadian-clock-driven cell size control model has four cell cycle factors. The eigenvalues of the inheritance matrix ** θ** determining the correlation patterns are
for the kicked cell cycle model and
and for the cell size control model. In both models, either the complex pair of eigenvalues produces oscillatory behaviour. The overall correlation patterns are of mixed type, depending on the parameters

*β*and

*a*.

To compare the models quantitatively, we match their mother-daughter interdivision time correlation coefficient in the absence of clock coupling. For the kicked cell cycle model, we notice that *ρ*_{(1,0)} = *β* in the absence of clock coupling. The cell size control model reduces to the model in Appendix 1 - Section A6 1 in the absence of clock coupling, which satisfies . Since realistic cell size control mechanisms [11, 68–70] (*a* ∈[0, 2)) ranging from sizers (*a* = 0) to adders (*a* = 1) to timers (*a* = 2) imply *β* ≤0, we find that the kicked cell cycle obeys a mixed correlation pattern of the alternator/oscillator type while the cell size control model obeys a aperiodic/oscillator pattern.

Focusing on the common adder size control (*a* = 1), we find that the regions where the cousin-mother inequality is satisfied is remarkably similar in both models when *β* is matched accordingly (Appendix 1 - Figure A6b and f). The lineage correlation function (red line) oscillates but the cross-branch correlation functions (blue line) alternates for the kicked cell cycle (Appendix 1 - Figure A6c-d) but not for the cell size control model (Appendix 1 - Figure A6g-h).

### A8. Inference validation using simulated data

To validate the inference results discussed in the main text we simulate interdivision time data using the maximum posterior parameters from the inference on two of the original live imaging datasets, and compare the output and model fit to our original inference.

We take the maximum posterior parameter sets from the original inference on two datasets (Appendix 1 - Table A2), cyanobacteria and mouse embryonic fibroblasts, and produce simulated interdivision time lineage data in *MATLAB* using custom scripts and Random Trees [67]. We chose to look at these two datasets in order to analyse the posterior distribution of the inferred underlying period *T*_{−1} to compare to the approximately 24h results seen in the main text.

From this simulated data, the correlation coefficients are calculated using the methods outlined in Methods D, and then we look at the model inference on these new, simulated correlations, to compare to the original. These simulations produce correlation patterns that reproduce the experimentally measured correlations (comparing Appendix 1 - Figure A9a-b with Figure 3a,f).

The posterior distribution of the simulated patterns are the same for the cyanobacteria, exhibiting an 100% oscillator pattern (Appendix 1 - Figure A9a), matching the fitting to the original dataset (Figure 3a). Mouse embryonic fibroblasts (Appendix 1 - Figure A9b) loses some of it’s original 100% oscillator pattern (Figure 3f) in favour of an alternator pattern. However, an oscillator pattern is still dominant.

We see that for cyanobacteria (Appendix 1 - Figure A9c) and mouse embryonic fibroblasts (Appendix 1 - Figure A9d), the posterior distribution for the inference on the simulated data for the correlation function oscillatory period, *T*_{−1} (Appendix 1 - Figure A9c,d), exhibits a large overlap with the original posterior distribution discussed in Section II G 2 (Figure 5e). The difference in the median for these posterior distributions is 0.42h for mouse embryonic fibroblasts (Appendix 1 - Figure A9d) and just 0.11h for cyanobacteria (Appendix 1 - Figure A9c). This result validates our analysis of these posterior distributions showing that the period that we reconstruct from the simulated correlation patterns is consistent with the original data.

### APPENDIX FIGURES

### APPENDIX TABLES

## Footnotes

Revised after review

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].
- [17].
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].
- [70].↵