Abstract
The time taken for cells to complete a round of cell division is a stochastic process controlled, in part, by intracellular factors. These factors can be inherited across cellular generations which gives rise to, often non-intuitive, correlation patterns in cell cycle timing between cells of different family relationships on lineage trees. Here, we formulate a framework of hidden inherited factors affecting the cell cycle that unifies known cell cycle control models and reveals three distinct interdivision time correlation patterns: aperiodic, alternator and oscillator. We use Bayesian inference with single-cell datasets of cell division in bacteria, mammalian and cancer cells, to identify the inheritance motifs that underlie these datasets. From our inference, we find that interdivision time correlation patterns do not identify a single cell cycle model but generally admit a broad posterior distribution of possible mechanisms. Despite this unidentifiability, we observe that the inferred patterns reveal interpretable inheritance dynamics and hidden rhythmicity of cell cycle factors. This reveals that cell cycle factors are commonly driven by circadian rhythms, but their period may differ in cancer. Our quantitative analysis thus reveals that correlation patterns are an emergent phenomenon that impact cell proliferation and these patterns may be altered in disease.
I. INTRODUCTION
Cell proliferation, the process of repeated rounds of DNA replication and cell division, is driven by multiple cell extrinsic and intrinsic factors [1, 2]. Stochasticity in any or all of these factors therefore influences the time taken for a cell to divide, generating heterogeneity in cell cycle length, even in genetically identical populations. For example, stochastic gene expression [3] can lead to heterogeneity in cell cycle length [4–6] as these fluctuations can be propagated by concerted cellular cues [7]. These cues can exhibit reproducible stochastic patterns that are important in development, homeostasis and ultimately, for cell survival [8].
Single-cell technologies illuminate a world of cellular variation by replacing bulk-average information with single-cell distributions. A key challenge is to exploit cell-to-cell variability to identify the mechanisms of cellular regulation and responses [8, 9]. Time-lapse microscopy allows us to resolve cell dynamics such as division timing, growth and protein expression [10] (Figure 1a, left). This has led to many discoveries in cell cycle dynamics in bacteria [11–14] and mammalian cells [15–18]. Early advances included measuring the distribution of division times across single cells [19] and the correlations between cellular variables leading to cell size homeostasis [11], while more recent applications of time-lapse microscopy have captured multiple generations of proliferating cells, making lineage tracing possible [20, 21].
(a) Time lapse observations. Cartoon demonstrating how time-lapse microscopy allows single cells to be tracked temporally as they go through the cell cycle to division. Multiple different factors affect the rate at which cells progress through the cell cycle from birth to subsequent division. Interdivision time data. Example lineage tree structure with possible ‘family relations’ of a cell between which correlations in interdivision time can be calculated. (b) Lineage correlation pattern. Plot of mother-daughter interdivision time correlation against cousin-cousin interdivision time correlation for the six publicly available datasets used in this work (Appendix 1 - Table A1, [13, 22–25]). The shaded red area indicates the region where the cousin-mother inequality is satisfied. (c) Identifying hidden cell cycle factors. Schematic showing the model motivation and process. We produce a generative model that describes the inheritance of multiple hidden ‘cell cycle factors’ that affect the interdivision time. The model is fitted to lineage tree data of interdivision time, and we analyse the model output to reveal the possible biological factors that affect the interdivision time correlation patterns of cells.
While single-cell distributions measure variation between cellular variables, they ignore both temporal signals and variations propagating across generations to entire lineage trees [24, 26–28]. These lineage tree correlation patterns can be robust and steady, similar to what is known in spatio-temporal pattern formation [29, 30]. Common examples of lineage tree correlation patterns concern the mother-daughter and the sister correlations that have been used to study cell size homeostasis in E. coli [11, 31] and other mechanisms generating correlated interdivision times such as population growth rate [19] and initiation of DNA synthesis [32].
A counter-intuitive correlation pattern presented by many cell types is the ‘cousin-mother inequality’ [27], where the interdivision times of cousin cells are more correlated than those of mother-daughter pairs. This inequality can be observed both in bacteria and mammalian cells (Figure 1b). More generally, lineage tree data gives rise to correlation patterns by comparing a single cell to any other cell on the tree (Figure 1a, right). Family relations – such as daughter, grandmother, cousin cells etc. – encode inheritance patterns, and correlations between these related cells have been used to understand the dynamics of cell populations [33, 34] (Figure 1c). Several stochastic models have been proposed to explain interdivision time correlation patterns. Most of them make prior assumptions on the underlying mechanism controlling cell division such as those focusing on cell size control [31], DNA replication [32, 35] or underlying oscillators [14]. For example, inheritance of DNA content can explain the correlation in interdivision time between sister cells in bacteria [32]. Similarly, it has been shown that a simple model with interdivision time correlations [28] cannot satisfy the ‘cousin-mother inequality’ [27], but a more complex kicked cell cycle model does [36]. It is presently unclear what information correlation patterns carry about the underlying mechanisms that generate them. This is because a unified and systematic framework to generate any desired interdivision time correlation pattern is lacking.
Here, we propose a stochastic model to investigate how cell cycle factors – which we define in this work as hidden properties that affect interdivision time – shape the lineage tree correlation patterns of cells. These could include physiological factors, such as cell size, growth rate and cell cycle checkpoints, or specific cell cycle drivers such as CDKs, mitogens and division proteins. We will only focus on data describing patterns of interdivision time in bacterial and mammalian cell types, which circumvents intricate measurements of cell volume, mass, and DNA replication. This also avoids dealing with fluorescent reporter strains that may be difficult to engineer depending on cell type. We propose a generative model of correlation patterns that involves a number of hidden cell cycle factors and reduces to common mechanistic cell cycle models for specific parameter choices. Our theory predicts three distinct lineage correlation patterns; aperiodic, alternator and oscillator. We demonstrate how the model can be used to identify these patterns using Bayesian inference in bacteria and mammalian cells. Our analysis reveals several dynamical signatures of cell cycle factors hidden in lineage tree interdivision time data.
II. RESULTS
A. A general inheritance matrix model provides a unified framework for lineage tree correlation patterns
Previous studies [27, 28] found that simple inheritance rules, where interdivision times are correlated from one generation to another through a single parameter, cannot explain the lineage correlation patterns seen in experimental single-cell data. To address this issue, we propose a unified framework where the interdivision time is determined by a number of cell cycle factors that represent hidden variables such as cell cycle phase lengths, protein levels, cell growth rate or other unknowns (Figure 1c), that each have their own inheritance pattern.
The states of the cell cycle factors is assumed to be a vector yp = (yp,1, yp,1, …, yp,N)⊤ that determine the interdivision time of a cell with index p via
Inheritance from mother to daughter of the N cell cycle factors is described by a nonlinear stochastic Markov model on a lineage tree:
where m in ℕ denotes the mother cell index and 2m and 2m + 1 the daughter cell indices.
and
are possibly nonlinear functions that model the dependence of the interdivision time on cell cycle factors and the inheritance process. ep = (ep,1, ep,2, …, ep,N)⊤ is a noise vector for which the pair e2m, e2m+1 are identically distributed random vectors with covariance matrix independent of m. A non-zero covariance between these noise vectors can account for correlated noise of sister cells. We implicitly assume symmetric cell division such that the deterministic part of the inheritance dynamics g is identical between the daughter cells. Note that we choose (1a) to be deterministic since division noise can be modelled by adding one more cell cycle factor that does not affect inheritance dynamics g.
The general model (1) includes many known cell cycle models as a special case. For example, the interactions between cell cycle factors could model cell size control mechanisms (Appendix 1 - Section A6 1), the coordination of cell cycle phases (Appendix 1 - Section A6 3), or deterministic cues, such as periodic forcing of the cell cycle (Appendix 1 - Section A7 1), or coupling of the circadian clock to cell size control (Appendix 1 - Section A7 2).
The full model can only be solved for specific choices of f and g, and these functions are generally unknown in inference problems. To overcome this limitation, we assume small fluctuations resulting in an approximate linear stochastic system (see Appendix 1 - Section A1 for a derivation) involving the interdivision time
The vector of cell cycle factor fluctuations xp = (xp,1, xp,2, …, xp,N)⊤ obeys
Here, is the stationary mean interdivision time, θ is the N × N inheritance matrix and z2m and z2m+1 are two noise vectors of length N that capture the stochasticity of inheritance dynamics and differentiate the sister cells (Figure 2a). We denote the N × N covariance matrices S1 = Var(z2m) = Var(z2m+1) and S2 = Cov(z2m, z2m+1), for all m in ℕ of the noise terms z (and e) in individual cells and between sister cells, respectively. The noise terms are independent for all other family relations. The cell cycle factor fluctuations are scaled such that α = (α1, α2, …, αN)⊤ is a binary vector of length N made up of 1s and 0s depending on whether the function f determining the interdivision time has dependence on a given cell cycle factor (see Appendix 1 - Section A1 for details). Under this scaling each cell cycle factor has a positive effect on the interdivision time, and hence we do not distinguish between factors with positive or negative effects on interdivision time.
(a) Diagram illustrating the inheritance matrix model with two cell cycle factors which affect the interdivision time of a cell. Each factor in the mother exerts an influence on a factor in the daughter through the inheritance matrix θ. (b,c) Schematics showing how the coordinate (k, l) introduced in Section II B is determined. This coordinate describes the distance to the most recent common ancestor for chosen pair of cells. Examples shown are (b) sister pairs with (k, l) = (1, 1), and (c), aunt-niece pairs with (k, l) = (2, 1). (d-o) Panels demonstrating the three correlation patterns that arise from the inheritance matrix model with two cell cycle factors. (d-f) Example inheritance matrices θ that produce the desired patterns: (d) aperiodic, (e) alternator and (f) oscillator correlation patterns. (g-i) Three-dimensional plot of the generalised tree correlation function (Equation M3) demonstrating each of the three patterns. On each plot we highlight the lineage generation correlation function (k = 0 or l = 0) (red line) and the cross-branch generation correlation function (k = l) (blue line). The shading of the 3D plot indicates the correlation coefficient at that point on the surface. (j-l) The lineage and cross-branch generation correlation functions plotted individually, showing the different dynamics for each pattern. (m-o) Region plots showing parameter values where the relevant pattern is obtained (orange) and where the cousin-mother inequality is satisfied (blue) for the θ matrices given in panels (d-f). White bands on (o) indicate where which results in real eigenvalues and therefore does not produce an oscillator pattern. Within the parameter region that both produces the desired pattern and also satisfied the cousin-mother inequality, we choose a parameter set (red cross) which is used for the corresponding plots in the panels above. In all panels we fix α = (1, 1)T and the noise vector z to have covariance equal to the identity matrix.
When the special case of a single cell cycle factor (N = 1) is considered, the inheritance matrix model system reduces to a well-known model with correlated division times [28, 37–39], and we will refer to this case as simple inheritance rules (see also Appendix 1 - Section A5). In the following, we will explore the correlation patterns generated by multiple cell cycle factors.
B. The inheritance matrix model reveals three distinct interdivision time correlation patterns
Here, we define a correlation pattern to be the correlation coefficients of pairs of cells on a lineage tree. Here we introduce a function ρ(k, l) which we call the generalised tree correlation function:
where τk and τl are the interdivision times of cells in the pair (k, l), and sτ is the interdivision time variance. The coordinate (k, l) describes the distance in generations from each cell in the pair to their shared nearest common ancestor (Figure 2b,c). We have derived a closed-form formula for ρ(k, l) (Eq. (M3) in Methods A; see Appendix 1 - Section A3 for a full derivation) as a weighted sum of powers of the inheritance matrix eigenvalues λ:
with
We observe that the eigenvalues determine the dependence of the tree correlation function on k and l, while the noise matrices S1 and S2 determine their relative weights wij (see (5)).
Our theoretical analysis reveals three distinct correlation patterns that can be generated by the inheritance matrix model (further details in Methods B). These can be classified by the eigenvalues of the inheritance matrix θ: (i) if the inheritance matrix exhibits real positive eigenvalues, we observe an aperiodic pattern (Figure 2d); (ii) if the inheritance matrix has real eigenvalues with at least one negative eigenvalue, we observe an alternator pattern (Figure 2e); and (iii) if there is a pair of complex eigenvalues we observe an oscillator pattern (Figure 2f). An intuitive interpretation of the eigenvalue decomposition is that it transforms the cell cycle factors into effective factors inherited independently. Hence, the inheritance matrix is diagonal in this basis. However, the analogy is limited to the case where the inheritance matrix is symmetric and the eigenvalues are real. For simplicity, we will focus on models with two cell cycle factors and note that in higher dimensions (N ≥ 3), the correlation patterns involve a mixture of the three patterns discussed in detail in this section (Appendix 1 - Figure A6c,d and g,h).
To demonstrate the aperiodic correlation pattern, we utilise an inheritance matrix with positive real eigenvalues (Figure 2d). Characteristically, the modelled interdivision time correlations decay to zero as the distance to the most recent ancestor increases (Figure 2g) since the eigenvalues in (4) are bounded between 0 and 1. To look more closely at the patterns on the tree, we utilise two reductions of the generalised tree correlation function. These are the lineage correlation function (ρ(k, l) for k or l = 0) and the cross-branch correlation function (ρ(k, l) for k = l). We look at these functions for continuous k, l to visualise better the patterns that occur down the lineage and across the branches of the tree. The lineage correlation function gives the correlation dynamics as you go down the lineage tree, whereas the cross-branch correlation function gives the correlation dynamics as you move across neighbouring branches of the lineage tree. We observe that the interdivision time correlations decrease as we move both across generations and branches (Figure 2j).
In contrast, the alternator pattern generates oscillations with a fixed period of two generations in the lineage correlation function. The behaviour is typically observed for cell cycle factors with negative mother-daughter correlations (Appendix 1 - Section A6 1). In this case, we have at least one negative eigenvalue and thus (4) will alternate between positive and negative values for successive generations, producing the period two oscillation. We demonstrate this correlation pattern for the generalised tree correlation function (Figure 2h) using a diagonal θ matrix (Figure 2e). We observe alternating correlations across generations in the lineage correlation function, and the continuous interpolation of the cross-branch correlation function (Figure 2k). Although the period is fixed to two generations, the amplitude of the correlation oscillation varies with the absolute magnitude of the eigenvalues (Methods B).
To investigate the oscillator correlation pattern, we propose a hypothetical inheritance matrix θ with eigenvalues which are complex for D, P ≠ 0 and
, k in ℤ (Figure 2f). The parameters P and D control the period and the respective damping of an underlying oscillator, i.e., the limit D →1 leads to an undamped oscillation and D → 0 corresponds to an overdamped oscillation (see Methods C for details). Correspondingly, the graph of the generalised tree correlation function (Figure 2i) shows clear oscillations across generations. These correlation oscillations are also evident in the lineage correlation function but are absent in the cross-branch correlation function (Figure 2l). However, oscillations are possible in the cross branch correlation function for other choices of θ with complex eigenvalues (see model fits in Section II D and Methods B). In summary, the qualitative behaviour of the interdivision time correlation patterns can be studied using the eigenvalue decomposition of the inheritance matrix θ.
C. The cousin-mother inequality is not required to generate complex correlation patterns
Our analysis shows that of the three specified patterns, only the oscillator pattern cannot arise from simple inheritance rules. This is because it requires at least two inherited cell cycle factors (N ≥ 2) for the inheritance matrix to possess complex eigenvalues. We therefore asked whether the oscillator pattern is necessary for the cousin-mother inequality to be satisfied. We find that this is not the case, but instead, all three correlation patterns can be compatible with the cousin-mother inequality if N ≥ 2. To demonstrate this, we choose three specific two-dimensional inheritance matrices θ that produce the required eigenvalue structure (Figure 2d-f). We then use these matrices with our analytical solution for the generalised tree correlation function (Methods A) to map the regions where the cousin-mother inequality can be satisfied (Figure 2m-o). Interestingly, we find that oscillations can arise even in parameter regions that violate the cousin-mother inequality (Figure 2o). We conclude that both the cousin-mother inequality and the oscillator pattern are sufficient but not necessary conditions to rule out simple inheritance rules.
To understand which datasets can be explained by simple inheritance rules, we fit the one-dimensional model (N = 1) to six publicly available lineage tree datasets (Appendix 1 - Table A1) using Bayesian methods (Methods D). These datasets were chosen as they each had a sufficient number of cells for correlation analysis and covered a broad range of cell types. We found that the model fit is poor for the datasets that display the cousin-mother inequality, which is the case for cyanobacteria, clock-deleted cyanobacteria, neuroblastoma and human colorectal cancer cells (Appendix 1 - Figure A1a-f). Despite not obeying the cousin-mother inequality, the fit is also poor for mouse embryonic fibroblasts (Appendix 1 - Figure A1f) as the median inferred correlation lies outside the 95% confidence intervals for both the grandmother and cousin correlations which are included in the model fit, and the confidence intervals for the data vs the credible intervals from the inference show minimal overlap (Appendix 1 - Figure A2f). Another inequality may be violated in this dataset that cannot be explained using the one-dimensional model, suggesting that the absence of the cousin-mother inequality cannot rule out more complex division rules. The only cell type that has a good fit for the one-dimensional model is mycobacteria (Appendix 1 - Figure A1c). We thus conclude that the majority of the datasets must be described by higher dimensional inheritance dynamics of multiple cell cycle factors.
D. The two-dimensional inheritance matrix model fits interdivision time correlation patterns from a range of cell types
We asked whether the correlation patterns are better described by a two-dimensional inheritance matrix model. Bayesian inference (Methods D) produced a good model fit for all six datasets (Figure 3a-f) for the two-factor inheritance matrix model, within relatively narrow error bars of mother, grandmother, sister and cousin correlations (Appendix 1 - Table A1). The credible intervals from the Bayesian inference matched the confidence intervals of correlations used for fitting (Appendix 1 - Figure A2). We quantified the quality of our fits using the Akaike information criterion (AIC) (Methods D, (M11)) for each dataset and compared these to the one-dimensional model (Appendix 1 - Table A1). The AIC estimates the goodness of fit with a penalty for model complexity allowing us to select the simplest model that explains the data. The AIC values indicate that the inheritance matrix model with two cell cycle factors provides the simplest fit for all cell types used here, except for the mycobacteria data where simple inheritance rules provided an equally good fit with a significant reduction in the number of model parameters. We expected the AIC to select the two dimensional model where the cousin-mother inequality was satisfied such as in cyanobacteria, clock-deleted cyanobacteria, neuroblastoma and human colorectal cancer cells. The match with the two-factor inheritance matrix model in fibroblasts was less obvious.
Posterior correlation functions based on fitting to mother-daughter, grandmother-granddaughter, sister-sister and cousin-cousin correlations for three bacterial (left) and three mammalian (right) datasets: (a) cyanobacteria, (b) clock-deleted cyanobacteria, (c) mycobacteria, (d) human colorectal cancer, (e) neuroblastoma, and (f) mouse embryonic fibroblasts. Pearson correlation coefficients (white circles) and 95% bootstrapped confidence intervals (error bars) obtained through re-sampling with replacement of the original data (10,000 re-samples). Posterior distribution samples were clustered into aperiodic, alternator, and oscillator patterns (bar charts). We show multiple representative samples (solid and shaded lines) drawn from the posterior distribution (cf. Appendix 1 - Figure A2 without clustering). Where correlations appear missing, this is in cases where the lineage trees in the data were not deep enough for the correlations to be calculated. Only lineage and cross branch generations 1 and 2 were used in model fitting. Here all panels assume α = (1, 1)⊤, but taking α = (1, 0)⊤ produces similar results (Appendix 1 - Figure A4).
Crucially, we find that the model has a good predictive capacity for correlations further down the lineage tree. For each pattern, we show several samples from the conditional posterior distribution (solid and shaded lines) to illustrate fits of the lineage correlation and cross-branch correlation function (Figure 3a-f). For all datasets except neuroblastoma, the curves also intercept the great-grandmother and great-great-grandmother correlations that were not used for fitting (Figure 3a-d,f), and bootstrapped confidence intervals from the data overlapped with the credible intervals obtained from Bayesian inference (Appendix 1 - Figure A2). We then asked which correlation patterns underlie the data. To assess this, we calculated the eigenvalues of each posterior sample of the inheritance matrix to categorise the aperiodic, alternator and oscillator patterns (Figure 3a-f, bar charts). We found that in every dataset, the dominant correlation pattern was identifiable with probabilities well above 50%, except for mycobacteria (Figure 3c) that was better described by simple inheritance rules (Appendix 1 - Figure A1c).
Cyanobacteria, (Figure 3a), human colorectal cancer (Figure 3d) and mouse embryonic fibroblasts (Figure 3f) display a dominant oscillator pattern, but we see that their lineage correlation functions exhibit widely different periodicities. For example, the posterior lineage correlation for cyanobacteria displays a higher frequency oscillation than those in human colorectal cancer cells and fibroblasts. Clock-deleted cyanobacteria (Figure 3b) and mycobacteria (Figure 3c) display a dominant alternator pattern which could be induced by strong sister correlations. We see that clock-deleted cyanobacteria (Figure 3b) has a 100% alternator pattern in contrast to the 100% oscillator pattern seen for wild type cyanobacteria, suggesting that the deletion of the clock gene has completely transformed the correlation pattern and has abolished the underlying oscillation. Neuroblastoma (Figure 3e) displays a dominant aperiodic pattern. The predictive capacity for this cell type is weaker than for the other datasets, which we assume is due to the tight confidence interval in the correlations. Despite this discrepancy, we find that the inheritance matrix model produces excellent fits and has good predictive capacity for all other cell types studied in this work.
E. Bayesian inference reveals that individual inheritance parameters are not identifiable
We next ask which mechanisms are responsible for generating the observed correlation patterns. The Bayesian inference used for model fitting (Methods D) samples parameters using a MCMC Gibbs sampler. The Gibbs sampler can be thought of as a random walk in parameter space that settles around parameter regions with high likelihood. We found that the explorations of the Gibbs sampler did not settle in a particular parameter subspace but meandered off to explore vast areas of the parameter space without improving the likelihood values (Appendix 1 - Figure A3a,b). Such behaviour is expected when model parameters are not identifiable and the posterior distribution of parameters cannot be efficiently sampled [40, 41].
To provide further evidence of unidentifiablity, we obtained four histograms of a single parameter of the inheritance matrix for different initialisations. The four distributions are very different (Figure 4a), showing that the random walk does not settle to a stationary distribution. We further observe that the mean squared displacement increases without bound (Figure 4b) showing that the sampling does not settle in a particular subset of the parameter space. In contrast to the individual parameters, the sampled posterior distribution of the eigenvalues is consistent across the averages (Figure 4c) and their mean squared displacement converges rapidly (Figure 4d). We note that unidentifiability arises for the inheritance matrix model with multiple cell cycle factors and does not feature for simple inheritance rules (Appendix 1 - Section A5). This ultimately demonstrates that the interdivision time correlation patterns do not identify a single set of inheritance parameters, but rather need to be described by a distribution of inheritance mechanisms.
(a) Posterior distribution histograms for θ11 depend on the realisations of a Gibbs sampler and do not settle to a stationary distribution. (b) A log-log plot of mean squared displacement for the four θ variables that make up the inheritance matrix θ. The mean squared displacement for all four parameters increases linearly, meaning the sampling does not settle in any particular region of parameter space. (c) Sampled posterior distribution histograms for the eigenvalue λ1 for each realisation. The histograms are almost identical across the four averages, showing the distribution has converged. (d) Mean squared displacement for the eigenvalues of the inheritance matrix θ settles to a finite value. Plots (a) - (d) utilise sampling from the inference for the clock-deleted cyanobacteria dataset. (e) Density histogram of the real eigenvalue pairs for clock-deleted cyanobacteria (pink) and neuroblastoma (brown) demonstrating where the eigenvalues lie in the aperiodic (yellow) and alternator (red) regions. (f) Density histogram of same-factor against alternate-factor mother-daughter correlation for clock-deleted cyanobacteria (pink) and neuroblastoma (brown). We take a minimum threshold of 0.3 for the probability density to remove irrelevant samples. (g-h) Influence diagrams for same factor vs alternate factor correlations for (g) clock-deleted cyanobacteria and (h) neuroblastoma.
F. The inheritance matrix model predicts the hidden dynamical correlations of cell cycle factors
Clock-deleted cyanobacteria and neuroblastoma both satisfy the cousin-mother inequality (Figure 1b), which indicates that at least two cell cycle factors are responsible for the corresponding correlation patterns. The eigenvalues of the inheritance matrix concentrate in different regions of the admissible parameter space (Figure 4e), suggesting the correlation patterns that generate the cousin-mother inequality are distinct. For the clock-deleted cyanobacteria dataset, we found that all posterior samples were consistent with an alternator correlation pattern, while most posterior samples presented aperiodic correlation patterns in neuroblastoma (Figure 3b,e bar charts).
We hypothesised that different inheritance models generate these patterns. To verify this hypothesis and since we cannot identify the cell cycle factors directly, we computed the mother-daughter correlations between the two hidden cell cycle factors. Since the order of factors is interchangeable, we only distinguish between mother-daughter correlations between the same (corr(xm,i, x2m,i) and corr(xm,i, x2m+1,i) for i = 1, 2) and alternate factors (corr(xm,i, x2m+1,j) and corr(xm,i, x2m,j) for i ≠ j = 1, 2). The resulting posterior distributions revealed distinct correlation patterns of cell cycle factor correlations for clock-deleted cyanobacteria and neuroblastoma (Figure 4f). For clock-deleted cyanobacteria, we predict that at least one factor has a negative mother-daughter correlation while its cross-correlation with the other factor must be positive; while the correlations are of opposite sign for neuroblastoma (Figure 4f). We sketch influence diagrams that summarise these relationships between factors (Figure 4g,h). Thus, the different interdivision time correlation patterns observed for clock-deleted cyanobacteria and neuroblastoma stem from distinct hidden correlation patterns of cell cycle factor fluctuations.
G. The inheritance matrix model reveals biological rhythms underlying the cell cycle
We observe that the lineage correlation functions of cyanobacteria, human colorectal cancer cells, and fibroblasts exhibit vastly different correlation oscillation periods (Figure 3). Next, we are interested to see whether the oscillations seen in these datasets are compatible with biological oscillators known to affect cell cycle control.
1. Correlation oscillations and underlying rhythms can exhibit vastly different periods
The period of the correlation oscillation is related to the location of the eigenvalues of the inheritance matrix on the complex plane. We consider an eigenvalue λ of the inheritance matrix. In terms of the mean interdivision time , the correlation period T0 is:
and the inequality means that the period T0 is always greater than twice the mean interdivision time
. More generally, there is an oscillation period associated with each eigenmode of the inheritance matrix, but the period is infinite for real eigenvalues, and thus only complex eigenvalues generate correlation oscillations. This inequality follows from (6) using |Arg(λ)| ≤ π. However, known biological oscillators that influence cell cycle control often have periods less than twice the mean interdivision time, such as stress response regulators [42, 43] and gene expression oscillations [44–46]. How can relatively slow observed correlation oscillations be compatible with much faster biological oscillators underlying the cell cycle?
The resolution to this issue is that the period of the correlation oscillation does not always match the frequency of the underlying oscillator. Instead there are a number of possible oscillator periods Tn compatible with the correlation oscillation period T0 (Appendix 1 - Section A4) given by:
for n in ℤ. This phenomenon, that the same correlation oscillation can be explained by multiple underlying oscillators, can be understood using the intuition in Figure 5a.
(a) Schematic showing how sampling a high frequency rhythm at each cell division could result in a lower frequency oscillator being constructed. (b) Possible oscillator periods (Equation 7) indexed by n for a correlation oscillation period . (c) Density plot of the complex eigenvalue output from the model sampling for cyanobacteria (purple) and mouse embryonic fibroblasts (orange). (d) Posterior distributions of the correlation oscillation period T0 in cyanobacteria (purple) and mouse embryonic fibroblasts (orange). (e) Posterior distributions of the oscillator period T−1 in cyanobacteria (purple) and mouse embryonic fibroblasts (orange). Arbitrary units in (d) and (e) are used to compare histograms, the density values are not normalised in relation to each other in order to display both histograms clearly on the same plot. (f) Density plot of complex eigenvalues for human colorectal cancer. (g) Posterior distributions of the correlation oscillation period in human colorectal cancer (shaded area) and oscillator clusters corresponding to positive (cluster A, orange) and negative real parts (B, blue). The bar chart shows the posterior mass of the clusters. (h) Posterior distributions of the oscillator periods T−1 corresponding to (g). (i) Model fit and 95% credible intervals for human colorectal cancer (cf. legend of Figure 3). Red area indicates the grandmother granddaughter correlation explored in (j). (j) Posterior distribution of oscillator vs alternator clusters give grandmother correlations with opposite signs. (k) Lineage and cross-branch correlation functions of oscillator clusters A (orange) and B (blue) in human colorectal cancer. Red area indicates the great-grandmother great-granddaughter correlation explored in (l). (l) Posterior distributions of oscillator clusters A (orange) and B (blue) have great-grandmother correlations of opposite signs.
2. Circadian oscillations in cyanobacteria and fibroblasts support coupling of the circadian clock and the cell cycle
Cyanobacteria and fibroblasts both exhibit correlation patterns consistent with an oscillator underlying cell divisions (Figure 3e, bar chart). We observe that the posterior distribution of the eigenvalues is confined to a region with negative real parts for cyanobacteria and positive real parts for fibroblasts (Figure 5c). Using these distributions we estimate the median period of the correlation oscillations (using Equation 6) to be 41.7h for cyanobacteria and 144.3h for fibroblasts (Figure 5d). We wondered whether the stark difference in the periods of the correlation oscillations indicates a different underlying rhythm. Conversely, we found this was not the case, but both correlation patterns were consistent with an approximate circadian rhythm. The posterior of the oscillator period T−1, which is closest to the period of correlation oscillation T0, suggests a median period of 24.6h for cyanobacteria and a median period of 23.8h for fibroblasts (Figure 5e). We also validated the inference result using simulated data (Appendix 1 - Figure A9). This finding supports a strong coupling of circadian rhythms to the cell cycle, as reported previously for both cyanobacteria [13, 47] and fibroblasts [48–50]. Notably, we see that clock-deleted cyanobacteria displays 100% alternator pattern (Figure 3b) and therefore has a lineage tree correlation pattern that cannot be described by an approximate 24h oscillator, in contrast to wild type cyanobacteria.
3. Bimodal posterior distribution of underlying oscillations in human colon cancer
Finally, we turn to the analysis of cancer cell data. The dominant correlation pattern was oscillatory (78% posterior probability, Figure 3d, bar chart). The posterior distribution of complex eigenvalues for the oscillator pattern has support in a large region of the parameter space. It has two distribution modes depending on whether the eigenvalues have positive or negative real parts (Figure 5f). Similarly, the posterior of the correlation oscillation period is bimodal, too (Figure 5g), which means that two competing oscillator patterns are compatible with the data.
To disentangle these alternative hypotheses, we cluster the posterior samples by the real part of the eigenvalues. We label cluster A for negative real parts and cluster B for positive ones. The correlation periods of the individual clusters do not provide us with immediate clues about the underlying oscillators. Cluster A has a median correlation oscillation period of 51.2h while cluster B has a median period of 100.6h (Figure 5g). We therefore inspected the oscillator periods T−1 for each cluster, which are closest to the observed correlation period (Figure 5h). The median of the predicted oscillator period of cluster A has an oscillator period T−1 of 24.1h, which hints at a circadian oscillator underlying the cell cycle in agreement with a previous model [23]. However, only about 33% of posterior samples with complex eigenvalues were assigned to this cluster. The majority of posterior samples, cluster B, had a different predicted period with a median of 19.6h (Figure 5h). A possible explanation is that the circadian period is shortened in cancer cells.
A strength of the Bayesian framework is that it allows us to express our confidence in this prediction. We find that our analysis is not conclusive about the correlation pattern as 78% of posterior samples showed an oscillator pattern. As a result, about 52% of all the posterior samples favour a 19.6h oscillator and 26% for the 24.1h oscillator, matching approximately circadian rhythm. 16% of the samples demonstrate alternator correlation patterns, and the remaining 6% samples are aperiodic (compare bar charts in Figures 3d and 5g). We therefore ask whether these competing models make predictions that translate into testable hypotheses. We found that the oscillator correlation pattern predicts a negative grandmother correlation while the alternator pattern predicts a positive grandmother correlation (Figure 5i,j). Thus measuring the grandmother correlation with higher precision, for example, via increasing sample size, would tighten the confidence intervals of measured correlations (Figure 5i), and improve our ability to narrow down the true pattern. On the contrary, predicting the great-grandmother correlation allows us to distinguish between the 19.6h and 24.1h rhythms (Figure 5h). Posterior samples in cluster A predicted a positive interdivision time correlation between a cell and its great-grandmother, while cluster B predicted a negative correlation (Figure 5k,l). While the great-grandmother correlation could not be estimated using the present data, deeper lineage trees could be used to discriminate the period of the biological oscillator and help reveal whether the circadian period is altered in cancer cells, or not. In summary, our theory helps to predict the hidden periodicities of biological oscillators from lineage tree interdivision time data.
III. DISCUSSION
We propose a Bayesian approach to predict hidden cell cycle factor dynamics from interdivision time correlation patterns. Our underlying model fits the lineage tree data for a range of bacterial and mammalian cell types and allows us to classify different correlation patterns. Our inference demonstrates that these patterns are identifiable, but the individual inheritance parameters are not. This finding suggests that interdivision time correlations alone are insufficient to gain mechanistic insights into cell cycle control mechanisms. The identified correlation patterns, however, reveal the dynamics of the underlying cell cycle factors.
We focused on a data-driven approach without any prior assumptions of the division mechanism, allowing the interdivision time data to speak for itself. Other studies used a model similar to the inheritance matrix model proposed here, and linked latent factors to the interplay between cell cycle progression and growth [24]. Autoregressive models have also been used in bacteria to discriminate between different mechanisms of cell size control [14]. Additionally, they have been used to combine growth and cell cycle reporters to explain interdivision time dynamics in fibroblasts [25]. In principle, the inheritance matrix model can be used to model the inheritance dynamics of any factor affecting the interdivision time of a cell. In fact, it comprises many mechanistic models as special cases, such as those based on DNA replication, cell size control or cell cycle phases (Appendix 1 - Section A6 and Appendix 1 - Figure A5 and A6). In future work, it will be useful to improve the identifiability of the model parameters. This could be accomplished either through including knowledge of inheritance mechanisms through prior distributions, or by including additional data on measured cell cycle factor dynamics – such as cell cycle phases, cell size, protein expression etc. – in the inference.
Another limitation of our inference is that we computed the interdivision time variance sτ in (M2) of the model assuming that trees have equal number of generations in each branch. The advantage of this estimator is that it does not assume any particular noise distribution but this may lead to a statistical bias compared to the sample variance of tree-structured data with branches of varying length [19, 24, 27, 51–53]. However, the approximation does not change the identified correlation patterns and the conclusion of this work, since any variance bias can be compensated by multiplying the noise matrices (S1,2 in Eqs. (5)) with a constant, and, for the data analysed, the interdivision time variance estimators cannot be distinguished within the 95% confidence intervals (Appendix 1 - Table A3). Developing a theory correcting for such biases in lineage tree data will be the subject of future work.
An important result of the present analysis is that lineage tree correlation patterns of very different cell types – cyanobacteria, mouse embryonic fibroblasts and human colorectal cancer – can be explained through an underlying circadian oscillator coupled to cell division. While the coupling between the cell cycle and circadian clock is well established both in cyanobacteria and mouse embryonic fibroblasts, it is less well studied in cancer [54, 55]. Our method robustly reconstructs the circadian rhythms from the interdivision time correlation patterns despite the lack of the cousin-mother inequality for fibroblasts, demonstrating the cousin-mother inequality is not required for complex correlation patterns (Section II C). It is interesting to observe the differences in the oscillatory correlation patterns in these organisms. They are characterised by complex eigenvalues with negative real parts in cyanobacteria, but positive real parts in fibroblasts (Figure 5c), resulting in opposite mother-daughter correlations for these datasets (Figure 3a,f).
It would be interesting to explore what mechanisms underlie these different patterns. While the circadian clock in fibroblasts relies on transcriptional mechanisms [49, 56, 57], the origin of the clock is non-transcriptional in cyanobacteria [58–60]. The negative mother-daughter correlation in cyanobacteria likely stems from size control mechanisms that are modulated by the circadian clock [13]. However, the mechanisms that generate positive mother-daughter correlations in fibroblasts are still to be explored. Interestingly, in human colorectal cancer, two oscillatory correlation patterns divide the posterior distributions into two distinct clusters with positive and negative mother-daughter correlations. If the circadian clock was to generate a positive mother-daughter correlation, as it does in fibroblasts which have a structurally related clock, the period corresponds to a 20h rhythm. This finding thus suggests that the circadian period is altered in cancerous cells. Indeed, several studies report similar periods of 18h and 20h for gene expressions in the human colorectal cancer core-clock [61, 62].
Our theory predicts that an oscillator’s period does not always match the period of the observed correlation oscillations. We describe a lower bound on the correlation period that is reminiscent of the Nyquist-Shannon sampling theorem. This theorem describes temporal aliasing in digital audio processing, where a high frequency signal produces low frequency oscillations when sampled at a frequency less than twice the sampling frequency. Similarly, spatial aliasing is observed in digital image processing as a moire pattern. In our analogy, the high frequency signal is a biological oscillator that couples to cell division and is sampled at the cell division frequency (Figure 5a). Our result thus extends the Nyquist-Shannon sampling theorem to lineage trees. Our finding has fundamental implications for the reconstruction of oscillator periods from interdivision time data, revealing that there exists a number of oscillators that can all explain the same correlation pattern.
Here, we concentrated on the oscillator periods T−1 that are closest to the correlation oscillation periods T0. In principle, we cannot exclude that oscillators with shorter physiological periods are contributing to the observed lineage tree correlation patterns. For example, HES1 expression oscillates with a period of around 5h in human colon cancer cells [44, 45]. The stress response regulators NF-κB and p53, which are critical for tumour development, oscillate with periods of approximately 100min and 5h respectively [42, 43]. The posterior distributions for periods in this region are not well separated (Appendix 1 - Figure A7c), which makes it challenging to identify factors that oscillate significantly faster than the cell cycle using interdivision time data. It is, however, unknown whether such hypothetical factors couple to cell division specifically in a manner to induce oscillatory interdivision time correlation patterns.
Going forward, there is a need to go beyond the Nyquist-Shannon limit and develop methods that have increased sensitivity to discriminate a broader range of oscillator periods. One way to circumvent the limitation would be to employ fluorescent reporters of the circadian clock that could be correlated directly with cell division timing. Another way, would be to provide parallel readouts of the underlying rhythm through events that sub-sample the cell cycle, such as DNA replication, or the timing of individual cell cycle phases. Not only would we be able to look at the correlation in interdivision time between cells on a lineage tree, but we would also be able to analyse the correlations between individual phases and family members, to reveal specific phase control mechanisms. Our main findings result from the the inheritance matrix model with two cell cycle factors, as this was sufficient to explain the correlation patterns of the chosen data. In principle, increasing the number of interacting cell cycle factors can lead to more complex composite patterns that involve combinations of the three patterns discussed in this paper, such as the alternator-oscillator (Appendix 1 - Figure A6c, d), aperiodic-oscillator (Appendix 1 - Figure A6g, h), or birhythmic correlation patterns. Such composite patterns could also arise as the result of nonlinear fluctuations that, within our framework, can be described by adding complexes of cell cycle factors to the inheritance matrix model (Appendix 1 - Section A2). The presence of such complexes induces higher-order harmonics in the correlation oscillations, similar to those observed in the cyanobacterial and mammalian circadian clock [12, 63], and detecting such complexes could provide an alternative route to increase the sensitivity of our inference method.
In summary, our findings highlight the predictive power of Bayesian inference on single-cell data and how it can be leveraged to draw testable hypotheses for the design of future experiments. This was exemplified for human colorectal cancer cells, where various patterns were compatible with the data, something that non-probabilistic approaches cannot accomplish as they fit only a single correlation pattern. In the future, it will be crucial to understand why different cell types have evolved specific lineage correlation patterns and how these patterns affect cell proliferation and disease. It would be interesting to understand whether specific correlation patterns give or reveal some fitness advantage and whether we can use them to predict cell survival. We anticipate that identifying hidden cell cycle factors and their rhythmicity using non-invasive methods such as interdivision time measurements will be instrumental in answering these questions and may benefit other fields where cell proliferation plays a pivotal role.
CODE AVAILABILITY
Code available at https://github.com/pthomaslab/Lineage-tree-correlation-pattern-inference.
METHODS
A. Analytical solution of the inheritance matrix model
From (2) and E[zp] = 0, for all p in ℕ, we see that the vector of cell cycle factors has zero mean E[xp] = 0. Its N × N covariance matrix Σ = Cov(xp, xp) satisfies a discrete-time Lyapunov equation:
From the solution of (M1), we compute the variance of the interdivision time
and the generalised tree correlation function ρ(k, l) (see Appendix 1 - Section A3 for a detailed derivation) given by:
where
with
To ensure that the lineage tree correlation pattern is stationary, we require SR(θ) < 1 where SR(θ) = max(λ1, λ2, …, λN) is the spectral radius of θ. This also ensures that the solutions to (M1); Σ, S1 and the function (M3) are unique and independent of the initial conditions.
B. Analysis of tree correlation patterns
The patterns of the generalised tree correlation function can be characterised through its eigendecomposition. The general decomposition proceeds through finding the matrix of eigenvectors U of θ such that
is the diagonal matrix of eigenvalues. Defining Ŝ1,2 = U S1,2U⊤ and
, the solution to (M1) is given by
This result can then be used to find an explicit expression for the generalised tree correlation function:
where
(M7) can be rewritten as a superposition of patterns (4) with weights given by (5).
The pattern of the tree correlation function is thus governed by the eigenvalues of the inheritance matrix θ: (i) if one eigenvalue, say λ1, is positive then the factor contributing to lineage correlation decays monotonically. The factor
contributing to the cross-branch correlation decays twice as fast; (ii) if there is a negative eigenvalue, the factor
alternates between negative and positive values with an envelope of |λ1|k, while the corresponding contribution to the cross-branch correlation decays monotonically with rate as |λ1|2k. Finally, if we have a pair of complex eigenvalues
then the factors
contributing to the lineage correlation function display damped oscillations with frequency Ω and envelope Dk, while the factor
and the factor
oscillate with frequency 2Ω.
C. Determining the period of correlation oscillations from the eigenvalues
We consider the case where the inheritance matrix θ has a pair of complex conjugate eigenvalues λ± = De±i2π/P. The lineage correlation function then oscillates whenever D ≠ {0, 1} and in ℤ. The period of correlation oscillations per generation is given by
where Arg(λ) in (− π, π] is the argument of the eigenvalue and ln(·) is the complex logarithm. The former is the angle made between the line joining the origin and the eigenvalue λ on the complex plane with the real axis. This means that
if and only if P > 2. Otherwise, T0 is calculated in terms of P by equation (M9) (Appendix 1 - Figure A8).
D. Data analysis & Bayesian inference of the inheritance matrix model
We determined all pairs of cells in a lineage tree, sorted them by family relations (k, l) and calculated the sample correlation coefficient of interdivision times (3). To maximise the number of samples used to calculate these correlations, an individual cell can appear in more than one pair. For example, if a cell had two cousins, it would be counted in two separate cousin pairs in the cousin-cousin correlation coefficient calculation. For training, we focus on the sample statistics with C = {(1, 0), (2, 0), (1, 1), (2, 2)} comprised of the interdivision time sample variance and four interdivision time sample correlation coefficients given by the mother-daughter, grandmother-granddaughter, sister-sister and cousin-cousin relations (Figure 2a). Note that ŝτ is computed across all interdivision times used to calculate the correlation coefficients in each dataset. Errors are estimated using bootstrapping by re-sampling cell pairs with replacement 10,000 times. The resulting variances and correlation coefficients are given in Appendix 1 - Table A1.
The vector of inferred model parameters for the two-dimensional model is Θ = (θ, S1), where we fix α = (1, 1)⊤ and S2 = 0 for simplicity. A different choice of α did not affect our results (Appendix 1 - Figure A4). Since S1 is symmetric, it consists of the N variances and N (N − 1)/2 correlation coefficients between the components of z. Thus for N = 2 the inheritance matrix model has seven free parameters to be estimated. We assumed that the log-likelihood for these statistics is the sum of square errors:
which is equivalent to assuming that the sample variance and correlation coefficients are normally distributed for large sample sizes. We calculate the interdivision time variance sτ and the generalised tree correlation function ρ(k, l) from (M2) and (M3). Note that (M2) is the interdivision time variance from a tree where all lineages have the same number of generations, which approximates the variance across all cells in the observed trees (Appendix 1 - Table A3). For simplicity, we neglected possible correlations between the sample statistics in
and used bootstrapped estimates for the standard deviation of the sample statistics
and
(Appendix 1 - Table A1). Note that the likelihood is independent of the mean since it is irrelevant for the correlation pattern. We assumed a flat prior with support restricted to SR(θ) < 1 and S1 positive semi-definite to guarantee the existence of a stationary correlation pattern.
The numerical implementation uses the adaptive Gibbs-sampler implemented in the Julia library Mamba.jl [64]. For each dataset, we sample 11 million parameter sets which include a burn-in transient of 1 million samples. These samples are removed before analysis of the output.
For model comparison we use the AIC [65] given by
where k is the number of model parameters and ln
is the maximum value of the log-likelihood function given by (M10). For S2 ≠ 0, the inheritance matrix model has k = d(1 + 2d) parameters where d is the number of cell cycle factors in the model. For S2 = 0 the number of parameters reduces to
.
ACKNOWLEDGMENTS
We thank Bruno Martins, Dimitris Volteras and Paul Piho for their comments on the manuscript. This work has been supported by a scholarship to FAH provided by the EPSRC Centre for Mathematics of Precision Health-care (EP/N014529/1) and MRC core funding to the London Institute of Medical Sciences (MC-A658-5TY60). ARB is funded by a CRUK Career Development Fellowship (C63833/A25729). PT is funded by a UKRI Future Leaders Fellowship (MR/T018429/1).
APPENDIX
A1. Small noise approximation
Here, we will derive the inheritance matrix model given by equations (2) in the main text. We assume that the fluctuations in the hidden cell factor dynamics are small, which leads to a computationally efficient approximation.
Firstly, in the limit of zero fluctuations, all cells must be identical. Hence, all cell cycle factors are equal to their means µ = (µ1, µ2, …, µN)⊤ = E(yp) and similarly for the noise vectors β = (β1, β2, …, βN)⊤ = E(e2m) = E(e2m+1) in Eq. (1b). From (1a) and (1b) we then find that
which can be efficiently solved for
and µ using standard numerical methods.
Secondly, we can decompose the interdivision time and the cell cycle factor vector into their respective mean and fluctuating components by
Denoting the index of the present cell by p and the one of its mother by m, we can expand f and g around the limit of zero fluctuations and we obtain to leading order
where
Using this expansion and (A2) in (1a) and (1b) of the main text we arrive at
where we have set
and
giving the fluctuations around the mean for the noise vectors. Comparing (A7) with (A1) and collecting terms to leading order, we obtain the linearised system:
Next, we define the diagonal scaling matrix Γ with non-zero elements as
for i = 1, 2, …, N. Using the rescaled noise sources
, we find the rescaled inheritance matrix θ and α-coefficients
The rescaled cell cycle factor fluctuations follow (2) of the main text and we reach rescaled variance-covariance matrices S1 and S2 as follows
A2. Beyond the small noise approximation: cell cycle factor complexes account for nonlinear fluctuations
Here we analyse the effect of nonlinearity on the interdivision time correlation patterns. For simplicity we consider a single cell cycle factor and follow the same lines as in section A1, Eq. (A9), while including terms of order x2. This leads to the expansion interdivision time and factor fluctuations
where
is the Jacobian of the cell cycle factor dynamics, as before, and
is the Hessian. From the second equation we obtain
Defining and
, combining Eqs. (A14) and (A16), and rescaling variables as in (A12) and (A13), we find the extended inheritance matrix model
where
Here and β = 1 if
, and analogously,
and β = 0 if
. Hence, the interdivision time correlation patterns with small to moderate fluctuations can be described through an extended linear system (A17) that includes nonlinear terms
. These additional terms can be interpreted as cell cycle factors forming binary complexes. The presence of these complexes increases the number of cell cycle factors and extends the eigenvalue spectrum of the effective inheritance matrix Θ by θ2. Hence, the presence of complexes leads to mixed correlation patterns. For example, for a single cell cycle factor, the eigenvalues of Θ are (θ, θ2), which corresponds to an alternator pattern for θ < 0. More generally, we may expect that nonlinear patterns can be described through mixtures of aperiodic, alternator, and oscillatory patterns. For example, the complex eigenvalue spectrum of an oscillator pattern (e±i2π/P) will include powers of complex eigenvalues (e±i4π/P) resulting in harmonics of the fundamental correlation oscillation frequency similar to higher order harmonics observed in single-cell time-series of the circadian clock [12, 63].
A3. Derivation of the generalised tree correlation function
In this section we derive an analytical expression for the generalised tree correlation function. This gives the Pearson correlation coefficient in interdivision time for any pair of related cells. We start with the equation for the Pearson correlation coefficient, and from there derive a formula for the interdivision time covariance using the known properties of the cell cycle factors x. From this, we can derive the general formula for the correlation coefficient between any related cell pair.
We associate a cell pair with an index (k, l) which measures the distance to the nearest common ancestor as given in Section II B (Figure 2a). From this, we denote their interdivision time fluctuations as and
respectively. The Pearson correlation coefficient between these fluctuations is given by
where sτ′ is the variance of the interdivision time fluctuations.
The interdivision time fluctuations and
are calculated from the vector of rescaled cell cycle factor fluctuations xk as given in Section II A, giving the equations
Substituting (A20) into (A19), we obtain a formula for ρ(k, l) in terms of the cell cycle factor fluctuations x and the α coefficients alone
Since xk and xl are identically distributed in steady state, we have that Var(xk) = Var(xl) = Cov(x, x) = Σ as specified in Methods A. We can write ρ(k, l) now as
where α⊤Σα gives the variance of the interdivision time fluctuations τ′.
Using the model equation (2) we can write the formula for the x vectors for the two cells in the cell pair (k, l) as
where cells k and l have mother cells k − 1 and l −1 respectively. The two cells are sisters if and only if their subscripts are both equal to 1, meaning they share a mother cell. Using recurrence of the model, we can write these equations as
where x0 is the vector of cell cycle factors for the most recent common ancestor for a cell pair given by (k, l).
All that remains is to derive a function for Cov(xk, xl) which we will denote ω(k, l). We calculate ω(k, l) as follows using expectations:
where
and
are the mean vectors of xk and xl respectively which are both equal to 0, giving
To find in terms of the model parameters, we substitute in equations (A26) for xk and xl and get
The noise term fluctuations z are only correlated if the cells are sisters, which only occurs when the distance we have (k, l) = (1, 1). So for the summations above, we exclude all terms except where i = j = 1. Doing this and expanding we get
where δk≥1 and δl≥1 are given in Equation M4. We also have that
The matrix Cov(x0, x0) is equivalent to the covariance matrix for any x, giving Cov(x0, x0) = Σ. This gives
Similarly we have,
As E(z) = 0, and Cov(z2m, z2m+1) = S2 as stated in Methods A, we obtain,
Equation (A31) therefore becomes:
Substituting (A37) back into (A29) we get
giving us the final equation for ω(k, l). Using the above equation in (A23), we obtain Eq. (M3) of the Methods.
A4. Derivation of the formula for the oscillator periods, Tn
The period of correlation oscillation as observed in the lineage correlation functions is given by (6). We can reveal the underlying oscillator periods by shifting the inferred period T0 to obtain a smaller period Tn. This means that shorter periods would produce the same inferred period in the lineage correlation function when sampled at the original frequency of once per cell cycle (Figure 5a).
The oscillator periods are obtained by adding or subtracting multiples of 2π to the argument of the eigenvalue which results in the new argument being in the same position in the complex plane. The oscillator period Tn with shift n in ℤ is therefore given by
Taking (A39) and substituting in (6), we obtain Tn in terms of T0 as (7).
A5. Solution of the tree correlation function and parameter identifiability for simple inheritance rules
We consider the limiting case of a single cell cycle factor (N = 1) resulting in simple inheritance rules. This situation could model a growth factor that can either increase or decrease interdivision times of cells depending on the monotonicity of f in Eq. 1. The analytical solution (M7) of the inheritance matrix model then reduces to
where
. First, we observe that, given single cell measurements of the mother-daughter correlation coefficient ρ(0, 1), the daughter-daughter correlation coefficient ρ(1, 1) and the variance sτ, the parameters θ, S1 and S2 are uniquely identifiable:
Thus measurements of the variance, lineage- and cross-branch correlations fully determine the parameters. The tree correlation function is, however, independent of f, which means that the interdivision time correlation pattern carries no information whether the growth factor increases or decreases growth. The reason for this indifference is that cell cycle factors are identified only by their fluctuation pattern, i.e., for each cell cycle factor whose fluctuations increase interdivision time x, we could define another cell cycle factor fluctuation that decrease interdivision time −x. We accounted for this unidentifiability issue trough a similarity transformation using the scaling matrix Γ in (A12) and (A13) that transforms all cell cycle factor fluctuations to increase interdivision time. Of course, this unidentifiabiliy could be removed through explicitly measuring the involved cell cycle factors.
A6. Mapping mechanistic cell cycle and cell size control models to the inheritance matrix model
To further investigate the output of the inheritance matrix model, we propose multiple models of known cell cycle control mechanisms, and map them to our inheritance matrix model framework. All cell size models assume symmetric division.
1. Cell size control model with correlated growth
Considering the influence of cell size control on interdivision time [11, 14, 31], here we propose a cell size control model where we have some mother to daughter inheritance of both the added size Δ and the growth rate κ (Appendix 1 - Figure A5g). The model equations are given by:
The noise terms ξ and ϕ are independent between sisters such that Cov(ξ2m, ξ2m+1) = Cov(ϕ2m, ϕ2m+1) = 0. Assuming exponential growth the formula for the interdivision time is given by
where p represents the index of a given cell. Taking the vector of cell cycle factors for the mother cell to be ym = (ym,1, ym,2, ym,3)⊤ = (Δm, sb,m, κm)⊤ and comparing (1a, 1b) with (A42) and (A43), we obtain
Then we can calculate the means from (A1),
Then using (A44) in (A5) and (A12), we find
Assuming and using (A13), we find
The θ matrix has eigenvalues which give an aperiodic pattern for a, b, c > 0 and an alternator pattern otherwise (Appendix 1 - Figure A5i,j). These same patterns arise for all real eigenvalues in the 3D model in the same was as in the two-dimensional system. Only a single negative eigenvalue is needed for the lineage correlation function to display an alternator pattern. We are restricted to a in (−2, 2) and b, c in (−1, 1) to ensure SR(θ) < 1. The cousin-mother inequality for this system is too complex to be looked at analytically, so we use numerical methods to visualise the parameter region in which the cousin-mother inequality can be satisfied (Appendix 1 - Figure A5h).
For the case of the aperiodic pattern, we observe positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1 - Figure A5k). In contrast, for an alternator pattern, the mother daughter same factor correlation is negative, but the alternate factor correlations vary between positive and negative values (Appendix 1 - Figure A5l).
2. Simple cell size control model
For the special case of b = Var[ϕ] = 0 and c = 1, the model reduces to a simple cell size control model with fluctuating added size (Appendix 1 - Figure A5a). The inheritance matrix θ then has eigenvalues . Thus depending on the choice of a, this model can produce both an alternator and aperiodic pattern (Appendix 1 - Figure A5c,d). In this case, using (M3) the cousin-mother inequality becomes
which cannot be satisfied for |a| < 2, which implies
. Hence the cousin-mother inequality cannot be satisfied for any reasonable choice of a in this simple model (Appendix 1 - Figure A5b).
For an aperiodic pattern, this simplified model exhibits positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1 - Figure A5e). In the alternator case, this model exhibits negative same factor mother-daughter correlation and also negative alternate factor mother-daughter correlation (Appendix 1 - Figure A5f).
3. Abstract cell cycle phase model
We propose a model of two abstract cell cycle phases that have no integrated dependence on cell size (Appendix 1 - Figure A5m). The model equations are given by
The noise terms ξ and ϕ are independent between sister cells such that Cov(ξ2m, ξ2m+1) = Cov(ϕ2m, ϕ2m+1) = 0. In this case we have that the two factors make up the length of the cell cycle, so we simply have τp = yp,1 + yp,2.
Therefore using (1a), and (1b) we obtain
We calculate the means from (A1),
Then using (A50) in (A5) and (A12), we find
As the noise terms are independent between sisters we have and using (A13) we obtain
The inheritance matrix θ has eigenvalues λ = (a, c) which gives an aperiodic pattern for a and c > 0 and an alternator pattern otherwise (Appendix 1 - Figure A5o,p).
The analytical form of the cousin-mother inequality is complex so we use numerical methods to visualise the parameter region in which the cousin-mother inequality can be satisfied (Appendix 1 - Figure A5n).
We calculate individual factor mother-daughter correlations and find that for an aperiodic pattern, the model exhibits a range of correlation patterns (Appendix 1 - Figure A5q). However, for an alternator pattern, we obtain positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1 - Figure A5r)
A7. Models of circadian-clock-driven correlation patterns
1. Kicked cell cycle model
Here we analyse the kicked cell cycle model [36] with our framework (Appendix 1 - Figure A6a). We will propose an inheritance matrix and then show that it reduces to the kicked cell cycle model for certain parameter choices. Consider the 3 × 3 inheritance matrix θ and noise vector zn given by
for n ∈ {2m, 2m + 1}. We have that S1 is given by Cov(z2m, z2m), however we assume that the noise terms ξn are independent between sisters such that S2 = Cov(z2m, z2m+1) = 0. Assuming α = (1, 0, 0)⊤, the interdivision times are governed by
The oscillator is represented by the cell cycle factors that evolve according to
with oscillator inheritance matrix
We can solve (A56) along an ancestral lineage of n generations
where
is the state of the ancestral cell. Substituting (A58) into (A55) and assuming
, i.e., the cell cycle oscillator
is deterministic, the interdivision time of the mother determines the interdivision time of the daughter cell via
where
and
which represent initial conditions. Assuming
approximates the time at birth for n ≫ 1, this leads to
Comparing (A60) to Eq. (1) and (2) in [36], we see that our IMM agrees with the kicked cell cycle model when D = 1, , and large n.
2. Circadian-clock-driven cell size control model
Here we analyse the model of cell size control driven by the circadian clock proposed in Martins et al. [13] within the inheritance matrix model framework (Appendix 1 - Figure A6e). The division rate, in Eq. (1) of [13] is given by
where s is the cell size with sb being the size at birth. G(t) is a function of time t that couples the size control to the circadian clock, and S(s, sb) is the division rate per unit volume of the cell. Assuming cells grow exponentially with growth rate α, we have
and the division size follows
where sb is the size at birth and tb is the time at birth.
To map these to our inheritance matrix model, we observe that samples from (A63) follow
where
is a drift term and
is a zero-mean noise term that depends both on time of day and birth size. Note that both
and
are periodic functions of time at birth tb,m. Since the latter is not explicitly modelled in our framework, here, we replace it with the state x0,m of the circadian clock, such that the update equations in (A64) now appear as
To gain intuition into the shape of the unknown functions g and h, we linearise the equations around some basal level x = δ of a clock-less mutant, which gives
For simplicity assume and that the clock-less mutant follows a linear cell size control model with gamma-distributed size increments ϕA,m ∼ Gamma with mean Δ as in [13]. These assumptions lead to the relations,
Using sb,2m = sb,2m+1 = sd,m/2, we can obtain the linearised inheritance matrix model equations for the circadian cell size control model (Appendix 1 - Figure A6e):
where now ξm is the added size and x0,m is the output x0,m = x1,m + x2,m of a circadian oscillator governed by
for cell generation n, where ϕ = (ϕ1, ϕ2)⊤ are noise terms added to the elements of x0 and
is some complex eigenvalued 2 × 2 inheritance matrix given by
Following this, we see that the circadian clock is incorporated into this cell size control system in the same way as the kicked cell cycle model outlined in the previous section (Appendix 1 - Section A7 1). Using (A62) we can write the interdivision time of a cell with index p as
Then taking the vector of cell cycle factors for the mother cell to be ym = (ym,1, ym,2, ym,3, ym,4)⊤ = (sb,m, x1,m, x2,m, ξm)⊤ and comparing (1a, 1b) with (A68) and (A71) we obtain
Computing the means using (A1) we get
Then using (A72) in (A5) and (A12), we can solve for the means
Then taking and using (A13), we obtain the following for S1:
where corij indicates the correlation between a pair of noise terms ϕi and ϕj, and
for i, j ∈ {1, 2, A}.
3. Model comparison
We notice that the kicked cell cycle model has three cell cycle factors, while the circadian-clock-driven cell size control model has four cell cycle factors. The eigenvalues of the inheritance matrix θ determining the correlation patterns are
for the kicked cell cycle model and
and for the cell size control model. In both models, either the complex pair of eigenvalues
produces oscillatory behaviour. The overall correlation patterns are of mixed type, depending on the parameters β and a.
To compare the models quantitatively, we match their mother-daughter interdivision time correlation coefficient in the absence of clock coupling. For the kicked cell cycle model, we notice that ρ(1,0) = β in the absence of clock coupling. The cell size control model reduces to the model in Appendix 1 - Section A6 1 in the absence of clock coupling, which satisfies . Since realistic cell size control mechanisms [11, 68–70] (a ∈[0, 2)) ranging from sizers (a = 0) to adders (a = 1) to timers (a = 2) imply β ≤0, we find that the kicked cell cycle obeys a mixed correlation pattern of the alternator/oscillator type while the cell size control model obeys a aperiodic/oscillator pattern.
Focusing on the common adder size control (a = 1), we find that the regions where the cousin-mother inequality is satisfied is remarkably similar in both models when β is matched accordingly (Appendix 1 - Figure A6b and f). The lineage correlation function (red line) oscillates but the cross-branch correlation functions (blue line) alternates for the kicked cell cycle (Appendix 1 - Figure A6c-d) but not for the cell size control model (Appendix 1 - Figure A6g-h).
A8. Inference validation using simulated data
To validate the inference results discussed in the main text we simulate interdivision time data using the maximum posterior parameters from the inference on two of the original live imaging datasets, and compare the output and model fit to our original inference.
We take the maximum posterior parameter sets from the original inference on two datasets (Appendix 1 - Table A2), cyanobacteria and mouse embryonic fibroblasts, and produce simulated interdivision time lineage data in MATLAB using custom scripts and Random Trees [67]. We chose to look at these two datasets in order to analyse the posterior distribution of the inferred underlying period T−1 to compare to the approximately 24h results seen in the main text.
From this simulated data, the correlation coefficients are calculated using the methods outlined in Methods D, and then we look at the model inference on these new, simulated correlations, to compare to the original. These simulations produce correlation patterns that reproduce the experimentally measured correlations (comparing Appendix 1 - Figure A9a-b with Figure 3a,f).
The posterior distribution of the simulated patterns are the same for the cyanobacteria, exhibiting an 100% oscillator pattern (Appendix 1 - Figure A9a), matching the fitting to the original dataset (Figure 3a). Mouse embryonic fibroblasts (Appendix 1 - Figure A9b) loses some of it’s original 100% oscillator pattern (Figure 3f) in favour of an alternator pattern. However, an oscillator pattern is still dominant.
We see that for cyanobacteria (Appendix 1 - Figure A9c) and mouse embryonic fibroblasts (Appendix 1 - Figure A9d), the posterior distribution for the inference on the simulated data for the correlation function oscillatory period, T−1 (Appendix 1 - Figure A9c,d), exhibits a large overlap with the original posterior distribution discussed in Section II G 2 (Figure 5e). The difference in the median for these posterior distributions is 0.42h for mouse embryonic fibroblasts (Appendix 1 - Figure A9d) and just 0.11h for cyanobacteria (Appendix 1 - Figure A9c). This result validates our analysis of these posterior distributions showing that the period that we reconstruct from the simulated correlation patterns is consistent with the original data.
APPENDIX FIGURES
(a-f) Plots showing data (open markers) against model predictions (solid black) for the one-dimensional model [28] for (a) cyanobacteria, (b) clock-deleted cyanobacteria, (c) mycobacteria, (d) human colorectal cancer, (e) neuroblastoma and (f) mouse embryonic fibroblasts. We fit the model using the same likelihood function (M10) and methods (Methods D) as in the main text. Points (black) give the median model output for each correlation and error bars give the 95% bootstrapped confidence intervals from 10,000 re-samplings with replacement. Circular points show the model fitted correlations (mother-daughter, grandmother-granddaughter, sister-sister and cousin-cousin) whereas triangular points demonstrate model predictions. For this fitting we used 100,000 samples (in contrast to 10 million used in the main text).
(a-f.i) Plots of model fits and predictions (solid markers) against the data (open markers) for the family pair correlation coefficients for (a) cyanobacteria, (b) clock-deleted cyanobacteria, (c) mycobacteria, (d) human colorectal cancer, (e) neuroblastoma and (f) mouse embryonic fibroblasts. Colours of the solid markers represent the fits and predictions for parameter samples clustered by correlation pattern. Inset for each panel is a bar chart giving the distribution of the three patterns for each dataset. (a-f.ii) Plots of model output against the data for the interdivision time covariance. In this figure, the error bars for the data (unfilled black points) are calculated via bootstrapping of 10,000 samples with replacement to give the 95% confidence interval. For the model, error bars represent the 95% credible interval, computed by taking the 2.5th and 97.5th percentile of the sampled values. For all plots, circles indicate fitted correlations and triangles show predicted correlations. We can see that the model fit is good for all datasets as the error bars overlap with that of the data, and this is reflected in the low AIC given in Appendix 1 - Table A1.
(a) Trace of the log-likelihood from four initialisations of the inference on the clock-deleted cyanobacteria dataset (different colours.) (b) Histogram of the posterior distribution of the log-likelihood for the inference samples on the clock-deleted cyanobacteria dataset. The histogram for each average aligns demonstrating convergence of the log-likelihood.
Same panels as in Figure 3 but with α = (1, 0)⊤ and showing only one sample. We show the calculated family correlations with 95% bootstrapped confidence intervals (open markers) and a single sample of the model fit for (a) cyanobacteria, (b) clock-deleted cyanobacteria, (c) mycobacteria, (d) human colorectal cancer, (e) neuroblastoma and (f). Posterior parameter sets are clustered by correlation patterns (bar charts.) For this fitting we used 100,000 samples (in contrast to 10 million used in the main text). We see a similar fit and pattern distributions for all cell types except for mycobacteria (c), which here displays a dominant oscillator pattern
(a-f) Simple cell size control model. (a) Model schematic. (b) The cousin-mother inequality cannot be satisfied for any choice of parameter a. (c-d) Generalised tree correlation function plots (c) for a = 1 and (d) a = −1.5 resulting in aperiodic and an alternator pattern respectively. (e-f) Same vs alternate factor mother-daughter correlation plots for (e) a = 1 and (f) a = −1.5. In panels (b-f) we fix E[ξ] = 1, Var(ξ) = 0.1, κ = 1. (g-l) Cell size control model with correlated growth rate. (g) Model schematic. (h) Region plot with fixed parameter a = 1 showing the parameter space b, c in (−1, 1) that satisfies the cousin-mother inequality (blue). Example parameter choices are also plotted for an aperiodic (yellow) and an alternator (red) pattern. (i-j) Generalised tree correlation function plots for (i) (b, c) = (0.2, 0.7) and (j) (b, c) = (−0.81, 0.88) resulting in aperiodic and an alternator pattern respectively. (k,l) Same vs alternate factor mother-daughter correlation plots for (k) (b, c) = (0.2, 0.7) and (l) (b, c) = (0.81, 0.88). In panels (h-l) we fix E[ξ] = E[ϕ] = 1, Var(ξ) = Var(ϕ) = 1, κ = 1. (m-r) Two cell cycle phase model (m) Model schematic. (n) Region plot with fixed parameter b = −0.75 showing the parameter space a, c in (−1, 1) that satisfies the cousin-mother inequality (blue). Example parameter choices are also plotted for an aperiodic (yellow) and an alternator (red) pattern. (o-p) Generalised tree correlation function plots (o) for (a, c) = (0.3, 0.4) and (p) (a, c) = (−0.25, 0.9) resulting in aperiodic and an alternator pattern respectively. (q-r) Same vs alternate factor mother-daughter correlation plots for (q) (a, c) = (0.3, 0.4) and (r) (a, c) = (−0.25, 0.9). In panels (n-r) we fix Var(ξ) = Var(ϕ) = 1.
(a-d) Kicked cell cycle model. (a) Model schematic. The mother to daughter IDT inheritance is given by where a is the size control parameter. The ‘kick’ to the cell cycle us produced by a two-dimensional complex eigenvalued inheritance matrix model system with oscillator behaviour. (b) Region plot for β = −0.25 (blue) and β = 0.25 (grey), demonstrating the region for this model where the cousin inequality is satisfied. Here we fix the variances of the noise terms ξ1, ξ2 and ξτ all equal to 0.1. (c-d) Plot of the generalised tree correlation function for (c) (D, P) = (0.85, 2.5) and (d) (D, P) = (0.85, 5). In both these plots we take β = −0.25, meaning the model has a mixture of alternator and oscillator behaviours. The cousin inequality is satisfied for both these parameter choices. (e-h) Circadian cell size control model (e) Model schematic. The parameter a gives how the daughter’s birth size depends on the mother’s birth size; and b gives the coupling of the circadian oscillator to the size control. (f) Region plot demonstrating where the cousin inequality is satisfied. We fix a = 1, b = 1. Correlations between noise terms are fixed equal to 0 and we set ηi = 0.1 for i ∈{1, 2, A}. (g-h) Plots of the generalised tree correlation function for the same fixed parameters specified in panel (f), with (g) (D, P) = (0.85, 2.5), and (h) (D, P) = (0.85, 5). As we fix a = 1, these plots show a combination of aperiodic and oscillator behaviour. We note that for (D, P) = (0.85, 2.5), the cousin inequality is not satisfied. This demonstrate that oscillatory behaviour is not a necessary condition for the cousin inequality to be satisfied.
Histogram of the posteriors of the possible periods underlying the lineage correlation function for (a) cyanobacteria, (b) mouse embryonic fibroblasts and (c) human colorectal cancer, calculated using Equation 7. Numerical values give medians of the posterior distributions for each Tn. For (c) human colorectal cancer, we take the median period of each cluster where the clusters are allocated through the sign of the real part of the eigenvalue (see Figure 5). For all panels the correlation oscillation period T0 is given in green and the oscillator periods in different colours. The period analysed in Section II F corresponds to the histograms of T−1 (blue).
Plot of the function for P against the observed lineage correlation function period T0 given in Equation (M9) (blue line), for an oscillator pattern given in Section II B. We see that T0 = P for P > 2. For chosen T0 = 3 with τ = 1 and various n we see how the parameters P that produce the corresponding T0s are directly equal to the possible Tn we can derive from the chosen T0 (black points), using Equation (7).
Model fits and distribution of patterns for data simulated using the maximum posterior parameter set (Appendix 1 - Table A2) for (a) cyanobacteria, and (b) mouse embryonic fibroblasts. To simulate interdivision time lineage trees, we take the maximum posterior parameter sets from the original inference on the two datasets. These trees are simulated using Eqs. (2) in MATLAB using custom scripts which utilise ‘Random trees’ branching process [67]. For each dataset, we first simulate a complete tree of 11 generations (2047 cells) and take the last 1000 cells to sample stationary initial conditions. For the final simulated data, we simulated a number of smaller trees of 6 generations (63 cells each) to better represent live imaging experiments. We divide the number of cells in the original dataset by 63 and simulate this number of trees, with each tree having initial condition sampled from the last 1000 cells of the original large tree. We then randomly sample 85% of the simulated cells without replacement to imitate loss of cells from imaging mid experiment. The calculation of the family interdivision time correlation coefficients and the parameter inference was done in the same way as with the original datasets as outlined in Methods D. Pearson correlation coefficients (white dots) and 95% bootstrapped confidence intervals (error bars) were obtained through re-sampling with replacement (10,000 samples) of the simulated data. Posterior samples were clustered into aperiodic, alternator, and oscillator patterns (bar charts). We show several representative samples (solid and shaded lines) of the model fit drawn from the posterior distribution. We assume α = (1, 1)⊤. (c-d) Histograms of the inferred oscillator period T−1 for the original inference (blue) and inference on the simulated data (orange) for cyanobacteria (c) and mouse embryonic fibroblasts (d), demonstrating significant overlap of the oscillator period of the simulated parameter set (black dashed line) and the posterior distribution from Bayesian inference. Note that the posterior distributions of the real (red) and simulated datasets (blue) also overlap. Dashed lines give the median period of these posterior distributions for original inference (blue) and inference on simulated data (orange). Maximum posterior parameters used in the simulations are given in Appendix 1 - Table A2.
APPENDIX TABLES
Lineage tree statistics obtained from each dataset used in this work. Mean interdivision time τ, tree variance ŝτ, CVs and all correlation coefficients ± standard deviation of the bootstrap distributions from 10,000 re-samplings with replacement. Statistics were calculated on all available cells that could be put in the required family pair (Methods D). Shaded datasets exhibit the cousin-mother inequality.
Maximum posterior matrices from the original inference, used to simulate interdivision time trees used for analysis in Appendix 1 - Section A8 Appendix 1 - Figure A9
Comparison of different variance estimators. Mean and 95% confidence intervals calculated from bootstrap distributions of 10,000 re-samplings with replacement for each dataset used in this work. The estimators are obtained as follows: bare variance is computed using all available cells that could be put in the required family pair (Methods D). The lineage variance is calculated through the weighted variance with weights following arguments similar to [22, 66]. Here Di is the number of divisions in the lineage that came before cell i and Ntrees is the total number of trees in the whole dataset. The censored variance is calculated after pruning trees such that each tree contains lineages of the same length as in [24, 27].
Footnotes
Revised after review
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].
- [17].
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].
- [70].↵