How fast are cells dividing: Probabilistic model of continuous labeling assays

Correct estimates of cell proliferation rates are crucial for quantitative models of the development, maintenance and regeneration of tissues. Continuous labeling assays are used to infer proliferation rates in vivo. So far, the experimental and theoretical study of continuous labeling assays focused on the dynamics of the mean labeling-fraction but neglected stochastic effects. To study the dynamics of the labeling-fraction in detail and fully exploit the information hidden in fluctuations, we developed a probabilistic model of continuous labeling assays which incorporates biological variability at different levels, between cells within a tissue sample but also between multiple tissue samples. Using stochastic simulations, we find systematic shifts of the mean-labeling fraction due to variability in cell cycle lengths. Using simulated data as ground truth, we show that current inference methods can give biased proliferation rate estimates with an error of up to 40 %. We derive the analytical solution for the Likelihood of our probabilistic model. We use this solution to infer unbiased proliferation rate estimates in a parameter recovery study. Furthermore, we show that the biological variability on different levels can be disentangled from the fluctuations in the labeling data. We implemented our model and the unbiased parameter estimation method as an open source Python tool and provide an easy to use web service for cell cycle length estimation from continuous labeling assays (https://imc.zih.tu-dresden.de/cellcycle).

C: Cartoon of the labeling assay for tissue shows labeled cells (green) at the start of the experiment (t = 0) and later (t = t 1 ).
For three exemplary cells, their individual cell cycle is shown. Since they have different cell ages and cell cycle lengths, they become labeled at different times. Note, cell j = 2 is older than cell j = 3, but stays unlabeled due to its significantly longer cell cycle length τ i,2 which is represented by the smaller gray area compared to the other two cells. D: Hierarchical probabilistic model for the cell cycle length. Each cell has an individual cell cycle length τ i,j which depends on the sample level cell cycle length τ i , which in turn depends on the population level cell cycle length such that < τ i,j > j = τ i , < τ i,j > i,j =< τ i > i = τ .
the biological variability on cell and tissue scale could be estimated and disentangled given the fluctuating 40 data in the labeling assays.

41
Here, we develop a probabilistic model of continuum labeling assays that incorporates biological vari-42 ability on the cell and sample levels as well as the measurement errors. By simulating this model, we 43 find systematic deviations from the previous deterministic model in the fraction of labeled cells. To access 44 whether these deviations are of practical relevance for the estimation of the cell cycle length, we derive an 45 analytical expression for the likelihood. Using this expression, we perform a parameter recovery study based 46 on the maximum likelihood method. We compare the accuracy and precision of our model to the previous deterministic model. Finally, we present a web service that allows to easily analyze continuous labeling 48 assays with our probabilistic model.

Probabilistic Model of S-Phase Labeling
In order to develop a probabilistic model accounting for variability between cells (indexed by j) and 51 between samples (indexed by i), we first define a single cell model of S-phase labeling. As we consider 52 asymmetric cell divisions the total number of cycling cells stays constant and the actual division of a cell 53 after M-phase is not relevant. Instead label incorporation during the S-phase is important. Therefore, we 54 consider the end of the previous S-phase as our start time for the cell age. When the cell enters its S-phase, 55 the label, e.g. a Thymidine analog like BrdU or EdU, becomes embedded in the newly synthesized DNA 56 and the cell gets labeled. Then, the labeling state s i,j of cell j in sample i is a Heaviside function H which 57 changes the state from zero (unlabeled) to one (labeled): where g i,j ∈ {0, 1} determines if that cell proliferates, τ i,j is the cell cycle time of that cell (T c ), f i,j ∈ [0, 1] 59 the relative S-phase length of that cell (T s /T c ), a i,j ∈ [0, 1] the age of that cell at t = 0 normalized to T c 60 and t the time after the start of the labeling experiment ( Figure 1A,B,C).

61
In the idealized limit of identical cellular parameters and uniformly distributed cell ages a i,j , we recover 62 the Nowakowski model: where k is the expected number of labeled cells in a sample of n cells and g the probability of a cell to be 64 proliferating. In this limit, the fraction of labeled cells k/n in our model equals the labeling index of the 65 Nowakowski model.

66
In the biological more relevant probabilistic case, we define random variables for the other cell parameters: where S i is the random variable for the labeling state. Since G and the Heaviside function are Boolean, S i 80 is also Boolean and thus it follows a Bernoulli distribution. its probability p(τ i , g, f, σ c ; t i ). Therefore, the probability for a cell to be labeled in the sample i is given by: In the probabilistic case, the counting process of labeled cells itself needs to be considered because the number of observed cells is relatively small which introduces noise. Here, it is modeled by a Binomial process B(n i , p i ). The number of labeled cell in the sample i equals the number of successes in a Binomial process using n i trials and p i as success probability, where n i corresponds to the number of cells observed in one sample: In order to generalize this distribution for different samples, it is interpreted as the conditional probability P (K = k|T = τ i ) of finding one sample i in the overall experiments. Consequently, the law of total probability can be applied in order to calculate the probability to find k labeled cells independent of a specific sample: Figure 3 depicts the perfect match between the probabilistic simulation and the analytic solution (Eq. 14) of our probabilistic model. The pmf solves the inverse problem and depends on the following set of parameter: τ, g, f, σ c , σ. The pmf with the parameters φ and the experimental data k = {k 0 , k 1 , ..., k n } at t = {t 0 , t 1 , ..., t n }) build the likelihood function L for the cell labeling experiment: To obtain parameter estimates with the likelihood (Eq. 16) and test its accuracy, we preform a parameter 97 recovery study under realistic conditions. We simulate an experiment as in Figure  where just the bias will appear more clearly. Additionally, our model is able to produce accurate error 126 intervals resulting in the targeted accuracy independent of the number of measurements (Table 1)   Web service for parameter estimation