Modelling somatic mutation accumulation and expansion in a long-lived tree with hierarchical modular architecture

In a long-lived organism with a modular architecture, such as trees, somatic mutations accumulate throughout the long lifespan and result in genetic mosaicism in each module within the same individual. In recent years, next-generation sequencing technology has provided a snapshot of such intra-organismal genetic variability. However, the dynamic processes underlying the accumulation and expansion of somatic mutations during the growth remain poorly understood. In this study, we constructed a model to describe these processes in a form that can be applied to a real tree. Given that the proliferation dynamics of meristematic cells vary across plant species, multiple possible processes for elongation and branching were comprehensively expressed in our model. Using published data from a poplar tree, we compared the prediction of the models with the observation and explained the cell lineage dynamics underlying somatic mutations accumulation that were not evident from the snapshot of the sequenced data. We showed that the somatic genetic drift during growth increases inter-meristem mosaicism, resulting in genetically distinct branches and less integrity within an individual tree. We also showed that the somatic genetic drift during branching leads to the mutation accumulation pattern that does not reflect the tree topology. Our modelling framework can help interpret and provide further insights into the empirical findings of genetic mosaicism in long-lived trees.


30
In long-lived organisms, somatic mutations accumulate during mitotic growth 31 and tissue regeneration because the DNA of somatic cells is continuously exposed to 32 endogenous (e.g., production of reactive oxygen species) or exogenous damage (e.g., 33 ionizing radiation and ultraviolet light) and replication errors. In unitary animals, 145 Branching is the process in which the axillary meristem is generated from cells 146 arising from the apical meristem. We modelled this process by assuming that α stem cell 147 initials proliferate radially by the successive cell divisions of # times (Fig. 2c), and the 148 stem cell initials for the newly formed axillary meristem are sampled from 2 " " cells 149 (Fig. 2d, e). 150 In the sampling process of stem cells for the newly formed axillary meristem,

Modelling mutation
elongation and branching processes at a rate per cell division. The mutation occurs in 6 each of the two daughter cells independently, as assumed in a previous study (Klekowski 179 et al., 1989). When mutation occurs in a stem cell, the mutated genomic site is randomly (2) 187 We define the stem cell state by Eq.
(2), and the meristem state of genomic site k at a 188 branch n is given as ∑ 3$ (4) 3 , the number of mutated cells in the meristem (Fig. 3). In  199 We analysed four extreme models with varying elongation and branching 200 processes (Table. 1) and examined how each process affects the accumulated number and 201 the distribution of somatic mutations across branches. With regard to the elongation 202 process, we focused on structured and stochastic meristems. During elongation of the 203 structured meristem, replacement of stem cell lineages never occurs (Fig. 2a), but in the 204 stochastic meristem, replacement of stem cell lineages frequently occurs (Fig. 2b). With 205 regard to the branching process, we examined unbiased and biased branching, namely, 206 = 10 and 0.5, respectively. If = 10, the sampling probability given in Eq. (1) is 207 close to the uniform distribution, and stem cells for the axillary meristem are sampled at 208 random ( Fig. 2d). However, if = 0.5, the sampling probability centred at = and 209 stem cells for the axillary meristem are sampled more frequently from cells in the vicinity 210 of = (Fig. 2e). Taken together, we focused on 2 × 2 models (Table. 1). 211 212 3.5. Mathematical formulation of somatic mutation accumulation 213 stochastic models, we used a mathematical model proposed by Klekowski et al. (1989) 215 by focusing on the somatic mutation accumulation at a single site in a single branch. Here

216
we briefly explain about the model. Note that, in this mathematical model, we considered 217 a mutation at a single site, whereas mutations at multiple genomic sites are considered in 218 the simulation model described before. In addition, we exclusively focused on the case 219 when ! = 1, as in the application of the simulation model (Table 2). Let i be the number 220 of mutated cells at an arbitrary site in a population of stem cells (0 ≤ ≤ ). We 221 introduced the state vector ( ), which represents the probability that the stem cell 222 population includes i mutated cells at time t: In the structured model, the number of mutated cells changes from to with 225 a transition probability that is given as the binomial distribution as follows: ; otherwise, 37 = 0. By introducing a transition matrix = [ 37 \ (0 ≤ , ≤ )), the 230 state change of the stem cell population in the structured model is given as follows: 239 After a cell division, the number of non-mutated daughter cells becomes 2( − ).

240
Therefore, the probability that daughter cells are selected to be mutated is given as 241 Eq. (7). After the mutation event, the number of mutated daughter cells becomes 2 + .

242
The probability that mutated cells are sampled from 2 + daughter cells and − 243 non-mutated cells are sampled from 2 − (2 − ) daughter cells is given as Overall, the probability that the number of mutated cells changes from to is given 246 as the product for all possible cases as given in Eq. (6) (see Klekowski et al., 1989). By     Table 2. 281 We counted predicted mutated sites in each branch and compared these values 282 with the observed SNP counts for each branch of the poplar tree (Hofmeister et al., 2020).

283
Given that SNP data are derived from the sequence of bulk samples that include many

302
To compare predicted and observed patterns of somatic mutation distribution, 303 we calculated the mean squared error (MSE) that measures the amount of error in the 304 model as follows:  These results demonstrate that the degree of intra-meristem mosaicism is higher 358 in the structured than in the stochastic models. In addition to intra-meristem mosaicism, -biased models, respectively) compared with the structured models (0.0125 ± 0.00255 365 and 0.0221 ± 0.00595 for the structured-unbiased and -biased models, respectively).

366
Greater variation in the frequency of mutated stem cells across meristems highlights the 367 higher degree of inter-meristem mosaicism in the stochastic models. We also found that 368 the variance in the biased models was greater than that in the unbiased models, suggesting

373
The most frequently observed mutated pattern was singleton, in which the 374 mutation was present only at a single branch (Fig. 5). All eight possible distribution 375 patterns of singletons were present in all four models, which was consistent with the data 376 from the poplar tree. Twin mutations in which the mutation was present at two branches 377 were less frequent than singleton mutations (Fig. A3). Among the predicted twin mutation, expanded to branches 2 and 4 but was lost in branch 3, was predicted only in the structured 384 models (Fig. 5a, b). The observed twin mutations in the poplar tree showed the presence 385 of topology-independent twin mutations, although the count was very small. Two types 386 of twin mutations were present in the data but were absent in model predictions (Fig. 5d). branching is expected to have a greater impact within trees with a more complex modular 434 architecture, especially in the species with structured meristem. 435 We also found that the mutation distribution pattern does not necessarily reflect 436 tree topology. Some mutations that did not follow the tree topology were predicted in the 437 structured models and observed in the poplar tree (Fig. 5). Some previous empirical 438 studies preferentially filtered out candidate mutations not following tree topology (Wang branches 4 and 8 were not the main stem but were lateral branches. In this case, these 449 patterns are predictable. In addition, the counts of mutations were much higher in the data 450 from the poplar tree compared with the prediction (Fig. 5). This finding may be because,  Table 1. Four models with different elongation and branching processes.

I. Structured-unbiased model
Stem cell lineages are maintained during the elongation processes.
In the branching process, replacement of stem cell lineages occurs randomly.

II. Structured-biased model
Stem cell lineages are maintained during the elongation processes.
In the branching process, replacement of stem cell lineages occurs in a biased manner. Stochastic      Stochasticbiased M a in a x is L a t e r a l b r a n c h M a in a x is L a t e r a l b r a n c h M a in a x is L a t e r a l b r a n c h M a in a x is L a t e r a l b r a n c h Stochastic-unbiased Stochastic-biased Data from the poplar tree S in g le to n T r ip le t Q u a r te t T w in S in g le to n T r ip le t Q u a r te t T w in S in g le to n T r ip le t Q u a r te t T w in S in g le to n T r ip le t Q u a r te t T w in