Comparison of Tug-of-War Models Assuming Moran versus Branching Process Population Dynamics

Mutations arising during cancer evolution are typically categorized as either ‘drivers’ or ‘passengers’, depending on whether they increase the cell fitness. Recently, McFarland et al. introduced the Tug-of-War model for the joint effect of rare advantageous drivers and frequent but deleterious passengers. We examine this model under two common but distinct frameworks, the Moran model and the branching process. We show that frequently used statistics are similar between a version of the Moran model and the branching process conditioned on the final cell count, under different selection scenarios. We infer the selection coefficients for three breast cancer samples, resulting in good fits of the shape of their Site Frequency Spectra. All fitted values for the selective disadvantage of passenger mutations are nonzero, supporting the view that they exert deleterious selection during tumorigenesis that driver mutations must compensate.


Introduction
As demonstrated in the seminal paper [12], mutations in different cancers vary substantially in counts and patterns.These differences reflect distinct defects in DNA repair mechanisms, cancer exposures, and cell types.The authors also reported evidence for 'driver' mutations in about 120 genes, which contribute to tumorigenesis.However, the majority of somatic mutations are likely 'passengers', which do not have an effect on tumor progression.In the language of population genetics, driver mutations are selectively advantageous to cancers, while the passengers are at best neutral.
The concept of a model involving the joint effect of rare advantageous and frequent neutral or slightly deleterious mutations can be applied to describe evolution of cancer genes.To the best of our knowledge, such model was first introduced by McFarland and co-authors, in a series of publications [18,19,20], and named the Tug-of-War model to reflect the competition between driver and passenger mutations.
The original Tug-of-War model [18] assumed that the cell death rate increases with the number of cells in the population increasing, which creates a mechanism for limiting the eventual tumor size.
In other papers [16,17], a Moran model was used for the population process, which provides a strict bound on the number of proliferating cells (see relevant discussion in [16]).Another assumption of [18] was that driver mutations become instantly fixed in the population, which may be acceptable under very strong selection (for mathematical details, see Bobrowski et al. [2]), but in general it is not satisfied.
The literature includes many examples of comparisons of how mutation, drift and selection interact in different population dynamics frameworks such as branching process versus Moran model [3,4] or Wright-Fisher model with population of varying size [5,6].Two possible modes of selection in cell populations are "crowding out" in which a faster-growing clone makes the slower-growing one rare to the point of being negligible, or "competitive replacement", in which individual cells inhibit each other's replacement by a descendant.Supercritical branching process models lead to the former, while the Moran and Wright-Fisher models to the latter.A version interpolating between these two mechanisms is the well-cited Gerrish and Lenski model [11].We will return to these models in the Discussion.
In the present paper, we compare the Tug-of-War in the multitype Moran model with constant population size and a critical multitype branching process.The latter is conditional on non-extinction or other restrictions.We explore similarities and differences between the two types of selection in cell populations.This contributes to the ongoing discussion of which models are most appropriate for proliferating cell populations under drift, mutation, and selection.
We begin with mathematical definitions of the two versions of Tug-of-War process.Then we present simulation results, which demonstrate the differences between the long-term behavior of the two versions under different selection scenarios.Finally, we infer the selection coefficients for some breast cancer samples using the Moran framework, and cross-examine the fitted parameters with the branching process.
2 Models and Data 2.1 Moran process Tug-of-War models Model descriptions in this section, are essentially summaries of the descriptions in [16] and to avoid redundancy, we summarize only the most essential features.For more detailed descriptions, see [16].

Model A
In this model (Figure 1A), we contextualize the Tug-of-War within the Moran model framework with multiple allelic types.We examine a population with a constant size of N cells.Each cell i is described by a pair of integers γ i = (α i , β i ), where α i and β i represent the quantities of drivers and passengers in its genotype, respectively.The cell's fitness is then defined as where s > 0 is the selective advantage associated with a driver mutation, while d ∈ (0, 1) represents the selective disadvantage of a passenger mutation (selection coefficients, of driver and passenger mutations; see the Natural Selection chapter of the book by Durrett [9]).Under the time-continuous Markov Chain assumption, the time until the next death -replacement event is exponentially distributed with parameter Σ f = i∈{1,...,N } f i .The dying cell i is selected from a distribution biased by fitness, i.e., with probability mass function (pmf) {f i /Σ f , i = 1, . . ., N }.The cell j that replaces the dying cell is selected from the same distribution.
The time until the next mutation event is exponentially distributed with parameter Nµ, where µ is the mutation rate per cell.The cell selected to mutate is chosen uniformly among the N cells.Its state then changes from (α, β) to (α + 1, β) with probability p ∈ (0, 1), or to (α, β + 1) with probability 1 − p.In combination, the time to the next event is random and exponentially distributed with parameter called the total rate of death -replacement and mutation events.

Model B
Model B (Figure 1B) is defined similarly to Model A, with the time until the next death -replacement event being exponentially distributed with parameter Σ f .However, instead of being biased by fitness, the dying cell is chosen among all N cells uniformly.The time until the next event is likewise exponentially distributed with the same parameter in Equ.(1).Moran models: N , constant cell count in the process; i, cell dying and to be replaced; j, cell replacing cell i; Notation for branching process: N (t), cell count at time t; i, cell chosen for division; {g 0 , g 1 , g 2 }, progeny count distribution; X, progeny count of cell i; N (t + T ), cell count at time t + T after division event.

Model A versus Model B
As we noted in [16], the crucial difference between Model A and Model B lies in the expected fitness increment in the population after the death -replacement event.This is equal to the difference of f j − f i , where f i and f j are the fitnesses of the dead cell and the replacing cell, in the absence of mutations.The mutation-selection balance condition was derived in [2] in the form of When ps > (1 − p)d, drivers "dominate" over passengers, and the reverse occurs if ps < (1 − p)d.
The expected fitness is intact after the death -replacement process in Model A, hence the expected fitness trend follows the mutation-selection condition.In Model B, the outcome is more complex (see Results and Discussion).

Branching process model
For the branching process model (Figure 1C), we consider a population consisting of N (t) cells at time t.Similar to the Moran models, the fitness of a cell i of type (α i , β i ) with α i drivers and β i passengers is defined by Two possible event types can occur in the branching process: cell division and mutation.The time to the next cell division is exponentially distributed with rate Σ f = i∈{1,...,N (t)} f i , fitness sum of all N (t) cells.The dividing cell is chosen from the N (t) cells with probability weighted by fitness {f i /Σ f }, and its progeny count follows a given pmf {g 0 , g 1 , . . .}.If the progeny count is 0, the population loses the chosen cell.On the other hand, it survives if the progeny count is 1, and multiplies if the progeny count is more than 1.Additionally, the time to the next mutation event is exponentially distributed with rate µN (t).A uniformly chosen cell acquires a driver and changes its type to (α + 1, β) with probability p, or it acquires another passenger to become type (α, β + 1) with probability 1 − p.

Criticality and conditioning
One important difference between the Moran and branching process settings is that the total cell count at any time remains constant at N in the Moran models.The branching process model starts from the same cell count, i.e.N (0) = N , but it can change at random throughout time.For a direct comparison to Moran models, we may assume that E(N (t)) = N .This is satisfied if we require that the mean progeny count of any cell is 1, i.e. that the branching process is critical [1,15].
Even with criticality, at any time N (t) can fixate at 0 (in which case the process enters extinction) or increase to a larger count.The probability of extinction increases as either t increases ( [15], Section 3.3) or the cell fitness increases as a result of time scale change.We analyze the effects of two types of conditioning on the results of the branching process.The first type conditions the branching process on non-extinction, meaning N (t f ) > 0 at the final time t f .The second type imposes a more stringent condition on the branching process, requiring that for some small constant c.

Distribution of progeny cell counts
We also study the impact of the distribution of progeny cell count {g k , k = 0, 1, . . .} on the outcomes of the branching process.In this study, we impose g k = 0 for k > 2. Therefore, a chosen cell dies if k = 0, remains unchanged if k = 1, or divides into two progeny cells if k = 2.The criticality requirement, discussed in the previous section, is satisfied if g 0 = g 2 .Note that the branching process setting discussed in this paper is equivalent to a birth-death process where each cell i with fitness f i dies with rate g 0 f i or divides with rate g 2 f i .
A common progeny count distribution is such that g 0 = g 2 = 0.25 and g 1 = 0.5, which is equivalent to a binomial distribution with rates n = 2, p = 0.5.For a direct comparison in simulations to the Moran models, the fitness in the branching process is scaled up by 4. Hence, the wait time until the next division event of a cell with fitness f i is exponentially distributed with rate 4f i .This event has equal probability to be a cell death or a cell division, both at 0.25.Therefore, the model is equivalent to a birth-death process, where the birth and death rates are 0.25 × 4f i = f i .In comparison, in the Moran models, the death -replacement events also occur at rate f i , resulting in two cells being chosen to divide and die, respectively.Because the event rates are now similar, it is easier to directly compare the model behaviors under different selection scenarios.
We will also investigate the effect of changing the progeny cell count distribution {g 0 , g 1 , g 2 }.
First, we retain the criticality by assuming g 0 = g 2 , and analyze the branching process with different values for g 1 .We set g 0 = g 2 = 0.5 and g 1 = 0, which doubles the probabilities for cell division and death events.This is similar to increasing the birth and death rates in a birth-death process, hence we name this model "fast BP".We also consider g 0 = g 2 = 0.05 and g 1 = 0.9 ("slow BP"), which decreases the cell division and death probabilities.Second, we investigate the supercritical branching process by setting g 2 > g 0 .For each of these parameter sets, we will analyze the differences in sample statistics, compared to the binomial branching process and the Moran models.

Site Frequency Spectrum
As in other preceding paragraphs, in this section, we include a summary of the descriptions in [16].
One of the common summary statistics of the sequence data is the so-called Site Frequency Spectrum.In a sequencing experiment with n cells, we can estimate for each novel somatic mutation call the number of cells carrying that mutation.The number S n (k) of mutations present in k cells is put into a vector (S n (1), S n (2), . . ., S n (n − 1)) called the Site Frequency Spectrum, abbreviated to SFS.
Frequently number of cells that were sampled is not know, as for example in the bulk sequencing data.However, we can estimate the relative proportion of the mutant at each site, and so arrive at a frequency spectrum based on proportions with notation S(x) = S(k/n), where x is treated as a continuous variable, such that x ∈ (0, 1) (or x ∈ (0, 1/2) if we consider diploid genome).The SFSs S(k) and S(x) are idealized versions of the empirical variant allele frequency (VAF) graph.It is convenient for reasons explained in [16] (Section 2.4) to employ the cumulative tail of the SFS S(x) 2.4 DNA Sequencing of Cell Samples from Breast Cancer Specimens

DNA sample collection and processing
Most of the details of DNA sample collection and processing in this section, overlap with those in [16] and we refer to the description there.We only mention here several basic details.Tissue samples from primary breast tumor were collected at the Department of Applied Radiology of the Maria Sklodowska-Curie National Research Institute of Oncology, Krakow Branch in Poland.The set of tumor and normal control samples called specimen G2 is HER2+ breast cancer, while sets described as G32 and G41 are triple-negative breast cancer type and luminal A type, respectively.

Removal of FFPE artifacts
Fixation of tissues in formalin leads to deamination of cytosine to uracil, which can be recognized by sequencing as C>T or G>A type modifications [8].
A significant portion of the variants detected in our WES data are a possible artifact of sample fixation in formalin.This is indicated primarily by the statistics of the number of variants of a specific type, where C:G>T:A definitely dominates (Table 1 The reason for such a large number of this type of changes may be the duplication of variants related to deamination in PCR amplification (necessary in the case of WES, especially in the case of samples with a low amount of DNA).
Omitting all C:G>T:A variants would result in the loss of approximately 1/6 of the true variants.
However, information about the frequency of reads with a specific orientation can be used to identify variants associated with the method of sample fixation.For this purpose, the SOBDetector [7] program was used.The software is based on the fact that formalin fixation most likely affects only one of the DNA strands (the C:G pair becomes the T:G pair) and therefore the pairedend next-generation sequencing approach can help this additional filtering step.By counting not only the number of reads supporting alternative alleles, but also the relative orientation of the reads (Forward-Reverse:FR or Reverse-Forward:RF), these FFPE artifacts will likely have a strand orientation bias toward one of the directions, while true mutations should have approximately the same number of FR and RF reads.
This work uses data in which the expected FFPE artifacts have been filtered out by SOBDetector.

Behavior of Moran and branching process models in extreme cases
We investigate the similarities and differences between Moran and branching process models.One thousand simulations are performed for each model, and each simulation starts with N = 100 cells with no mutations at t 0 = 0 under different values for the selection coefficients s and d, mutation rate µ and probability p of driver mutations (or equivalently probability 1 − p of passenger mutations).
We then examine the statistics at final time t f = 100 as well as during the entire time line [t 0 , t f ].
Five versions of branching processes are studied.This includes the branching process with g 0 = g 2 = 0.25 and g 1 = 0.5 conditioned on non-extinction (yellow), and the same branching process conditioned on the final population N (t f ) restricted in [90, 110] (orange).These models are termed "binomial BP" in the comparisons, since the progeny cell count distribution is binomial in this case.We also include the branching process with g 0 = g 2 = 0.5, g 1 = 0 (purple, termed "fast BP") and g 0 = g 2 = 0.05, g 1 = 0.9 (cyan, termed "slow BP"), both conditioned on The fifth branching process model is conditioned on non-extinction with g 0 , g 1 and g 2 computed such that the population size is expected to double between [t 0 , t f ] under neutral evolution.It can be shown that g 0 and g 2 are required to satisfy We choose g 1 = 0.5, resulting in g 0 = 0.2465 and g 2 = 0.2535.This model is referred to as "supercritical BP" (gray) in the numerical comparisons.Finally, Moran A and Moran B are represented in dark blue and green, respectively.
Since we scale up the fitness by 4 in the branching process models to make them similar to the Moran models, in the results we scale the fitnesses down by 4, for more convenient comparisons.
The division count, defined to be the total number of cell division events observed in a simulation, increases linearly with the total cell count and cell fitness, i.e. the rate at which cells divide.Since the expected cell count in a critical BP is identical to the Moran models, the division count in BP is 4 times higher than in the Moran models due to the fitness being scaled up.Therefore, we also downscale the BP division count by 4 to directly compare between different models.

Neutral evolution case
Figure 3 presents the simulated results for s = d = 0, which implies that all mutations are neutral, µ = 0.1 and p = 1/11.Since all cells have the same fitness, the formulations for Moran A and Moran B are identical.This is reflected in identical distributions among all statistics among the two variations (see Appendix C, Table 2, dark blue for Moran A and green for Moran B).Moreover, the binomial BP conditioned on N (t f ) ∈ [90, 110] also has the same distributions of allele counts and singleton counts, albeit with slightly higher variances (at final time in Figure 3B-C and throughout history in Figure 3K-L).The higher variances originate from wider distributions of event counts in the BP compared to Moran, these latter stemming from the fact that total cell count in BP varies throughout time, as opposed to Moran where the total cell count remains constant (Figure 3D-G).
Note that even though the statistics have higher variances, their averages are similar to the Moran models both throughout time and at the final time (Appendix C, Table 2, orange).
The impact of relaxing the conditioning of BP can be observed by comparing binomial BP conditioned on N (t f ) ∈ [90, 110], binomial BP conditioned on non-extinction and supercritical binomial BP with g 0 = 0.2465, g 1 = 0.5 and g 2 = 0.2535, conditioned on non-extinction.As the population size can vary widely throughout time if conditioned only on non-extinction (Figure 3G), every statistics in the comparison has much higher variances (Appendix C, Table 2, yellow for critical BP and gray for supercritical BP).However, for critical BP, the averages remain faithful to both Moran models and the more stringently conditioned BP.Only in the case of supercritical BP do all statistics differ, even cumulative mutation count (Figure 3H) and cumulative division/replacement count (Figure 3J), which is strictly associated with rapidly growing population size.
We then evaluate the impact of changing the progeny cell count distributions, while retaining criticality.The fast BP has increased g 0 and g 2 and therefore is equal to a birth-death process with higher rates.This leads to higher variances in the population size throughout time compared to the binomial BP, even if similarly conditioned (Figure 3G, Appendix C: Table 2, purple).
Importantly, the fast BP also results in both less alleles and lower percentage of singletons within all alleles (Figure 3B-C, K-L).Conversely, the slow BP is equivalent to a birth-death process with lower rates, whose population size therefore varies less than the binomial BP (Figure 3G).
Both its allele count and percentage of singleton count is much higher than in the binomial BP    2) is satisfied, therefore the average fitness remains constant for Moran A model (Figure 4I).Moreover, the fitness in binomial BP also remains unchanged on average over time.As a result, the distributions for all statistics are similar between the balanced evolution setting and the neutral evolution setting (Appendix B , Table 3), discussed in the last section, for Moran A and binomial BP.The outcomes following modulation of the progeny cell count distribution or relaxing the conditioning for BP, likewise remain unchanged compared to the neutral evolution setting.

Driver domination case
To understand the consequences when the driver mutations are strongly selectively advantageous, we set s = 0.25, d = 0, µ = 0.1 and p = 1/10 and compare the statistics in Figure 5 and in Appendix B , Table 4.
Because condition (2) is no longer satisfied, the fitness coefficients and mutation rates are not at equilibrium and fitness in Moran A increases over time (Figure 5I), leading to an increase in its replacement count (Figure 5J) as compared to the neutral or balanced evolution setting.As a result, the counts of both alleles and singletons are slightly lower in the selective evolution as compared to previous settings (Figure 5B-C).The same is also true for all remaining models.Remarkably, binomial BP still behaves identically to Moran A, differing only in population size over time (Figure 5G).As before, relaxing the conditioning on binomial BP leads only to higher variances of the statistics without changing their averages.Like in previous examples, supercritical BP has much higher averages and variances in all statistics than other models, both at the end of simulation as well as throughout time.Similarly as in the previous case, changing the progeny cell count distribution in BP while retaining criticality results in allele count and singleton count converging to different values (Figure 5B-C, K-L).
As in the balanced evolution setting, the only model differing in fitness from the remaining critical processes is Moran B (Figure 5I), this time resulting in twice as many replacements compared to other models (Figure 5F, J).In later moments of the simulation, the number of replacements is even higher than in the critical BP.Consequently, in Moran B, the counts of alleles and singletons decrease at a fast rate after reaching maximum values (Figure 5B-C, K-L).

Passenger domination case
Finally, we investigate the setting where passenger mutations are strongly deleterious, with parameters s = 0, d = 0.5, µ d = 0.1 and p = 1/10 corresponding to Figure 6.
In Moran A, as cells accumulate increasingly more mutations, their fitness decrease to 0 because of the passenger mutations' deleterious coefficient (Figure 6I), therefore they stop dividing (Figure 6J).However, the mutation process depends only on the population size (Figure 6G) and therefore occurs at a constant rate throughout time (Figure 6H).The consequence is that every cell sooner or later would acquire a unique mutation, therefore the cell population almost consists only of singletons (Figure 6B-C).
As is the case for other settings, binomial BP matches the average statistics from Moran A throughout history and at the final time (Appendix B , Table 5).The same is true for relaxing the conditioning on binomial BP, which only increases the variances.However, unlike the selective evolution setting, altering the progeny cell count distribution does not change the steady state values for the allele and singleton counts, as both converge to the same distributions as binomial BP and Moran A (Figure 6B-C, K-L).Nonetheless, compared to these models, the fast BP converges faster and the slow BP takes more time to converge.
The deleterious evolution setting is the only scenario in which supercritical BP behaves similarly to most of the other models.While fitness tends to zero, cells stop dividing an the impact of the supercriticality is no longer significant.
Finally, as usual, Moran B has a much higher steady state fitness compared to other models (Figure 6I).Therefore, although the replacement count is lower than in the balanced or selective evolution settings, cells do not stop dividing as is the case with Moran A, because the fitness does not converge to 0 (Figure 6E, I).This results in much lower allele count and singleton count (Figure 6A-B, J-K).Despite accumulating passenger mutation, after the initial period of dropping the average fitness stays at nonzero value, due to the drift process favoring fixation of clones with higher fitness.

Fitting breast cancer SFS
We use the Moran A model to fit the mutational SFS from 3 samples of breast cancer.We fix population size N = 100 cells, average time between mutation events L = N µ = 6, probability of driver mutations p = 0.01, final time t f = 100.We then vary the values for s and d, simulate the SFS from 1,000 simulations and compute the average SFS.For a given sample, we compute the cumulative tail of the SFS S(f i ) i.e., the proportion of mutations occurring at frequencies > f i , and similarly the average {S(f i |s, d)} for every combination of (s, d).The reverse cumulative SFS is evaluated for mutations with frequency f > 0.05.The error for (s, d) is defined as where I is the largest index such that S(f i ) and S(f i |s, d) are both positive.The range of best d parameter values does not vary among cases, with the G2 sample (Figure 7, panel A) having greater tolerance for changes in this parameter value.The impact of the s parameter on the SFS tail shape is significant: the range of the best fits varies between cases.The value of s is small for sample G2 (Figure 7, panel A) and slightly bigger for G32 (Figure 8, panel A).In the case of sample G41, the best fit was obtained using a relatively high s parameter value (Figure 9, panel A), indicating strong selection.
We compare the SFS from the 100 best (s, d) parameters against the breast cancer data-based SFS (Figures 7-9, panel B).The SFS from each sample is well fitted with Moran A, not only with the optimal parameter set but also with other (s, d) combinations.The fit is particularly good for the region of SFS with low frequency (f < 0.2).There are exponentially fewer mutations occurring at larger f , resulting in relatively higher discrepancy in the long tail of the SFS from Moran A.
However, the overall shape of the observed SFS can be fitted well by the Moran A model.
We also compare the SFS from the binomial BP (g 0 = g 2 = 0.25, N (t f ) ∈ [90, 110]), using the 100 best (s, d) parameters from Moran A inference (Figures 7-9, panel C).Similarly to Moran A, the SFS from BP can fit the shape of the observed SFS tail well.This is consistent with our finding from the previous section that the binomial BP with tight conditioning on the final population size behaves similarly to the Moran A model, resulting in similar statistics under a range of selection scenarios.Each SFS from Moran A or binomial BP is averaged from 1,000 simulations.

Discussion and Conclusion
As it has been expected, Moran model A behaves comparably to the binomial BP conditioned on final population size being close to the initial count.This manifests in similar statistics under different extremes of selection scenarios (Section 3.1), including values that are observable in sequencing samples.Crucially, SFS fitting results for experimental breast cancer data (Section 3.2) are similar between the two models.This finding might hold mathematical importance, and requires further investigation.Moreover, interestingly, the similarity between Moran A and binomial BP becomes more pronounced as the latter is conditioned more tightly to resemble the constant population size expected in Moran models.When conditioned only on non-extinction, the population size of each BP realization may deviate significantly (Figures 3-6G).This leads to higher variances in allele and singleton counts (Figures 3-6B-C, K-L) as well as mutation and division/replacement counts (Figures 3-5D-F, H-N), although the means of these statistics remain similar (Appendix B, Tables 2-5).However, the high population size variance also results in different SFS in non-extinction BP compared to tightly conditioned BP and Moran model A (Figures 3-6A).
Our Moran model B, similar though not identical to the model introduced in [2], exhibits a phenomenon known as the drift barrier, which prevents the deleterious passenger mutations from dominating temporal trends in fitness, even under mutation-selection balance condition tipped in their favor (sp < dq).Indeed, under this condition, due to drift, the fitness may decrease or increase depending on how much smaller sp is than dq.In addition, fitness generally increases at mutation-selection equilibrium (sp = dq).On the contrary, trends in Moran model A and binomial BP model follow the mutation-selection condition.
The effect of increasing fitness in model B was already predicted in [2] and described in [16].
While in Moran A fitness stays constant (as expected), in the case of Moran B, clones with higher fitness are favored, even for the same initial conditions and in absence of new mutations (see section 3.1.1 in [16]).This behavior results from the difference between Moran A and Moran B in the expected change in population fitness after a death-replacement event.As shown in Eqs. ( 4) and ( 5) in [16], the expected fitness change is 0 in Moran A and is ≥ 0 in Moran B. In general, fitness in Moran A depends only on the balance between drivers and passengers, while the trends in Moran B are more complex, as explained mathematically and confirmed by simulations in [2].The drift and selection pattern in Moran B biases it toward increasing fitness.
Among the BP variations, the supercritical model behaves differently in all cases compared to other models, due to the difference in population size growth rate (Figures 3-5G).The only exception is the deleterious evolution setting, in which the impact of supercriticality is less prominent, since the fitness being close to zero means cells stop dividing.In this scenario all statistics are much more similar to those obtained from other models (Appendix C, Table 5).
Our experiments with different progeny cell count distributions in BP show that the fast BP always has higher variances in the population size throughout time compared to the binomial BP, even if similarly conditioned (Figures 3-6G).The fast BP also results in both less alleles and lower percentage of singletons within all alleles (Figures 3-5B-C, K-L).This is not observed in case of deleterious evolution (Figure 6B-C, K-L), where there is only a difference in the rate of reaching the steady state, which is the same as for other models.On the other hand, the averages of mutation and division/replacement event counts (Figures 3-6D-F and Appendix C, Table 2-5) do not differ from averages for Moran model A and binomial BP.Reversely, the population size in slow BP varies less, and the allele count and percentage of singleton count is much higher, compared to the binomial BP.
There are features shared between all Moran models and BP variations across different selection scenarios.The more division/replacement events occur during a simulation, the less alleles and singletons we observe both at the sampling time point as well as throughout tumor history.This is especially pronounced in case of deleterious evolution, where in Moran B continuously dividing population under selective pressure prevents the accumulation of singletons (Figure 6B-C, K-L).Conversely, if the events occurring during a simulation are dominantly mutations, then the population consists of more alleles and singletons.In conclusion, across models, higher selection is associated with less alleles and singletons, higher pace of allele death, and cumulative SFS with fat tail.
It seems relevant to note that the frequently cited reference by Gerrish and Lenski [11] introduces a model of competition in populations of constant size in an asexual population.From the reading of this corner-stone paper, it seems that it uses results from supercritical branching processes and then just scales them intuitively into the constant population size framework.This method seems not mathematically rigorous.Our comparison of Moran and branching process models identifies subtle but important differences between the two approaches.Overall, we have shown that the critical binomial BP and the Moran A model behave similarly in the Tug-of-War setting under distinct selection scenarios.This finding is relevant for improving simulating efficiency and optimizing model inference.Branching process and Moran model remain the two main stochastic modeling approaches in population genetics, where they provide the theoretical framework to uncover a tumor's history from sequencing snapshots.However, BP simulation is considerably more time-consuming, as the cell population size can change arbitrarily due to random fluctuations.This problem is exacerbated in critical or near-critical BP, which is applicable for modeling many cancers.In this setting, the BP often has high probability of extinction, hence the high fraction of simulations that have to be discarded makes model inference computationally costly.In such cases, it would be more efficient to employ Moran A, which we have shown to provide comparable sample statistics and which is easier to implement.However, more work is needed to establish the theoretical equivalence between the Moran A model and the critical BP, and if this compatibility breaks down under certain conditions.
Both the Moran A model and the binomial BP can fit the SFS tail in our breast cancer samples well.However, the inference is complicated by a wide range of selection coefficients that result in equally comparable SFS to the data.These coefficients exhibit a trade-off between driver and passenger mutations, as the same SFS can result from driver mutations being more advantageous if the passenger mutations are also more deleterious, and vice versa.Therefore, the mutational SFS alone is not adequate to differentiate between these different selection settings.Separately, we found that the inference for all of our samples requires d > 0, confirming the observations from McFarland et al. [19] that passenger mutations exhibit a deleterious effect during tumor progression.

Figure 1 :
Figure 1: Graphical depiction of cell death and division events in (A) Moran A model, (B) Moran B model, and (C) branching process.Notation for all models: t, current time; T , time to next event; f (α, β), fitness of cell with α drivers and β passengers; Σ f = k f (α k , β k ).Notation for

Figure
Figure3Ndetails the rates at which alleles are lost from the population, divided into two

Figure 4 :
Figure 4: Comparisons between Moran and branching process (BP) models in the "balanced" setting.(A) Average cumulative tail of the mutational Site Frequency Spectra.(B) Distributions of allele counts at t f .(C) Distributions of singleton counts at t f .(D-F) Distributions of counts of driver mutations (D), passenger mutations (E) and divisions (F) within [t 0 , t f ]. (G-N) Trajectories of the averages over time of population sizes (+/-std) (G), cumulative mutation counts (H), fitness (I), cumulative division/replacement counts (J), allele counts (K), percentage of singletons among all alleles (L), allele birth counts (M) and allele death counts (N).Allele death counts (lines) are categorized into mutation events (circles) and division/replacement events (diamonds).Dark blue = Moran

Figure 6 :
Figure 6: Comparisons between Moran and branching process (BP) models in the "deleterious" setting.(A) Average cumulative tail of the mutational Site Frequency Spectra.(B) Distributions of allele counts at t f .(C) Distributions of singleton counts at t f .(D-F) Distributions of counts of driver mutations (D), passenger mutations (E) and divisions (F) within [t 0 , t f ]. (G-N) Trajectories of the averages over time of population sizes (+/-std) (G), cumulative mutation counts (H), fitness (I), cumulative division/replacement counts (J), allele counts (K), percentage of singletons among all alleles (L), allele birth counts (M) and allele death counts (N).Allele death counts (lines) are categorized into mutation events (circles) and division/replacement events (diamonds).Dark blue = Moran

Figures 7 -
Figures 7-9 present the fitting results for the SFS from the breast SFS data.The (s, d) combinations with low error exhibit a trade-off between driver and passenger mutations: the observed SFS can be simulated by Moran A with either low values for s and d, or high values for both (panel A in each figure).As a result, the 100 best (s, d) parameters (marked as squares) can be approximated by linear regression.

Figure 7 :
Figure 7: Results from fitting the SFS from sample G2. (A) Heatmap of error between data SFS tail and average SFS tail for distinct (s, d) combinations in Moran A. Dark blue squares: 100 best (s, d) combinations, with linear regression.Dark red square: best (s, d) combination.(B) Comparison between sample SFS (black line), average SFS under Moran A from 100 best (s, d) combinations (dark blue dots) and the best (s, d) combination (dark red dots).(C) Comparison between sample SFS (black line), average SFS under "binomial BP" (g 0 = g 2 = 0.25, N (t f ) ∈ [90, 110]) from 100 best (s, d) combinations (red dots) and the best (s, d) combination (dark red dots).Each SFS from Moran A or binomial BP is averaged from 1,000 simulations.

Figure 8 :
Figure 8: Results from fitting the SFS from sample G32.(A) Heatmap of error between data SFS and average SFS for distinct (s, d) combinations in Moran A. Dark blue squares: 100 best (s, d) combinations, with linear regression.Dark red square: best (s, d) combination.(B) Comparison between sample SFS (black line), average SFS under Moran A from 100 best (s, d) combinations (dark blue dots) and the best (s, d) combination (dark red dots).(C) Comparison between sample SFS (black line), average SFS under "binomial BP" (g 0 = g 2 = 0.25, N (t f ) ∈ [90, 110]) from 100 best (s, d) combinations (red dots) and the best (s, d) combination (dark red dots).Each SFS from Moran A or binomial BP is averaged from 1,000 simulations.

Figure 9 :
Figure 9: Results from fitting the SFS from sample G41.(A) Heatmap of error between data SFS and average SFS for distinct (s, d) combinations in Moran A. Dark blue squares: 100 best (s, d) combinations, with linear regression.Dark red square: best (s, d) combination.(B) Comparison between sample SFS (black line), average SFS under Moran A from 100 best (s, d) combinations (dark blue dots) and the best (s, d) combination (dark red dots).(C) Comparison between sample SFS (black line), average SFS under "binomial BP" (g 0 = g 2 = 0.25, N (t f ) ∈ [90, 110]) from 100 best (s, d) combinations (red dots) and the best (s, d) combination (dark red dots).

Table 1 :
Statistics of the number of variants of a specific type.
). Patient ID C:G>A:T C:G>G:C C:G>T:A T:A>A:T T:A>C:G T:A>G:C sum