Survival of the densest accounts for the expansion of mitochondrial mutations in ageing

The expansion of deleted mitochondrial DNA (mtDNA) molecules has been linked to ageing, particularly in skeletal muscle fibres; its mechanism has remained unclear for three decades. Previous accounts assigned a replicative advantage to the deletions, but there is evidence that cells can, instead, selectively remove defective mtDNA. We present a spatial model that, without a replicative advantage, but instead through a combination of enhanced density for mutants and noise, produces a wave of expanding mutations with wave speed consistent with experimental data, unlike a standard model based on replicative advantage. We provide a formula that predicts that the wave speed drops with copy number, in agreement with experimental data. Crucially, our model yields travelling waves of mutants even if mutants are preferentially eliminated. Justified by this exemplar of how noise, density and spatial structure affect muscle ageing, we introduce the mechanism of stochastic survival of the densest, an alternative to replicative advantage, that may underpin other phenomena, like the evolution of altruism.


Introduction
Mitochondria are central in health and disease [1].Mutations in the mitochondrial DNA (mtDNA) can impair organellar functions [2], with negative impact on cellular physiology.These mutations can be inherited or acquired over time [3].The accumulation of DNA mutations to high levels has been repeatedly linked to ageing [4,5,6], especially in postmitotic tissues such as neurons or muscles [7].In skeletal muscle, when the proportion of mutants in a zone of the fibre exceeds a threshold value [8,9], an oxidative phosphorylation defect is triggered, that is often revealed through deficiency in cytochrome c oxidase activity.Skeletal muscle fibre damage occurring with age leads to a reduction in muscle mass and strength through atrophy and fibre breakage, termed sarcopenia [10].This is one of the first manifestations of old age and has a knock-on effect on general health.Sarcopenia in mammals has been connected to high levels of mitochondrial deletions in skeletal muscle fibres [11,12,13,14].
The high levels of mitochondrial deletions observed in ages skeletal muscle are attributed less to a high mutation rate than to the clonal expansion of these mutations.Indeed, clonality is the defining feature of this expansion.Single-cell studies have consistently reported that fibres in damaged zones of the muscles are taken over by a single deletion.This has been observed in rats [15,13], rhesus monkeys [16,17] and humans [18].The fact that different affected fibres generally have different deletions suggests that founder mutations occur during the lifetime of an individual and then expand clonally [18].The mechanism behind this clonal expansion remains to be established, and has eluded scientists for more than three decades [19,20,21,22,23,24].
In evolutionary theory, if one species (here the mtDNA deletions) invades a system and outcompetes another species (the wildtype mtDNA), it is often concluded that the former has a replicative advantage.This has been the approach followed by numerous authors [24,25,26,27,28], who have made extensive use of mathematical population genetics modelling in order to compare quantitative predictions with experimental data.However, there is no accepted mechanism to justify a replicative advantage for mtDNA deletions that can explain clonal expansion.It was initially proposed that deletions, being smaller, replicate in a shorter time, leading to a replicative advantage [19,29].This was observed in some specific cases, namely for fast-proliferating cells [30] and cells recovering their mtDNA populations after heavy depletion [31].However, through numerical simulations it has been shown that the size-induced advantage cannot explain the observed accumulation in non-replicating muscle fibres, especially in short-lived animals [24].
Other mechanisms have been proposed, termed the vicious cycle [20], survival of the slowest [21], survival of the sick [32] and crippled mitochondria [33], but they were later found implausible and/or incapable of accounting quantitatively for experimental data (see Ref. [34] for a review).Other studies have modelled the phenomenon through a neutral stochastic model, i.e. without assigning a replicative advantage to mutants [22,35,23].In this approach, it is argued that random drift alone is sufficient to explain the clonal expansion.However, a more recent study [25] has shown that, for models like this, neutral random drift can account for clonal expansion only for very long-lived species (see penultimate section of Results).
In this work, we develop and test a mathematically transparent, physically-motivated stochastic model of the expansion of mtDNA deletions, that takes into account the spatial structure of muscle fibres as multinucleate cells, with many nuclei distributed along their length, each one controlling a surrounding population of mitochondria that can diffuse along the fibre (Fig. 1A).All previous models have treated muscle fibres as a single well-mixed region, containing an isolated and unstructured population of mitochondria.
Our work establishes that mitochondrial deletions can expand in muscle fibres without a replicative advantage and, surprisingly, even if mutants are preferentially eliminated from the system.Despite being subject to higher rates of mitophagy -something that finds experimental support [36,37,38,39,40,41] -mutants can come to dominate zones of the fibres.We also predict that the expansion of deletions takes place through a travelling wave of mutants propagating in the muscle fibres.We are able to recapitulate the spatial features of the distribution of the mutant load along fibres and align our quantitative predictions with available experimental data, using a model with only 4 parameters (with each parameter being constrained by auxiliary experimental data).
Our model structure derives from the generalized Lotka-Volterra model [42], although our conclusions are robust to other model choices widely used in mitochondrial genetics.We provide a mathematical expression for the mutant propagation wave speed, finding that it drops as the density of mtDNA in the fibre increases; we corroborate this prediction on further experimental data.In contrast to the model we present, we show that a standard model based on a replicative advantage of mutants produces an implausibly fast clonal expansion, incompatible with experimental data.Finally, a further quantity of interest is the rate of occurrence of founder mutations.We argue that previous neutral models, neglecting the spatial structure of the system, have greatly overestimated this rate.
To the best of our knowledge, the clonal expansion of mitochondrial deletions in skeletal muscle fibres is the first phenomenon with a compelling description as stochastic survival of the densest: a species that outcompetes another due to a combination of increased density, spatial structure and stochasticity, that can counterbalance a higher death rate.We claim that this mechanism might be the driving force behind other counterintuitive evolutionary phenomena, such as the spread of altruistic behaviour, that has analogies to the expansion of deletions.Stochastic survival of the densest is separate from frequency and density-dependent selection (e.g. the Allee effect [43]), and spatiallystructured game theoretic models (see SI.6 for further discussion).

Mean heteroplasmy increases in a spatial stochastic model with preferential elimination of mutants
In our full model, skeletal muscle fibres are schematised as a chain of regions, with each region representing a system of a nucleus surrounded by a population of mtDNA molecules, subdivided into wildtype and mutants.Individual molecules of mtDNA can be degraded, replicate and diffuse along the muscle fibre.We begin by analysing a single region.We will then explore the consequences of allowing two regions to be coupled, allowing mtDNAs to hop between adjacent regions.Extended chains of such regions will be considered in the following section.Importantly, this paper focuses on the dynamics of a single pre-existing mutation that reaches high levels through clonal expansion.We thus do not consider de novo mutations occurring continuously through time.
For one region, we use the stochastic model first introduced in Ref. [44] and further analysed in Ref. [45], that is close to the established model [22].We denote the wildtype and mutant copy number respectively by w and m.Heteroplasmy is the mutant proportion, namely h = m/(m + w).The per-capita degradation rate µ is common and constant; the per-capita replication rate λ(w, m) is also common to both species, but state-dependent, and is λ(w, m) = µ + c(N ss − w − δm), with parameters c, N ss and δ interpreted below.We remark that these birth and degradation rates are the same for mutants and wildtypes: no explicit selective advantage is assumed and the model can be called neutral.The full model is formalised via a reaction network (Fig. 1B) and Eq.(S11)) as a Poisson point process.
The parameter δ is one of the key elements of the model.Setting δ = 1 allows for a differential contribution to the replication rate between mtDNA species [45].In SI.1.1 we show that δ = 1 is equivalent to having a difference in carrying capacity or density between the two species and that 0 < δ < 1, corresponds to mutants being the densest species.In the following it is always 0 < δ < 1.
After a transient, the system hits a state for which λ = µ and then starts fluctuating around the deterministic steady state line, of equation w + δm = N ss in the w − m plane, as depicted in Fig. 1B.Hence the parameter N ss can be interpreted as the target toward which the effective population w+δm is steered.Typical values of N ss in healthy human tissues are 10 3 − 10 4 [46].The parameter c > 0 is a control strength (see SI.1 for interpretation).
As a number of authors have observed [22,23,35,45,47], this model exhibits an increase in mean mutant copy number, in its neutral version defined in Eq. ( 1) and even in the presence of preferential elimination of mutants in the form of a higher degradation rate.Intuitively, while fluctuating around the steady state line the system on average drifts toward regions of higher m (Fig. 1B).We refer to SI.1 for a mathematical treatment and an intuitive explanation of this effect that, as we show below and in the next section, lies at the core of our explanation of muscle ageing.While an increase in mean mutant copy number is known, here we investigate another property: an increase in mean heteroplasmy h in a spatially structured system, whose regions are coupled together and exchange mtDNA molecules.We first couple together two regions (Fig. 1C), with each regions population evolving according to the dynamics described above and with the addition of exchange of mtDNA molecules at a constant rate per molecule γ, an additional parameter whose value we derive from experimental studies (SI.8).In this setting, in addition to the increase in mean mutant copy number we also observe an increase in mean heteroplasmy h i in each region, in a neutral model (Fig. 1D, red line), as well as in the presence of enhanced clearing of mutants (Fig. 1D, green line).In the single region case, mean heteroplasmy is, by contrast, constant and always decreases when mutants are preferentially degraded (Fig. S1B).
Two key elements are needed for a spatially-extended system to exhibit the increase in mean heteroplasmy.First, noise: the effect is not observed in the deterministic version of our model (Fig. 1D, black line), that we define in SI.3.Second, mutants must be the densest species, i.e. 0 < δ < 1.Therefore, we have termed this effect stochastic survival of the densest (SSD).A heuristic explanation is that the mean copy number of the densest species is driven up by stochastic fluctuations over time.Then, when we couple regions together allowing molecules of mtDNA to hop between adjacent regions, the local increase in mutant copy number causes a net flow of mutants into neighbouring regions (Fig. 1C).This, in turn, leads to an increase in the mean proportion of mutants with time.

Stochastic survival of the densest is favoured as a model for muscle fibre ageing
In this section, we ask whether SSD, the effects highlighted in the previous section, can reproduce experimental data on the expansion of deletions in skeletal muscle fibres.We also compare SSD to a standard model based on giving mutants a replicative advantage over wildtypes.Coupling together hundreds of regions along a line we obtain a physical model of a skeletal muscle fibre.
Fig. 2A shows data obtained from human vastus lateralis samples [48]: a high-heteroplasmy zone is present, flanked by transition zones to low or zero heteroplasmy.Previous mathematical models of mtDNA deletion expansion have not addressed this spatial structure.A well-known feature of the clonal expansion of deletions is the high absolute copy number in zones of the muscle fibres taken over by mutants.This is clearly visible in Fig. 2B, which reports data on mtDNA copy number for the same muscle fibre [48].In the zone where only mutants are present the absolute copy number is approximately 5 times higher than in the wildtype-only zone, meaning a fivefold increase in carrying capacity for mutants.A fivefold increase is found also in Ref. [49], for a patient with ophthalmoplegia associated with an mtDNA deletion in muscle.
The functional form of the heteroplasmy profile in Fig. 2A (red fitted line) is convincingly the form expected for a travelling wave (see SI.12), namely a sigmoid 1/(1 + e τ x ) with x being the position along the muscle fibre centred at the point where h = 1 and τ = (2.464• 10 −2 ± 2 • 10 −5 )µm (maximum likelihood estimation, MLE) suggesting that the expansion is a wave-like phenomenon.We have estimated the speed of this wave analysing data from from Ref. [50] on the length of abnormal zones in rhesus monkeys and age of the subject.Regressing the lengths against age (Fig. 2C), we observe a relationship (p = 5 • 10 −4 ) which is approximately linear (R 2 = 0.76) and corresponds to an average wave speed of (0.131±0.025)µm/day.In SI.13 we further discuss the analysis of these data and explain why the true wave speed is likely to be marginally larger.
We consider two ways of modelling mutant expansion and reproducing the fivefold increase in mutant carrying capacity, one corresponding to a mutant mtDNA with a higher replication ratereplicative advantage (RA) model -and the other to our SSD model.A standard RA model reproduces Figure 2: Stochastic survival of the densest predicts a wave-like expansion of mtDNA mutants at a speed in agreement with experimental observations, while a standard replicative advantage model predicts a speed a factor of ≈ 300 too large.A) Spatial profile of mutant fraction (heteroplasmy) along a human skeletal muscle fibre [48].The heteroplasmy profile follows a sigmoid, the shape expected for a travelling wave.(B) Spatial structure of copy number for wildtype (blue) and mutants (orange) for the same muscle fibre as in panel A. The heteroplasmic zones present a higher absolute copy number, i.e. mutants are present at a higher density.(C) Experimental data on the length of abnormal zones of muscle fibres against age in rhesus monkeys, from Ref. [50].An approximate linear relationship is found (R 2 = 0.76, p = 5 • 10 −4 ), compatible with a wave-like expansion with speed (0.131 ± 0.025)µm/day (linear fit).(D) Stochastic simulations of a spatially extended model with a replicative advantage for mutants, with our best estimate of the model parameters (see SI.8) for muscle fibres of rhesus monkeys predicts a wave-like expansion with a speed of 40µm/day, 300 times faster than the observed speed.(E) Simulations of survival of the densest, with same death and replication rate for wildtypes and mutants, yields a mutant wave speed of 0.2µm/day for the fibres or rhesus monkeys, which is comparable with experimental observations (see (C)).(F) Survival of the densest predicts a wave of mutants even when mutants have a degradation rate higher than wildtypes and the same replication rate.Parameters are different from (E) and listed in SI.8. the fivefold increase in carrying capacity through a larger replication rate for mutants.In this model, the wildtype replication rate is Eq. ( 1) with δ = 1, and the mutant replication rate is the same but with N ss replaced by 5N ss (see Eq. (S45)).The death rate µ will be constant and common to the two species.It has been known since the work of Fisher and Kolmogorov [51,52] that an advantageous mutation arising in a spatially extended system can come to dominate it, sweeping it in a wave-like fashion, with a high-heteroplasmic front advancing at constant speed (FK waves henceforth).We have performed stochastic simulations of this RA model, with values of the other parameters estimated from experimental data (SI.8) and initialising the mutant content of the system as seen in Fig. 2A.The simulations show a wave-like expansion with a speed of 40µm/day (Fig. 2D), 300 times faster than the observed 0.131µm/day.We interpret this as strong evidence against the hypothesis of a simple replicative advantage being the cause of the expansion of deletions in skeletal muscle fibres.Therefore, a standard replicative advantage model is not consistent with the observed spatiotemporal dynamics of the expansion of mtDNA deletions in muscle fibres.
In our approach, SSD, we reproduce the increase in mutant carrying capacity, extending our neutral model from the previous section to a chain of regions, using δ = 1/5.In Fig. 2E we show the results of the simulations for our model, with all parameter values specified by the literature (SI.8).The predicted speed is 0.2µm/day, an improvement of two orders of magnitude over the model based on replicative advantage.It is exclusively the stochastic version of the model that exhibits this wave-like expansion; in the deterministic version, the high-heteroplasmy front only diffuses away (Fig. S3A).Moreover, stochastic survival of the densest reproduces the expansion of mutants even if they are preferentially degraded (see Fig. 2F), whereas in a deterministic model a higher degradation rate for mutants leads to their extinction (Fig. S3B).The key ingredients for a spatially-extended system to show SSD are thus stochasticity and differences in carrying capacity, or density, between the two species.
The RA model (FK waves) predicts that wave speed increases when N ss increases [53].SSD predicts, on the contrary, that a smaller N ss yields a faster wave.Intuitively, this is the case because the expansion of mutants is driven by stochastic fluctuations, whose effect generally becomes smaller in larger systems (see also Eq. (S14)).Simulations reported in SI.5 for the neutral model reveal how wave speed varies as a function of the parameters of our model and we find they can be approximated by the phenomenological equation v 2 ss .Eq. ( 2) is also the wave speed of a FK wave but where k is the net replicative advantage k RA .In our case k SSD is interpretable as the effective advantage induced by SSD.

Stochastic survival of the densest is consistent with wider data and the effect is robust to model and parameter variation
The leading edge of a faster wave is flatter than that of a slower wave (e.g.Ref. [54], summarised in SI.12).By exploiting this property, it is possible to compare the speeds of two waves by examining their shapes.Data on muscle fibres for humans [48] (Fig. 3A, B) and rats [13] (Fig. 3C, D) show that flatter waves of mutants propagate along fibres with a smaller N ss .This is in agreement with the prediction of our model and against the predictions for FK waves.
Skeletal muscle fibres can be broadly classified into Type 1 (oxidative) and Type 2 (glycolytic) fibres [55].The former rely on OXPHOS to function and have twice as many mitochondria as the the latter [56,57,50], that depend on glycolysis.It is known that Type 2 fibres are more affected by sarcopenia with ageing [12,58,59,60,61,50].While there are other physiological differences between the fibre types, this is in line with the predictions of our model that a smaller copy number per nucleus produces a faster expansion of mutants.Small, short-lived animals like rodents show sarcopenia on a time-scale of years (from 2 years) and have a larger proportion of Type 2 fibres compared to longlived animals such as rhesus monkeys and humans [62], that exhibit sarcopenia on a longer time-scale (decades).
Studies involving changes to mitochondrial copy number are consistent with an inverse relationship between copy number per nucleus and wave speed.It has been found that copy number depletion caused by antiretroviral therapy [63] or AKT2 deficiency [64] is associated with enhanced sarcopenia.Likewise, statins are well known for increasing the risk of sarcopenia [65,66,67] and have consistently been associated with reduction in mitochondrial copy number [68,69,70].Conversely, increase in mtDNA content through exercise [71,72] or overexpression of TFAM [73] and parkin [74] have been found to protect against sarcopenia and muscle atrophy.If mtDNA deletions had a replicative advantage, the inverse relationship between copy number per nucleus and wave speed would not be observed.
We have obtained probability distributions for the wave speeds predicted by survival of the densest and a RA model (FK waves), by inserting draws from the distributions of parameter values (given in SI.8) into Eq.( 2), with the appropriate interpretation of k for the two models.The predicted distributions are plotted in Fig. 4A together with the distribution of the experimentally observed wave speed obtained via linear fit (Fig. 2C).After accommodating this parametric uncertainty, survival of the densest remains much superior to RA at reproducing the observed speed.Finally, we have verified numerically that our results are robust to a selection of model choices for the control mechanism on the mitochondrial population.The main feature of our model, the travelling waves of mutants whose speed decreases for larger N ss , does not require the specific linear dependence of replication rate on copy number of Eq. ( 1).Other neutral feedback controls are possible, as long as they encode a larger mutant density.In SI.11 we show that a quadratic (Fig. S3C) and reciprocal (Fig. S3D) feedback control produce a wave of mutants qualitatively similar to that produced by linear feedback.

Wavelike spreading implies that low mtDNA deletion rates can nonetheless yield high mutant load
It has often been assumed that neutral stochastic models of mtDNA dynamics cannot explain the mutational load in skeletal muscle fibres of short-lived animals, due to a perceived incompatibility between the de novo mutation rate of mtDNA polymerase and observed mutational loads [25,15,13,34].This is one of the reasons that has motivated theorists to develop RA models.However, previous attempts to understand mutant proliferation in muscle fibres have modelled the system as a bulk of well-mixed mtDNAs [35,25,26].In Fig. 4B we show the evolution of a chain of regions with N ss = 1001 initialized with a single mutant molecule of mtDNA.We observe a high-heteroplasmy front, of mean height 0.30, meaning that the mutants reach 100% heteroplasmy around 30% of the time.The fixation probability of a founder mutation increases by three orders of magnitude compared to the well-mixed case (black line).As we argue more in detail in SI.9, this result dramatically lowers the de novo mutation rate needed to account for the observed mutant loads, since a higher fixation probability implies that fewer mutations need to arise.Additionally, in SI.10, we argue that previous modelling studies overestimated mutant loads -and therefore mtDNA mutation rates -by using bulk models to interpret experimental data.In conclusion, we have found two reasons why the mutation rate required to reproduce the observed mutant loads in aged skeletal muscle fibres without an RA Figure 3: A steeper wave of mutants propagates more slowly in fibres which have a higher copy number, in agreement with the predictions of stochastic survival of the densest.(A) Significant difference in the steepness of wavefronts of two human muscle fibres H1 and H2: τ H1 = (2.4 • 10 −2 ± 10 −3 )/µm for H1 and τ H2 = (1.4•10−2 ±2•10 −3 )/µm for H2 (MLE).Data from Ref. [48].According to the mathematics of travelling waves (SI.12), the steeper wave is slower.(B) Comparison between the corresponding N ss of the two fibres H1 and H2.We have found evidence (p = 10 −4 , one-tailed Welch's t-test) that the average copy number in normal zones of the two fibres H1 and H2, N ss in our model, is higher for the steeper, and hence slower, wave.This is in qualitative agreement with the predictions of our stochastic survival of the densest model that wave speed decreases with copy number.In contrast, a model based on replicative advantage predicts that speed increases with copy number.(C) Two muscle fibres in rats, R1 and R2, present waves of mutants with significantly different steepness of the waveform: τ R1 = (4.0• 10 −2 ± 6 • 10 −3 )/µm for R1 and τ R2 = (1.8• 10 −2 ± 2 • 10 −3 )/µm for R2 (MLE).(D) We have found an indication (p = 0.06, one-tailed Welch's t-test) that the average copy number in normal zones of the two fibres R1 and R2, is higher for the steeper wave (R1).Data from Ref. [13].2) we find that survival of the densest predicts a distribution for the wave speed (red histogram) compatible with observations (blue), whereas a F wave driven by replicative advantage with the same model parameters is two orders of magnitude faster (grey).(B) The probability that a single founder mutation (time 0, blue spike) takes over the system is equal to the mean heteroplasmy at long times in an ensemble of simulations.This probability dramatically increases in a spatially-structured system with N regions (≈ 0.3, red line) and N ss mtDNA molecules per region with respect to a single well-mixed region with the same total copy number N N ss , which is how previous studies modelled skeletal muscle fibres [35,25,26], where it stays constant at the initial value ( for mutants is much lower than previously thought.

Implications for the evolution of altruism
An altruistic trait is one that benefits others while costing its carriers [75].In SI.1.2,specifically in Eq. (S8), we show that the global carrying capacity of the system -the population it sustainsincreases with the proportion of mutants.In Eq. (S9) we show that, equivalently, the replication rate of both wildtypes and mutants is an increasing function of the proportion of mutants.If we assign a higher degradation rate to mutants, as we do in order to model a notional higher mitophagy rate for mitochondrial deletions, then mutants can be considered an altruistic species.Indeed, the increase in carrying capacity and replication rate can be seen as a benefit that mutants bring to the whole population, including wildtypes, at the cost of a higher degradation rate.Our model of the spread of mitochondrial deletions in skeletal muscle fibres can therefore be reinterpreted as a model for the spread of altruistic behaviour.An increase in carrying capacity is present also in mathematical models of public good production [76,77] and cooperative use of limited resources [75].These two behaviours are observed in microbial communities and are believed to be early examples of altruism in the history of life [78].
The two most prominent accounts for the evolution of altruism are kin selection and group selection [79].The former approach argues that altruism directed towards genetically related individuals can increase the overall success of the altruistic gene, which is likely to be shared by relatives of the carriers [80].In group selection, altruism spreads through advantages conferred to the group; for instance, a group whose members are ready to sacrifice themselves for other members will likely prevail over other groups [81].These approaches seek a deterministic advantage for altruistic traits.In our model the spread of altruism is driven by noise and has no deterministic counterpart.We now highlight our contributions in this area, referring to SI.6 for further links to the wider evolutionary biology literature.
A first, important merit of our model consists in describing the spread of an altruistic trait in a spatially extended system using a widely-used model of population genetics, the Lotka-Volterra model, which uses linear per molecule transition rates.The Moran model is another fundamental model of stochastic population genetics, dealing with the less general case of finite fixed-size populations.Previously, the study in Ref. [82] has shown that in a modified Moran Model with nonlinear rates, where the assumption of fixed population size is relaxed, altruists with a higher carrying capacity can have a higher fixation probability despite a higher death rate.Analogous results are presented in Ref. [83], which is based on a modification of the Wright-Fisher model, retaining the feature of discrete, non-overlapping populations.Our continuous-time model uses simpler (linear) rates, and the relevance of the Lotka-Volterra model, widely used in population genetics, indicates that SSD might be a widespread effect.
The increase in mean mutant copy number in the single-region system in a neutral model was highlighted 20 years ago in the field of mitochondrial biology, in the context of the relaxed replication model formulated in Refs.[22,35,23,45] and further analysed in Refs.[44,45].Models yielding an increase in mean copy number of an altruistic species have found that space can amplify the effect of randomness [77,47,84], in the sense that the altruistic species can tolerate a higher selective elimination in a spatially structured system.A key contribution of our work is to show explicitly that spatial structure can lead to a qualitatively different result, namely an increase in the proportion of altruists, that in turn leads to a travelling wave of altruists (the mutants in the mitochondrial setting).
Another contribution of our study is the use of a physically motivated and biologically interpretable microscopic mechanism.The study in Ref. [53] presents a phenomenological, macroscopic model that exhibits a noise-driven wave of altruists.We show in the SI.4 that the model in Ref. [53] is a special case of ours, in the limit of continuous space.We have further been able to obtain a formula for the wave speed covering a range of biologically and clinically relevant regimes not covered by other studies, namely without taking δ → 1 and without imposing the quasi-static limit, i.e. very slow diffusion (see SI.5).
Finally, but perhaps most importantly, we have identified the mtDNA populations in skeletal muscles as a candidate system that shows a travelling wave of a preferentially eliminated species.Muscle ageing could therefore be the first example of a new class of evolutionary phenomena best described as SSD.

Discussion
In this study, we have modelled the accumulation of mutation during ageing through a bottomup, physically interpretable stochastic model based on Lotka-Volterra dynamics, one of the simplest classic models of population genetics.Our aim has been to explain why mitochondrial deletions clonally expand despite possibly being preferentially eliminated through quality control mechanisms [36,37,38,39,40].Ours is the first spatial model to account for the spread of mutant mtDNA, schematising skeletal muscle fibres as a chain of nuclei each surrounded by a population of mitochondria that can move diffusively along the fibre.
The main achievement of our study is predicting a wave-like clonal expansion of deletions even if they are subject to preferential elimination.This counterintuitive result has its roots in the increased density of deletions -a widely observed fact -and in the stochastic nature of the model.Previous studies have tried to account for the expansion of deletions assigning them higher replication rate: we have discovered that a standard model with spatial structure and with a replicative advantage for mutants produces a wave of deletions that is 300 times faster than observed.Crucially, our model predicts, instead, a speed that is of the same order of magnitude as the observed one.We have accounted for this effect mathematically and shown that our observations are robust to the particular choice of replication rate for both species and to the uncertainty in parameter values.We have provided a candidate functional form for the wave speed (Eq.( 2)) that has implications for therapy, since existing drugs allow us to modulate some of the parameters influencing the propagation of mutants (i.e.copy number N ss and turnover rate µ).Moreover, we have corroborated our prediction of a wave speed decreasing with copy number by examining how copy number changes onset time and likelihood of developing sarcopenia.Eq. (S34) highlights that the largest preferential elimination M that mutants can tolerate and still expand also decreases with N ss .Therefore a larger copy number not only slows down the clonal expansion, but also makes it more likely to be stopped by a given rate of selective mitophagy.Finally, we have shown that the wave-like spread, together with a reevaluation of the current estimates of the mutant loads, greatly lowers the de novo mutation rate needed to reproduce the observed clonal expansion.A travelling wave of mutant implies that a lower mutation rate can nonetheless yield pathology.Future work might explore the interaction between the model presented here and mitochondrial network dynamics, as done in Ref. [28] for a model based on replicative advantage.Given the evidence for sharing of mtDNA between distinct cells [85,86,87], the effect we identify need not be restricted to multinucleate muscle fibres but could yield apparently invasive waves of mtDNA deletions in other tissue types.
In conclusion, our core claim is that it is not a replicative advantage, but the increased density of deletions and the randomness intrinsic to biological processes that drive the clonal expansion of mitochondrial deletions in skeletal muscle fibres.To our knowledge, this phenomenon is the first experimental candidate for what we have termed stochastic survival of the densest.The mechanism we propose might have applications elsewhere.For example, a classic setting for the wave-like spread of a trait is in the uptake of agriculture [88,89,90,91].However agriculture might not impart an explicit replicative advantage, and might lead to higher death rates [92,93,94,95], but nonetheless spread due to an increase in carrying capacity of the land.We believe that this study and the simplicity of the microscopic model we use might pave the way for an increased recognition of this intriguing mechanism in evolutionary biology, in light of the connections we have drawn with the evolution of altruistic behaviour.

Supporting Information 1 One-region model: deterministic and stochastic treatment
The fundamental unit of our model is the single region hosting a well-mixed population evolving according to linear feedback control.This constitutes the building block of our description of the skeletal muscle fibre.Our physical model of muscle fibres is a chain of these regions.
In our model, there are two species: wildtypes w and and mutants (deletions) mitochondrial DNA molecules m.
Every molecule is degraded at rate µ and replicates at rate λ(w, m) given by i.e. the larger between 0 and µ + c(N ss − w − δm).However, for biologically relevant values of the parameters, µ + c(N ss − w − δm) is practically always positive, as we noticed in our simulations.Therefore we can also use the simpler form in Eq. ( 1).Because of the linear relationship between the replication rate λ and the copy number we name the replication rate linear feedback control .The meaning of the parameters N ss and δ are explained in the main text, where we also mentioned that c > 0 modulates the strength of the control.This means that the larger c, the more strongly the system is penalised to be away from steady state when w + δm = N ss , i.e. when the terms in parentheses on the right-hand-side (RHS) is = 0 and hence λ = µ.In other words, a larger value of c > 0 will cause a stronger push toward steady state.
In the following we study a deterministic and stochastic version of the model defined by the above rates.

Deterministic (ODE) treatment
In a deterministic setting, the above rates give rise to the following system of ODEs: The analysis of this deterministic dynamical system is useful to understand the stochastic version of the model.We limit our analysis to w > 0, m > 0 given the meaning of the variables as copy numbers.We start by noticing the existence of the trivial, unstable fixed point (0, 0).We then notice a set of attractive fixed points which form the straight line of equation w + δm = N ss in the (m, w) plane.We refer to this as the central manifold (CM) or steady state (µ = λ) line.This line includes the configurations in which there is only a species, namely the wildtype fixed point (N ss , 0) and the mutant fixed point (0, N ss /δ).The carrying capacity of a species is the population size of this species at its fixed point, at which the species exists in isolation.We see that in our model the carrying capacities are N ss for wildtypes and N ss /δ for mutants.When 0 < δ < 1 -that is always the case in this work -the mutant carrying capacity is larger than wildtype.Another interpretation of the condition 0 < δ < 1 is that mutants are less tightly controlled or sensed by the system, as a change in the number of mutants entails a smaller variation in λ(w, m) than an equal change in the number of wildtypes.Hence, in our model mutants are the densest and less tightly controlled species.The red line in Fig. 1B is the CM for a system with N ss = 500 and δ = 0.5.
We define the heteroplasmy h as the proportion of mutants a conserved quantity of the dynamical system, namely which can be verified by direct calculation.Eq. (S4) is equivalent to the ratio m/w being constant, meaning that any line passing through the origin is a constant-heteroplasmy line.Hence, a way to summarise the behaviour of the system is: • If the system is in steady state, i.e. if w + δm = N ss , the dynamics stop.
• If the system is not in steady state, it will move toward the CM along a line connecting the initial condition to the origin.
The same behaviour can be recovered from the full solution of the system, which is where w 0 = w(0) and m 0 = m(0) are the initial conditions.From Eq. (S5) it follows that For t → ∞, as e −cNsst → 0, w(t) + δm(t) → N ss .This shows that the parameter c is connected to how fast the state of the system decays to the CM, justifying its interpretation of c as control strength.In summary, in the deterministic single-region model the system evolves toward the CM of equation w + δm = N ss with the condition ḣ = 0. We remark on the difference in carrying capacity of the two species, being modulated by the parameter δ.

Carrying capacity of the whole system and replication rate increase with heteroplasmy
The global carrying capacity K of the system is the value of the total population n = w + m at steady state.K can be expressed as a function of h.Since m = hn and w = (1 − h)n the steady state condition w + δm = N ss becomes where the first equality holds because we have defined K as the value of n at steady state.This leads to For δ < 1, K(h) is an increasing function of h: the higher the proportion of mutants, the larger the population that the system can sustain.When h = 1, i.e. mutants are in isolation, the mutant carrying capacity Nss δ is recovered.One way to interpret this is that mutants are using the resources of the system in a more economical way.The economical use of a limited resource is considered one of the earliest and simplest forms of altruism [75,53], that brings benefits to all the other individuals in the system regardless of their identity.
Another way of seeing the benefit that mutants bring to all the other molecules in the system is in terms of enhanced replication rate.By expressing w, m in terms of h, n as above, Eq. (S1) can be written as that shows that, for a given population size n, the common replication rate is an increasing function of h when δ < 1.Notice that here, differently from Eq. (S8), n = K since we are not assuming that the system is in the steady state.
In the study of the expansion of mitochondrial deletions, mutants are assigned a higher degradation rate, to model the higher mitophagy rate to which they are subject.The mitochondrial deletions of our model, for which δ < 1, can hence be seen in a very general sense as agents that benefit others paying a cost, in the form of a higher degradation rate.This aligns with the definition of biological altruism [82,75], hence the mutants of our model can be considered an altruistic species.

A stochastic treatment exhibits selection reversal
In the stochastic formulation of the model, the per capita degradation and replication rates µ and λ are interpreted as instantaneous probability (in an infinitesimal time interval) that each molecule is degraded or replicates.In this setting, w, m and h are random variables, and we ask questions about their probability distributions and their moments.Stochastic population dynamics models can be formalised as chemical reactions networks, consisting, in the terminology of chemical systems, of a set of reactants, reactions and products.Consider a general chemical system consisting of N distinct chemical species (X i ) interacting via R chemical reactions, where the j th reaction is of the form where s ij and r ij are stoichiometric coefficients.We define kj as the per molecule rate for the jth reaction.A chemical reaction network can be described by the associated N × R stoichiometry matrix S ij = r ij − s ij and the set of rates kj .The reactants are the molecules of the two species.In our model the reactions are birth and death and the possible products are i) another molecule of the same species (birth), or ii) ∅ for the degradation of a molecule (death reaction).
The chemical reaction network for the single-region model of the first section of Results is This is a network with N = 2 species and R = 4 reactions, with stoichiometry matrix given by The global rates for wildtypes or mutants are found by multiplying the per molecule rates µ, λ by w or m.This is true only in the case S ij ≤ 1; more details for the general case can be found in Ref. [41].These reactions and rates can be used to set up a Chemical Master Equation (CME), a system of coupled ODEs in P (w, m, t), the probability that the system is in the state (w, m) at time t.
In principle, solving the CME would give the probability that the system is found in any state at any given time.However, given the nonlinearity of the global rates, the CME cannot be solved exactly.
One can explore the behaviour of the stochastic models by simulating the CME, which can be done through Gillespie's stochastic simulation algorithm (SSA) [96,97].This algorithm is exact, in that each Gillespie simulation represents an exact sample from P (w, m, t).The simulations allowed us to observe the increase in mean mutant copy number m for δ < 1 (Fig. S1A, red line).However, it is worthwhile having an analytical account of this effect, in order to know how the increase in mutant copy number depends on the parameters of the model.We have obtained an effective description of the system, in the form of an approximate stochastic differential equation (SDE) that shows an increase in the number of mutants for δ < 1.The steps are the following: 1. Applying the Kramers-Moyal expansion to the CME, obtaining a Fokker-Planck equation (a PDE for the probability distribution P (w, m, t)).
2. Converting the Fokker-Plack Equation into a system of two coupled SDEs in the variables w(t) and m(t).
3. Applying a stochastic dimensionality reduction procedure, that exploits the fact that the system fluctuates around a central manifold (CM), to get a single SDE for m(t) that shows a positive drift for δ < 1.
The first two steps are standard [98,99].From the chemical reaction network in Eq. (S11), we obtain the system of SDEs: .
where .W 1 and .W 2 are two i.i.d.Wiener increments, i.e.Gaussian noise with zero mean and variance dt (see also Eq.S31).The third steps relies on a recently developed technique [77,84,47], which works specifically for systems fluctuating around a CM.In the next section SI.2 we detail each step, while here we only present and interpret the results.The final, effective SDE is with dW a Wiener increment.The first term on the RHS is the drift, and it corresponds to a logistic growth with carrying capacity Nss δ .When the drift is positive, since dW = 0, copy number increases on average.The drift is positive for 0 < δ < 1, i.e. when mutants have a higher carrying capacity or density, and δm < N ss .Because of noise the final value of the mutant copy number will not be Nss δ .The dynamics stop either when m = 0 or m = Nss δ , the only values for which dm = 0.This approximation relies on the system fluctuating closely around the CM, which is true for large values of c (strong control).The parameter c is not present in Eq. (S14) because this equation holds exactly for c → ∞.Indeed, when simulating Eq. (S14) using the Euler scheme, the agreement with the results of the exact Gillespie simulations is better for larger values of c (not shown).By applying Itô's formula (see SI.2) it is possible to find the SDE for h, which is where n = w + m.We notice in this equation the absence of a drift term, meaning that i.e. mean heteroplasmy is constant as stated in the main text.This is shown in Fig. S1B, that reports data from an ensemble of stochastic simulations (apart from the black line that refers to the ODE system).We see that the red line stabilises to a constant value after a transient.The transient is due to a difference in the steady state value of the effective population w + δm between the deterministic and stochastic systems.The two systems are both initialised in the steady state of the deterministic one, hence the stochastic system first hits its steady state and then evolves as described.
A heuristic way to understand the increase in m is the following.The per capita rates in Eq. (S11) are linear, hence the system is a stochastic Lotka-Volterra model, for which each individual has the same chance of generating a lineage that will take over the whole system [100].Hence, this probability must be 1 w 0 +w 0 .Since there are m 0 mutants, the probability that a mutant will take over the whole population is m 0 w 0 +w 0 = h 0 .The steady state population of mutants is Nss δ .Based on this, one can conclude that the mean number of mutants m → h 0 Nss δ as t → ∞.Taking the limit t → ∞ means that we wait long enough for fixation to happen and for the population to reach the steady state.Recall that this differs from the deterministic case, in which there is a locus of fixed points of equation w + δm = N ss (the CM) and the system equilibrates at the point of the CM for which heteroplasmy is h 0 .For 0 < δ < 1, h 0 Nss δ > m 0 always, hence m will increase.It is also illuminating to consider the case in which δ ≈ 0, namely when the mutants are scarcely controlled.When a wildtype dies at steady state this increases the birth rate.Either a mutant or a wildtype will be the next birth, but if a mutant is born this leaves the elevated birth rate (caused by the degradation of the wildtype) almost unchanged, encouraging further possible mutant births.
Interestingly, the stochastic system can exhibit an increase in m even if we introduce preferential elimination of mutants (Fig. S1A, green line).This can be done by increasing mutant degradation rate by , reflecting what we expect for defective mtDNA molecules.Indeed, under certain cases there is evidence for enhanced clearing of defective mutations [101,102,103], subject to higher mitophagy rates.Mathematically, this is described by replacing the second reaction in Eq. (S11) by with > 0, leaving all other reactions unchanged.In the next section (Eq.(S34)) we will see that, through the techniques in Refs.[77,84,47], it is possible to derive an SDE for mutant copy number that is valid for µ : .
The drift in m is now positive for δm < N ss and < 2(1−δ)µ Nss .In this case, the combined effect of stochasticity and differences in carrying capacity (density) produces a stochastic reversal of the Figure S1: (A) In the stochastic one-region system, mean mutant copy number increases in the neutral model (red) and in presence of preferential eliminarion of mutants (green).This is a consequence of higher carrying capacity for mutants (δ < 1) and noise.In the corresponding deterministic neutral model, mutant copy number remains constant (black).Parameter values in SI.7.Error bars are SEM.(B) In the one-region system, heteroplasmy is constant in the neutral deterministic system, as well as mean heteroplasmy in the neutral stochastic system (after a transient).Any selective elimination of mutants in the stochastic system causes a decrease in mean heteroplasmy, while mean number of mutants can still increase (green line, see panel A).Parameter values in SI.7.Error bars are SEM.direction of deterministic selection [77].Indeed, in the deterministic case, for any > 0 and for any initial condition with non-zero w 0 the system tends to the point (N ss , 0), the only stable fixed point, corresponding to wildtype fixation (mutant extinction).Conversely, in the stochastic case, there is a critical M = 2(1−δ)µ Nss such that, for < M , m still increases, meaning that mutant fixation is more likely than wildtype.Inspection of the expression for M , interpretable as the largest amount of additional mitophagy rate that still allows mutant to expand, reveals the following interesting points: • M increases with µ, meaning that the noisier the system, the higher the mitophagy rate that mutants can tolerate and still expand.
• For 0 < δ < 1, M is decreasing function of δ.The higher the the difference in density, the higher the mitophagy rate that mutants can overcome.
• M is a decreasing function of N ss .In particular, M → 0 when N ss → ∞.This is common to stochastic effects, whose intensities decrease with the system size, vanishing in the deterministic limit for which system size tends to infinity.
2 Derivation of the effective SDE for the stochastic model constrained to the CM In this section we show that Eq. ( S14) is an effective, approximate description for the system defined by the chemical reaction network in Eq. (S11) given the replication rate in Eq. (S1).We provide detail for each of the steps listed in SI.1.The approach used here is adapted from standard texts [98,99] and from Refs.[84,47].We strive to provide a self-contained treatment, hoping to make our derivation and the set of techniques accessible to a wider audience.Our exposition partly follows Ref. [41], where the same procedure is used for a related problem.
1 -From the CME to a Fokker-Planck equation via Kramers-Moyal expansion.
Let us denote n = (w, m), a vector specifying the system state.The CME can be written in compact form as dP (n, t) dt = n =n T (n|n )P (n , t) − T (n |n)P (n, t) (S19) T (n|n ) is the transition rate from state n to n.Given the reaction network in Eq. (S11), the corresponding (global) transition rates are The first step is to write the CME in a continuous form.We replace copy numbers n by the abundances x = n Nss relative to the effective population size.The variable x can be considered continuous for large N ss .The CME becomes We now proceed by expanding the CME through the second-order multivariate Kramers-Moyal expansion [98], that can be written as where H x (x) is the Hessian matrix of T (x |x)P (x), whose general expression is (S23) A transition x → x corresponds to some reaction j which moves the state from x to x .Since we know how each reaction affects state x through the constant stoichiometry matrix S ij , and since the global rate of each reaction is independent from x itself (see Eq. (S20)), we may transition from a notation involving x and x into a notation involving x and the reaction j that affects x .We may therefore define T j (x) := T (x |x), and let H x (x) → H j (x).
We now wish to re-write Eq. (S22) as a Fokker-Planck equation.Since the integral in Eq. (S22) is over x , and for a given x every x corresponds to a reaction j, we may interpret the integral in Eq. (S22) as a sum over all reactions, i.e. .x → R j=1 .Hence, for the j th reaction, [(x − x)] i = S ij .With these observations, we may write the first term in the integral in Eq. (S22) as where with A a vector of length N , S the N × R stoichiometry matrix and T the vector of transition rates, of length R. To re-write the second integral of Eq. (S22), we write an element of the Hessian H j in Eq. (S23) as where j = 1, . . ., R and l, m = 1, . . ., N .Thus, we may write where and B is an N × N matrix, and Diag(T) is a diagonal matrix whose main diagonal is the vector T.
We may therefore re-write Eq. (S22) as a Fokker-Planck equation for the state vector x of the form From a system of coupled ODEs for P (w, m, t) (the CME of Eq. (S19)) we have obtained a single PDE for P (x, t) approximating copy numbers as continuous variables.
2 -From Fokker-Planck to a system of Itô SDEs.
where GG T ≡ B (where G is an N × R matrix) and .W is a vector of length R of independent Wiener increments, that satisfy and σ is a constant that controls the strength of the noise.For our chemical reaction network Eq.(S11), the corresponding system of Itô SDEs is given in Eq. (S13).
3 -From a system of Itô SDEs to a single effective SDE for a system forced onto the central manifold.
What follows is an adaptation from [47,84], to which we refer for a proof and for interpretation of the functions introduced.We start from Eq. (S30).The procedure can be applied when the system admits a manifold on which A = 0, the central manifold (CM).We though it would be an helpful complement to Refs.[47,84] to give an explicit sequence of the steps involved for the specific system at hand.
• Identify the CM.
• Find the Jacobian of A, namely J such that and evaluate it on the CM, obtaining J CM .
• Find the eigenvalues λ 1 , • • • , λ N of J CM .The CM is the kernel of J CM , hence the multiplicity of the zero eigenvalue is the dimensionality of the CM.
• Compute the decomposition of J CM , writing where W = (w 1 , • • • , w N ) is the matrix of eigenvectors with those corresponding to the zero eigenvalues written first.
• Compute J + CM , defined as J + CM = W −1 Λ + W, where Λ + is the diagonal matrix with diagonal elements λ + 1 , • • • , λ + N defined as • For each element of A, compute the Hessian H i as and evaluate it on the CM, obtaining H i CM .
• Compute the matrix P as where I is the identity matrix, and the matrices Q i as [J + il P T H l P + P il (J +T H l P + P T H l J + )] • Calculate the vector g(x), whose elements are • Finally, effective SDEs for the variable z, namely the variable x constrained to the CM, are where the equations are now uncoupled because all the functions are evaluated on the CM.
This procedure allows one to obtain Eq. (S14) as an effective description of the neutral stochastic model defined in Eq. (S11).We provide a Mathematica notebook (see Supplementary File 1) to obtain Eq. (S30) from the system of SDEs in Eq. (S13).This technique is extended in Ref. [84] to the more general case in which the stochastic system is equivalent to a system of Itô SDEs of the form where h ≡ h(x) is another vector-valued function of x and σ is a small parameter, such that for = 0 the system admits a CM.In this case, Eq. (S32) becomes The stochastic system in which mutants are subject to preferential elimination (Eq.(S17)) corresponds to the case h(x) = (0, x 2 ) (recall that x 2 = m /Nss).Inserting this into Eq.(S34), one obtains Eq. (S18), that describes the noise and density-induced selection reversal.
Change of variable through Itô's formula to obtain an SDE for heteroplasmy.
Itô's formula states that, for an arbitrary function y(x, t) where x satisfies Eq. (S30), we may write the following SDE: where H (x) is the Hessian matrix of y(x, t) with respect to x (see Eq. ( S23), where T (x |x)P (x) should be replaced with y(x, t)).Applying this rule to Eq. (S14) for where h is the heteroplasmy and n is the total copy number, one obtains Eq. (S15) 3 The two-region model: deterministic and stochastic formulation The deterministic two-region model is formalised via the ODE system The addition of the second term on the RHS, proportional to the constant hopping rate γ accounts for diffusion of the mtDNA molecules between neighbouring regions.The first term on the RHS of each equation means that each cell seeks to maintain its own separate target population, i.e. copy number control is local.This system does not admit an analytical solution.However, it has a CM given by w 1 + δm 1 = w 2 + δm 2 = N ss , w 1 = w 2 (and consequently m 1 = m 2 ).It is easy to see that if these conditions are satisfied, all the time derivatives (the left-hand-sides) are zero and the dynamics stop, i.e. steady state is reached.A similar system of ODEs can be used to describe chains of regions of arbitrary length (see SI.5).
The stochastic formulation of the model is obtained by replicating the reaction network in Eq. (S11) for w 2 , m 2 and adding the reactions accounting for diffusion of the mtDNA molecules.The death rate is still constant and common, and each region has its own independently controlled replication rate given by λ i = µ + c(N ss − w i − δm i ), for i = 1, 2. The full chemical reaction network is (S37) The dimensionality reduction technique from [84], used to obtain Eq. (S18) is not viable in this case to obtain an SDE showing the increase in h 1 (or h 2 ) when 0 < δ < 1.The technique works by projecting the system onto the CM, where the two regions are identical; however, the increase in mean heteroplasmy is driven by fluctuations that move one of the regions away from the CM condition, after which the system relaxes to steady state with an average increase in h.The approximation involved in the technique is too crude to capture the effect.However, stochastic simulations consistently show the increase in h in chains of arbitrary length when 0 < δ < 1.This effect, conveniently highlighted in the two-region system (see Fig. 1D), is key to the appearance of a wave of heteroplasmy in a chain of regions.4 The one-region model is a microscopic building block for a system exhibiting a noise-driven wave In Ref. [53], a stochastic continuous-space model is presented, in the form of partial differential equations, describing a noise-driven wave of mutants, possibly subject to higher death rates than wildtypes.The paper is an extension of the classical Fisher-Kolmogorov PDE to a stochastic setting.
In this section, we show how that our microscopically interpretable bottom-up model, based on the simple Lotka-Volterra dynamics, gives rise to the model in Ref. [53] for δ → 1 and in a continuous-space limit.
Let us reconsider the deterministic system of Eq. (S2) and change variable to (n, h), where n = w + m is the total population.We already observed that .h . t = 0.For n we have .
The constancy of h allows us to write m = hn, hence Considering the case α → 0, equivalent to δ → 1, we can write Neglecting O(α 2 ) terms we obtain with K(h) = N ss (1 + αh).The total copy number follows a logistic growth with a carrying capacity K(h) that is a linearly increasing function of heteroplasmy h.In order to model a spatially-extended system as a muscle fibre, we give spatial dependence to copy number, i.e. n ≡ n(x).The hopping of mtDNA molecules between neighbouring regions of the muscle fibres is described in this case as a diffusion term, proportional to the second spatial derivative of ∂ 2 n(x) ∂x 2 .In order to model a muscle fibre, we can therefore write where we have explicitly written the time-dependence.D is the diffusion coefficient, for which D = γL 2 , where L is the inter-region spacing.Eq. (S42), derived from the special case of the simple single-region Lotka-Volterra model, is the starting point of the work in Ref. [53].By introducing noise from Wright-Fisher sampling and taking the low-diffusion limit D → 0, a wave equation for h can be derived which holds also in the case of slow preferential elimination of mutants.The limit δ → 1, that we have taken here to derive Eq. (S42) from Eq. (S2), is needed also in Ref. [53] in order to derive the wave equation.Hence our model, for some limiting values of the parameters D and δ, leads to a wave-equation for h.We stress the remarkable fact that from one of the simplest models of population genetics, the Lotka-Volterra, it is possible to derive an equation for a travelling wave of heteroplasmy driven by stochastic survival of the densest, albeit in some limit.

Phenomenological formula for wave speed
In the previous section we have shown how our model, in the limits γ → 0 and δ → 1, leads to an analytical description of a noise-driven wave.Moreover, through numerical simulations we have shown that the wave-like expansion is exhibited by our model without having to assume the above limits.We have performed simulations of a 201-region chain with 110 combination of the parameter values, with δ ∈ [0.1, 1], N ss ∈ [2,35], γ ∈ [0, 0.25], µ ∈ [0.01, 0.15].For each parameter configuration we have observed a wave-like expansion, except, as expected, when δ = 1 or γ = 0. We have calculated the wave speed plotting the wavefront at different times and measuring the distance covered, measuring ss .The excellent fit (R 2 = 0.99) shows that the wave speed v is well predicted by Eq. (S43), except for x → 0. B)Although the residuals do not follow a Gaussian distribution (Shapiro-Wilk, p < 10 −4 ), the absence of a clear pattern in the plot suggests that Eq. ( S43) is a satisfactory approximation in the parameter range considered.C) The Q-Q plot of the residuals against a normal distribution confirms the absence of a clear pattern.
the degree of shift between the two wavefronts.In some cases, the overlap could not be total since the wavefront's steepness changed slightly over time: in those cases we have looked for an overlap of the midpoint of the curves (i.e. the point for which h = 0.5).In all cases, it has been verified that the speed is constant, by measuring it for several different time intervals.We have found that v can be predicted with excellent accuracy through the formula Apart from the intercept, Eq. ( S43) is analogous to the formula for the wave speed of FK waves, with k being an effective reaction rate induced by stochastic survival of the densest, given by and β = 0.20723±4•10 −5 is calculated via MLE.In Fig. S2A we show the fit, for which R 2 = 0.99, and we observe residuals without any clear pattern.However, the presence of an intercept β = 0 means that Eq. (S43) is not accurate for x → 0, for which v = 0.In Fig. 4A we compare the probability distribution of the wave speed predicted through Eq. (S43), obtained through Monte-Carlo sampling on the basis of the estimated distributions for the parameters of our model, given in SI.8.Finally, we have verified that in a deterministic model deployed on a chain, i.e.Eq. (S36) applied to a larger number of regions, the high-heteroplasmy front diffuses away even if δ < 1 (Fig. S3A), which is consistent with the statement that the wave-like expansion is noise-driven.
6 Further links to the wider literature Density-dependent selection refers to situations in which the fitnesses of the different species within a population depend differently on population density (or size) [104].The theory of r/K selection [105,106], one of the most influential ideas in evolutionary biology [107,108,109], examines how selection shapes the strategies of species in the presence (or in absence) of density effects.The study of density-dependent selection has attracted interest across five decades [105,110,111,112], including in systems where stochasticity is important [113,114,115,104,116,117,118].In stochastic survival of the densest, wildtypes and mutants have a common per capita replication rate.Per capita degradation rates are constant, and the mutant degradation rate can be higher than the wildtype by a constant additive factor.The mutant and wildtype rates therefore depend in the same way on copy numbers, and hence our model is not an example of density-dependent selection.Frequencydependent selection [119] is distinct, but related, to density-dependent selection.Numerous definitions of frequency dependent-selection have been used over the decades, a classic one being that the relative fitness of a species varies with the relative frequencies of other species [120].Our model satisfies a strong mathematical definition of frequency-independent selection, based on the per capita growth rate (PCGR) of the species, the difference between per capita birth and death rates.The PCGR can be defined as a vector-valued function of copy number variables, with an output for each species.Frequency-independent selection is assured when the Jacobian of the PCGR is independent of copy numbers [121].This is clear for our linear feedback model (and all generalised Lotka-Volterra models), since the replication rate in Eq. ( 1) is linear in w and m.Therefore, stochastic survival of the densest is not a case of frequency-dependent selection.
In the context of a single-species population, the Allee effect can be described as a positive dependence of the PCGR on the density or size of the population for some interval of the values of population size [43].This contrasts with theories in which the PCGR always decreases with population size because of competition for resources or space among members of the same species (intra-specific competition), as in logistic growth.When the dynamics of a population exhibits the Allee effect, it means that there is some other mechanism at play that counterbalances the general limiting effect of competition on the PCGR.An example is that of plants that reproduce through pollination, whose rate is increased by the density of plants in the environments.The Allee effect is connected to altruism, in that the efficiency of some fitness-increasing behaviours, e.g.defence and feeding, is enhanced by a large population size when the associated tasks are carried out cooperatively [122].The difference between stochastic survival of the densest and the Allee effect is two-fold.The Allee effect refers to the possibility that PCGR in a population can be an increasing function of population size (for some interval) despite the increase in intra-specific competition that generally comes with higher density.Our work instead models inter-specific (between species) competitions, showing that the species that is denser in isolation can prevail despite a higher degradation rate and no replicative advantage.Moreover, the PCGR of our neutral model is proportional to N ss − w − δm for both species; in the case of a higher degradation rate for mutants, their PCGR is proportional to N ss − w − δm.In both cases we have linearly decreasing functions of the population size, with no positive dependence on it.
Previous work has explored how the spatial structure of a population can favour [123,124] or hinder [125] the evolution of altruistic behaviour.These studies refer to game theoretical models that can be recast in terms of population dynamics models, due to the correspondence between the replicator equation and the generalised Lotka-Volterra model, in their deterministic [42] and stochastic [126] versions.Similar results have been found applying agent-based models to cell populations [75,127].In these studies, the continuous spatial structure of the population influences the way its members interact, with interaction possible only within spatial neighbourhoods.The recurring theme of these studies is that when altruists cluster together they share the benefits of their behaviour mainly among themselves, and this can favour the spread of altruism.Our work is different in that we have a discrete spatial structure: individuals can migrate to different regions, but the population itself within a region is well-mixed; all its members interact in the same way with one another and everyone benefits from the mutants' altruism in the same way.In this sense, our model is more similar to Refs.[77,82,83].
7 Parameter values for simulations in Figs. 1

and S1
In this section we specify the simulations and list the parameters for Figs.1D and S1, with the aim of allowing a smooth reproduction of the results.The parameter values are not derived from the scientific literature and are not meant to describe a realistic system.Rather, they are chosen to show the increase in mean mutant copy number and the increase in mean heteroplasmy with a limited computational effort in the simplest possible systems.Realistic parameter values for simulating skeletal muscle fibres models as a chain of regions (Fig. 2) are reported in the next section.
Fig. 1D refers to the two-region system.Data obtained simulating the deterministic ODE system Eq.(S36) (black line) and the chemical reaction network Eq.(S37) (other lines) with parameters γ = µ = 7 • 10 −2 /day, δ = 10 −1 , c = 10 −3 /day, N ss = 455.The chemical reaction network is simulated via the Gillespie algorithm.One region is initialised with zero heteroplasmy w 10 = 455, m 10 = 0 and the other with w 20 = 450, m 20 = 50, corresponding to an initial heteroplasmy of 10 −1 .For the neutral and deterministic cases mutants and wildtypes have the same degradation rate µ.For the cases of slow and fast preferential elimination the degradation rate of mutants is increased by an additive constant (see Eq. (S17)).The increase in degradation rate is = 1.5 • 10 −3 /day for slow elimination and = 6.25 • 10 −3 /day for fast elimination.For the deterministic system the quantity plotted is h mean = (h 1 + h 2 )/2, the mean of the heteroplasmies of the two regions.For the stochastic system, we plot h mean = ( h 1 + h 2 )/2, averaging over an ensemble of 3.6 • 10 4 simulations.Error bars are standard error of the mean (SEM).Fewer (10 3 ) realizations are enough to highlight a statistically significant change in h mean in the stochastic systems.
Fig. S1 refers to the one-region system.Data is obtained simulating the deterministic ODE system Eq.(S2) (black lines) and the chemical reaction network Eq.(S11) via Gillespie algorithm (other lines) with parameters µ = 7 • 10 −2 /day, δ = 10 −1 , c = 2.5 • 10 −3 /day, N ss = 325.The system is initialised at steady state with w 0 = 300, m 0 = 250 corresponding to an initial heteroplasmy h 0 ≈ 0.45.For the neutral and deterministic cases mutants and wildtypes have the same degradation rate µ.For the cases of slow and fast preferential elimination the degradation rate of mutants is increased by an additive constant (see Eq. (S17).The increase in degradation rate is = 2.935 • 10 −4 /day for slow elimination and = 1.1735 • 10 −3 /day for fast elimination.In panel A, for the deterministic system we plot mutant copy number m; for the stochastic systems we plot m , averaging over an ensemble of 10 4 simulations.In panel B, for the deterministic system we plot heteroplasmy h; for the stochastic systems we plot h , averaging over an ensemble of 10 4 simulations.Fewer ( 103 ) realizations are enough to highlight a statistically significant changes in h and m .discuss the point estimates used to obtain Figs.2D, E, F and the estimated ranges used to obtain Fig. 4A.We derive these values from the scientific literature, in order to simulate a realistic travelling wave of mutants and compare the wave speed predicted by our simulations with that estimated from experiments (Fig. 2C).
Our model has a total of five parameters, all having an immediate and clear biological interpretation.We do not tune any of these parameters.Instead, we analyse experimental data to estimate parameter values that apply to skeletal muscle fibres of rhesus monkeys and use point estimates for our simulations (Figs.2.D, E).In Fig. 2.D we show that stochastic simulations (Gillespie algorithm) model based on a net replicative advantage k RA for mutants predicts a travelling wave of heteroplasmy with a wave speed that is approximately 300 times faster than the observed speed.Crucially, in Fig. 2E we show that stochastic simulations of our model for the same system predict a wave of heteroplasmy with a speed of expansion comparable with observations.Setting up this simulation required some care, due to the slow dynamics caused by the large values of N ss .The wavefront of a travelling wave has the shape of a sigmoid whose steepness depends on its speed (see SI.12), therefore one needs to initialise the wavefront with the correct steepness such that the wavefront remains approximately stationary.For all the other simulations presented in the paper, this is not an issue since the heteroplasmy wavefront assumes the steepness corresponding to its constant speed in a short time transient (faster dynamics), that can then be neglected.Instead, for Fig. 2E we established an appropriate approximate initial steepness after a series of trials, by verifying that the shape did not change appreciably after running the simulation for a sufficient number of days (around 50).The chosen initial heteroplasmy profile is h 0 (x) = 1/(e τ x−b + 1) for x > 0, with τ = 7.15 • 10 −4 µm and b = 3.915, over 500 regions of length L = 30µm.We also introduced approximations at the boundaries of the system.In order to simulate the effect of a macroscopic zone of the muscle fibre taken over by mutants we fixed the value of the heteroplasmy of leftmost region of the chain at 1.At the right boundary, we truncated the values of heteroplasmy lower than 5 • 10 −3 to zero, to rule out the possibility that edge effects alter the speed of expansion of the heteroplasmy wavefront.We verified that the rightmost regions are mutant-free up to the longest simulated time.In Fig. 2F, we show a wave of heteroplasmy that advances despite the mutants having a degradation rate 1% higher than that of wildtype.This simulation has been performed with N s s = 4 to obtain a faster dynamics and wave and avoid the issues with initialisation described above.
In addition, in Fig. 4A we provide evidence that our conclusions are robust to the uncertainty in parameter values.We have drawn parameter values according to the distributions inferred from experimental data (see discussion below for details).We have then obtained distributions of the wave speeds predicted by a SSD and RA model (FK waves), inserting parameter values in the corresponding formulae for wave speed, namely Eq. (S43) with the appropriate interpretation of k for each case (setting β = 0 for FK waves).The approach used to obtain the distributions of the predicted wave speeds in is detailed in Ref. [128] and can be easily reproduced using the online tool at http://caladis.org/and the parameter estimates given below.The distributions plotted in Fig. 4A confirm that SSD is strongly favoured to account for the expansion.
The parameters µ, γ, c, N ss are common to SSD and to the RA model, whereas the net mutant replicative advantage k RA is replaced by δ in our model.We estimate the degradation rate µ from the half-life of mitochondria.Reported values for the half-life in the literature range from a few to 100 days [129,130,131,132,44].We therefore assume that the half-life is uniformly distributed on the range (2,100) days.As a point estimate, we have chosen 10 days for the half-life, corresponding to a degradation rate of µ = 0.07/day.
N ss represents the average copy number per nucleus when only wildtypes are present.A study [46] reports N ss 3500 for healthy human skeletal muscle fibres.We therefore suppose that for rhesus monkeys muscle fibres our uncertainty on N ss has a uniform distribution on the range (2000,5000).We have chosen N ss = 3000 since macaques are smaller animals.According to our empirical formula in Eq. ( 2), changing N ss from 3000 to 4000 would reduce the wave speed by about 10%, Therefore, we do not expect the uncertainty on N ss to affect our results significantly.
An important quantity in the study of muscle ageing, although not directly a parameter of our model, is the internuclear distance L. It has been reported that in skeletal muscle of healthy mice around 30 nuclei are present in 1mm of fibre length, i.e. ∼ 1 nucleus per 30µm [133].We therefore take L = 30µm as a point estimate, while assuming a uniform distribution on the range (25,35)µm.
The hopping rate γ is linked to the diffusion coefficient of mtDNA molecules in the muscle fibre, through the relationship D = γL 2 [134].Individual organelles in muscle fibres appear to be tightly packed.However, a study, Ref. [135], reported that the movement of nucleoids within organelles in these tissues can be effectively described as diffusion with a value of D corresponding to γ = 0.1/day.Therefore we choose this value as a point estimate of the individual mtDNA molecules.We quantify the uncertainty about the value of γ according to a uniform distribution on the range (0.05,0.15)/day The control strength c is the only parameter for which we do not have an optimal measurement.Our estimate is based on the fact that, in replicative cells, during a cell-cycle (≈ 1 day) the number of mtDNA roughly doubles.In this situation, in which a fast mitochondrial biogenesis is needed, we suppose that population is growing exponentially at a rate λ(w = 0, m = 0).For a doubling time of 1 day, this corresponds to c = ln 2/day − µ N ss , that for N ss = 3000 and µ = 0.07/day gives c = 2 • 10 −4 /day.Since c is not an independent parameter of our model, it is not present in the phenomenological wave speed formula Eq. ( 2).The parameters δ and k RA correspond to the two ways of reproducing the increase in mutant carrying capacity (or density), respectively for our model and for a model based on a replicative advantage.From the experimental data [48] reported in Fig. 2B, in high-heteroplasmy regions density is approximately increased by a factor F = 5, as reported also in Ref. [49].We therefore choose 5 as the point estimate of F and we assume that F has a uniform distribution on the range (2,8).As for δ, we notice that in our model the carrying capacity m * for mutants is obtained by setting in Eq. (S1) λ = µ and w = 0 gives, leading to m * = N ss /δ.The carrying capacity for wildtypes is obtained by setting λ = µ and m = 0 in Eq. (S1), leading to w * = N ss .Therefore, δ = 1/F and we chose δ = 1/5 as a point estimate.In a replicative advantage model, the mutants have a replication rate given by Eq. (S1) (with δ = 1) and the addition of a net replicative advantage k RA , namely In order to reproduce an F-fold increase of mutant carrying capacity, it must be k RA = cN ss (F − 1).Indeed, setting λ m = µ and w = 0 yields m * = k RA /c + N ss .Hence, we assume k RA to be uniformly distributed on the range (1, 7)cN ss and we choose k = 4cN ss as a point estimate, yielding as the mutant replication rate in an RA model.In Fig. 2F we show that stochastic survival of the densest yields a travelling wave of mutants even if mutants and wildtype replicate at the same rate and mutants are preferentially degraded.The mutant degradation rate is 10% higher than wildtype, which is µ = 0.07/day.In this simulation, for numerical convenience, we have used N ss = 4, in order to obtain a faster dynamics.Other parameters are δ = 2/3, γ = 0.12/day, c = 0.4/day.9 Stochastic survival of the densest can account for mutational load in short-lived mammals In the penultimate Result section we have claimed stochastic survival of the densest lowers the value of R mut required to yield a given mutant load through a neutral model.Here we explain and expand on this claim.Previous efforts to understand the ability of neutral genetic models to account for mammalian ageing have often focused on understanding the compatibility between mutant loads and the de novo mutation rate (R mut ) through mathematical modelling.The approach is to develop a model predicting observed amounts of mutant molecules as a function of mutation rate and then choose the R mut that matches experimentally observed (i.e.target) mutant loads.In Ref. [35] a neutral stochastic model of degradation and replication of mtDNA successfully predicted commonly accepted values of mutant loads in humans with mutation rates R mut which fall into accepted ranges.However, it was later argued that this mechanism cannot explain mutant loads for short-lived animals like rodents [25], which show similar accumulation patterns on much shorter timescales (≈ 3 years) [15,13].For the model to predict the observed mutant loads in short-lived animals, R mut must be so high that it would produce an unrealistically high mutational diversity, whereas experimental studies report a single deletion that clonally expands [34].It has therefore often been assumed that neutral stochastic models of mtDNA dynamics cannot explain the mutational load in short-lived animals.
However, previous neutral models neglected the spatial structure of muscle fibres and demanded the target mutant load be reached through pure random drift and continual addition of new sequence mutations.In this work we have shown that spatial structure induces a wave-like clonal expansion of mutants driven by stochastic survival of the densest.In Fig. 4B we have shown that this dramatically increases the chance that a mutation clonally takes over a macroscopic zone of a muscle fibre.The system simulated is a chain of N = 1001 regions, each with N ss = 4, initialised with a single mutation mtDNA molecule with δ = 0.1 in the central region serving as a founder mutation and no additional mutational events.This creates a high-heteroplasmy front, of mean height 0.30, meaning that the mutants reach 100% heteroplasmy around 30% of the time in an ensemble of 5 • 10 2 simulations.Early neutral models modelled muscle fibres as isolated, well-mixed populations of mitochondria [22,35], which implies constant mean heteroplasmy (red line in Fig. S1B).For our example, with 1001 × N ss = 4004 mtDNA molecules, the initial value of heteroplasmy after one mutation arises, would be 1 /1004, meaning that mutants would take over the fibre around 0.025% of the time.This simulation shows that spatial structure can dramatically increase the fixation probability of a founder mutation.The increase scales linearly with N , as it can be inferred by the previous example.Since a mice muscle fibre can contain up to thousands of nuclei (see next section), the fixation probability can be up to three orders of magnitude higher than previously thought.Therefore, fewer mutations will need to arise (lower R mut ) before a high-heteroplasmy region manifests itself.In the next section we make this argument quantitative.
More generally, what determines mutant load over time, and hence the progression of sarcopenia, is not only R mut , as assumed in early studies, but rather the speed of the wave of mutants.Different animals can have different mtDNA turnover rates, copy numbers, type of mutation and diffusion rate that, according to Eq. ( 2), affect the wave speed.Therefore different species would be expected to experience sarcopenia on different time scales.In particular, short-lived animals are predicted to have higher wave speeds for a larger proportion of fibres, since they have more glycolytic, low-copy-number fibres [62].
10 Revisited estimates of mutant loads and mutation rate R mut In the main text (penultimate Results section) and in the previous section we have shown how a given R mut can yield a mutant load that is much higher than the predictions of previous models (Fig. 4B) that ignore the spatial structure of the system.Here we show that misinterpretation of the available experimental data -caused by neglecting spatial structure -has led to wrong estimates of the mutant loads.
Most existing modelling studies [35,25,26] for humans and rodents have tried to match predictions to the following end-of-life target mutant loads: 5-10% fibres must have reached a heteroplasmy level of 60%.We remark that in these studies muscle fibres are modelled as a single region, with an unstructured and well-mixed mitochondrial population.In the following we refer to the the former quantity as threshold cell-fraction (TCF) and to the latter as threshold heteroplasmy (TH).The TH of 60% comes from a study on cybrid HeLa cells, stating that a 5196 base pair deletion needs to reach a cell-fraction of > 60% before the cell shows a reduction in COX activity [136].However, TH is cell-dependent [137] and this value needs not be relevant for skeletal muscle fibres in humans and rodents.The TCF of 5% [35,25] is often justified on the basis of three studies [18,138,139].In the first study, 5% is the upper bound of the range (0.1-5)% [18].The second study found a maximum of 0.37% COX-negative fibres in human limb muscle over a large age range [138].The last study [139] did observe high percentages of COX-negative muscle fibres (∼ 40%) but, it refers to a patient affected by a mitochondrial deletion disease, whereas we are interested in muscle ageing in healthy subjects.
We propose new estimates after examining data on skeletal muscle fibres of aged healthy mammals [50,48,140,18] and considering the spatial structure of muscle fibres.In these studies, hundreds of serial sections of length ∼ 10µm were cut along muscle fibres and stained for cytochrome c oxidase (COX) and succinate dehydrogenase (SDH) activities.Succinate dehydrogenase refers to complex II of the respiratory chain, the only complex fully encoded by the nucleus.Therefore, negative SDH activities indicate a nuclear defect, whereas normal or hyperactive activities and negative COX activities point towards mtDNA defects.An example of these data, from Ref. [48], is Fig. 2A, where additionally the heteroplasmy is measured for each section.Notice that, for simplicity, we have reported the spatial dimension as continuous, while in reality the measurements (the blue circles in the plot) refer to the section as a whole.
First, we suggest a TH of 90%, based on the fact that in Refs.[18,140] regions of skeletal muscle fibres containing mitochondrial dysfunctions were shown to contain > 90% of deletion mutations.Second, we claim that, in the light of a spatially-structured model, the TCF is not the relevant quantity to use as a target for simulations.Instead, we propose a new quantity, the probability P obs (t) of observing an abnormal region in a fibre at age t, which we estimate using data in Refs.[48,50], based on the following argument.For notational simplicity, we drop the time dependence, i.e.P obs = P obs (t).First, we make the following assumptions: distinct but nearby abnormal zones in a single fibre are the result of a single mutation event, this was not observed in the studies we investigated.
3. An abnormal zone is caused by a single mutation event, not by more than one mutation that expand and merge into a single zone.
In Ref. [48] segments of 2000 µm of 10652 human vastus lateralis (VL) muscle fibres of aged (92 years old) subjects were analysed, and 98 abnormal zones were found.In the context of our model and under our assumptions, this means that 98 mutations occurred and expanded to ∼ 90% in its original region as well as neighbouring ones, giving rise to a macroscopic abnormal zone.Based on our estimate of L = 30µm for the inter-region spacing (SI.8), the number of regions included in the measurements described above is N = 10652•2000 30 , meaning that the probability that a mutation arises in any region and subsequently spreads in an observable manner is 98/N ≈ 1.4 • 10 −4 .This probability is equivalent to P obs , the probability per region of observing an abnormal zone.In Refs.[50] segments of 1600µm of 2115 muscle fibres of 34-year-old rhesus monkeys vastus lateralis (VL) were analysed, and 51 abnormal zones were found.By the same reasoning, this corresponds to P obs = 0.05%.These estimates for rhesus monkeys and humans refer to subjects at the end of the typical lifespan for the respective species, hence our proposed end-of-life P obs are in the range ≈ Importantly, it is the spatial structure of our model, with its numerous regions composing a fibre, that leads to the introduction of P obs , whose estimates are orders of magnitude lower than those for the TCF.We can see this with a hypothetical example.Let us suppose that a study on human VL reports that 10% of examined fibres of an individual of age t show an abnormal zone at some point along their length.For simplicity let us suppose that the deletion has fixed in this abnormal zone.Previous studies modelled fibres as a single well-mixed population.For a population of size N ss , if we consider that each molecule has the same chance of fixing given by 1 /Nss as in the generalised Lotka-Volterra model, this requires that R mut t ≈ 0.1 (neglecting times to coalesce).This is because we expect N ss R mut t mutations to arise in a time t.Let us now take into account the spatial extension of the fibres.An average VL fibre of length ≈ 6.5cm is constituted by N = 6.5×10 4 30 ≈ 2200 regions, assuming that a region has length 30µm, see SI.8).A correct interpretation of the experimental observation is that in at least one of these regions a mutation arises and fixes with probability 0.1.This corresponds to a value of P obs such that 1 − (1 − P obs ) 2200 = 0.1, namely (for large N ) P obs = 0.1 /N ≈ 4.5 • 10 −5 .In Fig. S2B we show that in the context of stochastic survival of the densest a single mutation with δ = 0.05 has a chance of fixing > 1 /Nss = 0.25.We have performed other simulations for 3 ≤ N ss ≤ 20 in the range and 0.05 < δ < 0.5, finding that in all cases the chance of fixing of a single founder mutation is ≈ 1 /Nss.Therefore, we can conclude that in the spatially extended case R mut t ≈ P obs ≈ 0.1 /N, where we have assumed that R mut is so low that we can treat R mut t as a probability.We see that in this case the inferred mutation rate is three order of magnitudes lower than in the well-mixed population context.More generally, for large N , which is true for muscle fibres, we predict R mut t to be N times smaller than previously assumed.
In summary, we have presented two reasons why R mut could be a small (and so realistic) number.In this section we have argued that the literature has focused on the TCF but R mut does not need to be equal to this number -it is, in fact, several orders of magnitude smaller.In the previous section we have shown that, because of stochastic survival of the densest, a single founder mutation spreads with a macroscopic probability, and this leads to values of R mut much lower than in the well-mixed population picture for a given mutant load.In conclusion, neutral models based on random drift do not need an excessively high mutation rate to reproduce the observed mutant loads and can be a strong candidate to explain the expansion of mitochondrial deletions in muscle fibres.Figure S3: A deterministic model deployed on a chain does not produce mutant expansion despite differences in carrying capacity, while different choices of feedback control still produce a wave of mutants when stochasticity is present.(A) The deterministic ODE model of Eq. (S36) applied to a chain of regions does not produce a travelling wave of heteroplasmy, even with larger carrying capacity for mutants (δ < 1).Rather, we observe diffusion of the heteroplasmy front (with reflecting boundaries).The curves are the heteroplasmy values obtained by ODE integration of a system analogous to Eq. (S36) for 41 regions.Parameters are µ = 0.07/day γ = 0.1/day, δ = 0.1, c = 0.02/day, N ss = 3000.(B) In the deterministic chain, when the degradation rate of mutants is higher than that of wildtype, mutants go extinct despite a higher carrying capacity.Parameters as in (A), with mutant death rate being 1% higher than mutant.12 Relationship between steepness and speed of a travelling wave This section is based on Secs.13.2 and 13.3 of Ref. [54].Let us consider the classical Fisher-Kolmogorov reaction-diffusion equation where h is heteroplasmy, k is the reaction rate and D is the diffusion coefficient.This equation admits travelling wave solutions moving in the positive x-direction with speed c given by h(x, t) = h(z) = h(x − ct), having defined z = x−ct.If the mutants of our system had a replicative advantage k, Eq. (S48) would describe the wave-like expansion of the high-heteroplasmy front in the continuous-space approximation.
If the system is initialised such that h(x, 0) ∼ e −ax as x → ∞, for arbitrary a > 0, the heteroplasmy wavefront will evolve into an approximately sigmoidal shape with a steepness τ linked to the wave speed through meaning that the slower the wave, the steeper the wavefront (larger τ ).We speculate that waves driven by stochastic survival of the densest can be described by an equation analogous to Eq. (S48), with an effective reaction rate induced by stochastic survival of the densest, that we hypothesise is given by Eq. (S44).Therefore, also for our model we expect that steeper waves will be slower.In absence of a formal proof, we have verified this intuition computationally.In Fig. S4 we plot the travelling waves of heteroplasmy for two systems that differ only in copy number N ss .The blue curve corresponds to a system with a smaller N ss and hence, according to Eq. ( 2) presenting a faster wave; the orange curve corresponds to a larger N ss and hence a slower wave.As expected, the slower wave has a steeper wavefront.
In Fig. 3 we have reported experimental data on human (panels A and B) and rat (panels C and D) muscle fibres, showing that the steeper wavefront corresponds to the fibre with a larger N ss .We can conclude that the steeper wavefront corresponds to a slower wave.Hence, the data reported in Fig. 3 support the prediction of our model that slower waves travel in fibres with a larger N ss .

Rhesus monkeys data collection
The data plotted in Fig. 2C on the length of the abnormal zones in muscle fibres of rhesus monkeys was originally published in Ref. [50].The measurements were performed on skeletal muscle tissues of 11 different animals and the data set consists of the lengths of several abnormal zones for each subject.For each animal, we only use the length of the longest abnormal zone, that we assume has originated from a mutation at birth.We consider shorter regions to be the results of mutations that have arisen later in life.Although these assumption are unlikely to be precisely true, our analysis is enough to give an indicative order-of-magnitude estimate, that can be considered as a moderately tight lower bound on the speed of expansion.

Statistical methods
Error bars in Figs.1D and S1A, B are standard error of the mean (SEM).The parameters of the fits reported in Figs.2C and S2C are estimated via linear least squares and the uncertainty reported is the standard deviation.For the fits reported in Figs.2A and 3A, C non-linear least squares is used and the uncertainty is the standard deviation.The statistical test performed on the data in 3B, D is a one-tailed unequal-variances t-test (Welch's test).The wave speeds relative to Figs. 2D, E and those corresponding to the points in Fig. S2A have been calculated as explained in SI.5.

Figure 1 :
Figure 1: Stochastic survival of the densest can produce increases in the proportion of mutants even if they are subject to higher mitophagy rates than wildtypes.(A) Dysfunctional mtDNA mutants expand in muscle fibres with age in a wave-like manner, leading to defects in OXPHOS.(B) The one-region system fluctuates around the steady state line (red line), on average moving toward regions of higher m (average proportion of mutant h stays constant).(C) Differences in carrying capacity, noise and spatial structure (exchange of mtDNA between neighboring regions) lead to an increase in h , an effect we have termed stochastic survival of the densest.(D) In the stochastic two-region system, mean mutant fraction (heteroplasmy) increases in the neutral model (red) and even if mutants are preferentially eliminated (green).Parameter values in SI.7.Error bars are SEM.

Figure 4 :
Figure 4: Survival of the densest robustly predicts a wave speed two orders of magnitude closer to observations compared to a standard model based on replicative advantage, and predicts a greatly enhanced probability for the expansion of founder mutations than models lacking spatial structure.(A) Inserting probabilistic estimates of the model parameters (see SI.8) into Eq.(2) we find that survival of the densest predicts a distribution for the wave speed (red histogram) compatible with observations (blue), whereas a F wave driven by replicative advantage with the same model parameters is two orders of magnitude faster (grey).(B) The probability that a single founder mutation (time 0, blue spike) takes over the system is equal to the mean heteroplasmy at long times in an ensemble of simulations.This probability dramatically increases in a spatially-structured system with N regions (≈ 0.3, red line) and N ss mtDNA molecules per region with respect to a single well-mixed region with the same total copy number N N ss , which is how previous studies modelled skeletal muscle fibres[35,25,26], where it stays constant at the initial value(1/(N ss N ) ≈ 2.5•10 −4 , black line).Lines relative to previous times show the advance of the wave of heteroplasmy seeded by the founder mutation.Parameters are N = 1001, N ss = 4, δ = 0.1, µ = 0.07/day, γ = 0.1/day, c = 0.04/day.
Figure 4: Survival of the densest robustly predicts a wave speed two orders of magnitude closer to observations compared to a standard model based on replicative advantage, and predicts a greatly enhanced probability for the expansion of founder mutations than models lacking spatial structure.(A) Inserting probabilistic estimates of the model parameters (see SI.8) into Eq.(2) we find that survival of the densest predicts a distribution for the wave speed (red histogram) compatible with observations (blue), whereas a F wave driven by replicative advantage with the same model parameters is two orders of magnitude faster (grey).(B) The probability that a single founder mutation (time 0, blue spike) takes over the system is equal to the mean heteroplasmy at long times in an ensemble of simulations.This probability dramatically increases in a spatially-structured system with N regions (≈ 0.3, red line) and N ss mtDNA molecules per region with respect to a single well-mixed region with the same total copy number N N ss , which is how previous studies modelled skeletal muscle fibres[35,25,26], where it stays constant at the initial value(1/(N ss N ) ≈ 2.5•10 −4 , black line).Lines relative to previous times show the advance of the wave of heteroplasmy seeded by the founder mutation.Parameters are N = 1001, N ss = 4, δ = 0.1, µ = 0.07/day, γ = 0.1/day, c = 0.04/day.

Figure S2 :
Figure S2: An empirical formula accurately predicts the survival of the densest wave speed obtained through stochastic simulations for a wide range of parameter values.A) The blue dots are 110 values of wave speed measured through stochastic simulations of the linear feedback control for varying values of the parameters N ss , µ, γ and 0 < δ < 1.The black line is the linear fit against the variable x = (1 − δ) (µγ 3 ) 1 4 /N1 3 FigureS3: A deterministic model deployed on a chain does not produce mutant expansion despite differences in carrying capacity, while different choices of feedback control still produce a wave of mutants when stochasticity is present.(A) The deterministic ODE model of Eq. (S36) applied to a chain of regions does not produce a travelling wave of heteroplasmy, even with larger carrying capacity for mutants (δ < 1).Rather, we observe diffusion of the heteroplasmy front (with reflecting boundaries).The curves are the heteroplasmy values obtained by ODE integration of a system analogous to Eq. (S36) for 41 regions.Parameters are µ = 0.07/day γ = 0.1/day, δ = 0.1, c = 0.02/day, N ss = 3000.(B) In the deterministic chain, when the degradation rate of mutants is higher than that of wildtype, mutants go extinct despite a higher carrying capacity.Parameters as in (A), with mutant death rate being 1% higher than mutant.(C) Quadratic and (D) reciprocal feedback control produce a qualitatively similar wave-like expansion of mutants in a stochastic model.See SI.11 for definition of the feedbacks and parameter values.

11
Quadratic and reciprocal feedback produce a wave of heteroplasmyIn Fig.S3C,D we show that other types of feedback controls produce an analogous travelling wave of heteroplasmy without the need of a replicative advantage for mutants.The two additional types of controls are the quadratic control, namelyλ(w, m) = µ + a[N 2 ss − (w + δm) 2 ],(S46)and the reciprocal control, given by λ(w, m) = µ − b a and b are control strength parameters.Importantly, these control mechanisms all produce wave speeds that decrease when N ss increases, a fundamental difference between our model and models based on a net replicative advantage for mutants.The data reported in Fig.S3C,D are obtained averaging over 400 realizations with the following parameters.For quadratic feedback, Fig.S3Cand Eq. (S46): µ = 7 • 10 −2 /day, γ = 0.12/day, δ = 2 /3, a = 0.4/day, N ss = 4.For reciprocal feedback, Fig.S3Dand Eq. (S47): µ = 7•10 −2 /day, γ = 10 −2 /day, δ = 0.1, b = 1.5/day,N ss = 3.The small values of N ss are chosen for numerical convenience, since a smaller copy number produces a faster dynamics and faster waves.

Figure S4 :
Figure S4: The two curves are the travelling wavefronts of two systems that differ only in the average wildtype copy number N ss .The smaller the N ss , the faster the wave, according to Eqs. (S43) and (S44).The blue curve is the wavefront propagating in the system with the smaller N f ast ss = 2, while the blue curve is relative to N slow ss = 6.The plot shows that faster waves have a flatter wavefront, as it is the case for FK waves.