Diffusion approximations in population genetics and the rate of Muller's ratchet

Diffusion theory is a central tool of modern population genetics, yielding simple expressions for fixation probabilities and other quantities that are not easily derived from the underlying Wright-Fisher model. Unfortunately, the textbook derivation of diffusion equations as scaling limits requires evolutionary parameters (selection coefficients, mutation rates) to scale like the inverse population size -- a severe restriction that does not always reflect biological reality. Here we note that the Wright-Fisher model can be approximated by diffusion equations under more general conditions, including in regimes where selection and/or mutation are strong compared to genetic drift. As an illustration, we use a diffusion approximation of the Wright-Fisher model to improve estimates for the expected time to fixation of a strongly deleterious allele, i.e. the rate of Muller's ratchet.


I. INTRODUCTION
The Wright-Fisher (WF) model captures in simple mathematical form the combined effects of natural selection, mutation, and genetic drift, and therefore plays a foundational role in quantitative evolutionary theory and population genetics. Unfortunately, the main quantities of interest such as fixation probabilities cannot be computed in closed form within this model. An established approximation strategy relates the discrete WF model to a continuous diffusion process in allele frequency space obtained via a scaling limit in which the population size and timescale go to infinity in a correlated manner. This scaling transformation was first used by Fisher [1], Wright [2] and Kimura [3], and more rigorous expositions were given by Kolmogorov [4], Malécot [5] and Feller [6]. This approach plays a central role in most textbooks expositions [7][8][9][10], and it has been said that "the standard diffusion approximation has permeated the field so thoroughly that it shapes the way in which workers think about the genetics of populations" [11].
A key limitation of the WF diffusion equation, however, is its narrow range of applicability: for the deterministic forces of selection and mutation to survive the scaling limit, selection coefficients and mutation rates must scale like the inverse effective population size 1/N . This condition is neither biologically well motivated (selection coefficients and mutation rates do not covary with population size) nor universally applicable (viral populations, for instance, can have large effective sizes O(10 7 ) [12]). Perhaps because the scaling limit of the WF model is often presented as the diffusion approximation of population genetics, diffusion methods are sometimes believed to be restricted to the weak selection-weak mutation regime of evolution: "In order to use diffusion approximations it is necessary that the mutation rates be of no larger order of magnitude than 1/N " [13]. Outside this regime, the common practice is to either assume neutral evolution and effectively set all selection coefficients and mutation rates to zero, as in Kimura's original work [3], or to use deterministic equations [14], i.e. to neglect genetic drift altogether.
Nevertheless, there are important cases where genetic drift is small compared to selection and mutation, and yet plays a key role in driving evolutionary change. An example is Muller's ratchet [15,16], viz. the irreversible accumulation of slightly deleterious mutations in finite asexual populations. This process has been invoked in the evolution of mitochondria [17], RNA viruses [18], Y chromosomes [19], ageing [20], cancer [21], and sex [22]. Obtaining analytical expressions for the mean click time of the ratchet is therefore an important goal for theory.
At first sight, diffusion theory seems ideally suited for this problem: Muller's ratchet is akin to a fixation problem for a two-type Wright-Fisher process, where one type represents unloaded alleles, and the other type comprises all other, deleterious mutations [19,23,24].
However, recent works have reinforced the belief that diffusion theory is inapplicable under finite selection strengths [25,26]. An emerging consensus is that diffusion theory just does not apply to Muller's ratchet, at least not in the slow click regime: "The fact that the extinction of the fittest class is due to [. . . ] a rare, large fluctuation and not [. . . ] a typical fluctuation prohibits simple diffusive treatments of the ratchet" [26]. 1 This paper has two objectives. First, we stress that diffusion methods in population genetics do not in fact require s, u, v ∼ N −1 (s selection coefficient, u, v mutation rates) to correctly capture the interplay between mutation and selection in large populations; an alternative form of the WF diffusion equation, obtained as an interpolation of the WF process rather than as a scaling limit, provides an excellent approximation valid under the more general conditions s, u, v 1 N . The fact that diffusion theory can be used to derive more general results than the usual classical Wright-Fisher diffusion equation has been noted in the past (see e.g. [11] and references therein), but has not been sufficiently appreciated in our view. Second, we use the alternative approximation by diffusion to derive a new analytical formula for the mean fixation time of a deleterious allele, a central quantity in the context of Muller's ratchet. We find that not only can diffusion theory be applied to this problem, it actually provides a better approximation than previous estimates in [25,26] based on WKB expansions.

II. SCALING LIMIT VS. INTERPOLATION
We focus on the haploid WF model for two types A and B. For a population with effective size N , this is the Markov chain on the set {0, 1/N, · · · , (N − 1)/N, 1} defined by the binomial sampling formula N X n+1 ∼ Binom(N, p(X n )).
Here X n denotes the frequency of A types at generation n, and p(x) represents the probability that an offspring is of type A given that the frequency of A in the parental population is x.
The form of the function p(x) depends on the specifics of mutation-selection dynamics. For instance, if A has (1 + s) times more offspring than B, and the probability of a mutation Other functional forms are more appropriate for viability selection, etc.
In the limit of large population sizes, the standard diffusion approximation considers the WF model on the slow timescale T = nN , and assumes that selection and mutation are the WF diffusion equation, Eq. (4) below.
where dW T denotes a standard Wiener process and the Ito convention for stochastic differential equations is used. When brought back to the fast timescale t = T /N , this process Eq. (4) above is the "WF diffusion equation" found in textbooks. By construction, this approximation relies on the condition that all evolutionary parameters scale as 1/N when N is taken to infinity, corresponding to the "weak selection-weak mutation" regime of evolution [7]. A rule of thumb due to Nei states that selection can be considered small if N s 2 1 [25,27].
There is, however, another way to approximate the WF model with a diffusion process.
By the central limit theorem, Eq. (1) may be written as where ν approaches a standard normal variable if N 1. The latter equation is the Euler-Maruyama discretization of the Ito diffusion equation at times t = 0, 1, · · · . If the fraction X t changes only slightly during each time increment (6) is an accurate interpolation of the original process, Eq. (1). We simplify Eq. (6) further as Versions of Eq. (7) appear in e.g. [24,28], but, to our knowledge, have never been used to derive analytical results valid beyond the weak selection-weak mutation limit of evolutionary dynamics. A mathematical discussion of neutral diffusions at large mutation rates can be found in [29,30].

Stationary distributions
That the diffusive approximation in Eq. (7) has a broader domain of validity than the standard WF diffusion in Eq. (4) can be seen by comparing their stationary distributions.
We recall that, given a stochastic differential equation of the kind described above, a stationary distribution P eq exists as long as both boundaries, x = 0 and x = 1, are reflecting (v = 0, u = 0) and can be calculated as [31] In Fig. 1 we compare the analytical expressions from both diffusive approximations for P eq with simulations of the WF process with p(x) from Eq. (2). Results are shown for a range of parameters s, u, v, both within and outside the weak selection-weak mutation regime defined In Fig. 2 we compare the relative errors of mean establishment times t est between WF simulations and predictions derived from the two diffusive approximations (Eq. (4) and Eq. (7)). Details for the calculations are found in the Appendix. Here, again, the relative errors for the prediction of the WF diffusion increase outside the weak selection regime s 2 > 1/N , unlike the estimates derived from the diffusive approximation by interpolation (Eq. (7)). Details of the calculations can be found in the Appendix A. is commonly referred to as the slow click regime. Although significant progress was made in the last decade [33], Haigh's model is difficult to study analytically, and a common approach is to approximate it by a two-type WF model [24][25][26]. In this simpler model, one class corresponds to the fittest (unmutated) individuals, with unit fitness, and all other mutants are collectively assigned a reduced fitness (1 − s) with s = (1 − e −U )/(1 − e −U/S ); the effective mutation rate away from the fittest class is then u = 1 − e −U . Assuming fecundity selection following [25,26], we arrive at the WF model in Eq. (1) with sampling probability . In contrast to Eq. (2) the selection coefficient is redefined as a coefficient of negative selection, hence the normalisation term of mean fitness is adjusted accordingly.
We used classical expressions for hitting times of one-dimensional diffusions [7] combined with Laplace's approximation of integrals to derive from Eq. (7) an analytical expression for the expected time between clicks of the ratchet; technical details are provided in the Appendix. On a logarithmic scale, this expression reads compared to the approximation that is found by [26] log E(T click ) ∼ 2N [s − u + u log(u/s)], (11) and corresponds to the expected hitting time of the WF diffusion (Eq. (4) instead of Eq. (7)). Fig. 3 compares the accuracy of our result with earlier heuristic diffusion approximations [19,34] as well as more recent WKB expansions of Moran models [26]. The near-perfect agreement in the large N limit-including when S, U 1/N -confirms that diffusion methods are more general than scaling limits and can be successfully applied outside the weak selection-weak mutation regime of evolution.

V. CONCLUSION
Because it does not depend on the mode (fecundity or viability) of selection, on generations being overlapping or non-overlapping, etc., the classical WF diffusion equation (4) where in our setting a(x) = p(x) − x and b(x) = x(1 − x)/N .
We are interested in the expected time for the stochastic variable x to hit the boundary at x = 0, given an initial position x = x 0 (to be specified below). In the scope of diffusion theory this is referred to as the mean first passage time t(x 0 ), which is described by a second order differential equation [31] a(x 0 ) dt(x 0 ) dx 0 Hence, the exponential integral is approximately reduced to a Gaussian integral on a bounded interval, which can be evaluated in terms of error functions.
For the ratchet click time in the slow-clicking regime, we assume the initial position to be precisely at the deterministic equilibrium, i.e. where a(x 0 ) = x 0 which means the initial point x 0 and the critical point x c of the potential Ψ(x) coincide, x 0 = x c = 1 − u/s. It is from this, that the condition for the slow clicking regime can be deduced, namely u < s, which implies the existence of a deterministic equilibrium.
We arrive finally at the following expression for the expected click time in terms of our evolutionary parameters N, s, u On a logarithmic scale, Eq. (A7) yields in the limit of large populations in accordance with Eq. (10) from the main text.
An analogous procedure, in which we replace a(x) = sx(1−x)−ux from the WF diffusion, Eq. 4, can be used to derive an estimate for Muller's ratchet rate from the WF diffusion For σ < 1 − u s , where σ 2 = u 2N s 2 denotes the variance of the Gaussian, the approximation can be reduced to which is the same equation as derived by [26] through WKB approximations of the associated Moran process (after a rescaling of time and population size t → t /2N and N → 2N ). The result in Eq. (A10) coincides on a logarithmic scale with Eq. (11), as described in the main text.

Fixation Probability and Fixation Times
Additionally, we here report the comparison of the estimated fixation probabilities and fixation times for the Wright-Fisher model without mutations u = v = 0. In Fig. 4 we compare both diffusive approximations, Eq. (4) and Eq. (7), as well as the formula reported in [39]. We notice that the corrections in the strong selection regime are minor. Analytical formulae are compared to mean fixation probabilities from a total of 10 8 (for N = 100) and 10 7 (for N = 1000) Wright-Fisher simulations (Eq. (1)). The estimate from the WF diffusion, Eq. (4) is the classical formula for the fixation problem as presented by Crow and Kimura [40] More recently, it was shown in [39] that is slightly more accurate that the Crow-Kimura formula, among other good properties.
Making use of the diffusive scheme by interpolation, Eq. (7), given an initial point x = x 0 , the probability of fixation at x = 1 is approximated by which reduces to the classical fixation problem for x 0 = 1/N .
Using Eq. (A2) with the appropriate boundary conditions t(0) = t(1) = 0 (two absorbing boundaries), the mean first hitting time at x = 1, conditional on fixation at x = 1, with initial condition x = x 0 can be computed through the following integral [31] t(x 0 ) = π 0 (x 0 ) π 1 (x 0 ) with Φ(x) = x −2a(z) b(z) dz. Herein, π 1 (x) and π 0 (x) denote the fixation probabilities at x = 1 and x = 0, respectively, as a function of an initial point x 0 . The relative errors in estimates deriving from the differing a(x) in the WF diffusion (Eq. (4)) and the diffusive approximation by interpolation (Eq. (7)) are compared in Fig. 5 relative to mean times of fixation from WF simulations for varying parameters s, N .

Mean establishment time with reversions
As presented in Fig. 2  are compared. These formulae can be derived equivalently to the above described fixation times. The probability of establishment, given an inital point x 0 , is given by where as before Φ(x) = x −2a(z) b(z) dz and x c denotes the metastable state defined by p(x c ) = x c .
From this the expected time to establishment , assuming the initial frequency x 0 = 1 N , is computed by (A16)

WF simulations
Stochastic simulations were run on Wolfram Mathematica according to the dynamics from Eq. (1), while varying the evolutionary parameters N, s, u, v as given in the main text.
Due to limited computation times only a constrained set of parameters are tested that do not exceed computational feasibility.