Wald’s martingale and the Moran process

Many models of evolution are stochastic processes, where some quantity of interest fluctuates randomly in time. One classic example is the Moran birth-death process, where that quantity is the number of mutants in a population. In such processes we are often interested in their absorption (i.e. fixation) probabilities, and the conditional distributions of absorption time. Those conditional time distributions can be very difficult to calculate, even for relatively simple processes like the Moran birth-death model. Instead of considering the time to absorption, we consider a closely-related quantity: the number of mutant population size changes before absorption. We use Wald’s martingale to obtain the conditional characteristic functions of that quantity in the Moran process. Our expressions are novel, analytical, and exact. The parameter dependence of the characteristic functions is explicit, so it is easy to explore their properties in parameter space. We also use them to approximate the conditional characteristic functions of absorption time. We state the conditions under which that approximation is particularly accurate. Martingales are an elegant framework to solve principal problems of evolutionary stochastic processes. They do not require us to evaluate recursion relations, so we can quickly and tractably obtain absorption probabilities and times of evolutionary stochastic processes. Author summary The Moran process is a probabilistic birth-death model of evolution. A mutant is introduced to an indigenous population, and we randomly choose organisms to live or die on subsequent time steps. Our goals are to calculate the probabilities that the mutant eventually dominates the population or goes extinct, and the distribution of time it requires to do so. The conditional distributions of time are difficult to obtain for the Moran process, so we consider a slightly different but related problem. We instead calculate the conditional distributions of the number of times that the mutant population size changes before it dominates the population or goes extinct. We use a martingale identified by Abraham Wald to obtain elegant and exact expressions for those distributions. We then use them to approximate conditional time distributions, and we show when that approximation is accurate. Our analysis outlines the basic concepts martingales and demonstrates why they are a formidable tool for studying probabilistic evolutionary models such as the Moran process.

Introduction 1 quantity is conserved throughout a stochastic process. So if we know the value of that 23 quantity at the beginning of the process, then we know it at the end. We can often 24 calculate absorption (i.e. fixation) probabilities [8,13] and/or times [17,18] from it. 25 Abraham Wald identified a powerful martingale for stochastic processes whose steps 26 are independent and identically distributed [17,18]. Wald's martingale is the seminal 27 result of sequential analysis [19,20]. From that martingale, he obtained absorption 28 probabilities and the conditional characteristic functions (CFs) of absorption times. We 29 will show that Wald's martingale can be applied to the Moran process if we discard time 30 steps where the mutant population size does not change. Therefore we can obtain the 31 conditional CFs of the number of times that the mutant population size changes before 32 it achieves fixation or goes extinct, i.e. 'active steps' [21]. We will then use those CFs to 33 approximate the CFs of the time to fixation and extinction, and we will show the 34 conditions under which that approximation is particularly accurate. Our CFs for the 35 number of mutant population size changes are novel, clean, and exact results. More 36 generally, Wald's methodology demonstrates an elegant approach to investigate classic 37 problems of evolutionary models [8,13,22]. 38 Fig. 1 illustrates the problem considered by sequential analysis [19,20]. 40 Let S t be the cumulative sum of t realizations of independent and identically 41 distributed (i.i.d.) random variables X ∼ Pr(X) (Fig. 1 blue curves). Setting S 0 = 0, 42 we write S t = t i=1 X i . On sequential time steps, we observe one realization of X and 43 add it to the sum of all previous realizations (Fig. 1, red dots). S t is then a random 44 walk in time. Schematic of sequential analysis. We initialize a cumulative sum to be S 0 = 0. On subsequent time steps, we make one observation of X ∼ Pr(X). The X are i.i.d. on each time step (blue curves). For illustrative purposes, we horizontally shift Pr(X) given S t−1 (red dots). In this example, the peak of the blue curve is horizontally aligned with the red dot below it. The sequential sum S t of observations of X is a random walk in time t. While S t remains between two absorbing barriers a and b, we continue adding observations of X to S t . We are interested in the probabilities that S t hits either barrier first, and the distribution of the number of observations T it required to get there. In this example, S T = a, and T = 56.

39
Define two constant finite absorbing barriers a > 0 and b < 0 (vertical dashed lines, 46 Fig. 1). While b < S t < a, we continue making new observations of X, one per time 47 step, and adding them to the sum. We stop this sequential process when S t meets or 48 exceeds either absorbing barrier (which happens in finite time [17]). We want to find i.e. 'absorption probabilities. ' We also want to find the (conditional) distributions of the 51 number of observations required to hit them, Pr(T |S T = a) and Pr(T |S T = b). 52 Wald derived these quantities from a martingale [17,23,24]: where φ X (h) is the moment generating function (MGF) of X and h is its independent 53 variable. For the Moran process, φ X (h) exists because X can only assume the finite 54 values +1, 0, or -1.

55
The quantity E e Sth φ X (h) −t is conserved throughout a stochastic process whose steps are i.i.d. We illustrate this conservation by taking the expectation of both sides of Wald's martingale: Doob's optional stopping theorem [24,25] states that a randomly-stopped martingale is 56 February 18, 2020 4/21 still a martingale. Letting T be a random variable: Eq. 1 is known as the fundamental identity in sequential analysis [17,20] The process absorbs in finite time under weak conditions [17], so we can insert 62 Pr(S T = b) = 1 − Pr(S T = a) and rearrange: 63 Wald [17] also showed that we can obtain the conditional CFs of T from his martingale. Splitting the expectation in Eq. 1: Since φ X (h) is convex (under weak assumptions [17]), its logarithm has two real roots in 64 h, and its derivative is nonzero at those roots. Then for imaginary τ , − log φ X (h) = τ 65 has two complex roots h 1 (τ ) and h 2 (τ ) in the neighborhood of τ = 0.

66
Inserting φ X (h 1 (τ )) = e −τ and φ X (h 2 (τ )) = e −τ in the expectations above, we obtain a system of two equations: E e ah2(τ ) e τ T Pr(S T = a|T ) + E e bh2(τ ) e τ T Pr(S T = b|T ) = 1, which can be written as conditional CFs of T , ψ T |b (τ ) and ψ T |a (τ ): We have two equations, so we can solve for both conditional CFs. 67 Wald's analysis is exact when S T hits the absorbing barriers a and b exactly [17], as 68 in the Moran process. For an introduction to the Moran process, see [12].

71
Consider the Moran birth-death process with a fixed population size N , and mutants 72 with a relative reproductive advantage of r with respect to indigenous 73 individuals [11][12][13]. Let S t−1 be the mutant population size on time step t − 1, and let 74 X t be the change in the mutant population size on time step t. Then where S 0 is the initial mutant population size.

76
Let p X↑ , p X↓ , and p X0 represent the Moran process transition probabilities, and let F t−1 = rS t−1 + N − S t−1 represent the total fitness of the graph on time step t − 1.
The distribution of X t is: We cannot directly apply Wald's martingale to the Moran process because the X t 77 are not independent. The distribution of X t depends on the state S t−1 , and therefore 78 on previous observations of X.

79
Instead, consider the random variable Y t ≡ X t |(X t = 0). In doing so we eliminate time steps where no change in the mutant population size occurs, i.e. we only consider 'active steps' [21]. Redefine S t = S 0 + t i=1 Y i , and let p Y ↑ and p Y ↓ represent transition probabilities: The transition probabilities of Y t are independent and identical [4,6]. Therefore Wald's 80 Typically we would set the Moran absorbing barriers to be a = N (fixation) and 83 b = 0 (extinction), and we would set S 0 to be some value between them. However, 84 Wald's analysis assumes that S 0 = 0, so we shift the barriers to a = N − S 0 and 85 b = −S 0 . Inserting h 0 , a, and b into Eq. 2: We recover the fixation probability of the Moran process [2,[11][12][13]. 87 We can generalize Eq. 4 to find the probability that the Moran process achieves any 88 state a before b starting from some S 0 between them. Call that probability α [b,S0,a] : We next use Eq. 3 to find the conditional CFs of the number of mutant population size changes to achieve absorption C T . We begin by solving for the two roots h 1 (τ ) and which has the analytical solution:   that we obtain by inserting h 1 (τ ) and h 2 (τ ) into Eq. 3. We evaluated Eq. 3 for r = 0.9 105 (top plots) and r = 1.5 (bottom plots), and we used N = 10 and S 0 = 3 in all plots.

106
The real (pink) and imaginary (gray) parts of Eq. 3 are shown by the thick solid traces 107 in Fig. 3. The real and imaginary parts of the CFs are even and odd, respectively, and 108 they pass through 1 and 0 at τ = 0.  Moran process 100,000 times with the stated parameter values. We stored whether the 111 initial mutant population fixed or went extinct, and how many times the mutant population size changed before doing so. We applied the Fourier transform to our 113 simulated results to compare them with our predicted CFs. Their match is excellent 114 because our analysis is exact, and we performed sufficiently many simulations for the 115 Moran process to converge closely to that solution.

116
Notice that the conditional CFs in the top row of Fig. 3 are similar to those in the 117 bottom row. Therefore the CFs of C T are not particularly sensitive to changes in r.

118
Comparing the left column with the right column, we see that the CFs of C T are  In Fig. 5, the bottom x-axes correspond to ψ C T |b (τ ) and ψ C T |a (τ ), and the top x-axes 161 correspond to ψ T |b (τ ) and ψ T |a (τ ). (bottom x-axis markings). If we can find those scaling factors κ a (fixation) and κ b (extinction), then we can approximate ψ T |a (τ ) and ψ T |b (τ ) from ψ C T |a (τ ) and ψ C T |b (τ ).
This observation indicates that the conditional CFs of T can be approximated as: where κ b and κ a are scaling constants conditional on extinction or fixation, respectively. 163 A natural choice for κ b and κ a is the expected waiting time per change T C in the mutant population size, conditional on fixation or extinction. Then we approximate T as (number of mutant population size changes) · (expected time per change): We now calculate κ a = E T C a (the calculation of κ b is analogous).

164
E T C a is the conditional expected time to absorption E T a divided by the 165 conditional expected number of mutant population changes E C T a .

166
The conditional expected time to fixation is: where V S is the number of visits to state S t , and T V is the time spent per visit [6].
167 E T C a is then: We now calculate the three conditional expectations in the summand.

169
The conditional expected number of visits to transient states. We calculate E V S a by noting that V S |a is geometrically distributed [26]. Let p V |a be the probability that the mutant population fixes without returning to the state S t that it currently occupies. Let R St = 0 denote that occurrence. p V |a is then: The only way that the Moran process can fix and not return to S t is that a) the process increases by 1, and b) it then fixes without returning. We find these two absorption February 18, 2020 11/21 probabilities by inserting appropriate starting conditions and absorbing barriers into Eq.
5. For a) insert S 0 = S t , a = S t + 1, and b = S t − 1, and for b) insert S 0 = S t + 1, a = N , and b = S t : Pr(S T = a) is the fixation probability (Eq. 4), but starting from S 0 = S t . p V |a is then: if a state was visited at least once before fixation. Since the Moran process fixes, we know that all states S t ≥ S 0 will be visited at least once. But for S t < S 0 , we need the probability that the process arrives at state S t at some time before it eventually fixes. Let p A |a represent that probability, and let A St = 1 represent that occurrence: p A |a is then: We can now find E V S a : The conditional expected time spent in transient states per visit. We obtain E T V a by noting that T V |a is also geometrically-distributed. Therefore E T V a = (p C |a) −1 , where p C |a is the probability that the mutant population size February 18, 2020 12/21 changes on a time step, conditional on fixation: We find p X↑ |a by Bayes' rule, and we evaluate the likelihood by inserting S 0 = S t + 1, a = N , and b = 0 into Eq. 5: We find p X↓ |a analogously: Inserting our expressions for p X↑ and p X↓ : The conditional expected number of changes in the mutant population size.
We can obtain E C T a from ψ C T |a (τ ), i.e. the slopes of the black lines in Figs. 3 and 4.
We can also obtain it by summing E V S a over all transient states: We can now insert E V S a , E T V a , and E C T a into Eq. 7 to obtain κ a . The  That match is closer in some plots than it is in others. The top row of Fig. 6 shows 181 that, when S 0 = 2, our approximation for ψ T |a (τ ) matches simulation results very 182 accurately. But our approximation for ψ T |b (τ ) differs from simulation results noticeably, 183 particularly further from the origin. The bottom row of Fig. 6 shows that, when S 0 = 8, 184 the converse is true.

185
Sojourn CFs establish approximation accuracy 186 We can show why the accuracy of our approximation for conditional absorption times depends on S 0 and where absorption occurs. Consider the conditional CFs ψ T S |a (τ ) and ψ T S |b (τ ) of the time T S that the Moran process spends in each transient state; the 'sojourn CFs' [5,16]. We can find ψ T S |a (τ ) and ψ T S |b (τ ) by noting that T S = V S T V , i.e. the total time spent in a state is the number of visits to that state, multiplied by the time spent per visit. Since the T V are i.i.d.: We can evaluate this conditional expectation. Using the tower property to split it, conditional on whether the process arrived at the transient state or not (i.e. A = 0 or The first conditional expectation on the right hand side is 1, since V S = 0 for states where the process never arrives. The second conditional expectation is a geometric sum February 18, 2020 14/21 that we can evaluate: .
Finally, T V |a is also geometrically distributed, and the CF of a geometrically-distributed random variable is: ψ T S |b (τ ) is calculated analogously.   (Fig. 7 second row), ψ T S |b (τ ) differs substantially among S t . T |b then depends not only on C T |b but also on which 204 states the process occupied, and how many times each state was visited, before 205 extinction. Our approximation T |b ∝ C T |b will be less accurate because T |b is sensitive 206 to the specific path to extinction, as well as the length of the path (i.e. the value of 207 C T |b ). Collectively, these results suggest that we can accurately approximate ψ T |a (τ ) 208 when S 0 is small, or ψ T |b (τ ) when S 0 is large.

209
Notice that some sojourn CFs in Fig. 7 appear very similar to each other. In fact,  approximations to the problem [4]. Second, we can settle for closed-form but intractable 234 expressions for those distributions [3] and numerically evaluate them. Third, we can 235 tweak the problem itself. Rather than calculate T , we can exactly and easily calculate 236 C T , and observe that those distributions are related to each other. The conditional, 237 general, and exact distribution of T might be difficult to obtain, but we can get a 238 tractable proxy for it. inductively [27,31]. We can calculate them by brute force if the size of state space is 247 small enough [5,31,32,37]. But quite often they are written using sums of products over 248 all state space [4,5,[33][34][35][36]. If we consider more complicated evolutionary stochastic 249 processes, e.g. an evolutionary graph with many partitions [8,11,38], Markov chains 250 quickly become an infeasible method of analysis. Martingales circumvent these issues 251 because they are conservation statements, so we do not need to evaluate recursions over 252 state space.

253
Other popular approaches to studying evolutionary stochastic processes include 254 simulation [34,[39][40][41] and approximation [11,37,42,43]. While both approaches are 255 useful, they also have drawbacks. In principle, any well-defined stochastic process can 256 be studied by simulating it enough. But simulation results cannot be generalized 257 beyond its specific parameter settings. Computation time can prohibit us from 258 evaluating a stochastic process over all parameter space, particularly if the 259 dimensionality of parameter space is high. Approximations can yield tractable 260 analytical expressions for quantities of interest, but they require us to take a special 261 limit such as large population size [11] or weak selection [37,42,43]. Such results are 262 only valid in those particular limits. Martingales can provide exact results that are valid 263 over all parameter space without requiring simplifying assumptions. independent of the state of an evolutionary stochastic process. That quantity is then conserved throughout the process, so we can use Doob's optional stopping 267 theorem [24,25] to extract statistics of interest. There are a couple of ways to find such 268 quantities. We can modify the process such that transition probabilities are 269 state-independent, as we have done here by discarding time steps where the mutant 270 population size remains unchanged. Alternatively, we can show that the expectation of 271 some state-dependent quantity always equals 1 (a product martingale) or 0 (a sum 272 martingale), regardless of the state [8,13]. Finding such a quantity can be difficult, in