Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Genealogical distances under low levels of selection

Elisabeth Huss, Peter Pfaffelhuber
doi: https://doi.org/10.1101/495770
Elisabeth Huss
University of Freiburg
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter Pfaffelhuber
University of Freiburg
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: p.p@stochastik.uni-freiburg.de
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

For a panmictic population of constant size evolving under neutrality, Kingman’s coalescent describes the genealogy of a population sample in equilibrium. However, for genealogical trees under selection, not even expectations for most basic quantities like height and length of the resulting random tree are known. Here, we give an analytic expression for the distribution of the total tree length of a sample of size n under low levels of selection in a two-alleles model. We can prove that trees are shorter than under neutrality under genic selection and if the beneficial mutant has dominance h < 1/2, but longer for h > 1/2. The difference from neutrality is 𝒪 (α2) for genic selection with selection intensity α and 𝒪 (α) for other modes of dominance.

1 Introduction

Understanding population genetic models, e.g. the Wright-Fisher or the Moran model, can be achieved in various ways. Classically, allelic frequencies are described by diffusions in the large population limit, and for simple models such as two-alleles models, the theory of one-dimensional diffusions leads to predictions for virtually all quantities of interest (Ewens, 2004). Moreover, starting with Kingman (1982) and Hudson (1983), genealogical trees started to play a big role in the understanding of the models as well as of DNA data from a population sample. Most importantly, all variation seen in data can be mapped onto a genealogical tree. Under neutral evolution, the mutational process is independent of the genealogical tree. As a consequence, the length of the genealogical tree is proportional to the total number of polymorphic sites in the sample.

Genealogies under selection have long been an interesting object to study (see e.g. Wakeley, 2010 for a review). Starting with Krone and Neuhauser (1997) and Neuhauser and Krone (1997), genealogical trees under selection could be described using the Ancestral Selection Graph (ASG). In addition to coalescence events, which indicate joint ancestry of ancestral lines, selective events affect the genealogy in the following way: First, going backward in time, splitting events indicate possible ancestry. Since fit types are more likely to produce offspring in selective events, they are more likely to be true ancestors in such splitting events. However, true ancestry can only be decided once the ancestral types are known. So, in a second stage, going forward in time through the ASG, types are fixed and it can be decided which of the possible ancestors is true. The disadvantage of these splitting events is that they make this genealogical structure far more complicated to study than the coalescent for neutral evolution.

In recent years, much progress has been made in the simulation of genealogical trees under selection. Mostly, these simulation algorithms use the approach of the structured coalescent, which is based on Kaplan et al. (1988). Here, the allelic frequency path is generated first, and conditional on this path, coalescence events are carried out. (See also Barton et al., 2004 for a formal derivation of this approach.) Simulation approaches based on this idea include the inference method by Coop and Griffiths (2004), msms by Ewing and Hermisson (2010), and discoal Kern and Schrider (2016). However, the structured coalescent approach was only used in a few studies in order to obtain analytical insights (see e.g. Taylor, 2007).

Recently, genealogies under selection have been studied by Depperschmidt et al. (2012) using Markov processes taking values in the space of trees, i.e. the genealogical tree is modelled as a stochastic process which is changing as the population evolves. As for many Markov processes, the equilibrium can be studied using stationary solutions of differential equations. In our manuscript, we will make use of this approach in order to compute an approximation for the total tree length under a general bi-allelic selection scheme, which is assumed to be weak; see Section 4. Our results are extensions of Theorem 5 of Depperschmidt et al. (2012), where an approximation of the Laplace-transform of the genealogical distance of a pair of individuals under bi-allelic mutation and low levels of selection was computed.

The paper is structured as follows: In Section 2, we introduce the model we are going to study, i.e. genealogies in the large population limit for a Moran model under genic selection, incomplete dominance or over-or under-dominant selection. The last three cases we collect under the term other modes of dominance. We give recursions for the Laplace-transform (and the expectation) of the tree length of a sample in Theorem 1 and Corollary 4 for genic selection, and in Theorem 2 and Corollary 7 for other modes of dominance. In Section 3, we discuss our findings and also provide some plots, based on numerical solutions of the recursions, on the change of tree lengths under selection. Section 4 gives some preliminaries for the proofs. In particular, we give a brief review of the construction of evolving genealogies from Depperschmidt et al. (2012). Finally, Section 5 contains all proofs.

2 Model and main results

We will obtain approximations for the tree length under selection. While Theorem 1 and its corollaries describe the case of genic selection, Theorem 2 and its corollaries deal with other modes of dominance. All proofs are found in Section 5.

Genic selection

Consider a Moran model of size N, where every individual has type either • or ∘, selection is genic, type • is advantageous with selection coefficient α, and mutation is bi-directional. In other words, consider a population of N (haploid) individuals with the following transitions:

  1. Every pair of individuals resamples at rate 1; upon such a resampling event, one of the two individuals involved dies, the other one reproduces.

  2. Every line is hit by a mutation event from ∘ to • at rate θ∘ > 0, and by a mutation event from • to ∘ at rate θ• > 0.

  3. Every line of type • places an offspring on a randomly chosen line at rate α.

Let X denote the frequency of • in the population. Mutation leads to an expected change dX of Embedded Image for Embedded Image and Embedded Image per time dt, and selection of αX(1 − X)dt. Recall that X follows – in the limit N → ∞ – the SDE Embedded Image for some Brownian motion W; see e.g. (5.6) in Ewens (2004).

In the sequel, we will rely here on the possibility to study the genealogical tree of a sample taken in the large population limit of a Moran model in equilibrium. Since the time-point of sampling is the same for all ancestral lines, the resulting tree is ultra-metric and is given by genealogical distances between all pairs of individuals in the sample. In addition, marks on the tree describe mutation events from • to ∘ or from ∘ to •; see also Figure 1. This possibility is implicitly made by the ancestral selection graph from Neuhauser and Krone (1997), and formally justified by some results obtained in Depperschmidt et al. (2012); precisely, their Theorem 4 states that the genealogical tree under selection has a unique equilibrium.

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

A population of size N = 5 following Moran dynamics with resampling, mutation, and genic selection in equilibrium. In the graphical representation on the left, bullet points indicate mutation events from the beneficial • type to ∘ and back. Grey arrows indicate neutral resampling events. Selective events are given through black arrows and only take effect if the arrows starts at the beneficial (black) type. Selective arrows starting with the deleterious type cannot be used and are indicated by a cross. Here, the full tree of all individuals in the model is drawn on the right for two points in time, but it is also possible to study the tree of a population sample.

We will write ℙα[.] for the distribution of genealogical trees taken from the large population limit of Moran models under the selection coefficient α and 𝔼α[.] for the corresponding expectation. In particular, ℙ0[.] and 𝔼0[.] are reserved for neutral evolution, α = 0. Within the genealogical tree, we pick a sample of size n and let Embedded Image We note that in the absence of selection, Ln does not depend on the mutational mechanism and Embedded Image, where Embedded Image, k = 2, …, n are the coalescence times in the tree; see e.g. (3.25) in Wakeley (2008). In particular, for λ ≥ 0, Embedded Image with f1 = 1 since the empty product is defined to be 1. We are now ready to state our first main result, which gives a recursion for an approximation of the Laplace-transform of the tree length under selection for small α.

Theorem 1

(Genealogical distances under genic selection). Let Embedded Image and Θ be given as in (1), Ln as in (2) and Embedded Image Then, x1, x2, … satisfy the recursion x1 = 0 and Embedded Image where a1, a2, … satisfy a recursion a1 = 0 and Embedded Image where b1, b2, … satisfy a recursion b1 = 0 and Embedded Image where c1, c2, … satisfy a recursion c1 = 0 and Embedded Image where e1, e2, … satisfy a recursion e1 = 0 and Embedded Image and finally – recall (3) – Embedded Image with Embedded Image and Embedded Image

Remark 1

(Interpretations). In the proof, we will see that the quantities an, bn, cn, … do have interpretations within the Moran model. If the tree length of the genealogy of a sample of individuals 1, …, n is denoted Ln, the genealogical distance of individuals i and j is Rij, and Ui is the type of individual i (either • of ∘), these are Embedded Image Moreover, in Theorem 2, another quantity will arise, which is Embedded Image We note that, from these definitions, only an relies on the model with selection (since all other terms are within the neutral model. In addition, a1 = b1 = c1 = e1 = 0. The initial value d2 is given through the initial condition f1, as well as f2, g1 and g2.

Remark 2

(Comparing neutral and selective genealogies).

  1. Note that for α = 0, (4) gives precisely (3). Moreover, there is no linear term in α in the recursion (4). This finding is reminiscent of Theorem 4.26 in Krone and Neuhauser (1997) and Theorem 5 in Depperschmidt et al. (2012), but we note that for other models of dominance, a linear term arises; see Theorem 2.

  2. Let us compare tree lengths under neutrality and under selection qualitatively. Crucially, the quantity dn as given in (11) is positive. As consequences, by the recursions, en from (8) is positive, cn from (7) is positive, bn from (6) is positive, and an from (5) is positive. The effect is that xn for small α is positive, i.e. Embedded Image for small α, which implies that genealogical trees are generally shorter (in the so-called Laplace-transform-order) under selection. In particular, we have shown the intuitive result that expected tree lengths are shorter under selection; see also Corollary 4 for a quantitative result concerning expected tree lengths.

  3. While xn, an are quantities within the selected genealogies, all other quantities can be computed under neutrality, α = 0. However, if one would like to obtain finer results, i.e. specify the 𝒪 (α3)-term in (4), more quantities within selected genealogies would have to be computed. In principle, this is straight-forward using our approach of the proof of Theorem 1.

Remark 3

(Solving the recursions). All recursions for xn, an, bn, cn, en, hn are of the form Embedded Image with µ1 = 0 and can readily be solved by writing Embedded Image with Π∅ := 1.

Since we can directly obtain expected tree lengths from the Laplace-transforms in Theorem 1, we obtain also a recursion for expected tree lengths by using that Embedded Image.

Corollary 4

(Expected tree length under genic selection). With Embedded Image and Ln as in Theorem 1, let Embedded Image Then, Embedded Image satisfy the recursion Embedded Image and Embedded Image where Embedded Image satisfy a recursion Embedded Image and Embedded Image where, Embedded Image satisfy the recursion Embedded Image and Embedded Image where, Embedded Image satisfy the recursion Embedded Image and Embedded Image where, Embedded Image satisfy the recursion Embedded Image and Embedded Image and finally Embedded Image with Embedded Image and Embedded Image The following result, the special case n = 2, was already obtained in Theorem 5 of Depperschmidt et al. (2012).

Corollary 5

(Genealogical distance of two individuals under genic selection). With Embedded Image and Ln as in Theorem 2, Embedded Image

Other modes of dominance

In a diploid population, (1) only models the frequency of • correctly if selection is genic, i.e. if the selective advantage of an individual which is homozygous for • is twice the advantage of a heterozygote. For other modes of dominance, we have to introduce a dominance coefficient h∈(–∞,∞) and change the dynamics of the Moran model as follows: Let X be the frequency of •in the population. In addition to 1. and 2. from the beginning of Section 2, we add frequency-dependent selection events:

3’. Every line of type • places an offspring to a randomly chosen line at rate α(X + h(1 − X)). Every line of type ∘ places an offspring to a randomly chosen line at rate αhX.

Note that 3’. is best understood by assuming that every line picks a random partner and if the pair is a heterozygote, it has fitness advantage αh, and if it is homozygous for •, it has fitness advantage α. (Here, we have assumed that h ≥ 0, but some modifications of 3’. also allow for h < 0.) The expected effect of 3’. on X is then αX(1 − X)(X + h(1 − X)) − αX(1 − X)hX = αX(1 − X)(h+(1 − 2h)X) and the frequency of • follows – in the limit N → ∞ – the SDE Embedded Image We will write ℙα,h[.] for the distribution of genealogical trees and allele frequencies under this scenario, and 𝔼α,h[.] for the corresponding expectation. Recalling that ℙα[.] and 𝔼α[.] are the correpsonding operators for genic selection, we have ℙα[.] = ℙ2α,1/2[.]. We note that 3’. above for 2α and h = 1/2 does not directly transform to 3. under genic selection for selection intensity α. However, as argued above (14), the effect on the process X only comes from the difference of the effects of type • and ∘ and the same argument applies on the level of genealogies, and this difference is the same for 3. and 3’.

We note that h = 0 means a positively selected recessive allele, while h = 1 refers to a dominant selectively favoured allele. Again, we obtain an approximation of the Laplace-transform of the tree length of a sample of size n.

Theorem 2

(Genealogical distances under any form of dominance). Let Embedded Image and Θ be given as in (14), Ln as in (2) and Embedded Image Then, y1, y2, … satisfy the recursion y1 = 0 and Embedded Image where h1, h2, … satisfy the recursion h1 = 0 and Embedded Image and en was given in Theorem 1.

Remark 6

(Comparing genealogies).

  1. Most interestingly, neutral trees differ from trees under genic selection only in order α2, whereas the difference is in order α for other forms of dominance. While this may be counter-intuitive at first sight, it can be easily explained. Note that the model actually does not change if we replace α by −α and h by 1 − h at the same time. By doing so, we just interchange the roles of allele • and ∘. For h = 1/2, this means that our results have to be identical for α and −α, leading to a vanishing linear term in (4). For h ≠ 1/2, this symmetry does not have to hold, leading to a linear term in α.

  2. Similar to our reasoning in Remark 2.2, the sign of hn in the recursion for yn determines if tree lengths are shorter or longer under selection. We see that the behaviour changes at h = 1/2. By construction, hn is positive, so if h < 1/2, yn is positive as well and we see that trees are shorter under selection (in the Laplace-transform order). If h > 1/2, the reverse is true and trees are longer under selection. This result is not surprising for over-dominant selection, h > 1, since the advantage of the heterozygote leads to maintenance of heterozygosity or balancing selection, which in turn is known to produce longer genealogical trees.

Corollary 7

(Expected tree length under any form of dominance). With Embedded Image and Ln as in Theorem 2, let Embedded Image Then,Embedded Image satisfy the recursion Embedded Image and Embedded Image where Embedded Image satisfy the recursion Embedded Image and Embedded Image and Embedded Image was given in Corollary 4.

Corollary 8

(Genealogical distance of two individuals under any form of dominance). With Embedded Image and Ln as in Theorem 2, Embedded Image

3 Discussion

A fundamental question in population genetics is: How does selection affect genealogies of a sample of individuals? We have added to this question an analysis of tree lengths under low levels of selection, both for genic selection and for other modes of dominance. While our results are only given through recursions, these give valuable insights. Recall that under neutrality, Embedded Image where Z is Gumbel distributed (see p. 255 of Wiuf and Hein, 1999). In particular, for large n, Embedded Image where γe ≈ 0.57 is the Euler-Mascheroni constant.

Summing up and extending previous results for large samples, the main findings of our study are the following on the expected tree lengths under selection. A proof is found in the appendix.

Proposition 9

(Summary). For h = 1/2, there is C1/2 < ∞ such that Embedded Image For h < 1/2, there is C<1/2 < ∞ such that Embedded Image For h > 1/2, there is C>1/2 < ∞ such that Embedded Image Since 𝔼0[Ln] = 𝒪 (log n) for large n, this shows that the order of magnitude in the change due to selection is much smaller than the length of the full tree for large samples. Note that this finding is in line with Przeworski et al. (1999), where simulations of the ancestral selection graph are used to find that the overall tree shape under selection is not too different from neutrality for low levels of selection.

In order to get more quantitative insights, we have numerically solved the recursions from Corollaries 4 and 7, and plotted the effect on the tree length for various scenarios. Figure 2(A) analyses the effect of genic selection in large samples (i.e. n = 50). Interestingly, there is some mutation rate Embedded Image, which gives the largest effect. This is clear since very little mutation implies almost no change in the genealogy relative to neutrality since almost always only the beneficial type is present in the population, and very high mutation rate implies that selection is virtually inefficient, leading to nearly neutral trees. Moreover, we can see here that Θ(1 − Θ) enters the recursion for the change in tree length only linearly. In Figure 2(B), we display the change in tree length for h = 0. Since 1 − 2h enters the recursion only linearly, the graph looks qualitatively the same for other dominance coefficients.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

Using the recursions from Corollaries 4 and 7, we see differences in expected tree length. (A) For genic selection and large samples, the effect changes with the total mutation rate Embedded Image and is linear in Θ(1 – Θ). (B) Plot of the change in total tree length for small values of α with h = 0, dependent on the sample size, and three parameters of Embedded Image.

In principle, the approach we use here is comparable to the ancestral selection graph in the sense that all events happening within the ASG are also implemented in our construction. However, within the ASG, when a splitting event occurs, it is not clear which of the two lines is the true ancestor, so both lines are followed. As a consequence, in any case the ASG looks longer than a neutral coalescent due to the splitting events, and only when the ASG is pruned to become the true genealogical tree, can the tree length be computed. In our approach, the information, which we need within a splitting event is different, since only the type of the additional line, and the type of an individual within the sample is needed.

The approach we use here to study genealogies under selection can be used for other statistics than the total tree length. In principle, every quantity of a sample tree can be described. The reason why we chose the tree length is its simple structure due to coalescence events: If two lines in a sample of size n coalesce, all that remains is a tree of n – 1 individuals, already implying the recursive structure for the tree lengths which is apparent in Theorems 1 and 2. In addition, in order to describe the effect of selection, we rely on a description of the sample which was already used in the mathematics literature for the so-called Fleming-Viot process (which generalizes the Wright-Fisher diffusion); see e.g. Etheridge (2001) and Depperschmidt et al. (2012). In principle, our approach can be extended e.g. to include more than two alleles, population structure, recombination etc. However, one would still have to find a recursive structure, which will often be feasible only in the case of weak selection.

4 Preliminaries for the proofs

We here present the construction of a tree-valued process in a nutshell, leaving out various technical details. All details of the construction are given in Depperschmidt et al. (2012).

Any genealogical tree is uniquely given by all genealogical distances between pairs of (haploid) individuals. So, in order to describe the evolution of genealogical trees, it suffices to describe the evolution of all pairwise distances. We also note that the tree length which we consider in all our results, is a function of pairwise distances. (See e.g. Section 8 of Depperschmidt et al. (2012).) Consider a sample of size n taken from the Moran model at time t as described in Section 2, and let R(t) := (Rij(t))i<j be the pairwise genealogical distances (note that Rij = R ji and Rii = 0), and U(t) := (U1(t), …, Un(t)) the allelic types, either •or ∘. Within the sample of size n, we will speak of R(t) as the sample tree, and of U(t) of the types within the sample. Note that there is the possibility to extend the sample by picking new individuals at random, leading to types Un+1(t), Un+2(t), …, which we will need for selective events below. We will consider some smooth, bounded function: Embedded Image and are going to describe the change in 𝔼α,h[Φ(R(t), U(t))] due to the evolution of the Moran model. We have to take into account several mechanisms:

  • 0. Growth of the tree: During times when no events happen, all genealogical distances grow deterministically and linearly (with speed 2). In time dt, the change is, using the partial derivative of Φ with respect to the i jth coordinate in Embedded Image, denoted by Embedded Image, Embedded Image

  • 1. Resampling: If a pair of the N individuals within the Moran model resamples, there is either none, one, or two of them within the sample of size n. If none, the sample tree is not affected. If one, and this one reproduces, the sample tree is not affected as well. If one, and this one is replaced by the individual outside of the sample, the effect is the same as if we would have picked the other individual to begin with. Since in the 𝔼α,h[…], we average over all possibilities which samples of size n we take, there is also no resulting effect. If two, i reproduces and j dies, say, the effect is that distances to individual j are replaced by distances to individual i, and the new type of individual j is the type of i. Since all pairs within the sample resample at rate 1, the change in time dt is Embedded Image With Embedded Image

  • 2. Mutation: We note that the mutational mechanism can also be described by saying that every line in the Moran model mutates at rate Embedded Image, with outcome • with probability Θ and outcome ∘ with probability 1– Θ. Since mutation only affects the alleles of the individuals in the sample, we have in time dt Embedded Image With Embedded Image and analogously for Ui,∘.

  • 3. Selection: By the dynamics of the Moran model, we have to accept that selective events depend on the type of individuals. We say that the kth individual has fitness αχk := αχ(Uk), and χ is the fitness function. For genic selection and other modes of dominance, the fitness functions are denoted by χk and χk,m, respectively, where for other modes of dominance, m is some randomly picked (haploid) individual. (Note that n « N, so we have that m is outside of the sample with high probability.) The fitness functions are given by Embedded Image respectively, where we will have 1 ≤ k ≤ n and m > n below. As for resampling, selective events occur in a pair of one individual i giving birth and the other, individual j, dying, as given through the function θij from above. We start with genic selection (recall that 𝔼α[.] = 𝔼2α,1/2[.]). Here, we find that in time dt, since n « N, Embedded Image where summands are zero, if j > n, and summands with i n only give a negligible effect; for the remaining summands, i is any individual outside of the sample, so we choose i = n + 1 without loss of generality and obtain Embedded Image which gives by permuting sampling order of j and n + 1 in the first term, Embedded Image Note that the ≈ is exact in the limit N → ∞. For other modes of dominance, we find analogously the effect Embedded Image

In the calculations above, Φ can be any smooth function, which depends on a randomly picked sample. Now, we consider a sample of size n + j for some n, j ≥ 0 and focus on the function, for some 0 ≤ i ≤ n, Embedded Image Note that although Embedded Image is dealing with a sample of size n + j, the tree length is only computed with respect to the genealogical tree of the first n individuals. In words, the quantity Embedded Image is the Laplace-transform of the length of the genealogical tree of a sample of size n under selection, on the event that the first i individuals within the sample, as well as j additionally picked individuals (outside of the sample) carry the beneficial type •. For i = j = 0,Embedded Image is thus the Laplace-transform of the tree length of a sample of size n under selection and the main object of study in Theorems 1 and 2. The following lemma is an application of the general theory from 0.-3. above.

Lemma 10

For n ≥ 2, and with the convention that Embedded Image, in the limit N → ∞, Embedded Image for genic selection, and the last two terms change to Embedded Image for other modes of dominance.

Remark 11

Simple algebra shows that the α-terms in (23) and (24) agree for h = 1/2, i.e. (23) with 𝔼α[.] agrees with (24) for 𝔼2α,1/2[.] For future reference, note that for i = j = 0, the α-term in (24) gives Embedded Image Proof of Lemma 10. We will omit dependencies on t and write Embedded Image in the proof. The effect of tree growth on Embedded Image is that the tree grows by ndt in time dt, i.e. in time dt Embedded Image Let I = {1, …, i}, H = {i + 1, …, n} and J = {n + 1, …, n + j}. For resampling, we distinguish between events among I, events among I ∪ H, with at most one partner within I, events with one partner within I and the second among J, events with one partner in H and the second among J, and events with two partners in J. Only if two among I ∪ H coalesce, n decreases. This gives Embedded Image For mutation, we note that for Embedded Image, herefore the effect of mutation is Embedded Image Last, for selection, we have to distinguish the cases of genic selection and other modes of dominance. For genic selection, we have that χk = 1(Uk = •) and we note that for i ∈ I and j ∈ J,Embedded Image, therefore the effect is, from (20), Embedded Image For other modes of dominance Embedded Image Therefore, the effect of selection is here, from (21), Embedded Image □

The next result is collected from Theorem 4 and Lemma 8.1 in Depperschmidt et al. (2012).

Lemma 12

The process (R, U) of genealogical distances and types, has a unique equilibrium under ℙα,h. This equilibrium is described by Embedded Image for all possible Φ. Moreover, for this equilibrium, denoted by (R(∞), U(∞)), satisfies Embedded Image

5 Proof of Theorems 1 and 2

Proof of Theorem 1. To begin, we note that the quantities as defined in Remark 1 for n = 1 – since L1 = 0 – are given by a1 = b1 = c1 = e1 = 0. Moreover, we note that Embedded Image since the mutational history of single lines, leading to U1 and Un+1 are independent of the genealogy for α = 0 and therefore, from Lemma 12, we see that ãn := αan = 𝒪 (α). For the recursion on xn, we write – from Lemma 10 – Embedded Image In equilibrium, the right hand side must equal 0, and using this equality also for α = 0, we have Embedded Image which gives exactly (4), where an is given in (11). For the recursion on an, we write with Lemma 10 Embedded Image In equilibrium, both sides must be 0, and dividing both sides by α gives a recursion for an, but we still have to show that the last term is bn + 𝒪 (α). First, we can replace 𝔼α[.] by 𝔼0[.] in this expression, since we are making an error of order α at most. Then, for α = 0, i.e. neutral evolution, we note that two individuals k ≠ 𝓁, which have genealogical distance Rk𝓁, both have type • in either of two cases: (i) there is no mutation on the path between k, 𝓁 in the genealogy, and their joint ancestor has type •; (ii) there is a mutation on the path between k,𝓁, and both mutational events determining the types of k, 𝓁 give the type •. For α = 0, the mutational process is independent of coalescence events, hence we find the probabilities Embedded Image and Embedded Image in cases (i) and (ii), respectively. Hence, for any k, 𝓁 = 1, …, n + 2 and k ≠ 𝓁 Embedded Image Hence, we obtain that Embedded Image which proves (5), where bn is given as in (11). For the remaining recursions, we always work with α = 0 and set 𝔼[.] := 𝔼0[.]. In order to obtain a recursion for bn, consider a coalescent with n + 2 lines and distinguish the following cases for the first step:

  1. Coalescence of lines among the first n lines, except for lines 1,2 (rateEmbedded Image − 1);

  2. Coalescence of lines 1,2 (rate 1);

  3. Coalescence of lines n + 1 and 1 (rate 1);

  4. Coalescence of lines n + 1 and one of 2, …, n (rate n − 1);

  5. Coalescence of lines n + 1 and n + 2 (rate 1);

  6. Coalescence of lines n + 2 and one of 1, …, n (rate n);

Recalling Embedded Image, we write by a first-step decomposition Embedded Image This shows (6). For cn, we use the same coalescent, and by distinguishing the six cases, we write Embedded Image which shows (7). For dn, let B ∈ {2, …, n} be the number of lines in a coalescent starting with n lines, just before lines 1 and 2 coalesce. Then, Embedded Image Then, Embedded Image and we see that Embedded Image Finally, for en, we again use a recursion. Consider a coalescent with n + 1 lines and make a first-step-analysis. In this first step, we distinguish four cases:

  1. Coalescence of lines 1 or 2 with one of 3, …, n; rateEmbedded Image − 1

  2. Coalescence of lines 1 and 2; rate 1

  3. Coalescence of lines n + 1 and 1; rate 1

  4. Coalescence of lines n + 1 and one of 2, …, n; rate n − 1

Hence, Embedded Image This shows (8).□

Proof of Corollary 4. We have to compute Embedded Image. A close inspection of the recursions for x reveals that (i) xn is a sum of products, and in each summand, some factor dk enters and (ii) dk = 𝒪 (λ) for small λ for all k, which is best seen from (11). As a consequence, we can compute the derivative with respect to λ at λ = 0 in each summand which enters xn by taking the derivative of dk with respect to λ and set λ = 0 in all other factors. Summing up, we have the same recursions as in Theorem 1 with (i) λ = 0 in all terms except dn, and (ii) replace dn with the derivative according to λ at λ = 0. This gives the recursions as given in the corollary and Embedded Image with Embedded Image and the result follows. □

Proof of Corollary 5. Applying Theorem 1, we get Embedded Image This gives the first assertion. The second follows since a2 is 𝒪 (λ) and the derivative with respect to λ at λ = 0 is easily computed. □

Proof of Theorem 2. Starting in the same way as in the proof of Theorem 1, we get, as in (25) – recall (20) – Embedded Image For the last term, we compute Embedded Image Now, we use that 𝔼α,h[.] = 𝔼0[.] + 𝒪 (α) as in Lemma 12 in order to obtain an approximate recursion for the last term. Consider a coalescent with n + 2 lines and distinguish the following cases:

  1. Coalescence of lines among the first n lines (rate Embedded Image);

  2. Coalescence of lines n + 1 and 1 (rate 1);

  3. Coalescence of lines n + 1 and one of 2, …, n (rate n − 1);

  4. Coalescence of lines n + 1 and n + 2 (rate 1);

  5. Coalescence of lines n + 2 and one of 1, …, n (rate n).

We then get Embedded Image Using the definitions of hn and en from Remark 1 then gives the result. □

Proof of Corollary 7. The proof is basically the same as for Corollary 4: The recursions for yn is a sum of products, where each factor comes with a factor dn, and dn = 𝒪 (λ). Therefore, the derivative according to λ at λ = 0 is performed by taking derivatives only of dn and setting λ = 0 in all other instances. □

Proof of Corollary 8. Applying Theorem 2, we get Embedded Image and with d2 from the proof of Corollary 5, the result follows. □

A Proof of Proposition 9

For finite n, all results can be read off from Corollaries 4 and 5; see also Remark 6.1 for genic selection and Remark 6.2 for other modes of dominance. It remains to show uniformity in n. For the first assertion, we have that Embedded Image, where an ∼ bn if 0 < lim inf Embedded Image. From Remark 3, we can solve the recursions for Embedded Image and ãn, which all are of the form Embedded Image We want to find the behaviour of µn for large n. Hence, Embedded Image Since Embedded Image and the recursion for Embedded Image comes with κ = 1, we find that Embedded Image Next, the recursion for Embedded Image comes with κ = 2, and Embedded Image, so Embedded Image. In the recursion for Embedded Image, we have κ = 2, so Embedded Image. In the recursion for ãn, we have κ = 1, so Embedded Image log n. Finally, for Embedded Image, we write Embedded Image the first assertion of Proposition 9.

Similarly, in order to study the effect under other modes of dominance, we have that the recursion for Embedded Image comes with κ = 2, therefore Embedded Image. Then, Embedded Image which gives the second and third assertion of Proposition 9.

Acknowledgements

This research was supported by the DFG priority program SPP 1590, and in particular through grant Pf-672/8-1 to PP. We thank two anonymous referees for their careful reading and remarks, which helped to improve the manuscript.

References

  1. ↵
    Barton, N., A. Etheridge, and A. Sturm (2004). Coalescence in a random background. Annals of Applied Probability 14, no. 2, 754–785.
    OpenUrl
  2. ↵
    Coop, G. and R. C. Griffiths (2004). Ancestral inference on gene trees under selection. Theo. Pop. Biol. 66(3), 219–232.
    OpenUrl
  3. ↵
    Depperschmidt, A., A. Greven, and P. Pfaffelhuber (2012). Tree-valued Fleming–Viot dynamics with mutation and selection. Annals of Applied Probability 22(6), 2560–2615.
    OpenUrl
  4. ↵
    Etheridge, A. (2001). An introduction to superprocesses. American Mathematical Society.
  5. ↵
    Ewens, W. (2004). Mathematical PopulationGenetics. I. Theoretical introduction. Second edition. Springer.
  6. ↵
    Ewing, G. and J. Hermisson (2010). Msms: a coalescent simulation program including re-combination, demographic structure and selection at a single locus. Bioinformatics 26(16), 2064–2065.
    OpenUrlCrossRefPubMedWeb of Science
  7. ↵
    Hudson, R. (1983). Properties of a neutral allele model with intragenic recombination. Theo. Pop. Biol. 23, 183–201.
    OpenUrl
  8. ↵
    Kaplan, N., T. Darden, and R. Hudson (1988). The coalescent process in models with selection. Genetics 120, 819–829.
    OpenUrlAbstract/FREE Full Text
  9. ↵
    Kern, A. D. and D. R. Schrider (2016). Discoal: flexible coalescent simulations with selection. Bioinformatics 32(24), 3839–3841.
    OpenUrlCrossRefPubMed
  10. ↵
    Kingman, J. (1982). The coalescent. Stochastic Process. Appl. 13(3), 235–248.
    OpenUrlCrossRef
  11. ↵
    Krone, S. and C. Neuhauser (1997). Ancestral processes with selection. Theo. Pop. Biol. 51, 210–237.
    OpenUrl
  12. ↵
    Neuhauser, C. and S. Krone (1997). The genealogy of samples in models with selection. Genetics 154, 519–534.
    OpenUrl
  13. ↵
    Przeworski, M., B. Charlesworth, and J. D. Wall (1999). Genealogies and weak purifying selection. Molecular biology and evolution 16(2), 246–252.
    OpenUrlCrossRefPubMedWeb of Science
  14. ↵
    Taylor, J. (2007). The common ancestor process for a Wright-Fisher diffusion. Electron. J. Probab. 12, 808–847.
    OpenUrl
  15. ↵
    Wakeley, J. (2008). Coalescent Theory: An Introduction. Roberts & Company.
  16. ↵
    Wakeley, J. (2010). Natural selection and coalescent theory. In Evolution since Darwin: The First 150 Years, pp. 119–149. Sunderland, MA: Sinauer and Associates.
  17. ↵
    Wiuf, C. and J. Hein (1999). Recombination as a point process along sequences. Theo. Pop. Biol. 55, 248–259.
    OpenUrl
Back to top
PreviousNext
Posted April 02, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Genealogical distances under low levels of selection
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Genealogical distances under low levels of selection
Elisabeth Huss, Peter Pfaffelhuber
bioRxiv 495770; doi: https://doi.org/10.1101/495770
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Genealogical distances under low levels of selection
Elisabeth Huss, Peter Pfaffelhuber
bioRxiv 495770; doi: https://doi.org/10.1101/495770

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Evolutionary Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (4235)
  • Biochemistry (9140)
  • Bioengineering (6784)
  • Bioinformatics (24008)
  • Biophysics (12132)
  • Cancer Biology (9537)
  • Cell Biology (13782)
  • Clinical Trials (138)
  • Developmental Biology (7638)
  • Ecology (11707)
  • Epidemiology (2066)
  • Evolutionary Biology (15513)
  • Genetics (10648)
  • Genomics (14329)
  • Immunology (9484)
  • Microbiology (22849)
  • Molecular Biology (9095)
  • Neuroscience (49005)
  • Paleontology (355)
  • Pathology (1483)
  • Pharmacology and Toxicology (2570)
  • Physiology (3848)
  • Plant Biology (8332)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2296)
  • Systems Biology (6193)
  • Zoology (1301)