Analysis and rejection sampling of Wright–Fisher diffusion bridges

doi:10.1016/j.tpb.2013.08.005

Theoretical Population Biology

Volume 89, November 2013, Pages 64-74

https://doi.org/10.1016/j.tpb.2013.08.005 Get rights and content

Abstract

We investigate the properties of a Wright–Fisher diffusion process starting at frequency $x$ at time 0 and conditioned to be at frequency $y$ at time $T$ . Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on standing variation and in the inference of selection from allele frequency time series. We establish a number of results about the distribution of neutral Wright–Fisher bridges and develop a novel rejection-sampling scheme for bridges under selection that we use to study their behavior.

Introduction

The Wright–Fisher Markov chain is of central importance in population genetics, and it has contributed greatly to the understanding of the patterns of genetic variation seen in natural populations. Much recent work has focused on developing sampling theory for neutral sites linked to sites under selection (Smith and Haigh, 1974, Kaplan et al., 1989, Nielsen et al., 2005, Etheridge et al., 2006). Typically, the site under selection is assumed to have dynamics governed by the diffusion process limit of the Wright–Fisher chain, in which case the genealogy of linked neutral sites can be constructed using the framework of Hudson and Kaplan (1988). However, due to the complicated nature of this model, analytical theory is necessarily approximate, and the main focus is on simulation methods. In particular, a number of simulation programs, including mbs (Teshima and Innan, 2009) and msms (Ewing and Hermisson, 2010), have recently appeared to help facilitate the simulation of neutral genealogies linked to sites undergoing a Wright–Fisher diffusion with selection.

Simulations of Wright–Fisher paths under selection can be easily carried out using standard techniques for simulating diffusions. Frequently, however, it is necessary to simulate a Wright–Fisher path conditioned on some particular outcome. For example, to simulate the path of an allele under selection that is currently at frequency $x$ , a time-reversal argument shows that it is possible to simulate a path starting at $x$ conditioned to hit 0 eventually (Maruyama, 1974). However, more complicated scenarios, including the action of natural selection on standing genetic variation, require more elaborate simulation methods (Peter et al., 2012).

The stochastic process describing an allele that starts at frequency $x$ at time 0 and is conditioned to end at frequency $y$ at time $T$ is called a bridge between $x$ and $y$ in time $T$ , or a bridge between $x$ and $y$ over the time interval $[0, T]$ . Wright–Fisher diffusion bridges appear naturally in the study of selection acting on standing variation because it is necessary to know the path taken by an allele at current frequency $y$ that fell under the influence of natural selection at a time $T$ generations in the past when it was segregating neutrally at frequency $x$ . Wright–Fisher diffusion bridges are also of interest for their application to inference of selection from allele frequency time series (Bollback et al., 2008, Malaspinas et al., 2012, Mathieson and McVean, 2013, Feder et al., 2013). In particular, analysis of bridges can help determine the extent to which more signal is gained by adding further intermediate time points.

In addition to their applied interest, there are interesting theoretical questions surrounding Wright–Fisher diffusion bridges. For alleles conditioned to eventually fix, Maruyama (1974) showed that the distribution of the trajectory does not depend on the sign of the selection coefficient; that is, both positively and negatively selected alleles with the same absolute value of the selection coefficient exhibit the same dynamics conditioned on eventual fixation. It is natural to inquire whether the analogous result holds for a bridge between any two interior points. Moreover, the degree to which a Wright–Fisher bridge with selection will differ from a Wright–Fisher bridge under neutrality is not known (in connection with this question, we recall the well-known fact that the distribution of a bridge for a Brownian motion with drift does not depend on the drift parameter, and so it is conceivable that the presence of selection has little or no effect on the behavior of Wright–Fisher bridges). Lastly, the characteristics of the sample paths of the frequency of alleles destined to be lost in a fixed amount of time are not only interesting theoretically but may also have applications to geographically structured populations (Slatkin and Excoffier, 2012).

Here, we investigate various features of Wright–Fisher diffusion bridges. The paper is structured as follows. First, we establish analytical results for neutral Wright–Fisher bridges. Then, we derive a novel rejection sampler for Wright–Fisher bridges with selection, and use it to study the properties of such processes. For example, we estimate the distribution of the maximum of a bridge from 0 to 0 under selection, and investigate how this distribution depends on the strength of selection.

Section snippets

Background

The Wright–Fisher diffusion with genic selection is a diffusion process ${X_{t}, t \geq 0}$ with state space $[0, 1]$ and infinitesimal generator $L = γ x (1 - x) \frac{\partial}{\partial x} + \frac{1}{2} x (1 - x) \frac{\partial^{2}}{\partial x^{2}} .$ When $γ = 0$ , the diffusion is said to be neutral; otherwise, the drift term captures the strength and direction of natural selection.

The corresponding Wright–Fisher diffusion bridge, ${X_{t}^{x, z, [0, T]}, 0 \leq t \leq T}$ is the stochastic process that results from conditioning the Wright–Fisher diffusion to start with value $x$ at time 0 and end with value $z$ at

Transition densities for the neutral Wright–Fisher diffusion

When there is no natural selection (i.e., $γ = 0$ ), the transition densities of the Wright–Fisher diffusion can be expressed as $f (x, y; t) = \sum_{l = 2}^{\infty} q_{l} (t) \sum_{k = 1}^{l - 1} (\binom{l}{k}) x^{k} {(1 - x)}^{l - k} B (y; k, l - k),$ where the $q_{l} (t)$ are the transition functions of a death process starting at infinity with death rate $\frac{1}{2} n (n - 1)$ when $n$ individuals are left alive and $B (\cdot; α, β)$ is the density of the Beta distribution with parameters $α$ and $β$ (Ethier and Griffiths, 1993). That is, $q_{l} (t)$ is the probability that a Kingman coalescent tree with

General framework

When selection is incorporated into the Wright–Fisher model, there is no known series formula for the transition density akin to (3.1) (but see Kimura, 1955, Kimura, 1957 for attempts using perturbation theory, as well as Song and Steinrücken (2012) and Steinrücken et al. (2012) for methods of approximating an eigenfunction expansion computationally). Therefore, analytical results for distributions associated with the corresponding bridge like those we obtained in the neutral case are not

Discussion

We have examined the behavior of Wright–Fisher diffusion bridges under both neutral models and models with genic selection. Although various conditioned Wright–Fisher diffusions have been studied in the past, Wright–Fisher diffusions conditioned to obtain a specific value at a predetermined time have not been studied extensively. We have elucidated some of the properties of Wright–Fisher bridges using a combination of analytical theory and simulations.

In contrast to Brownian motion with drift,

Acknowledgments

The authors thank M. Slatkin and B. Peter for initial discussions that led to our interest in this topic.

JGS was supported in part by NIH NRSA trainee appointment grant T32-HG00047 and by NIH grant R01-GM40282. RCG was supported by the Miller Institute for Basic Research in Science, University of California at Berkeley. SNE was supported in part by NSF grant DMS-0907630.

References (30)

A. Beskos et al.
Exact simulation of diffusions
Annals of Applied Probability
(2005)
J.P. Bollback et al.
Estimation of 2Nes from temporal allele frequency data
Genetics
(2008)
J.F. Crow et al.
An Introduction to Population Genetics Theory
(1970)
E. Csáki et al.
On the joint distribution of the maximum and its location for a linear diffusion
Annales de l’Institut Henri Poincaré, Probabilités et Statistiques
(1987)
A. Etheridge et al.
An approximate sampling formula under genetic hitchhiking
Annals of Applied Probability
(2006)
S.N. Ethier et al.
The transition function of a Fleming–Viot process
Annals of Probability
(1993)
G. Ewing et al.
MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus
Bioinformatics (Oxford, England)
(2010)
Feder, A., Kryazhimskiy, S., Plotkin, J.B., 2013. Identifying signatures of selection in genetic time series. arXiv...
R. Fisher
On the dominance ratio
Proceeding of the Royal Society of Edinburg
(1922)
R.C. Griffiths et al.
Diffusion processes and coalescent trees

R.R. Hudson et al.

The coalescent process in models with selection and recombination

Genetics

(1988)

N. Ikeda et al.

N.L. Kaplan et al.

The “hitchhiking effect” revisited

Genetics

(1989)

M. Kimura

Some problems of stochastic processes in genetics

The Annals of Mathematical Statistics

(1957)

M. Kimura

Stochastic processes and distribution of gene frequencies under natural selection

Cited by (19)

Wright–Fisher diffusion bridges
2018, Theoretical Population Biology
Citation Excerpt :
In this section we consider the genealogy of a Wright–Fisher bridge when there is selection in the model. This is a new approach different from that in Schraiber et al. (2013). The genealogy of the Wright–Fisher diffusion with selection is more complex that in a neutral model and the transition functions for the coalescent genealogy do not have an explicit form.
The trajectory of the frequency of an allele which begins at $x$ at time 0 and is known to have frequency $z$ at time $T$ can be modelled by the bridge process of the Wright–Fisher diffusion. Bridges when $x = z = 0$ are particularly interesting because they model the trajectory of the frequency of an allele which appears at a time, then is lost by random drift or mutation after a time $T$ . The coalescent genealogy back in time of a population in a neutral Wright–Fisher diffusion process is well understood. In this paper we obtain a new interpretation of the coalescent genealogy of the population in a bridge from a time $t \in (0, T)$ . In a bridge with allele frequencies of 0 at times 0 and $T$ the coalescence structure is that the population coalesces in two directions from $t$ to 0 and $t$ to $T$ such that there is just one lineage of the allele under consideration at times 0 and $T$ . The genealogy in Wright–Fisher diffusion bridges with selection is more complex than in the neutral model, but still with the property of the population branching and coalescing in two directions from time $t \in (0, T)$ . The density of the frequency of an allele at time $t$ is expressed in a way that shows coalescence in the two directions. A new algorithm for exact simulation of a neutral Wright–Fisher bridge is derived. This follows from knowing the density of the frequency in a bridge and exact simulation from the Wright–Fisher diffusion. The genealogy of the neutral Wright–Fisher bridge is also modelled by branching Pólya urns, extending a representation in a Wright–Fisher diffusion. This is a new very interesting representation that relates Wright–Fisher bridges to classical urn models in a Bayesian setting.
An informational transition in conditioned Markov chains: Applied to genetics and evolution
2016, Journal of Theoretical Biology
Citation Excerpt :
We make the assumption that mutation can be neglected during the finite time interval separating two observations.2 We define a conditioned trajectory to be the set of states of a population that are sequentially visited over time, where the population starts in a specific state, at the initial time, and ends in a specific set of states at the final time (for work on conditioned trajectories see Zhao et al., 2013, 2014; Schraiber et al., 2013). The set of final states could consist of multiple states of the population, or just a single state, as we shall assume in some illustrative examples.
In this work we assume that we have some knowledge about the state of a population at two known times, when the dynamics is governed by a Markov chain such as a Wright–Fisher model. Such knowledge could be obtained, for example, from observations made on ancient and contemporary DNA, or during laboratory experiments involving long term evolution. A natural assumption is that the behaviour of the population, between observations, is related to (or constrained by) what was actually observed. The present work shows that this assumption has limited validity. When the time interval between observations is larger than a characteristic value, which is a property of the population under consideration, there is a range of intermediate times where the behaviour of the population has reduced or no dependence on what was observed and an equilibrium-like distribution applies. Thus, for example, if the frequency of an allele is observed at two different times, then for a large enough time interval between observations, the population has reduced or no dependence on the two observed frequencies for a range of intermediate times. Given observations of a population at two times, we provide a general theoretical analysis of the behaviour of the population at all intermediate times, and determine an expression for the characteristic time interval, beyond which the observations do not constrain the population's behaviour over a range of intermediate times. The findings of this work relate to what can be meaningfully inferred about a population at intermediate times, given knowledge of terminal states.
Exact simulation of conditioned wright-fisher models
2014, Journal of Theoretical Biology
Citation Excerpt :
This fact is exploited to produce a simple simulation method that differs from the direct method described above. No such simple simulation method is known for the diffusion approximation of the Wright–Fisher model, where the only method for the generation of conditioned continuous state/continuous time trajectories is based on trajectory rejection (Schraiber et al., 2013). The work of Schraiber et al. (2013) uses a non-linear change of variables combined with Girsanov׳s theorem, and it is by no means obvious how to extend this methodology to more complex/higher dimensional problems, such as populations with multiple alleles with selection, or populations spread over multiple patches.
Forward and backward simulations play an increasing role in population genetics, in particular when inferring the relative importance of evolutionary forces. It is therefore important to develop fast and accurate simulation methods for general population genetics models. Here we present an exact simulation method that generates trajectories of an allele׳s frequency in a finite population, as described by a general Wright–Fisher model. The method generates conditioned trajectories that start from a known frequency at a known time, and which achieve a specific final frequency at a known final time. The simulation method applies irrespective of the smallness of the probability of the transition between the initial and final states, because it is not based on rejection of trajectories. We illustrate the method on several different populations where a Wright–Fisher model (or related) applies, namely (i) a locus with 2 alleles, that is subject to selection and mutation; (ii) a locus with 3 alleles, that is subject to selection; (iii) a locus in a metapopulation consisting of two subpopulations of finite size, that are subject to selection and migration. The simulation method allows the generation of conditioned trajectories that can be used for the purposes of visualisation, the estimation of summary statistics, and the development/testing of new inferential methods. The simulated trajectories provide a very simple approach to estimating quantities that cannot easily be expressed in terms of the transition matrix, and can be applied to finite Markov chains other than the Wright–Fisher model.
Core elements of a TPB paper
2014, Theoretical Population Biology
A path integral formulation of the Wright-Fisher process with genic selection
2014, Theoretical Population Biology
The Wright–Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright–Fisher process with selection and use that formalism to obtain a simple perturbation series to approximate the transition density. The perturbation series can be understood in terms of Feynman diagrams, which have a simple probabilistic interpretation in terms of selective events. The perturbation series proves to be an accurate approximation of the transition density for weak selection and is shown to be arbitrarily accurate for any selection coefficient.
Unbiased likelihood-based estimation of Wright-Fisher diffusion processes
2023, arXiv

View all citing articles on Scopus

View full text

Analysis and rejection sampling of Wright–Fisher diffusion bridges

Abstract

Introduction

Section snippets

Background

Transition densities for the neutral Wright–Fisher diffusion

General framework

Discussion

Acknowledgments

Exact simulation of diffusions

Annals of Applied Probability

Estimation of 2Nes from temporal allele frequency data

Genetics

An Introduction to Population Genetics Theory

On the joint distribution of the maximum and its location for a linear diffusion

Annales de l’Institut Henri Poincaré, Probabilités et Statistiques

An approximate sampling formula under genetic hitchhiking

Annals of Applied Probability

The transition function of a Fleming–Viot process

Annals of Probability

MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus

Bioinformatics (Oxford, England)

On the dominance ratio

Proceeding of the Royal Society of Edinburg

Diffusion processes and coalescent trees

The coalescent process in models with selection and recombination

Genetics

The “hitchhiking effect” revisited

Genetics

Some problems of stochastic processes in genetics

The Annals of Mathematical Statistics

Stochastic processes and distribution of gene frequencies under natural selection