Analysis and rejection sampling of Wright–Fisher diffusion bridges
Introduction
The Wright–Fisher Markov chain is of central importance in population genetics, and it has contributed greatly to the understanding of the patterns of genetic variation seen in natural populations. Much recent work has focused on developing sampling theory for neutral sites linked to sites under selection (Smith and Haigh, 1974, Kaplan et al., 1989, Nielsen et al., 2005, Etheridge et al., 2006). Typically, the site under selection is assumed to have dynamics governed by the diffusion process limit of the Wright–Fisher chain, in which case the genealogy of linked neutral sites can be constructed using the framework of Hudson and Kaplan (1988). However, due to the complicated nature of this model, analytical theory is necessarily approximate, and the main focus is on simulation methods. In particular, a number of simulation programs, including mbs (Teshima and Innan, 2009) and msms (Ewing and Hermisson, 2010), have recently appeared to help facilitate the simulation of neutral genealogies linked to sites undergoing a Wright–Fisher diffusion with selection.
Simulations of Wright–Fisher paths under selection can be easily carried out using standard techniques for simulating diffusions. Frequently, however, it is necessary to simulate a Wright–Fisher path conditioned on some particular outcome. For example, to simulate the path of an allele under selection that is currently at frequency , a time-reversal argument shows that it is possible to simulate a path starting at conditioned to hit 0 eventually (Maruyama, 1974). However, more complicated scenarios, including the action of natural selection on standing genetic variation, require more elaborate simulation methods (Peter et al., 2012).
The stochastic process describing an allele that starts at frequency at time 0 and is conditioned to end at frequency at time is called a bridge between and in time , or a bridge between and over the time interval . Wright–Fisher diffusion bridges appear naturally in the study of selection acting on standing variation because it is necessary to know the path taken by an allele at current frequency that fell under the influence of natural selection at a time generations in the past when it was segregating neutrally at frequency . Wright–Fisher diffusion bridges are also of interest for their application to inference of selection from allele frequency time series (Bollback et al., 2008, Malaspinas et al., 2012, Mathieson and McVean, 2013, Feder et al., 2013). In particular, analysis of bridges can help determine the extent to which more signal is gained by adding further intermediate time points.
In addition to their applied interest, there are interesting theoretical questions surrounding Wright–Fisher diffusion bridges. For alleles conditioned to eventually fix, Maruyama (1974) showed that the distribution of the trajectory does not depend on the sign of the selection coefficient; that is, both positively and negatively selected alleles with the same absolute value of the selection coefficient exhibit the same dynamics conditioned on eventual fixation. It is natural to inquire whether the analogous result holds for a bridge between any two interior points. Moreover, the degree to which a Wright–Fisher bridge with selection will differ from a Wright–Fisher bridge under neutrality is not known (in connection with this question, we recall the well-known fact that the distribution of a bridge for a Brownian motion with drift does not depend on the drift parameter, and so it is conceivable that the presence of selection has little or no effect on the behavior of Wright–Fisher bridges). Lastly, the characteristics of the sample paths of the frequency of alleles destined to be lost in a fixed amount of time are not only interesting theoretically but may also have applications to geographically structured populations (Slatkin and Excoffier, 2012).
Here, we investigate various features of Wright–Fisher diffusion bridges. The paper is structured as follows. First, we establish analytical results for neutral Wright–Fisher bridges. Then, we derive a novel rejection sampler for Wright–Fisher bridges with selection, and use it to study the properties of such processes. For example, we estimate the distribution of the maximum of a bridge from 0 to 0 under selection, and investigate how this distribution depends on the strength of selection.
Section snippets
Background
The Wright–Fisher diffusion with genic selection is a diffusion process with state space and infinitesimal generator When , the diffusion is said to be neutral; otherwise, the drift term captures the strength and direction of natural selection.
The corresponding Wright–Fisher diffusion bridge, is the stochastic process that results from conditioning the Wright–Fisher diffusion to start with value at time 0 and end with value at
Transition densities for the neutral Wright–Fisher diffusion
When there is no natural selection (i.e., ), the transition densities of the Wright–Fisher diffusion can be expressed as where the are the transition functions of a death process starting at infinity with death rate when individuals are left alive and is the density of the Beta distribution with parameters and (Ethier and Griffiths, 1993). That is, is the probability that a Kingman coalescent tree with
General framework
When selection is incorporated into the Wright–Fisher model, there is no known series formula for the transition density akin to (3.1) (but see Kimura, 1955, Kimura, 1957 for attempts using perturbation theory, as well as Song and Steinrücken (2012) and Steinrücken et al. (2012) for methods of approximating an eigenfunction expansion computationally). Therefore, analytical results for distributions associated with the corresponding bridge like those we obtained in the neutral case are not
Discussion
We have examined the behavior of Wright–Fisher diffusion bridges under both neutral models and models with genic selection. Although various conditioned Wright–Fisher diffusions have been studied in the past, Wright–Fisher diffusions conditioned to obtain a specific value at a predetermined time have not been studied extensively. We have elucidated some of the properties of Wright–Fisher bridges using a combination of analytical theory and simulations.
In contrast to Brownian motion with drift,
Acknowledgments
The authors thank M. Slatkin and B. Peter for initial discussions that led to our interest in this topic.
JGS was supported in part by NIH NRSA trainee appointment grant T32-HG00047 and by NIH grant R01-GM40282. RCG was supported by the Miller Institute for Basic Research in Science, University of California at Berkeley. SNE was supported in part by NSF grant DMS-0907630.
References (30)
- et al.
Exact simulation of diffusions
Annals of Applied Probability
(2005) - et al.
Estimation of 2Nes from temporal allele frequency data
Genetics
(2008) - et al.
An Introduction to Population Genetics Theory
(1970) - et al.
On the joint distribution of the maximum and its location for a linear diffusion
Annales de l’Institut Henri Poincaré, Probabilités et Statistiques
(1987) - et al.
An approximate sampling formula under genetic hitchhiking
Annals of Applied Probability
(2006) - et al.
The transition function of a Fleming–Viot process
Annals of Probability
(1993) - et al.
MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus
Bioinformatics (Oxford, England)
(2010) - Feder, A., Kryazhimskiy, S., Plotkin, J.B., 2013. Identifying signatures of selection in genetic time series. arXiv...
On the dominance ratio
Proceeding of the Royal Society of Edinburg
(1922)- et al.
Diffusion processes and coalescent trees
The coalescent process in models with selection and recombination
Genetics
The “hitchhiking effect” revisited
Genetics
Some problems of stochastic processes in genetics
The Annals of Mathematical Statistics
Stochastic processes and distribution of gene frequencies under natural selection
Cited by (19)
Wright–Fisher diffusion bridges
2018, Theoretical Population BiologyCitation Excerpt :In this section we consider the genealogy of a Wright–Fisher bridge when there is selection in the model. This is a new approach different from that in Schraiber et al. (2013). The genealogy of the Wright–Fisher diffusion with selection is more complex that in a neutral model and the transition functions for the coalescent genealogy do not have an explicit form.
An informational transition in conditioned Markov chains: Applied to genetics and evolution
2016, Journal of Theoretical BiologyCitation Excerpt :We make the assumption that mutation can be neglected during the finite time interval separating two observations.2 We define a conditioned trajectory to be the set of states of a population that are sequentially visited over time, where the population starts in a specific state, at the initial time, and ends in a specific set of states at the final time (for work on conditioned trajectories see Zhao et al., 2013, 2014; Schraiber et al., 2013). The set of final states could consist of multiple states of the population, or just a single state, as we shall assume in some illustrative examples.
Exact simulation of conditioned wright-fisher models
2014, Journal of Theoretical BiologyCitation Excerpt :This fact is exploited to produce a simple simulation method that differs from the direct method described above. No such simple simulation method is known for the diffusion approximation of the Wright–Fisher model, where the only method for the generation of conditioned continuous state/continuous time trajectories is based on trajectory rejection (Schraiber et al., 2013). The work of Schraiber et al. (2013) uses a non-linear change of variables combined with Girsanov׳s theorem, and it is by no means obvious how to extend this methodology to more complex/higher dimensional problems, such as populations with multiple alleles with selection, or populations spread over multiple patches.
Core elements of a TPB paper
2014, Theoretical Population BiologyA path integral formulation of the Wright-Fisher process with genic selection
2014, Theoretical Population Biology