Speeding up Inference of Homologous Recombination in Bacteria

Bacteria reproduce clonally but most species recombine frequently, so that the ancestral process is best captured using an ancestral recombination graph. This graph model is often too complex to be used in an inferential setup, but it can be approximated for example by the ClonalOrigin model. Inference in the ClonalOrigin model is performed via a Reversible-Jump Markov Chain Monte Carlo algorithm, which attempts to jointly explore: the recombination rate, the number of recombination events, the departure and arrival points on the clonal genealogy for each recombination event, and the range of genomic sites affected by each recombination event. However, the Reversible-Jump algorithm usually performs poorly due to the complexity of the target distribution since it needs to explore spaces of different dimensions. Recent developments in Bayesian computation methodology have provided ways to improve existing methods and code, but are not well-known outside the statistics community. We show how exploiting one of these new computational methods can lead to faster inference under the ClonalOrigin model.


Introduction
Recombination is a critical process in evolution, particularly when analysing within-species variation. Bacteria reproduce clonally, but recombination exists in most species, where a donor cell contributes a small segment of its DNA to a recipient cell. This process is analogous to gene conversion in eukaryotes and is typically modelled using an Ancestral Recombination Graph (ARG) model (Hudson, 1990;Griffiths, 1996;Wiuf and Hein, 2000). The simulation of genomic data under this model is relatively easy (Didelot et al., 2009;Brown et al., 2016) but inference of the ancestral process given genomic data is much harder. This inference problem is important even when recombination is not directly of interest, because ignoring recombination leads to inaccurate reconstruction of the clonal part of the ancestry (Schierup and Hein, 2000;Hedge and Wilson, 2014).
The ClonalOrigin model (Didelot et al., 2010) can be regarded as a good approximation of the aforementioned ARG process, in which recombination events are modelled independently given the clonal genealogy, denoted throughout as T . For completeness, we note the existence of related approximations to the ARG such as the sequential Markov coalescent for eukaryotes (McVean and Cardin, 2005;Marjoram and Wall, 2006) and the bacterial sequential Markov coalescent (De Maio and Wilson, 2017). There are also simpler approximations such as the ClonalFrame model (Didelot and Falush, 2007;Didelot and Wilson, 2015) in which the origin of each recombination event is not modelled.
The major challenge when implementing the ClonalOrigin model is to efficiently explore the joint posterior distribution of the recombination rate (ρ), the number of recombination events (R), their departures (a 1 , ..., a R ) and arrivals (b 1 ,...,b R ) on the clonal genealogy and the sites delimiting the start (x 1 , ..., x R ) and end (y 1 , ..., y R ) points of each recombination event on the genome. In order to explore such a complex distribution using Markov Chain Monte Carlo (MCMC), one must resort to the Reversible-Jump MCMC (RJMCMC) algorithm (Green, 1995) which in plain words aims at visiting spaces of different dimensions. This is the approach that was taken previously in both the original standalone implementation of the ClonalOrigin model (Didelot et al., 2010) and a recent reimplementation (Vaughan et al., 2017) within the BEAST2 framework (Bouckaert et al., 2019).
Unfortunately, as known by computational statisticians, the RJMCMC algorithm usually performs poorly due to the difficulty of proposing "good" trans-dimensional jumps. Because of this, the number of iterations (and consequently the running time) required by the algorithm for obtaining a reasonable approximation of the posterior distribution may be impractically large. Recent developments in Bayesian computation methodology provide ways of improving existing methods and code, but are not well-known outside the statistics community.
Such state-of-the-art methods belong to the framework of Sequential Monte Carlo (SMC) methods (see e.g. Doucet et al., 2001;Del Moral et al., 2006) where inference is performed using an appropriate sequence of intermediate target distributions leading to the desired one. Some of these methods have been successfully implemented to some extent when inferring coalescent trees as data arrives (see e.g. Dinh et al., 2018;Everitt et al., 2019). By performing inference in a sequential manner the complexity of the problem is reduced and the the desired target distribution may be explored more efficiently. The ideas presented here are based on the aforementioned SMC methodology that can lead to faster inference in the ClonalOrigin model; nonetheless, these ideas could be in principle applied to other evolutionary models.

Bayesian inference under the ClonalOrigin model
In this section we briefly describe the elements of the ClonalOrigin model, namely the parameters of interest, prior distributions on these parameters and the likelihood function for the data. For more details on assumptions and derivations of formulae please refer to cited references and the supplementary material.
• We use a coalescent tree to represent the clonal genealogy of n samples (D = D 1:n ) and denote such a tree by T = (τ, t), which is composed of a topology τ and a vector t = (t 2 , . . . , t n ) of branch lengths. A Kingman's coalescent prior (Kingman, 1982) is assumed for T .
• Let R denote the number of recombination events affecting the DNA sequences, which we assume a priori to be distributed according to a Poisson random variable with mean ρL/2, where ρ denotes the global recombination rate and L = n i=2 it i is the total branch length of the tree T , see e.g. Wiuf and Hein (1999) for more details. We also let L (s, t) denote the sum of branch lengths from time s to time t on the tree.
• Each recombination event is formed by the following variables: departure and arrival points on the genealogy, and start and end sites on the genome. These four variables, when referring to the i-th recombination edge, are denoted by a i , b i , x i and y i , respectively.
• Both variables a i and b i are fully determined by a time and lineage on the tree (denoted respectively by a i [t] and a i [l], respectively), and the chosen priors are those from the construction of an ARG as a point process along the genomes (Wiuf and Hein, 1999).
• The priors for x i and y i are constructed assuming a uniform distribution on the sequence for x i and a geometric distribution of mean δ > 0 for the difference y i − x i | x i . When the sequence is made of B blocks comprising a total length of L the priors need to be modified accordingly as in Didelot et al. (2010).
• The entire set of variables describing the recombination events is denoted by R = (R, a 1:R , b 1:R , x 1:R , y 1:R ) .
• Mutation events occur at rate θ/2 across the genealogy and on existing recombination edges; for simplicity we assume that all substitutions are equally likely, as in the evolutionary model JC69 (Jukes and Cantor, 1969).
Under the above the assumptions, the full prior for the set of parameters of interest is given by (see the supplementary material for the derivation) where p 0,ρ , p 0,δ , p 0,θ are arbitrary priors for the recombination rate, mean-tract length and mutation rate. For the likelihood computation, first recall that mutation events occur at rate θ/2 across the genealogy and on existing recombination edges; we then assume for simplicity that all substitutions are equally likely (Jukes and Cantor, 1969). The computation of the likelihood function L (T , R, θ; D), for any tree T = (τ, t), set of recombination events R, and mutation rate θ, is done using Felsenstein's pruning algorithm (Felsenstein, 1973(Felsenstein, , 1981) (see also the supplementary material for more details).
The posterior on the full set of parameters is obtained through Bayes' Theorem However, inferring the whole set of parameters represents a big challenge. One could instead fix one or more parameters and work with the resulting conditional distributions, for example finding a point estimate of T and then fix it to infer the rest of the parameters. We will consider this incomplete approach for some of the examples presented later where we aim to explore The algorithm Markov chain Monte Carlo (MCMC) is commonly the method of choice for exploring posterior distributions. If we had access to the the marginal posterior for the number of recombination events π (R | ρ, δ, θ, T , D), the inference of the remaining parameters could be carried out independently across different values of R. Since such a marginal is not available we need to infer R jointly with the rest of the parameters leading to a posterior that has no fixed dimension. The celebrated Reversible-Jump MCMC (RJMCMC) algorithm (Green, 1995) provides an elegant solution and corresponds to the generalisation of the Metropolis-Hastings (MH) algorithm that allows "jumps" across different dimensions. In our context, these jumps will correspond only to going up or down by one dimension, i.e. given R ≥ 1 recombination events they can go up to R + 1 or down to R − 1. In addition to these trans-dimensional moves, we still need to perform intra-dimensional moves that allow the full exploration of the desired posterior. Therefore, for fixed R we could also perform moves on ρ, δ, θ, T and R | R = {a 1:R , b 1:R , x 1:R , y 1:R }.
Unfortunately, RJMCMC typically suffers from bad mixing in the sense that the resulting chain converges slowly to the desired posterior; this will in turn require a prohibitively large number of iterations for obtaining accurate answers. Due to recent developments in methodology (Karagiannis and Andrieu, 2013;Andrieu et al., 2018), the acceptance ratio for trans-dimensional moves within the RJMCMC can be understood as an importance sampling estimate of the ratio of two marginal densities, i.e. for an upwards move the ratio corresponds to π (R + 1 | D) /π (R | D), and using a single importance point (more details on this can be found in the supplementary material). One could then ask whether improving the aforementioned estimate could result in a chain with better mixing. The answer turns out to be positive but with some restrictions. Two straightforward approaches to achieve variance reduction of the estimator are annealed importance sampling (Neal, 2001) and simply using more than one importance point.
In plain words, the annealing procedure creates a smooth bridge between distributions of different dimension; so in a sense instead of attempting one big jump from R to R + 1 we attempt many, say T > 1, smaller jumps. Clearly, the downside of this approach is the extra cost of performing T steps that could result in a rejection in the MCMC algorithm. Whether annealing provides an advantage or not will entirely depend on the quality of the smaller jumps. In the ClonalOrigin context, when proposing to add a new recombination event that is in a bad region of the posterior, the idea is to use the smaller jumps to correct its position before deciding to accept or not.
On the other hand, considering N ≥ 2 importance points when jumping from R to R + 1 requires the creation of several proposed recombination events which are used for estimating the aforementioned ratio of marginals. However, if the upwards ends up being accepted one needs to select which of the proposed recombination events will be retained for the next iteration of the algorithm. To do this, a categorical distribution is used that selects which recombination event will survive according to its weight or contribution to the estimate of the acceptance ratio. Therefore, a recombination event that is more plausible than the rest will have a larger weight and, if the upwards move is accepted, it will have a greater chance of being retained ready for the next iteration. Figure 1 illustrates both the annealing and the multiple importance points schemes when T = 5 and N = 4. The trees in black correspond to the current value of T on which recombination events are appended. Colours correspond to different stages of the proposed recombination event when perturbed; the idea of annealing is that the proposed event in blue at t = 1 will be perturbed in such way that the the one in red at t = T has a better chance of being accepted. This process can be repeated N times, possibly in parallel, noting that perturbations could be drastic (as in n = N in the figure) or almost nonexistent (as in n = 3); it all depends on the quality of the the proposed event at t = 1 and the perturbation mechanism. Only one event from those at t = T is selected for the final accept-reject step in the MCMC, but events leading to a higher value in the posterior will have a better chance of being selected and possibly added later on, as discussed previously.
Algorithm 1 describes one step of the full process which we have termed Reversible-multiple-Jump MCMC (RmJMCMC) in line withB Andrieu et al. (2018), where it was firstly introduced and studied, and belonging to the wider class of MH with Asymmetric Acceptance Ratio (MHAAR) algorithms. Here we have adapted RmJMCMC to the ClonalOrigin context. We refer the reader to the appendix, where we describe the UAM and DAM algorithms used in the RmJMCMC. The implementation in C++ of the RmJMCMC algorithm is freely available at: https://github.com/fmedina7/ClonOr_cpp Notice that the upwards move agrees with the description in the previous paragraph; however, the downwards move appears to be more intricate. In words, we propose to delete a recombination event but in order to decide whether to accept or not we must test if this deletion is convenient. This is done by generating N − 1 recombination events and computing the resulting acceptance ratios as if we were trying an upwards move. Low values for the aforementioned ratios would imply that the proposed deletion is possibly a good decision, whereas if many ratios result in high values it might indicate that the deletion is a poor choice. The final decision rule results in a valid algorithm as shown in Andrieu et al. (2018), otherwise the algorithm would not be exact in the sense that it does not target the desired posterior distribution. Non-exact or noisy methods have been explored in the past (see e.g. Alquier et al., 2016), however we do not discuss them any further as the bias that is introduced is typically difficult to quantify.
One further observation is the obvious increased computational cost of RmJMCMC as opposed to performing only RJMCMC (equivalent to RmJMCMC when T = N = 1) or running multiple RJMCMC chains in parallel. Every iteration of RmJMCMC is at least T N times more expensive than one RJMCMC iteration. However, as opposed to running RJMCMC for longer or multiple independent RJMCMC chains, RmJMCMC has provable better convergence properties (Andrieu et al., 2018) that can reduce burn-in times and improve convergence towards a region of high posterior probability. This is discussed in more

Flavours of RmJMCMC
In order to implement Algorithm 1, we must specify the sampling auxiliary distributions − → ϕ (· | R) and ← − ϕ (· | R) (which could both depend on other parameters, e.g. ρ, δ, θ, T ). In the following section we present results using simple choices, the joint prior on (a, b, x, y) for the distribution − → ϕ and a discrete uniform on the set {1, . . . , R} for ← − ϕ assuming there are R active recombination events. In mathematical terms, the associated densities are . . , R}. The previous choices greatly simplify the computation of the output ratios in Algorithms 2 and 3, which involve mainly ratios of likelihood functions.
• With probability 1/2 try an upwards move: 2. For each n ∈ {1, . . . , N } perform an UAM as in Algorithm 2 using as input R, E

Computer
Otherwise return R.

Delete the recombination event with index
3. Perform a DAM as in Algorithm 3 using as input R , J using the common distribution − → ϕ (· | R).
5. For each n ∈ {2, . . . , N } perform an UAM as in Algorithm 2 using as input R, E and discarding the other output variables.
Additionally, we must define the way to perturb recombination events at every small step in the annealing process. This is done using an MCMC algorithm as explained in Algorithms 2 and 3; hence the perturbation is fully defined once we select a proposal distribution for R fixing the value of R, i.e. we need to perturb at least one recombination event within the existing R events. For the examples in the following section we choose to perturb (using the prior as proposal) only the newly created recombination event when going upwards, or the recombination event to be deleted when going downwards, for both cases this event is denoted by E * in Algorithms 2 and 3. Doing this provides a very simple expression for the acceptance ratio in the MCMC steps within the annealing that involves only ratios of likelihood functions.
Finally, we must decide the number of annealing steps T , the sequence of real numbers {γ t } T t=0 and the value of replicates N .
In the examples that follow we use different values for T and N which involve different costs and running times. As mentioned earlier, the computational cost increased as T and N increases, however an appealing property of RmJMCMC is that loops involving N (either upwards or downwards) can be performed in parallel, this may lead to higher efficiency when computing running times. Due to this, the value of N is commonly determined by the number of cores available in the computer or server. Respecting the required sequence {γ t } T t=0 we simply choose a linear interpolation γ t = t/T . We want to emphasise that the choice of − → ϕ and ← − ϕ were made in accordance to the original ClonalOrigin implementation from Didelot et al. (2010); whereas the choices for the proposals within the MCMC and the elements of the sequence {γ t } T t=0 were made for convenience. However, the way RmJMCMC was formulated permits the use of more complex approaches that could provide better results in terms of efficiency. Some of these improvements are briefly described in the Discussion and are devoted to future work.

Results
This section compares the RmJMCMC algorithm and the standard RJMCMC for the ClonalOrigin model. We start with some toy simulated experiments, moving later on to results using real, and previously studied, datasets. The code used for generating the forthcoming results is available at: https://github.com/fmedina7/ClonOr_cpp

Application to simulated data
We present several simulated examples in order to explore the scalability and usability of RmJMCMC. We compare the settings when N = T = 1 (corresponding to the RJMCMC algorithm), when T = 4 and N = 4, when T = 2 and N = 8, and when T = 1 and N = 16. The efficiency between these configurations is compared using an estimation of the effective sample size (ESS) for a quantity of interest, as is commonly done in MCMC (Robert and Casella, 2004). In words, the ESS indicates how many independent samples were obtained from the total number of iterations for which the algorithm was run, and if the chain exhibits large correlation then the ESS will typically be much smaller than the number of iterations.
Throughout this section we consider a "typical setting" where data was simulated using a mean tract length of δ = 236, a mutation rate per site θ s = 0.03, and a recombination rate per site ρ s = 0.005. These three parameters remain fixed, and consequently the inference is carried only for the number of recombination events and their locations on the genome and on the tree, i.e. the target distribution is given by (2).
Example 1. Typical setting: 50 sequences, length 50 kilobase pairs (Kb), mean tract length δ = 236, global mutation rate θ = 1500 (θ s = 0.03), global recombination rate ρ = 250 (ρ s = 0.005). Figure 2 contains traceplots for the number of recombination events (top row) and for the value of the log-likelihood (middle and bottom rows). Observe that the schemes for which N > 1 have a similar performance when comparing against iteration number (plots on the left); this is not surprising since the annealing moves (those involving T ) are perturbed using an MCMC algorithm with a proposal equal to the prior, and also the multiple instances (those involving N ) are generated using the prior. However, the computational burden is different in each setting since for a fixed value of N T the algorithm could be parallelised using N cores but still requires T − 1 serial iterations of the annealing process. Hence, for this case, the setting where T = 1 and N = 16 should perform best when taking into account the running time. The plots on the right incorporate this information and notice that all the schemes still seem to perform better than the standard RJMCMC chain (in blue) as they reach a region where the likelihood is high in a shorter amount of time, this can also be seen more clearly in the plots from the   bottom row. It is worth noting that despite the RJMCMC chain being closer to the true number of recombination events, it is not a reliable indicator of whether the chain has converged since the value of the log-likelihood is still in a transient phase.  Figure 5 also shows recombination frequencies, but between edges in the tree. Coloured squares indicate pair of edges on the tree joined by several recombination events, depending on the scale. In this case, the differences between the schemes and against the ground truth are negligible.
This example is similar to the previous one, except for the length of the sequence which is now 10Kb. The global mutation and recombination rates have been scaled appropriately leaving the per-site rates the same as before. Figure 6 contains some plots of interest, we observe once more an improvement of the ESS and faster convergence of the log-likelihood without considering running times. However, when time is taken into account (right plots) running RmJMCMC seems to be worth it only when T = 1 and N = 16. Having shorter sequences makes the inference problem less expensive, so the schemes where T > 1 do not seem to add any benefit. Here we compare how the ESS is affected as the length of the sequences becomes larger and as the number of sequences increases. Figure 7 presents the results for lengths of 10Kb, 20Kb and 50Kb. From the plots in the bottom row, the inverse of the ESS appears to be linear as the length of the sequence increases, more specifically, doubling the sequence length halves the ESS. Interestingly, the reduction of ESS for the RJMCMC chain (in blue) when taking into account the running time seems more drastic than the other schemes.
In contrast, increasing the sequence length seems to have a logarithmic effect on the inverse of the ESS as seen in Figure 8; however, when considering running times, the effect becomes linear. For this case, the method with less reduction of ESS is the RJMCMC chain. Thus, RmJMCMC seems to outperform RJMCMC when having larger sequences rather than when having lots of short sequences.

Application to Escherichia coli data
Here we present results for a dataset of 27 sequences from Escherichia coli on two regions with different recombination intensity; this dataset has been previously studied using ClonalOrigin (Didelot et al., 2012). Similarly as in the simulated examples, we have fixed the clonal genealogy T and also some of the global parameters. The specific details on how these fixed values were obtained can be found in Didelot et al. (2012).
Example 4. E. coli data on region of length 27.1Kb. The fixed parameters considered here are the clonal genealogy T and the mutation rate per site θ s = 0.0125. Therefore, the inference is performed on the set of recombination events R and the global parameters ρ and δ, i.e. the target distribution arises from (1) by fixing T and θ accordingly.  Figure 9 presents some preliminary results on a region with a moderate recombination frequency of ρ s near 0.01. Observe that despite increasing N does not seem to improve the value of the ESS adjusted for running time, the log-likelihood trace plot appears to indicate there is some advantage of taking N > 1 during the burn-in phase. Interestingly, the trace plots in the bottom row show that when N > 1 the chain appears to favour large values for ρ during the burn-in stage, whereas for δ might take different routes towards the apparent high-posterior region. Clearly, further simulations are needed to rule out whether RmJMCMC offers a clear advantage. Example 5. E. coli data on region of length 3.4Kb. Here we consider a shorter region but for which we know the recombination frequency is higher. The fixed parameters considered once again are the clonal genealogy T and the mutation rate per site θ s = 0.0125, we have also fixed the tract length δ = 542 due to the high levels of recombination. The inference is thus performed on the set of recombination events R and the global parameter ρ, i.e. the target distribution arises from (1) by fixing T , θ and δ accordingly.
Observe from Figure 10 that this is a more challenging region for the E. coli data. In particular, observe that apparently the setting N = 12 does not provide an advantage in terms of reaching a high likelihood region in a shorter time; however, when looking at the trace plots for the number of recombination events and for ρ it does appear to have an improvement. In this case ESS plots have not been presented since clearly the chain has not reached stationarity, even after nearly 21 days of running the algorithms. Recall that accurate ESS values are based on the fact that the chains have converged. Finally, observe that the recombination rate in this region is much higher than the previous region, with ρ s reaching values around 0.17. Due to the difficulty of this example, further simulations are needed in order to have a clearer picture of the benefits, if any, from running a RmJMCMC algorithm.

Discussion
The RmJMCMC algorithm has the potential of speeding up the convergence in the ClonalOrigin model towards a region of high likelihood, hence reducing the number of iterations and running time when compared to a standard RJMCMC. Here the RmJMCMC algorithm has been presented in the context of the ClonalOrigin model, but in principle it could be implemented to other models in population genetics where the dimension of the posterior is not fixed.
The version used in the examples presented here is possibly the simplest one could implement, namely using the prior distribution for proposing and perturbing recombination events. However, other alternatives are possible that may lead to better results, for example splitting or merging existing recombination events, or modifying the prior accordingly if some regions in the tree or in the genome are more likely to recombine. Similarly, different perturbations in the annealing step may be considered since in the presented examples there is no clear advantage when taking T > 1. Alternatives include random-walk style proposals for some (or all) of the variables involved in the proposed recombination event (a i , b i , x i , y i ). In this respect, within the annealing process, one should also consider perturbing all of the existing recombination events and not just the recently proposed one.
Finally, it is worth pointing out that the number of cores used in the presented examples was not particularly large; nonetheless we were able to observe significant improvements in some cases. It would be interesting to perform a much larger study where several hundred cores are at hand.

Supplementary Material Details on priors
We use a coalescent tree to represent the clonal genealogy of n samples and denote a tree by T = (τ, t), which is composed of a topology τ and a vector t = (t 2 , . . . , t n ) of branch lengths. A Kingman's coalescent prior (Kingman, 1982) is assumed for T , meaning The number of recombination events R is assumed, a priori, to be a Poisson random variable with mean ρL/2, where ρ denotes the global recombination rate and L = n i=2 it i is the total branch length of the tree T . Hence Each recombination event is formed by the following variables: departure and arrival points on the genealogy, and start and end sites on the genome. These four variables, when referring to the i-th recombination edge, are denoted by a i , b i , x i and y i , respectively. Both variables a i and b i are fully determined by a time and lineage on the tree (denoted respectively by a i [t] and a i [l], respectively), and the chosen priors are those from the construction of an ARG (Wiuf and Hein, 1999). These are a uniform distribution on the tree for b i , and from this arrival point the edge reconnects at a i on the tree at a rate proportional to the number of active lineages. We then have and, assuming that at time b i [t] there are m ∈ {2, . . . , n} active lineages and letting s k = where L (s, t) denotes the sum of branch lengths from time s to time t on the tree. If u ∈ (s k , s k−1 ) for any k ∈ {2, . . . , m}, considering s 1 = ∞, Once a i [t] has been found, the lineage a i [t] on which the recombination event departs is chosen uniformly across the active lineages. Hence The previous two probabilities lead to The priors for x i and y i are constructed assuming a uniform distribution on the sequence for x i and a geometric distribution of mean δ > 0 for the difference y i − x i | x i . When the sequence is made of B blocks comprising a total length of L = B b=1 L b , the priors need to be modified accordingly as in (Didelot et al., 2010). Let s b i denote the i-th site in block the b-th block, where i ∈ {1, . . . , L b } and s b L b + 1 = s b+1 1 , then if u is not the first site of a block P [x i = u | δ] = κ (a constant); whereas if u is at the beginning of a block (say u = s b 1 ) we can use the uncensored variablesx i andȳ i (which denote the true sites where the recombination begins and ends) to derive where In order to find the density of y i we use the fact thatȳ i −x i |x i ∼ Geom (δ) on the positive integers. We then have for v ≥ u and u ∈ s b 2 , . . . , s b Hence, The previous equations fully define the prior to be used for the set of recombination events, denoted by R = (R, a 1:R , b 1:R , x 1:R , y 1:R ), i.e.
p 0 (R |ρ, δ, T ) = R!p 0 (R | ρ, T ) p 0 (a 1:R , b 1:R , x 1:R , y 1:R |R, ρ, δ, T ) , where the factorial comes from ordering the recombination events (e.g. ordering according to one of the continuous variables x or y). Assuming recombination events occur independently given the genealogy T and introducing priors for ρ and δ, we obtain

Likelihood computation
Recall that the vector D = D 1:n corresponds to the n observed DNA sequences each composed of L sites, we denote the i-th site in the k-th sequence by D (i) k . For computing the likelihood function at the i-th site, the local tree T (i) = τ (i) , t (i) is constructed as a function of the clonal genealogy T and the set of recombination events R. From this local tree with n leaves, let z if k ≤ n; and p z,w (t) denotes the transition probability from the basis z to the basis w in t units of time, and π corresponds to the limiting probability of such transitions.
For the JC69 model (Jukes and Cantor, 1969), we have π(z) = 0.25 for each z ∈ {A, C, G, T } and Hence, assuming independence across the L different sites in all the sequences we obtain RJMCMC as an importance estimator within MCMC To keep notation simple we remove the variables ρ, δ, θ, T and the data D from the conditional distributions, i.e. we write π (R) ≡ π (R | ρ, δ, θ, T , D). In order to jump up in dimension, i.e. to go from R recombination events to R + 1, we must create four variables a R , b R , x R , y R . This is done using an auxiliary distribution, for example the prior for such variables, which we denote by ϕ R→R+1 and that in principle could also depend on the data or other parameters. Similarly, in order to perform a jump down in dimension we need a way to merge or remove recombination events, this is done using the distribution ϕ R+1→R . Letting ξ 1:R = {a 1:R , b 1:R , x 1:R , y 1:R }, the auxiliary ϕ s allow us to transform ξ 1:R into ξ 1:R+1 and vice-versa, where ξ 1:R is not necessarily the same as ξ 1:R . For simplicity, suppose that the resulting one-to-one transformation (which we denote by T R→R+1 ) has a Jacobian with determinant equal to 1, then π (R + 1) π (R) = π dξ 1:R+1 , R + 1 ϕ R+1→R (du ) π (R) = π ξ 1:R+1 , R + 1 ϕ R+1→R (u ) π (R) π (ξ 1:R | R) ϕ R→R+1 (u) × π (dξ 1:R | R) ϕ R→R+1 (du) = E π ξ 1:R+1 , R + 1 ϕ R+1→R (u ) π (ξ 1:R , R) ϕ R→R+1 (u) , the above expectation is taken on the variables ξ 1:R ∼ π (dξ 1:R | R − 1), u ∼ ϕ R→R+1 , and where ξ 1:R+1 , u = T R→R+1 (ξ 1:R , u). The ratio inside the previous expectation is exactly the ratio used in the RJMCMC algorithm, which serves as an estimate for the ratio of the "ideal" unfeasible scenario in which we could explore the posterior of R without the need of ξ 1:R .

Annealing Moves
The full algorithm requires the introduction of the upwards and downwards annealed moves. For this, we require the introduction of auxiliary distributions, the first − → ϕ (· | R) is for generating a new recombination event, i.e. for moving upwards; whereas the second ← − ϕ (· | R) is for selecting an index when deleting a recombination event. Moreover, these two distributions might also depend on ρ, δ, θ, T and/or D. Simple choices are the prior of a single recombination event (a, b, x, y) for − → ϕ (described above), and a uniform distribution on {1, . . . , R} for ← − ϕ assuming there are R active recombination events. We now present the upwards and downwards moves.
Despite the previous algorithms being very similar, they have important differences. For instance, the upwards move requires as input a newly created recombination event, whereas for the downwards this requirement is replaced with the need for an index. Also, even though the sequence {γ t } T t=0 is required in both cases, the construction of η t is different depending on the type of move. We want to stress that this construction is essential for obtaining a valid algorithm, otherwise the procedure could target something different to the desired posterior. A simple choice for the sequence if γ t = t/T , which implies γ T −t = 1 − γ t .
Finally, we emphasise that in both algorithms, the variables (R, E * ) can be obtained from R , J * and viceversa; this means that ratios of the form η t+1 /η t can be solely expressed in terms of (R, E * ) or R , J * . Similarly, when perturbing (R, E * ) using a MCMC process we are implicitly perturbing the corresponding R , J * . In addition to this, the outputs are not that different since one set of variables can be obtained from the other. In both algorithms we chose to work with (R, E * ) and only compute the corresponding pair R , J * when needed.