## Abstract

We give recursions for the expected site-frequency spectrum associated with Xi-coalescents, that is exchangeable coalescents which admit *simultaneous multiple mergers* of ancestral lineages. Xi-coalescents arise, for example, in association with population models of skewed offspring distributions with diploidv, recurrent advantageous mutations, or strong bottlenecks. In contrast, Lambda-coalescents admit multiple mergers of lineages, but at most one such merger each time. Xi-coalescents, as well as Lambda-coalescents, can predict an excess of singletons, compared to the Kingman coalescent. We compare estimates of coalescent parameters when Xi-coalescent models are applied to data obtained from Lambda-coalescents, and vice versa. In general, Xi-coalescents predict fewer singletons than corresponding Lambda-coalescents, but higher count of mutations of ‘size’ larger than singletons. We analyse unfolded site-frequency spectra obtained for nuclear loci of the diploid Atlantic cod, and obtain different coalescent parameter estimates than previously obtained with Lambda-coalescents. Our results provide new inference tools, and suggest that for nuclear population genetic data from diploid or polyploid highly fecund populations who may have skewed offspring distributions, one should not apply Lambda-coalescents, but Xi-coalescents.

## Introduction

The coalescent approach, ie. the idea of considering the (random) ancestral relations of alleles sampled from natual populations, has provided rich mathematical theory [cf. 4], and very useful inference tools (cf. eg. [14, 39] for reviews). Initiated by the Kingman coalescent [24, 26, 25], coalescent models now include the family of Lambda-(Λ-)coalescents [31, 32, 13], and Xi-(Ξ-)coalescents [35, 29, 33]. Ξ-coalescents admit *simultaneous multiple mergers* of ancestral lineages. Thus, in each merger event, distinct groups of ancestral lineages can merge at the same time, and each group can have more than two lineages. Λ-coalescents, in contrast, only allow one group - possibly containing more than two lineages - to merge each time. Thus, due to multiple mergers, the derivation and application of inference tools becomes harder as one moves from the Kingman coalescent to Lambda-coalescent models, and from Λ-coaleseents to Ξ-coalescents, Ξ-coalescents can be obtained from diploid population models [6, 30, 11]. They also arise in models of repeated strong bottlenecks [8], and in models of selective sweeps [15, 16].

[20] obtained closed-form expressions for the expected site-frequency spectrum, as well as (co)-varianees, when associated with the Kingman coalescent. [7] obtain recursions for expected values and (co)-variances when associated with Λ-coalescents. However, the complexity of the recursions means that (co)-variances. when associated with Λ-coalescents, can only be computed for small sample sizes. The expected values can be applied in distance statistics [21], as well as in an approximate likelihood approach [7, 18].

Multiple merger coalescents can be obtained from population models that admit large offspring numbers and skewed offspring distributions, characteristics associated with many marine populations [1, 19, 5, 9, 10, 22, 23, 34]. Indeed, [2] find much better fit (the sum of the squared distance between observed and expected values) between data on nuclear genes from Atlantic cod and Λ-coalescents than with the Kingman coalescent. Simulation results of [34] suggest that the site-frequency spectrum of (at least some) Ξ-coalescents is multimodal, a pattern observed in data on the nuclear *Ckma* gene in Atlantic cod [2]. Based on this evidence, a way to compute expected values of the site-frequency spectrum associated with Ξ-coalescents 1 should be a welcome and important addition to the set of inference tools for population genetics.

In this work, we obtain recursions for the expected site-frequency spectrum associated with general Ξ-coalescents, with an approach similar to the one applied by [7]. We compare estimates of coalescent parameters when applied to simulated data obtained under Λ-coalescents, and vice-versa. Since the recursions for the expected values are already fairly complex, and computationally intensive, we expect recursions for the (co)-variances to be even more so. The (co)-variances will therefore not be addressed.

We estimate coalescent parameters associated with 4-fold Xi-coalescents for the unfolded site-frequency spectrum of 3 nuclear loci [2] of the highly fecund Atlantic cod. Our simple method involves minimising the distance between observed and expected values, where the distance is not calibrated by the corresponding variance. Hence, it is not a formal test. Our estimates differ from previous estimates obtained with the use of Lambda-coalescents. The main biological implication of our results are that Xi-coalescents should be applied to nuclear data from highly fecund diploid (or polyploid) populations, and Lambda-coalescents to haploid data such as mitochondrial DNA.

The paper is structured as follows: First we give a precise mathematical description of the various coalescent models. We then state our main result on the expected site frequency spectrum of Ξ-coalescents, Theorem 2. This is followed by a discussion of some specific examples of Ξ-coalescents. Some numerical examples which illustrate the difference in the site-frequency spectrum between Lambda- and Xi-coalescents are then presented, followed by an application to nuclear Atlantic cod data. The proofs are collected in an appendix.

## Theory

### Coalescent models

We briefly review the basic coalescent models, namely the Kingman-, Λ-, and Ξ-coalescents. They all have in common to be continuous-time Markov chains, taking values in the space of partitions of the natural numbers ℕ := {1, 2,…,}, whose restriction to the first *n* integers can be described as follows: Let *𝒫*_{n} denote the space of partitions of [*n*] := {1,…, *n*}. We write *π* for a generic element of *𝒫*_{n}, and #*π* for the *size* of *π*, i.e. for the number of blocks *π*_{i} ∈ *π*. Thus for *π* ∈ *𝒫*_{n} we have *π* = *{π*_{1},*…,π*_{#π}*}* with #*π* ≤ *n*, If * _{π}*,

*′ ∈*

_{π}*𝒫*

_{n}with #

*π*=

*m,*we write

*π*′ ≺

*π*if there exist

*i,j*∈

*[m]*with

*π*′ = {

*π*

_{ℓ}:

*ℓ*∈ [

*m*],

*ℓ*∉ {

*i*,

*j*}} ∪ {

*π*

_{i}∪ {

*π*

_{j}}, ie,

*π*′ is obtained from

*π*by merging blocks

*π*

_{i}and

*π*

_{j}If the transition rates of the continuous-time Markov chain with values in

*𝒫*

_{n}, and starting from state {{1},…, {

*n*}} at time

*t*= 0, are given by we refer to as the Kingman-

*n*-coalescent. The process is stopped at time inf{

*t*> 0 : = {{1,…,

*n*}}}, ie. when the most recent common ancestor of the

*n*lineages has been reached.

If *π*, *π*′ ∈ *P*_{n} with #*π* = *m* and there exist indices *i*_{1},…, *i*_{k} ∈ *[m]* with *π*′ = {*π*_{l}: *l* ∈ [*m*], *ℓ* ∉ {*i*_{1}, …, *i*_{k}}} ∪ {*π*_{i1} ∪ … ∪ *π*_{ik} we write *π*′ ≺* _{m},_{k} π* and that a

*k*-merger has occurred, with 2 ≤

*k*≤

*m.*For a finite measure Λ on [0,1], define if the integral in (3) exists, and otherwise, A

*𝒫*-valued continuous-time Markov chain with transition rates

_{n}*q*

_{π,π′}from

*π*to

*π*′ given by is referred to as a Λ-

*n*-coalescent. The waiting time in state

*π*is exponential with rate

*λ*

_{m}as in (3, 4).

Now we specify the transition rates for a Ξ-n-coalescent. We write if *#*, #′ ∈ *𝒫*_{n} with #*π* = *m* and there exist groups of indices , *j* = 1,…, *r*, such that
by which we denote a transition where blocks with indices merge into a single block, for *j* ∈ [*r*]. Thus, a transition denoted by *π* is a *simultaneous multiple merger*, where *k*_{j} ≥ 2 blocks merge into a single block, and such mergers occur simultaneously. The vector specifies the merger sizes, and we write .

Let Δ denote the infinite simplex
Let *x* ∈ Δ, *m* Δ ℕ, with *k*_{i} ≥ 2 and the vector of the *r* merger sizes, and the number of blocks unaffected by the given merger.

Define the functions and , with *x* ∈ Δ_{0} := Δ\{(0, 0,…)} = Δ \ {0},
Let Ξ_{0} denote a finite measure on Δ_{0}, and write Ξ := Ξ_{0} + *aδ*_{{0}}, Further, let denote the number of ways of arranging m items into r non-empty groups whose sizes are given by . With *l*_{j} denoting the number of *k*_{1},…, *k*_{r} equal to *j*, one checks that, with [35],
Now define [35]
if the integral in (9) exists, and
otherwise, A continuous-time *P*_{n}-valued Markov chain with transitions *q*_{π},_{π′} given by [35]
with and *g*(*x*,*m*) given by (6), is referred to as a Ξ-n-coaleseent, and denoted by The waiting time in state *π* is exponential with rate λ_{m} as in (9, 10).

### The site-frequency spectrum

The site-frequency spectrum is a simple summary statistic of the full DNA sequence data, but contains valuable information about variation among individuals. We assume the infinitely-many sites mutation model [40], in which mutations occur as independent Poisson processes on the branches of a given gene genealogy with rate *θ*/2 for some constant *θ* > 0, and no two mutations occur at the same site. The constant *θ* is determined by the ratio *μ*/*c*_{N}, where *μ* is the per-generation mutation rate, and *c*_{N} is the probability of two distinct individuals (gene copies) sharing a common ancestor in the previous generation. We refer to [18] for a discussion of the relation between mutation and timescales of different coalescent processes.

Given sample size *m*, we let denote the number of polymorphic sites at which one variant (the derived mutation) is observed in *i* copies. The collection
is known as the (unfolded) *site-frequency spectrum.* If information about ancestral states are unavailable, so that one does not know which variant is new, one considers the *folded* spectrum in which
[20] obtains closed-form solutions for expected values and (co)-variances of the site-frequency spectrum associated with the Kingman coalescent Π^{(k)} Indeed [20],
Let denote the random total length of branches subtending *i* ∈ [*m* − 1] leaves. Result (14) follows from [20]
and the infinitely-many sites mutation model.

Due to the multiple merger property of Λ- and Ξ-coalescents, closed-form expressions for are quite hard to obtain. A key quantity in computing for multiple merger coalescents is
which can be described as the probability that starting from *m* blocks, conditioned that there are at some point in time exactly *k* blocks, one of them, sampled uniformly at random, subtends *i* ∈ [*m* − *k* + 1] leaves. See Figure 1 for an illustration.

With *g*(*m*, *k*) we denote the expected length of time during which we see *k* ∈ {2,…, *m*} blocks, given that we started from *m* ≥ 2 blocks. Given *p*^{(m)}[*k*, *i*] and *g*(*m*, *k*), can be computed as follows:
where moreover *g*(*m*, *k*) can be computed recursively. This was shown in [7] for Λ-coalescents but in fact holds for Ξ-coalescents as well. Hence, it suffices to obtain a recursion for *p*^{(m)} [*k*, *i*], For the Λ-case, [7] obtain the recursion
in which *p*_{m,n} is the probability of the block-counting process jumping from *m* to *n* blocks.

Before we turn to the expected site-frequency spectrum associated with Ξ-coalescents, we give the recursion to compute *g*(*m*, *k*).

**Lemma 1**. *Let p _{m},_{k} denote the probability of the block-counting process associated with a p_{m}-valued* Ξ

*-m-coalescent with transition rates*(11)

*jumping from m*≥ 2

*to k*∈ [

*m*− 1]

*blocks. For any*≥1,

*m*> n*we have*

*with the boundary condition*

*for any m*≥ 2,

*where*λ

_{m}=

*−q*

_{π, π}

*with*#

*π*=

*m, see (9, 10, 11)*.

A proof of Lemma 1 is given in the Appendix.

### The expected site-frequency spectrum associated with Ξ-coalescents

Since formula (16) holds for any exchangeable coalescent, it suffices to obtain a recursion for *p*^{(n)}[*k*, *i*] when associated with Ξ-coalescents. The recursion in Thm. (2) below for the quantity *p*^{(m)}[*k*, *i*] is a key ingredient needed to compute , Before we state the result, we review our notation for partitions of positive integers. We need to consider partitions of integers, since the current active number of blocks can, in one transition, change from *m* to *n* in any number of ways in a Ξ-coalescent, if *m* and *n* are anything but small.

A *partition* of *n* ∈ ℕ is a non-increasing sequence of positive integers whose sum is *n*. By *ñ* we denote the set of all partitions of *n*, we denote by *v* a generic element of *ñ*. By way of example,
If *v* ∈ *ñ*, we write |*v*| = *n*. The *size* of a partition *v* is defined as the length of the sequence, and is denoted by #*v*. If |*v*| = *n* with size #*v* = *k* ∈ [*n*], we write . Thus, if and only if *v* = (*n*); if and only if .

Another way of representing an integer partition *v* is by specifying how often each positive integer *i* ∈ ℕ appears in *v* (see [12] for details). Thus, we will also denote *v* by (*α*_{1},*α*_{2},…), where *α*_{i} denotes the number of times integer *i* appears in the given partition *v*, A partition *μ* = 〈 *β*_{1},…〉 is a *sub-partition* of *v* = 〈 *α*_{1},… 〉, denoted *μ* ⊂ *v*, if and only if *β*_{i} ≤ *α*_{i} for all *i*. For a set partition *π* ∈ *𝒫*_{n}, we define the *integer partition associated with π*, denoted , as the partition of *n* obtained by listing the block sizes of *π* in decreasing order. More detailed discussion of partitions of integers can be found eg. in [38].

The role of integer partitions in association with Ξ-coalescents should now be clear. We can enumerate all the possible ways the block counting process can jump from *m* to *n* active blocks by specifying the partitions of *m* (see eg. [12] for details). The elements of the sequence specify the merger sizes, with the obvious exclusion of mergers of size 1. By way of example, integer partition (3, 2, 1) ∈ specifies a simultaneous merger of 3 blocks and 2 blocks, and one block remains unchanged, when we have 6 active blocks. By (11), any such transition happens at rate λ_{6,(3,2)}.

More generally, given integer partition with *v* = 〈 *α*_{1}, *α*_{2},…〉, put r := *n* − *α*_{1} for the number of elements of the sequence that are larger than 1, so that
Then *v*_{1} ≥ *v*_{2} ≥ … ≥ *v*_{r} ≥ *2,* and defining := (*v*_{1}, *v*_{2},…, *v*_{r}), the corresponding transitions in which specifies the sizes of the *r* mergers involved happen at rate , see (11). Moreover, the probability of such a transition is given by [12, Lemma 2.2.2]
where the factor
denotes the number of different ways of merging *m* blocks in *r* groups specified by (see also (7)), and , λ_{m} are given by (8, 9, 10).

Now we state our main theorem, which contains the recursion for *p*^{(m)}[*k*, *i*] needed to compute . The theorem holds for all Xi-*m*-coalescents whose block-counting process visits every possible state with positive probability, which is true of all examples of Xi-coalescents that we consider. The assumption is not very restrictive, since it excludes only pathological cases like the star-shaped coalescent.

**Theorem 2.** *[12] Let* *be a* Ξ*-m-coalescent with transition rates* (11) *such that the corresponding block-counting process hits every k* ∈ [*m*] *with positive probability. Then, for 2* ≤ *k* ≤ *m and 1* ≤ *i* ≤ *m — k + 1, we have*
*with the boundary cases p ^{(m)}*[

*m*,

*i*] = 𝟙

_{(i = 1)}.

A proof is provided in the Appendix.

### Specific examples of Xi-coalescents

Out of the rich class of Xi-coalescents, several special cases have been identified either for biological / modeling relevance or mathematical traetability. The example most relevant for us is concerned with diploidy.

Haploid population models are probably the most common models in mathematical population genetics. Diploidy, and other forms of polyploidy, are, however, widely found in nature, Atlantic cod is diploid, and oysters show both tetraploidy and triploidy [cf. eg. 28], In polyploid models which admit skewed offspring distribution, one should observe up to *M* ≥ 2 simultaneous mergers, where *M* is some fixed number which reflects the level of polyploidy.

Indeed, a mathematical description of a Xi-coalescent which admits up to *M* simultaneous mergers is as follows. Take a finite measure A on [0, 1] (which would normally describe a Λ-coalescent). For convenience, let be the corresponding normalized probability measure. Then, with *M* ≥ 2, define the measure Ξ on the simplex Δ by
The interpetation is this: If the normalized Lambda-measure *F* produces a multiple merger event, in which individual active ancestral lineages (blocks of the current partition) take part with probability *x* ∈ (0, 1], then the participating lineages are randomly grouped into *M* simultaneous mergers (each with probability ). Observe that if *F*(*dx*) = *δ*_{0}(*x*)*dx*, then (22) becomes , which corresponds to a Kingman-coalescent with time scaled by a factor

For a given merger of the Ξ-coalescent into *r* ∈ [*M*] groups of sizes given by = (*k*_{1},…,*k*_{r}), with *k*_{1} ≥ … ≥ *k*_{r} ≥ 2 and := *k*_{1} + …+ *k*_{r}, when *n* active ancestral lineages are present, with lineages unaffected by the given merger, the transition rates are given by
The rates (23) depend on the choice of the probability measure *F* determined by the underlying population model. A proof of (23) is in the Appendix.

Xi-coalescents which admit at most *M* = 4 simultaneous mergers arise from diploid Cannings population models with skewed offspring distribution as shown in [30, 6], and are thus relevant for population genetics. In fact, if a haploid model (for example for mitochondrial DNA) is governed by a Λ-coalescent, the corresponding diploid model (concerning the core genome) might naturally lead to Xi-coalescents. Indeed, [6] derive a 4-fold Xi-coalescent from a diploid model, in which exactly one pair of diploid parents contribute diploid offspring in each reproduction event. Hence, since 4 parental chromosomes are involved in each event, one can observe up to 4 simultaneous mergers. This was also observed by [30] in association with a diploid population model, but under a more general reproduction law than considered by [6]. Xi-coalescents are also classified by [11] for a very general diploid exchangeable Cannings model in which arbitrary pairs of diploid parents contribute offspring in each reproduction event. This generalises the model by [30], in which each individual forms at most one parental pair in each generation. For a detailed classification of coalescent limits, see [30, 33, 29, 11].

In truly diploid models, as considered by [30, 6], selfing is excluded, which leads to a ‘separation of timescales’ phenomenon in the ancestral process, in which blocks which reside in the same diploid individual instantaneously ‘disperse’; thus the configuration of blocks in diploid individuals becomes irrelevant in the ancestral process (see Cor. 4.3 in [30]).

A natural candidate for *F* may be the beta distribution with parameters *ϑ* > 0 and *γ* > 0 (cf. e.g. [6]), with density
In this case, the rate in (23) takes the form
A different choice is based on a model of Eldon and Wakeley [19], where
In this case, the rates reduce to
In (25) the parameter *Ψ* has a clear biological interpretation as the fraction of the diploid population replaced by the offspring of the reproducing parental pair in one generation. The interpretation of the parameters in (24) is perhaps less clear.

A multi-loci ancestral recombination graph in which simultaneous mergers are admitted are obtained by [6] in which the framework of [30] is borrowed. There, one can think of the reproduction model as a two-atom Lambda-measure, one atom at zero, and another at some point *Ψ* ∈ (0, 1), If the atom at *ψ* has mass of order at most *N*^{−2}, the limit process admits simultaneous mergers. The order *N*^{−2} represents the order of the expected time which two gene copies need to coalesce when only 1 diploid offspring is produced in each reproduction event. In [6], complete dispersion of chromosomes also occurs, and the configuration of blocks among diploid individuals becomes irrelevant in the limit process.

Xi-coalescents can also be obtained from a population model where the population size varies substantially due to recurrent bottlenecks. This has been introduced and discussed in [8], who obtain a randomly time-changed Kingman coalescent, which thus yields a Xi-coalescent.

Durrett and Schweinsberg [15, 16] show that a Xi-coalescent gives a good approximation [16, cf. Prop. 3.1] to the genealogy of a locus subject to recurrent beneficial mutations.

These examples suggest that Xi-coalescents form an important class of mathematical objects with which to study genetic diversity.

### Numerical results

The result (21) in Thm. 2 holds for arbitrary number (up to ⌊*m*/2⌋) of simultaneous mergers. However, it is quite a challenge to compute *p*^{(m)}[*k*, *i*] for large *m* due to the number of terms in the recursion (21), In our numerical examples, we restrict to Ξ-coalescents which admit simultaneous mergers in up to 4 groups. The number 4 represents the number of parental chromosomes involved in a large reproduction event, ie. when the number of diploid offspring of a given pair of diploid parents constitute a significant part of the total population. Such Ξ-coalescents can be shown to arise from *truly* diploid Cannings population models [30, 6], and are thus highly relevant for population genetics.

The recursion for *p*^{(m)} [*k*, *i*] simplifies a bit when one restricts to Ξ-coalescents with at most *M* ≥ 2 simultaneous mergers (*M* = 1 simply gives a Lambda-coalescent), as shown in Cor. 3, Before we state Cor. 3, we briefly review *ordered mergers.* Define 𝕄 := {2, 3,…}, and let
denote the set of single and up to *M* simultaneous ordered mergers by which the blockcounting process can jump from *m* ≥ 2 to *n* ∈ [*m* − 1] blocks in *r* ∈ [*m*] mergers. Thus, *𝓂*(*m*, *n*) corresponds to the set of all integer partitions such that, if *v* = 〈*γ*_{1}, *γ*_{2}, … 〉, we have ∑_{j≥2} *γ*_{j} = *n* – *γ*_{1} ∈ [*M*]. Indeed, for 1 ≤ *n* ≤ *m*, we have a bijeetion
with *r*_{v} := max{*j* : *V*_{j} ≥ 2} = *n* − γ_{1}, for *v* = 〈*λ*_{1}, *λ*_{2};〉. Obviously, the inverse bijection is given by
Thus informally, ordered mergers are just integer partitions where all elements equal to one are omitted.

For ease of presentation, we will also write *M*(*m, n*) ∋ *μ* = (*α*) ≡ (*α*_{2}, *α*_{3},…) where *α*_{j} denotes the number of occurrences of merger of size *j* in merger *μ*. By γ = (*β*) ⊂ *μ* = (*α*) we denote a *submerger γ* of *μ* where *β*_{j} ≤ *α*_{j} for all *j, including* the case *β*_{j} = 0. Finally, the *size* of is just the length of the sequence . i.e.*# μ* = *r* for Note that if *μ* is the ordered merger corresponding to some integer partition *v* = *〈 γ*_{1}, *γ*_{2}, … 〉, as above, then #*v* = n = #*μ* + *γ*_{1}.

**Corollary 3.** *Let* be a *P*_{n}-*valued* Ξ-*coalescent with transition rates* (11) *such M* ≥ *2 simultaneous mergers are possible. Then*
*with the boundary cases p*^{(m)} [*m, i*] = 𝟙 _{(i = 1).}

Informally, the first sum in recursion (27) is over the number of blocks the block-counting process can jump to, given that it starts in *m*, and conditioned on it hits *k*. The second sum is over all (up to *M* simultaneous) mergers (*μ*) in which one can jump from *m* to *n* blocks, and the last sum is over all the ways mergers involving the *i* leaves can be nested within each given merger *μ.*

Corollary 3 is simply another way of representing *p*^{(m)}[*k*, *i*] (21) (see Thm. 2) in terms of ordered mergers. The switch in focus to ordered mergers from integer partitions obliviates the need to keep track of all the 1s. Also, since we restrict to Ξcoalescents which admit at most *M* simultaneous mergers, the required mergers can be generated quite efficiently.

The computation of *p*^{(m)} [*k*, *i*] can be checked by noting that for each fixed *m* ≥ 2, and with 2 ≤ *k* ≤ *m*, . One can also compute with a simple recursion as follows. Let denote the expected length of *i* external branches, given *m* ≥ 2 active blocks (*i* of which are singleton blocks). Define, for all *m* ∊ 𝕄,
as the set of all ordered (up to *M* simultaneous) mergers given *m* active blocks, where *ℳ*(*m, n*) was defined in (26).

**Lemma 4.** *Let be a P _{n}-valued* Ξ-

*coalescent with transition rates (11) such that at most M simultaneous mergers are possible. With λ*

_{m}given by (9, 10), let C_{m}be given by (28), and let denote the probability of merger when the number of active blocks is m. Then, with*with the boundary condition*.

#### Expected branch lengths 𝔼^{(m, π)}[*B*_{i}]

Under the infinitely many sites mutation model, the expected site-frequency spectrum is given by where θ > 0 is the appropriately scaled mutation rate. Hence, it suffices to consider in a comparison of the site-frequency spectrum associated with different coalescent models. In Figure 2, we consider the 4-fold Ξ-coalescent, when the measure *F* in (23) is associated with the beta-density, with *α* ∊ [1, 2) [36],
The range of *α* is the interval [1, 2), since for *α* ∊ [1, 2), one obtains a Lambda-Beta coalescent from a supercritical branching population model [36]. We do not have a microscopic diploid population model which explicitly yields a Xi-Beta coalescent. However, the results of [30] indicate that such a process should exist. Also, the Lambda-Beta coalescent is one of the most studied examples of Lambda-coalescents. However, the existence of the Xi-Dirac coalescent was proved by [6]. The expected branch lengths associated with a Lambda-coalescent with *F*-measure (30), as well as the Kingman coalescent, are also shown in Figure 2 for comparison. We consider the normalised expected spectrum
in which 𝔼^{(Π)} [*B*^{(n)}] is the expected total size of the genealogy Define Let denote the (random) total number of segregating sites, and define the normalised spectrum (with if *ξ*^{(n)} = 0), The reasons for our preference for over are the following. The quantity which is an approximation of the expected normalised spectrum is, clearly, *not* a function of the mutation rate θ. The expected normalised spectrum is well approximated by , and is also quite robust to changes in mutation rate, if the mutation rate is not very small [18], Since is a decent approximation of (results not shown), is therefore a good approximation of , One can therefore use to estimate coalescent parameters, for example by minimising sum-of-squares, without the need to jointly estimate the mutation rate.

As Figure 2 shows, the corresponding Lambda-Beta and Xi-Beta coalescents predict different patterns of the site-frequency spectrum, at least for Both processes can predict a significant excess of singletons relative to the Kingman coalescent.

Similar conclusions can be reached from Figure 3, in which the *F* measure is associated with the Dirac-measure *F*(*dx*) = *δ*_{Ψ} (*x*)*dx* for some *Ψ* ∊ [0, 1], The Xi-Dirac-coalescent, for *Ψ* = 0.95 displays a multimodal graph of but the relative ‘height’ of the modes is small (< 1% of the total expected length). As well as an excess of singleton polymorphisms, [2] observe multi-modality, or small ‘bumps’, in the site-frequency spectrum associated with the *Ckma* gene in Atlantic cod.

#### Coalescent parameter estimates

As Figures 2 and 3 indicate, one should be able to distinguish between corresponding Xi- and Lambda-coalescents from an observed site-frequency spectrum. In Table 1 we record parameter estimates obtained when the ‘data’ are branch lengths *Bi* simulated under either a 4-fold Xi-coalescent (23) or a Lambda-coalescent (5) with parameter values as shown. We consider an extensive comparison of different examples of the large class of Xi-coalescent a bit beyond the scope of the current work. If 0 *≤ ϑ* < 1, Ξ (*ϑ*) denotes a 4-fold Dirac-Xi coalescent, with *F*-measure *F*(*dx*) = *δ*_{Ψ}(*x*)*dx* in (23), and Λ (*ϑ*) a Dirac-Lambda coalescent. If *ϑ* ∊ [1, 2) Ξ (*ϑ*) denotes a 4-fold Beta-Xi coalescent, and Λ (*ϑ*) a Beta-Lambda coalescent. Estimates of *ϑ* attributed to coalescent process Π_{2} are obtained with an *ℓ*_{2} norm applied to the normalised lengths drawn from coalescent process Π_{1;} and ,
If the *B*_{i} are drawn from a Xi-Dirac coalescent (Π_{1}) we estimate *ϑ* associated with a Lambda-Dirac coalescent (Π_{2}), and vice versa. If the *B*_{i} are drawn from a Xi-Beta coalescent (Π_{1}), we estimate *ϑ* associated with a Beta-Lambda coalescent (Π_{2}), and vice versa.

A more suitable distance statistic could be
where 𝕍^{(n, Π2)} [*R*_{i}] denotes the varianee of *R*_{i} computed with respect to Π_{2}. However, we can neither represent 𝕍^{(n, Π2)} [*R*_{i}] nor 𝔼^{(n, Π2)} [*R*_{i}] as simple functions of *ϑ* or *n*. In actual applications, one would replace *R*_{i} in (33) with , the normalised site-frequency spectrum.

As Table 1 shows, a Lambda-Dirac coalescent underestimates *Ψ* when the data are generated by a Xi-Dirac coalescent, and Lambda-Beta overestimates *α* when the data are generated by a Xi-Beta coalescent. When we switch the generation of data from Xi-to Lambda-coalescents, we reach the opposite conclusions. A Xi-Dirac coalescent overestimates the parameter (*Ψ*) when the data are generated by a Lambda-Dirac coalescent, and the Xi-Beta coalescent underestimates *α* when the data are generated by a Lambda-Beta coalescent

The difference between the corresponding Xi- and Lambda-coalescents are further illustrated in Figure 4, in which the distance between the normalised expected spectra (31), with Π as shown, is quantified by the *ℓ*_{2} norm (32). The graphs in Figure 4 show clearly that even when the parameters associated with the corresponding Xi- and Lambda-coalescents are the same, the difference in can be substantial (except of course when the Lambda-coalescent is the Kingman coalescent; which happens when *α* = 2 or *Ψ* = 0).

The difference in estimates between corresponding Lambda- and Xi-coalescents may be understood from the way the Xi-coalescent process is constructed. Indeed, given *k* blocks drawn from the associated Lambda-coalescent, we see one *k*-merger with probability 4^{1−k}, which quickly becomes small as *k* increases. A much more likely outcome is for the blocks to become (evenly) distributed into four groups. The effect on the genealogy of drawing a large number *k* is thus reduced in a Xi-coalescent relative to the corresponding Lambda-coalescent.

C code written for the computations is available at http://page.math.tu-berlin.de/∼eldon/programs.html.

### Application to Atlantic cod data

Atlantic cod is a diploid highly fecund marine organism, whose reproduction is potentially characterised by a skewed offspring distribution [1, 2]. Since Xi-coalescents can arise from diploid population models which admit skewed offspring distributions [30, 6], one should analyse population genetic data of nuclear loci in diploid highly fecund populations with Xi-coalescent models. Indeed, [2] obtain population genetic data at three nuclear loci from Atlantic cod. We use the *ℓ*_{2}-norm (32) to fit (see Table 2) the 4-fold Xi-Beta and Xi-Dirac coalescents to the unfolded site-frequency spectrum (USFS) of *Ckma*, *Myg*, and *HbA2* genes obtained by [2],

The USFS of the nuclear genes *Ckma*, *Myg*, and *HbA2* are all characterised by a high relative amount of singletons. Thus, singletons have the most weight in our estimate, in particular since we do not calibrate the difference between observed and expected values by the variance. The estimates of *α* associated with the Xi-Beta coalescent are therefore all at 1.0, which we attribute to the excessive amount of singletons. The excessive amount of singletons also increases the estimate of *Ψ* associated with the Xi-Dirac coalescent. In particular, our Xi-based estimates of *Ψ* are higher than the Lambda-based estimates obtained by [2] (see Table 2 in [2]). Possibly the Xi-coalescent assigns less mass to the external branches than the corresponding Lambda-coalescent for a given parameter value, but the exact shift in mass may vary between different Xi-coalescents. The Xi-Dirac coalescent is able to predict the excessive amount of singletons, the Xi-Beta coalescent much less so (Figures 5–6),

Our estimates of *Ψ* for the combined data on *Ckma* is quite a bit smaller than for the partitioned data (into A and B alleles), and for the supposedly neutral loci *Myg* and *HbA2.* [2] also observe a similar pattern. Of the three loci, the Xi-coalescents give best fit to the *Ckma* data. In view of the modes in the right tail of the USFS for *Ckma*, [2] conclude that *Ckma* is under strong selection. Even though the Xi-Dirac coalescent does show multimodal spectrum (Figure 3), the modes are small relative to the expected length of external branches, and do not quite explain the modes observed for *Ckma* (Figure 6).

## Discussion

We prove recursions for the expected site-frequency spectrum associated with Xi-coalescents which admit simultaneous multiple mergers of active ancestral lineages (blocks of the current partition). We give a class of Xi-coalescents which is ‘driven’ by a finite measure (a Lambda-measure) on the unit interval, which determines the law of the total number of active lineages which may merge each time. This class of Xi-coalescents can be applied to populations of arbitrary ploidy. We apply the recursions to compare estimates of coalescent parameters between Lambda- and Xi-coalescents. Finally, we estimate coalescent parameters associated with Xi-coalescents for Atlantic cod where the data are the unfolded site-frequency spectrum on nuclear loci.

The framework we develop will allow us to extend the recursions to more complicated frameworks, to, by way of example, populations structured into discrete subpopulations [17, 27], or possibly by considering some sort of continuous distribution in space [3]. However, one would extend the recursions to such more structured frameworks at the high risk of increasing their computational complexity.

Computing the full expected site-frequency spectrum for a 4-fold Xi-coalescent with sample size *n* ≥ 100 takes unfortunately a bit of time (on the order of hours). We do not provide detailed analysis, but the time can be shortened by considering the lumped site-frequency spectrum, where one would collect all classes of size larger than some number *m* ≪ *n* into one class. How such lumping would affect the inference remains to be seen. However, one can analyse small samples (*n ≤* 100) with our recursions, as we provide an example of. Exact likelihood methods in the spirit of [5] are yet to be developed for Xi-coalescents, and will likely be computationally intensive.

Our simple method of minimising the deviation between observed and expected values of the unfolded site-frequency spectrum should not be applied in a formal test procedure, since we do not scale the deviations by the corresponding variance. We do not give recursions for the (co)-variances of the site-frequency spectrum associated with Xi-coalescents, as these will very likely be too computationally intensive to be useful. This has already been shown to be the case for the much simpler Lambda-coalescents [7]. Our recursions provide a way to distinguish between Xi-coalescents and other demographic effects such as population growth, with the use of approximate likelihoods [18].

We obtain estimates of coalescent parameters associated with Xi-coalescents for data on nuclear loci of Atlantic cod [2]. Our estimates differ from previous estimates obtained with the use of Lambda-coalescents [2], due to the simultaneous merger characteristic of Xi-coalescents. Regardless of exact estimates, our results, coupled with those of [2], suggest that multiple merger coalescents might be the proper null model with which to analyse population genetic data on Atlantic cod, as well as other populations which may exhibit HFSOD, Our specific examples of Xi-coalescents may well represent an oversimplification of the actual mating schemes. However, one could use the framework of, say, [11], to develop new examples of Xi-coalescents for specific mating schemes, for example when one successful female produces offspring with many males.

Rigorous inference methods to distinguish the effects of high fecundity coupled with skewed offspring distribution (HFSOD) from selection are yet to be developed. The common notion is that selective sweeps lead to an excess of singletons. The main genetic signature of HFSOD is also an excess of singletons. The unfolded site-frequency spectrum of the *Ckma* gene [2] is trimodal, with an excess of singletons, and small modes of mutations of larger size. These smaller modes are not captured by the examples of Xi-coalescents that we apply, Durrett and Schweinsberg [37, 16] use a stick-breaking construction to obtain a good approximation (*≤ O*(1/(log(N))^{2})) to a selective sweep where *N* denotes the population size. [2] conclude that *Ckma* is under a form of balancing selection. Our examples of Xi-coalescents also give best fit to the *Ckma* gene of the three loci studied by [2], although our method does not constitute a formal test. We refer to [2] for more detailed discussion of the variation observed at *Ckma*, and the supposedly neutral nuclear loci *Myg* and *HbA2.* The congruence between our Xi-coalescent examples and *Ckma*, and between Xi-coalescents and selective sweeps studied by [37, 16], and the different site-frequency spectra predicted by Lambda- and Xi-coalescents, leads us to conclude that Xi-coalescents form an important class of mathematical objects with which to study genetic diversity.

## Appendix

### Proof of recursion (21) for *p*^{(n)}[*k*, *b*] (Thm. 2)

The requirement *p*^{(n)}[*n*, 1] = 1 is obvious (therefore *p*^{(n)}[*n, k*] = 0 for all *k* > 1), From now on, we consider the case 1 < *k* < *n*. By *P*_{A} we denote the set of all partitions of the set *A*; and [*n*] := {1, 2,…, *n*} for all *n* ∊ ℕ := {1, 2,…}. Let Π = {Π_{t}, *t* ≥ 0} be a *P*_{ℕ}-valued exchangeable coalescent, defined on the probability space we denote the projection of Π onto *𝒫*_{[n]}, which will be associated with coalescent processes started from *n* ≥ 2 leaves. Define
ie. the first time the block counting process associated with Π^{(n)} hits state *k*, with *τ*_{n}^{(n)} = 0. Let
Recall that we assume that for each *k* ∊ [*n*]. Thus we can define the conditional law On the conditional probability space we sample uniformly at random a block *π*_{0} from the blocks of , ie. for Then we can write

We define the first jump time *τ* of the block-counting process

Now consider some *k ≤ m* < *n*, and partition ie. an integer partition *v* of *n* into *m* elements. Define
(recall that *π*^{↓} denotes the integer partition associated to *π* ∊ *P*_{[n]} obtained by listing the block sizes of *π* in decreasing order). Then it is clear that we have the decomposition
We define the conditional law .

For each we define the set of blocks
The set of blocks *π*^{(v,0)} (*w*) contains all blocks of the partition ) that will eventually merge into the block *π*_{0}(*w*).

For any integer subpartition *ϱ* = 〈*β*_{1}, *β*_{2}, …〉 ⊂ ν = 〈*α*_{1}, *α*_{2}, …〉, we need to be able to compute the probability on Indeed,
The event {(*π*^{ν,0})^{↓} = *ϱ* states that the sizes of the blocks of *π*^{(v,0)} are given by the integer subpartition *ϱ*. Equation (38) follows from the exchangeability of Π, and the form of the probability density function of the multivariate hvpergeometric distribution, which applies to the event of sampling *β*_{i} blocks from *α*_{i} for each *i*, for a total of #*ϱ*. out of #*v*. The condition #*ϱ* < #*v* − *k* + 1 is required since we condition on the block counting process to hit *k* blocks.

Given (38) we can compute . Indeed, the decomposition
(where the union is disjoint) together with (38) gives
Given Equation (39) one can now compute *p*^{(n)}[*k*,*b*]: By the decomposition (36), we obtain
We apply Lemma 5 to and Equation (39) to to obtain

### Proof of computation of (Lemma 5)

Let (34) and *τ* be as defined in the proof for Thm. 2, Recall further that see Equation (20),

By we denote the block-counting process associated with Π^{(n)}, starting from *n*, Let *g(n, k)* denote the expected length of time that spends in state *k ≤ n*,
Clearly, where *#π* = *k*, see (11).

**Lemma 5.** *For* k *≤ m < n and* *assuming* *we have*
*where g*(.,.) *is defined in* (41).

**Proof.** Assuming one obtains
where we use *#v* = *m* and apply the strong Markov property of the process (∏^{(n)})^{↓} at time *τ* to obtain the last equality.

Now we consider and obtain where we use the strong Markov property of the block-counting process in the last step. Now the statement follows from (43) and (44).

### Proof of recursion (18) for *g*(*n,k*) (Lemma 1)

**Proof.** For *m* ≥ 2, let λ_{m} := −*q*_{π,π} with *q*_{π,π} given by (11), and *# π* = *m*. Again write for the block-counting process starting from *m*.

Let denote the first jump time of the block-counting process, and let , *k ≤ m* − 1. For *n* = *m*, one obtains
and (19) is established.

For *n < m*, we decompose according to the value of the block-counting process after the first jump and obtain
where we use the strong Markov property of *Y* in the seeond-to-last equality. Thus (18) is established.

### Proof of Equation (23)

Fix with and let . Then we have to prove that
Let Δ denote the infinite simplex, and recall the function *f* from (6) used to describe the rates (11) of a Xi-coalescent, We define the map We then have that the rate of a -merger is given by
Since supp , we can rewrite the rate in (45) as
Define Then the integrand in (46) is given by
which is zero if *r + l* > M. For *r* + *l* ≤ *M*, one obtains
for any choice of indices *i*_{i},…, *i*_{r+l} which are all different and smaller than *M*. Therefore, with (*a*)_{m} := *a*(*a* − 1) … (*a* − *m* + 1) for *m* ∊ ℕ, (*a*)_{0} := 1 denoting the falling factorial, we have
We note further that

Thus we get and we can represent the rate as This completes our proof. □

## ACKNOWLEDGMENTS

JB and BE acknowledge support by Deutsche Forschungsgemeinschaft (DFG) grant BL 1105/3-1 as part of SPP Priority Programme 1590 ‘Probabilistic Structures in Evolution’.