Epistasis, inbreeding depression and the evolution of self-fertilization

Diala Abu Awad; Denis Roze

doi:10.1101/809814

ABSTRACT

Inbreeding depression resulting from partially recessive deleterious alleles is thought to be the main genetic factor preventing self-fertilizing mutants from spreading in outcrossing hermaphroditic populations. However, deleterious alleles may also generate an advantage to selfers in terms of more efficient purging, while the effects of epistasis among those alleles on inbreeding depression and mating system evolution remain little explored. In this paper, we use a general model of selection to disentangle the effects of different forms of epistasis (additive-by-additive, additive-by-dominance and dominance-by-dominance) on inbreeding depression and on the strength of selection for selfing. Models with fixed epistasis across loci, and models of stabilizing selection acting on quantitative traits (generating distributions of epistasis) are considered as special cases. Besides its effects on inbreeding depression, epistasis may increase the purging advantage associated with selfing (when it is negative on average), while the variance in epistasis favors selfing through the generation of linkage disequilibria that increase mean fitness. Approximations for the strengths of these effects are derived, and compared with individual-based simulation results.

INTRODUCTION

Self-fertilization is a widespread mating system found in hermaphroditic plants and animals (e.g., Jarne and Auld, 2006; Igic and Busch, 2013). In Angiosperms, the transition from outcrossing to selfing occurred multiple times, leading to approximately 10−15% of species self-fertilizing at very high rates (Barrett et al., 2014). Two possible benefits of selfing have been proposed to explain such transitions: the possibility for a single individual to generate offspring in the absence of mating partner or pollinator (“reproductive assurance”, Darwin, 1876; Stebbins, 1957; Porcher and Lande, 2005a; Busch and Delph, 2012), and the “automatic advantage” stemming from the fact that, in a population containing both selfers and outcrossers, selfers tend to transmit more copies of their genome to the next generation if they continue to export pollen — thus retaining the ability to sire outcrossed ovules (Fisher, 1941; Charlesworth, 1980; Stone et al., 2014). The main evolutionary force thought to oppose the spread of selfing is inbreeding depression, the decreased fitness of inbred offspring resulting from the expression of partially recessive deleterious alleles segregating within populations (Charlesworth and Charlesworth, 1987). When selfers export as much pollen as out-crossers (leading to a 50% transmission advantage for selfing), inbreeding depression must be 0.5 to compensate for the automatic advantage of selfing (Lande and Schemske, 1985). However, observations from natural populations indicate that self-fertilizing individuals do not always export as much pollen as their outcrossing counterparts, as some of their pollen production is used to fertilize their own ovules (see references in Porcher and Lande, 2005a). This phenomenon, known as pollen discounting, decreases the automatic advantage of selfing (Nagylaki, 1976; Charlesworth, 1980), thus reducing the threshold value of inbreeding depression above which outcrossing can be maintained (e.g., Holsinger et al., 1984). It may also lead to evolutionarily stable mixed mating systems (involving both selfing and outcrossing) under some models of discounting such as the mass-action pollination model (Holsinger, 1991; Porcher and Lande, 2005a).

Several models explored the evolution of mating systems while explicitly representing the genetic architecture of inbreeding depression (e.g., Charlesworth et al., 1990; Uyenoyama and Waller, 1991; Epinat and Lenormand, 2009; Porcher and Lande, 2005b; Gervais et al., 2014), and highlighted the importance of another genetic factor (besides the automatic advantage and inbreeding depression) affecting the evolution of selfing. This third factor stems from the fact that selection against deleterious alleles is more efficient among selfed offspring (due to their increased homozygosity) than among outcrossed offspring, generating positive linkage disequilibria between alleles increasing the selfing rate and the more advantageous alleles at selected loci. Alleles increasing selfing thus tend to be found on better purged genetic backgrounds, which may allow selfing to spread even when inbreeding depression is higher than 0.5 (Charlesworth et al., 1990). This effect becomes more important as the strength of selection against deleterious alleles increases (so that purging occurs more rapidly), recombination decreases, and as alleles increasing selfing have larger effects — so that linkage disequilibria can be maintained over larger numbers of generations (Charlesworth et al., 1990; Uyenoyama and Waller, 1991; Epinat and Lenormand, 2009). This corresponds to Lande and Schemske’s (1985) verbal prediction that a mutant allele coding for complete selfing may increase in frequency regardless of the amount of inbreeding depression.

Most genetic models on the evolution of selfing assume that deleterious alleles at different loci have multiplicative effects (no epistasis). Charlesworth et al. (1991) considered a deterministic model including synergistic epistasis between deleterious alleles, showing that this form of epistasis tends to flatten the relation between inbreeding depression and the population’s selfing rate, inbreeding depression sometimes increasing at high selfing rates. Concerning the spread of selfing modifier alleles, the results were qualitatively similar to the multiplicative model, except that, for parameter values where full outcrossing is not stable, the evolutionarily stable selfing rate tended to be slightly below 1 under synergistic epistasis (whereas it would have been at exactly 1 in the absence of epistasis). Other models explored the effect of partial selfing on inbreeding depression generated by polygenic quantitative traits under stabilizing selection (Lande and Porcher, 2015; Abu Awad and Roze, 2018). This type of model typically generates distributions of epistatic interactions across loci, including possible compensatory effects between mutations. When effective recombination is sufficiently weak, linkage disequilibria generated by epistasis may greatly reduce inbreeding depression, and even generate outbreeding depression between selfing lineages carrying different combinations of compensatory mutations. However, the evolution of the selfing rate was not considered by these models.

In this paper, we use a general model of epistasis between pairs of selected loci to explore the effects of epistasis on inbreeding depression and on the evolution of selfing. We derive analytical approximations showing that epistatic interactions affect the spread of selfing modifiers through various mechanisms: by affecting inbreeding depression, the purging advantage of selfers and also through linkage disequilibria between selected loci. Although the expressions obtained can become complicated for intermediate selfing rates, we will see that the condition determining whether selfing can spread in a fully outcrossing population often remains relatively simple. Notably, our model allows us to disentangle the effects of additive-by-additive, additive-by-dominance and dominance-by-dominance epistatic interactions on inbreeding depression and selection for selfing — while the models used by Charlesworth et al. (1991), Lande and Porcher (2015) and Abu Awad and Roze (2018) impose certain relations between these quantities. The cases of fixed, synergistic epistasis and of stabilizing selection acting on quantitative traits (Fisher’s geometric model) will be considered as special cases, for which we will also present individual-based simulation results. Overall, our results show that, for a given level of inbreeding depression and average strength of selection against deleterious alleles, epistatic interactions tend to facilitate the spread of selfing, due to the fact that selfing can maintain beneficial combinations of alleles.

METHODS

Life cycle

Our analytical model represents an infinite, hermaphroditic population with discrete generations. A proportion σ of ovules produced by a given individual are self-fertilized, while its remaining ovules are fertilized by pollen sampled from the population pollen pool (Table 1 provides a list of the symbols used throughout the paper). A parameter κ represents the rate of pollen discounting: an individual with selfing rate σ contributes to the pollen pool in proportion 1 − κσ (e.g., Charlesworth, 1980). Therefore, κ equals 0 in the absence of pollen discounting, while κ equals 1 under full discounting (in which case complete selfers do not contribute to the pollen pool). We assume that the selfing rate σ is genetically variable, and coded by ℓ_σ loci with additive effects: where the sum is over all loci affecting the selfing rate, and where and represent the effect of the alleles present respectively on the maternally and paternally inherited genes at locus i (note that the assumption of additivity within and between loci may not always hold, in particular when selfing rates are close to 0 or 1). The model does not make any assumption concerning the number of alleles segregating at loci affecting the selfing rate; however, our analysis will assume that the variance of σ in the population remains small and that linkage disequilibria between loci affecting the selfing rate may be neglected, effectively leading to the same expression for the selection gradient on the selfing rate as in a simpler model considering the spread of a mutant allele changing σ by a small amount. Although we assume that the selfing rate is purely genetically determined, our general results should still hold when σ is also affected by (uncorrelated) environmental effects, after multiplying expressions for the change in the average selfing rate over time by the heritability of σ.

View this table:

Table 1: Parameters and variables of the model.

The fitness W of an organism is defined as its overall fecundity (that may depend on its survival), so that the expected number of seeds produced by an individual is proportional to W, while its contribution to the population pollen pool is proportional to W (1 − κσ). We assume that W is affected by a possibly large number ℓ of biallelic loci. Alleles at each of these loci are denoted 0 and 1; we assume an equal mutation rate u from 0 to 1 and from 1 to 0, assumed to be small relative to the strength of selection at each locus. The overall mutation rate (per haploid genome) at loci affecting fitness is denoted U = uℓ. The quantity (resp. ) equals 0 if the individual carries allele 0 on its maternally (resp. paternally) inherited copy of locus j, and equals 1 otherwise. The frequencies of allele 1 at locus j on the maternally and paternally inherited genes (averages of and over the whole population) are denoted and . Finally, is the frequency of allele 1 at locus j in the whole population.

Genetic associations

Throughout the paper, index i will denote a locus affecting the selfing rate of individuals, while indices j and k will denote loci affecting fitness. Following Barton and Turelli (1991) and Kirkpatrick et al. (2002), we define the centered variables: where and are the averages of and over the whole population. The genetic association between the sets 𝕌 and 𝕍 of loci present in the maternally and paternally derived genome of an individual is defined as: where E stands for the average over all individuals in the population, and with:

For example, is a measure of the departure from Hardy-Weinberg equilibrium at locus j, while measures the linkage disequilibrium between loci j and k on paternally derived haplotypes. Finally, is defined as (D_𝕌,𝕍 + D_𝕍,𝕌) /2, and will be denoted .

Using these notations, the variance in selfing rate in the population can be written as:

Ignoring genetic associations between different loci affecting the selfing rate, this becomes:

General expression for fitness, and special cases

The fitness of an individual divided by the population mean fitness can be expressed in terms of “selection coefficients” a_𝕌,𝕍 representing the effect of selection acting on the sets 𝕌 and 𝕍 of loci (Barton and Turelli, 1991; Kirkpatrick et al., 2002):

Throughout the paper, we assume no effect of the sex-of-origin of genes on fitness, so that a_𝕌,𝕍 = a_𝕍,𝕌. The coefficient a_j,ø = a_ø,j will be denoted a_j and represents selection for allele 1 at locus j. The coefficient a_j,j represents the effect of dominance at locus j, while a_jk,ø and a_j,k represent cis and trans epistasis between loci j and k. Coefficients a_jk,j and a_jk,jk respectively correspond to additive-by-dominance and dominance-by-dominance epistatic interactions between loci j and k, measured as deviations from additivity. Throughout the paper, we will assume that selection is weak, all a_𝕌,𝕍 being of order ϵ (where ϵ is a small term), and derive general expressions for inbreeding depression and the strength of selection for selfing to leading order in a_𝕌,𝕍 coefficients. Results for any particular fitness function can then be obtained by computing the corresponding expressions for a_𝕌,𝕍 coefficients. We will consider three examples of fitness function that have been used in previous papers, and lead to different properties of the three forms of epistasis mentioned above. Approximate expressions for a_𝕌,𝕍 coefficients under these fitness functions are computed in Supplementary File S1.

Uniformly deleterious alleles

Our first example corresponds to the case where allele 1 at each fitness locus j is deleterious, with selection and dominance coefficients s and h. Epistatic interactions occur between pairs of loci, and are decomposed into additive-by-additive (e_axa), additive-by-dominance (e_axd) and dominance-by-dominance (e_dxd) epistasis (see Supplementary Figure S1 for an interpretation of these terms). We assume multiplicative effects of epistatic components on fitness W (i.e., additive effects on log W), so that: where n_he and n_ho are the numbers of loci at which a deleterious allele is present in the heterozygous (n_he) or homozygous (n_ho) state, while n₂, n₃ and n₄ are the numbers of interactions between 2, 3 and 4 deleterious alleles at two different loci, given by:

For example, n₂ is given by the number of pairs of heterozygous loci in the genome (n_he (n_he − 1)/2), plus twice the number of pairs involving one heterozygous locus and one homozygous locus for the deleterious allele (n_hen_ho), plus four times the number of pairs of homozygous loci for the deleterious allele (n_ho (n_ho − 1)/2). In such models with fixed epistasis and possibly large numbers of loci, combinations of mutations quickly become advantageous when epistasis is positive, in which case they sweep through the population. We therefore focused on cases where e_axa, e_axd and e_dxd are negative, and will assume throughout that deleterious alleles stay at low frequencies in the population (p_j remains small). As shown in Supplementary File S1, equation 9 leads to a_jk = a_j,k ≈ e_axa, a_jk,j ≈ e_axd and a_jk,jk ≈ e_dxd, while the strength of directional selection at each locus (a_j) is affected by e_axa and the effective dominance (a_j,j) is affected by e_axd. Because epistatic coefficients are the same for all pairs of loci, equation 9 leads to a situation where the variances of a_jk, a_jk,j and a_jk,jk over pairs of loci equal zero, while their mean values may depart from zero.

Charlesworth et al. (1991) explored the effect of synergistic epistasis (measured by a parameter β) on inbreeding depression, using a fitness function that imposes relations between h, e_axa, e_axd and e_dxd. As explained in Supplementary File S1, their fitness function (equation 2 in Charlesworth et al., 1991) is equivalent to setting e_axa = −βh², e_axd = −βh (1 − 2h) and e_dxd = −β (1 − 2h)² in our equation 9.

Gaussian stabilizing selection

Our second fitness function corresponds to stabilizing selection acting on an arbitrary number n of quantitative traits, with a symmetrical, Gaussian-shaped fitness function. The general model is the same as in Abu Awad and Roze (2018): r_αj denotes the effect of allele 1 at locus j on trait α, and we assume that the different loci have additive effects on traits: where g_α is the value of trait α in a given individual (note that g_α = 0 for all traits in an individual carrying allele 0 at all loci). We assume that the values of r_αj for all loci and traits are sampled from the same distribution with mean zero and variance a². The fitness of individuals is given by: where V_s represents the strength of selection. According to equation 14, the optimal value of each trait is zero. We assume that is small, so that selection is weak at each locus. This model generates distributions of fitness effects of mutations and of pairwise epistatic effects on fitness (the average value of epistasis being zero), while deleterious alleles have a dominance coefficient close to 1/4 in an optimal genotype (Martin and Lenormand, 2006b; Martin et al., 2007; Manna et al., 2011). In a population at equilibrium, equations 13 and 14 lead to (i.e., the rarer allele at locus j is disfavored), and , while a_jk,j and a_jk,jk are smaller in magnitude (see Supplementary File S1). This scenario thus generates a situation where additive-by-additive epistasis (a_jk = a_j,k) is zero on average (because the average of r_αj is zero) but has a positive variance among pairs of loci, while additive-by-dominance and dominance-by-dominance epistasis are negligible. As in the previous example, we will generally assume that the deleterious allele at each locus j (allele 1 if p_j < 0.5, allele 0 if p_j > 0) stays rare in the population, by assuming that (1 − 2p_j)² is close to 1; this is also true in the next example.

Non-Gaussian stabilizing selection

The last example we examined is a generalization of the fitness function given by equation 14, in order to introduce a coefficient Q affecting the shape of the fitness peak (e.g., Martin and Lenormand, 2006a; Tenaillon et al., 2007; Gros et al., 2009; Roze and Blanckaert, 2014; Abu Awad and Roze, 2018): where is the Euclidean distance from the optimum in phenotypic space. The fitness function is thus Gaussian when Q = 2, while Q > 2 leads to a flatter fitness peak around the optimum. The expressions for a_𝕌,𝕍 coefficients derived in Supplementary File S1 show that the variances of a_jk = a_j,k, a_jk,j and a_jk,jk over pairs of loci have the same order of magnitude, and that additive-by-additive epistasis (a_jk = a_j,k) is zero on average, while additive-by-dominance and dominance-by-dominance epistasis (a_jk,j, a_jk,jk) are negative on average when Q > 2. Note that Q > 2 also generates higher-order epistatic interactions (involving more than two loci); however, we did not compute expressions for these terms.

Quasi-linkage equilibrium (QLE) approximation

Using the general expression for fitness given by equation 8, the change in the mean selfing rate per generation can be expressed in terms of genetic associations between loci affecting the selfing rate and loci affecting fitness. Expressions for these associations can then be computed using general methods to derive recursions on allele frequencies and genetic associations (Barton and Turelli, 1991; Kirkpatrick et al., 2002). For this, we decompose the life cycle into two steps: selection corresponds to the differential contribution of individuals due to differences in overall fecundity and/or survival rates (W), while reproduction corresponds to gamete production and fertilization (involving either selfing or outcrossing). Associations measured after selection (that is, weighting each parent by its relative fitness) will be denoted , while associations after reproduction (among offspring) will be denoted . Assuming that “effective recombination rates” (that is, recombination rates multiplied by outcrossing rates) are sufficiently large relative to the strength of selection, genetic associations equilibrate rapidly relative to the change in allele frequencies due to selection. In that case, associations can be expressed in terms of allele frequencies by computing their values at equilibrium, for given allele frequencies (e.g., Barton and Turelli, 1991; Nagylaki, 1993). Note that when allele frequencies at fitness loci have reached an equilibrium (for example, at mutation-selection balance), one does not need to assume that the selection coefficients a_𝕌,𝕍 are small relative to effective recombination rates for the QLE approximation to hold, but only that changes in allele frequencies due to the variation in the selfing rate between individuals are small. We will thus assume that the variance in the selfing rate in the population V_σ stays small (and therefore, the genetic variance contributed by each locus affecting the selfing rate is also small), and compute expressions to the first order in V_σ. This is equivalent to the assumption that alleles at modifier loci have small effects, commonly done in modifier models.

Individual-based simulations

In order to verify our analytical results, individual-based simulations were run using two C++ programs, one with uniformly deleterious alleles with fixed epistatic effects (equation 9) and the other with stabilizing selection on n quantitative traits (equation 14). Both are described in Supplementary File S5 and are available from Dryad. Both programs represent a population of N diploid individuals with discrete generations, the genome of each individual consisting of two copies of a linear chromosome with map length R Morgans. In the first program (fixed epistasis), deleterious alleles occur at rate U par haploid genome per generation at an infinite number of possible sites along the chromosome. A locus with an infinite number of possible alleles, located at the mid-point of the chromosome controls the selfing rate of the individual. In the program representing stabilizing selection, each chromosome carries ℓ equidistant biallelic loci affecting the n traits under selection (as in Abu Awad and Roze, 2018). The selfing rate is controlled by ℓ_σ = 10 additive loci evenly spaced over the chromosome, each with an infinite number of possible alleles (the selfing rate being set to zero if the sum of allelic values at these loci is negative, and one if the sum is larger than one). In both programs, mutations affecting the selfing rate occur at rate U_self = 10⁻³ per generation, the value of each mutant allele at a selfing modifier locus being drawn from a Gaussian distribution with standard deviation σ_self centered on the allele value before mutation. The selfing rate is set to zero during an initial burn-in period (set to 20,000 generations) after which mutations are introduced at selfing modifier loci.

RESULTS

Effects of epistasis on inbreeding depression

We first explore the effects of epistasis on inbreeding depression, assuming that the selfing rate is fixed. Throughout the paper, inbreeding depression δ is classically defined as: where and are the mean fitnesses of offspring produced by selfing and by outcrossing, respectively (e.g., Lande and Schemske, 1985). In Supplementary File S2, we show that a general expression for δ in terms of one- and two-locus selection coefficients, in a randomly mating population (σ = 0) is given by: where the sums are over all loci affecting fitness, and with:

ρ_jk being the recombination rate between loci j and k. With arbitrary selfing, and assuming all ρ_jk ≈ 1/2, equation 17 generalizes to: with several higher-order terms depending on genetic associations between loci generated by epistatic interactions (, see equation B17 in Supplementary File S2 for the complete expression). The term F in equation 19 corresponds to the inbreeding coefficient (probability of identity by descent between the maternal and paternal copy of a gene), given by: at equilibrium, while G_jk is the identity disequilibrium between loci j and k (Weir and Cockerham, 1973), given by: (ϕ_jk is the joint probability of identity by descent at loci j and k). Under free recombination (ρ_jk = 1/2), it simplifies to: which will be denoted G hereafter. Given that G_jk is only weakly dependent on ρ_jk, G_jk should be close to G for most pairs of loci when the genome map length is not too small.