Bridging the gap between the evolutionary dynamics and the molecular

Abstract


Introduction
In eukaryotes, meiosis is a fundamental step in the reproduction process, allowing the formation of haploid cells from a diploid cell.This process requires the success of a key step, namely, the pairing of homologous chromosomes.Correct pairing is essential for proper segregation of chromosomes in daughter cells.In addition, it allows for the formation of cross-overs, thus implementing recombination and generating new combinations of alleles.Finally, meiosis and recombination are at the heart of questions of hybrid sterility and speciation [1][2][3].However, the evolutionary dynamics of meiotic recombination still remains poorly understood.A correct understanding of this dynamics requires explicit description of the population genetics processes, on one side, and the molecular mechanisms of meiosis, on the other side.In the present work, we present an attempt in this direction, using theoretical and simulation models.
In mammals and many other eukaryotes, recombination points are not uniformly distributed along the chromosomes [4].Instead, crossovers frequently occur at the same positions in independent meioses, into regions of the genome called recombination hotspots, where the frequency of crossing-over occurrence is 10 to 100 times higher than in the rest of the genome [5,6].Recombination hotspots are typically 1 to 2 kb long, and they are often located outside of genes.More than 30,000 hotspots were found in humans [4,7], and around 40,000 in mice [8].These hotspots are characterized in humans by the presence of a sequence motif (13-mer) determining up to 40% of crossing-overs [9].This motif has helped to identify the PRDM9 gene as the gene responsible for hotspot location, [9,10].PRDM9 encodes a DNA-binding protein, and the hotspots therefore correspond to strong binding sites of this protein.
Once bound to its target site, the PRDM9 protein trimethylates surrounding histones (H3K4me3 and H3K36me3 [12,13]), inducing the recruitment of the double strand break (DSB) machinery near the target site [14] (for a review see [15]).DSB induction, followed by DNA resection, produces a single-stranded end that searches for its homologous sequence on the other chromosome and is then repaired, using the sequence at the same locus on the homologous chromosome as the template.This repair leads to the conversion of the sequence located near the break (often overlapping the site targeted by PRDM9 [16]).Some of these gene conversion events lead to the formation of crossing-overs (CO), while others are repaired without exchange of flanking regions (non crossing-overs, NCO).In some cases, repair is done not with the homologue but with the sister chromatid [16].
Crucially, when allelic variation exists at a given target motif that modulates the binding affinity for PRDM9, DSBs will form more frequently on the allele for which PRDM9 has the highest affinity (the 'hot' allele).Given that DSBs are repaired by using the intact homologue as a template, this process leads to the preferential replacement of 'hot' alleles by alleles for which PRDM9 has a lower affinity.This mechanism of biased gene conversion favors the transmission of mutations that inactivate hotspots, leading to an increase in the frequency of these mutant inactive versions of binding sites, as well as their fixation in the population.This self-destruction phenomenon, commonly called the "hotspot conversion paradox" [17], leads to the progressive inactivation, hereafter called erosion, of PRDM9 binding sites at the genome scale, over short evolutionary times (in the order of 10,000 to 100,000 generations [18]), which therefore raises the question of the maintenance of recombination in the long term.
As a solution to the paradox, a model has been proposed [19], which works as follows: the disappearance of recombination hotspots compromises the proper functioning of meiosis and greatly reduces the fertility of individuals.In this context, new PRDM9 alleles, recognizing new target sites already present by chance in the genome and thus restoring recombination, would be positively selected.They would eventually replace the alleles currently segregating in the population.By analogy with the so-called Red Queen dynamics [20] typically displayed by prey-predator or host-pathogen systems, this model has been called the intragenomic Red Queen model of recombination hotspot evolution [19].It predicts a rapid evolutionary dynamics of recombination landscapes, as well as a strong positive selection on the DNA binding domain of PRDM9.
Several empirical observations support the Red Queen model.First, fine scale recombination landscapes differ between closely related species, like between humans and chimpanzees [10,21], or between modern humans and Denisovan [18], suggesting a fast turnover of recombination hotspots.Second, the DNA binding domain of PRDM9 is a zinc finger domain, consisting of an array of 7 to 10 zinc fingers.This domain is encoded by a minisatellite, which mutates rapidly [22] by a combination of point mutations and unequal crossing over between the sequence repeats.This allows for the rapid accumulation of new combinations of zinc fingers, and thus of new alleles recognizing different hotspots, providing the necessary mutational input for the Red Queen to run.In part because of this high mutation rate, PRDM9 is typically characterized by a high genetic diversity in natural populations [23][24][25][26][27] .Finally, non-synonymous substitutions in the zinc finger domain of PRDM9 are more frequent than synonymous substitutions, suggesting the presence of a strong positive selection acting on the DNA binding domain [3,25].
Studies combining computer simulations and theoretical analyses have also given many arguments in favor of the Red Queen model [19], [28].However, current theoretical models remain silent on two essential points.
First, what exact mechanism explains the positive selection acting on PRDM9 to restore recombination?Current models invoke a decline in fertility caused by the erosion of recombination, but without providing a precise explanation of this point.Second, these models do not provide any explanation, at this stage, of the link between the intra-genomic Red Queen and the phenotype of hybrid sterility induced by PRDM9 in the mouse [1,29].
In this respect, a series of molecular and comparative analyses conducted on the mouse provide some clues.First, the differential erosion of PRDM9 binding sites between mouse subspecies (due to different PRDM9 alleles eroding their respective target sites within each subspecies) leads to asymmetrical binding configurations in hybrid individuals [30].Specifically, in a F1 hybrid, each of the two PRDM9 alleles has eroded its targets in the genome of its parental strain, but not in the other strain's genome.Each PRDM9 allele will therefore tend to bind preferentially to the still active target sites present on the chromosomes inherited from the other parent, but not to the homologous but eroded sites on the chromosome from its own parent.These asymmetrical binding patterns of PRDM9 across the whole genome are suspected to be involved in the sterility phenotype.Indeed, Chip-Seq experiments have uncovered a correlation between PRDM9 binding asymmetry rates in a variety of mouse hybrids for different pairs of PRDM9 alleles and hybrid sterility [31,32].Cytogenetic observations have also shown that chromosomes that are more asymmetrically bound are less often correctly paired during metaphase I [32].Finally, the introduction by transgenesis in mice of a PRDM9 allele possessing the human zinc finger domain, which has never been present in the mouse species complex and has then not eroded its targets asymmetrically in either of the two populations, restores both a good binding symmetry and high levels of fertility [1].
In the light of these empirical results, a model has been proposed [1], according to which PRDM9 would in fact have a dual role during meiosis: in addition to being responsible for the recruitment of the DSB machinery, it would also facilitate the pairing between the homologous chromosomes, thanks to its symmetrical binding, that is, its simultaneous binding to the same target site on both homologues.More precisely, PRDM9 is thought to act to bring the sites on which it is bound to the chromosomal axis.The mechanisms involved in this step are not very clear, although it would seem that H3K4me3 markers [33] and the SSXRD and KRAB domains of PRDM9 are implicated [34].This is where symmetry would play a key role (Fig 1): when PRDM9 is bound symmetrically, the two homologous loci are each brought closer to the chromosomal axis, which makes the complementary sequence of the homologue more directly accessible to the single-stranded end produced by resection of the DSB [35].In contrast, in the case where PRDM9 binds only on one of the two homologous loci, a possible DSB would be repaired only later on, either by the homologue as a non CO event or by the sister chromatid [16].
This mechanistic model based on the symmetrical binding of PRDM9 provides a globally coherent explanation of the hybrid sterility phenotype.A last question remains open however: could this dual role of PRDM9 also explain the Red Queen evolutionary dynamics of recombination observed within a single population?Of note, in the hybrid, failure of meiosis is a consequence of macroscopic asymmetric sequence patterns due to differential erosion in the two parental lineages.Such macroscopic asymmetric sequence patterns are unlikely within a single population.On the other hand, statistical binding asymmetries might nevertheless occur and play a role.
To investigate this question, in this paper, we revisit the theoretical modeling of the intra-genomic Red Queen of PRDM9 -dependent recombination.In contrast to previous work [19,28] (but see recent work of Baker et al. [36]), we explicitly model the molecular mechanism of meiosis and, more specifically, the function currently hypothesized for PRDM9.Our specific aim was to test whether the combined effects of biased gene conversion and symmetry provide sufficient ingredients for running an intra-genomic Red Queen, and this, under empirically reasonable parameters values.

Results
To investigate the evolutionary dynamics of PRDM9 -dependent recombination, we developed a simulation program modeling the evolution of a randomly mating population of diploid individuals and accounting Figure 1: A model of how symmetrical PRDM9 binding facilitates chromosome pairing.Upon binding DNA at a specific target motif, PRDM9 (orange oval) brings the DNA segment close to the chromosomal axis.Some of the sites bound by PRDM9 may then undergo a DSB (red star).Resection of the DSB generates a single-stranded end, which will search for a complementary sequence to use as a template for repair.In the case where PRDM9 is symmetrically bound (i.e. on both homologues, case on the left-hand side), the templates provided by the two sister chromatids of the homologue are more directly accessible, thus facilitating homology search and pairing with the homologue.The break can then be repaired either as a CO or a NCO event, in both cases, implementing gene conversion at the broken site.In the case of asymmetrical PRDM9 binding (case shown on the right-hand side), the homologue is less directly accessible, preventing efficient homologue engagement.The broken site is assumed to be repaired later on, once the homologues have synapsed (and this, thanks to other DSBs occurring at symmetrically bound sites somewhere else on the same pair of chromosomes), as NCO events.In the case where the homologue bears an inactive binding site at the position corresponding to the DSB, the NCO will effectively implement biased gene conversion in favor of the inactive version.Meiosis is modeled step by step.First, the PRDM9 proteins binds to their target sites, according to a simple chemical equilibrium (Fig 2C).The probability of occupation of a site will thus depend on the binding affinity of PRDM9 for this site.It may also depend on PRDM9 concentration, which can itself depend on the genotype of the individual (either homozygous for single PRDM9 allele, or heterozygous for two distinct PRDM9 alleles), through gene dosage effects.Then, a small number of double strand breaks are induced at some of the sites bound by PRDM9 (Fig 2D).At that step, a key assumption of the model is that meiosis will succeed only if at least one of those DSBs occurs at a site that is also bound by PRDM9 on at least one of the two chromatids of the homologous chromosome, in which case a single cross-over is performed at the site of one of the DSBs fulfilling this requirement.Finally, all DSBs are repaired using the homologue (Fig 2E ), upon which meiosis is assumed to proceed successfully, producing four gametes, one of which is chosen at random for reproduction (Fig 2F).
Of note, because of the symmetry requirement, the probability of success of meiosis will depend on the genotype of the individuals on which it is attempted, which will thus induce differences in fertility between individuals.In addition, explicit repair of the DSBs by the homologue effectively implements the process of gene conversion at the binding sites.The main question is then to what extent these two aspects of the molecular mechanism are susceptible to influence the evolutionary dynamics.
The overall procedure is run for several tens of thousands of generations, during which several summary statistics are monitored: the frequency of each PRDM9 allele and the resulting genetic diversity at the PRDM9 locus, the mean proportion of binding sites of PRDM9 alleles that are currently active, their mean affinity across the genome, the mean probability of symmetrical binding and the mean probability of success of meiosis.The model parameters and the monitored statistics are summarized in Table 1 The results section is divided in three main parts.First, a simple version of the model is presented, in which PRDM9 gene dosage is ignored (that is, assuming that the number of proteins produced by a given PRDM9 allele is the same in a homozygote and in a heterozygote for this allele).Although empirically questionable, this assumption offers a simpler basis for understanding key features of the model and of the resulting evolutionary dynamics.Then, in a second part, we introduce PRDM9 gene dosage and work out its consequences.Finally, we attempt an empirical calibration of the model based on current knowledge in the mouse, so as to test its predictions.

Intragenomic Red Queen
We first ran the model without gene dosage, and with a low mutation rate at the PRDM9 locus (u = 5×10 −6 ).
Note that the parameter values used here are not meant to be empirically relevant.Instead, the aim is to illustrate the different regimes produced by the model.Assuming a low mutation rate for PRDM9 results in few, rarely more than one, alleles segregating at any given time in the population (Fig 3A).We call this a monomorphic regime, after Latrille et al [28].The dynamic is as follows.First, an allele appears in the Relative difference in fertility between homozygotes and heterozygotes for young alleles σ z Relative difference in fertility between homozygotes and hemizygotes for alleles of level of erosion z a These variables also change over time; b the mean is over all individuals carrying this allele (with a weight of 1 for heterozygotes and a weight of 2 for homozygotes).
population and invades, until reaching a frequency close to 1.Meanwhile, the proportion θ of active binding sites for this allele decreases, due to the erosion of these sites by inactivating mutations and biased gene conversion.Once the allele has eroded a fraction of around 20% of its sites, it is quickly replaced by a newly arisen PRDM9 allele that recognizes a different hotspot sequence motif.This rapid replacement clearly suggests the presence of strong positive selection.The newly invading allele then erodes its target until being replaced by a new allele which in turn follows a similar trajectory, and so on.
A key aspect of the model is that it does not explicitly invoke a fitness function.Instead, the positive selection that seems to be operating on new alleles (Fig 3) is an emerging property of the mechanism of meiosis.This positive selection can be more precisely understood by examining how the rate of PRDM9 symmetrical binding (defined by Eq (15) and Eq (14)) and the mean fertility of a typical allele (Eq (16)) evolve over the lifespan of this allele (Fig 3D and 3E), and how this relates to the level of erosion (Eq (13),  .This is expected: the more frequent an allele, the more often it will be implicated in a meiosis and the more opportunities its target sites will have to undergo gene conversion events.
Second, erosion results in a decrease of the mean affinity of the sites that are still active (Fig 3C).
This reflects the fact that sites of high affinity are more often bound by PRDM9, and thus are more often converted by inactive mutant versions.
Third, sites of lower affinity are also less often symmetrically bound (i.e.bound simultaneously on both homologues).The key quantity that captures this effect is the conditional rate of symmetrical binding (q).Since DSBs are chosen uniformly at random among all bound sites by PRMD9, q corresponds to the probability that a DSB will occur in a symmetrically bound site and will thus contribute to a successful pairing of the homologues.This probability is a monotonous function of the mean affinity of the binding sites for the PRDM9 protein (Fig 3D).The mean value of q over all target sites of a given PRDM9 allele is thus lower for older alleles (Fig 3D).
Finally, since, in our model, meiosis requires at least one DSB in a symmetrically bound site, the mean fertility of an older allele is lower (Fig 3E).Hence, new alleles (young alleles) will be positively selected at the expense of old alleles and will ultimately replace them in the population.
To assess to what extent the requirement of symmetrical binding impacts the evolutionary dynamics of PRDM9, we performed additional simulations with a simpler model, in which DSBs can be repaired as COs even when PRDM9 binds only one of the two homologues (i.e. this corresponds to a model where PRDM9 is required for the formation of DSBs, but not for chromosome pairing).With this setting, the model still predicts a turnover of PRDM9 alleles, but with unreasonably high, in fact nearly complete, levels of erosion (Supplementary Figure 2).Fundamentally, PRDM9 alleles persist in the population until they have no more sites to bind, at which point they cannot anymore recruit DSBs and are thus selectively eliminated for this trivial reason.Thus, a key result obtained here is that the requirement of symmetrical binding for chromosome pairing, combined with preferential erosion of high affinity sites, is sufficient for creating differences in fitness (fertility) between old and young alleles, which in turn will spontaneously induce a Red Queen dynamics at the level of the population, while ensuring moderate levels of hotspot erosion.
Of note some stochastic deviations from this typical life-cycle for a PRDM9 allele are sometimes observed, such as an allele being first outcompeted by a subsequent allele but then showing a rebound in frequency when the competitor has itself eroded a large fraction of its target sites.Such deviations are relatively rare and do not seem to fundamentally change the overall regime.It should also be noted that we observe an increase in the rate of symmetrical binding and in the mean fertility at the beginning and at the end of the life of the alleles.The reason for this is that these two summary statistics are defined, for each allele, as a mean over all diploid genotypes carrying this allele segregating in the population.As a result, when old alleles have declined to a low frequency, they often find themselves in a heterozygous state with new alleles, which restores the rate of symmetrical binding and thus the fertility of the corresponding diploid individual.Likewise, when a new allele appears in the population, it is in a heterozygous state with an older allele, which gives a lower rate of symmetrical binding and fertility than being in the homozygous state.Control simulations (without the requirement of symmetry) run in the polymorphic case result in less extreme erosion levels than control simulations in the monomorphic regime, although still much stronger than in the simulations in which symmetrically binding is required (Supplementary Figure 2).The overall dynamics under this control appears to be neutral, with a turnover caused by mutational input of new alleles and loss of old alleles by drift.

Scaling experiments
The simulations shown in the previous section have helped to establish that the molecular details of the implication of PRDM9 during meiosis (such as depicted in where P inv corresponds to the probability of invasion.In this expression, if u increases, this also increases the rate of replacement of old alleles thus shifting the equilibrium towards weaker equilibrium erosion.These lower erosion levels imply a higher rate of symmetrical binding and a higher mean fertility (Fig 5C ).A second scaling experiment was performed, now varying the rate v of target inactivation.From this experiment, it appears that v has a very weak impact on standing diversity (Fig 5D).On the other hand, it has an important impact on equilibrium erosion levels (Fig 5E ) and on fertility (Fig 5F ), with an opposite effect, compared to what was observed with u.This comes from the fact that the rate of erosion is equal to 4N vg.Increasing v will therefore result in an increase of the rate of erosion at equilibrium.
To summarize, the equilibrium set point of the Red Queen dynamic is directly impacted by the two parameters u and v, whose effects are opposed.In the case of diversity, u (or, more precisely, N u) is the key determining factor, while v seems to have minor impact.These results are consistent with the results obtained by Latrille et al. [28].

Comparison with analytical developments
In parallel with the simulator, the model was also explored analytically, using the self-consistent mean-field approach originally described in Latrille et al. [28].The complete analytical developments can be found in Supporting materials sections 1.1 to 1.3.Here, a simplified presentation is given, introducing the main intuitions and formulae.
For these analytical developments to be tractable, three preliminary conditions were introduced.First, a polymorphic regime (N u >> 1) is assumed for PRDM9.This assumption makes it possible to neglect the presence and the impact on the dynamics of homozygous individuals, which facilitate the approximations.
Second, we assume a strong selection regime (4N s 0 >> 1, where s 0 is the mean selection coefficient acting on new PRDM9 alleles).This allows us to rely on deterministic approximations for the typical trajectory of PRDM9 alleles.Finally, we assume a weak erosion of the target sites at equilibrium, which allows us to linearize the equations.
First, to a good approximation, the rate of erosion of the targets of an allele depends on its frequency in the population : where ρ = N vd 2h = 4N vg and g = d 8h is the net rate of erosion per generation.As a result, the cumulated erosion of an allele of age t is: where f (t ′ ) is the frequency of the allele at time t ′ .The quantity z can thus be seen as the intrinsic age of the allele (i.e. an allele ages more rapidly if more frequent in the population).Because of selection, the frequency of an allele changes as a function of its erosion, or intrinsic age z.
Assuming weak erosion (z << 1) allows one to linearize the differential equation describing the evolution of the frequency of an allele, which gives : Eq 3 says that the variation in frequency of an allele is proportional to the difference between the erosion of the targets for this allele (z) and the mean erosion for all other alleles in the population (z).The proportionality coefficient, α, corresponds to the linear response of the log-fitness as a function of erosion.It can be expressed as a function of the mechanistic parameters of the model (Supporting materials section 1 Eq 56).Thus, if an allele is younger, the allele has eroded less targets than the average (z < z) and will increase in frequency.Conversely when the allele reaches and then exceeds the mean erosion level, its frequency will start to decrease.
The previous equation shows the evolution of the frequency of a typical allele in the population, but it depends on the mean level of erosion z, which is unknown.To determine its value, we can apply a self-consistent argument, essentially saying that all alleles entering the population have this typical dynamic.
Using this self-consistent argument leads to the following explicit expression for z [28] : Here, we find three constants of the model.There is first of all ρ, the net erosion constant, which by increasing will contribute to increasing the mean equilibrium erosion level z.Then, in the denominator, there is µ = 4N u, the population-scaled mutation rate of PRDM9, which by increasing leads to a more frequent emergence of new alleles.This restricts the lifetime of alleles in the population and thus decreases the average cumulative erosion rate.Finally, we find again the constant α.If the latter increases, this means that the log-fitness will decrease more quickly as a function of erosion.As a result, the allele will be more quickly eliminated (i.e. with less erosion), which lowers the equilibrium erosion level z.
Note that this equation is the same as that presented in Latrille et al [28].However, in [28], α and g were phenomenological parameters, set arbitrarily, while in our mechanistic model, they emerge directly from the molecular mechanisms of meiosis.As such, they can be expressed as functions of the model parameters (d, h, N or v, see Supporting matrials section 1 Eq 56 and Eq 34).
From there, we can express the mean equilibrium quantities over the population and across the whole simulation as a function of z.Namely θ, the mean proportion of still active target sites, w, the mean fertility of the alleles, and the diversity D of PRDM9 in the population at equilibrium : In the expression for w, q corresponds to the mean probability of symmetrical binding at equilibrium.It depends on z.
The analytical approximations presented here are plotted on

Taking into account PRDM9 gene dosage
The previous results were obtained with a model assuming the same concentration of the PRDM9 protein product of a given allele in individuals that are either homozygous or heterozygous for this allele.Yet, in reality, gene dosage seems to be an important aspect of the regulation of PRDM9 expression [37].To account for this fact, gene dosage was introduced in the simulation model.This was done by assuming that a homozygote produces twice as many protein products as a heterozygote for a given allele.The main consequence of introducing gene dosage is that, in homozygotes, the occupancy of a site of a given affinity is increased, compared to a heterozygote, leading to a higher probability of symmetrical binding and fertility.
This could have an important impact on the Red Queen dynamics.
To illustrate this point, Fig 6 contrasts the results obtained with and without gene dosage in an otherwise identical parameter configuration.We observe a drastic change of regime between the two settings.While the simulation without gene dosage gives a polymorphic regime, the simulation with gene dosage gives a strict monomorphic regime with an extremely dominant allele staying at a frequency close to 1 during several thousands of generations, a time period during which new alleles apparently cannot invade the population.
What happens here is that, due to gene dosage, homozygotes have a fitness advantage over heterozygotes.
Since alleles at high frequency are more often present in a homozygous diploid genotype, they have an advantage over low frequency alleles.And so, paradoxically, old (but not too old) alleles at high frequency can have a higher mean fitness than new alleles that just appeared in the population, and this, in spite of their starting from the analytical results shown above (for the complete analytical developments, see Supporting materials section 1.4).Here, perturbative means that it is valid only when the impact of dosage is weak.
However it will also give the conditions on the model parameters for this to be the case.Thus, with dosage, the evolution of the frequency f of an allele through time is now given by: where α het corresponds to the α coefficient (slope of the log-fitness as a function of z) for a heterozygote and : is the relative difference in fertility between homozygotes and heterozygotes for young alleles (or homozygous advantage).
Thus, unlike what was observed without dosage, the fate of an allele is now determined by a competition between two different effects : an age effect and a frequency effect.The age effect (everything else being equal, older alleles tend to have a lower fitness than younger alleles) was already present without dosage (Eq 3).The frequency effect (everything being equal, more frequent alleles benefit from a homozygous advantage and will therefore tend to become even more frequent) is the specific contribution of dosage.Importantly, the frequency effect (which is proportional to σ 0 ) systematically acts against diversity.
Based on Eq 6, we can determine when dosage will play an important role.First, σ 0 should be large.
Second, there should be enough time between successive invasions for eviction to take place.An analytical approximation for the time between invasions is given by τ = 2 µαz .Altogether we predict a qualitative change induced by gene dosage on PRDM9 standing diversity when N u is large (i.e. the regime would have been polymorphic without dosage) and σ 0 τ is also large (i.e.eviction due to homozygous advantage is sufficiently strong and has enough time to occur).Conversely, we predict a polymorphic regime when N u is large and when the dosage effects are negligible, i.e. when σ 0 τ is small.These predictions were verified by conducting two scaling experiments, exploring a broad range of values for the mutation rates at the PRDM9 locus u and at the targets v (Supplementary Figure 3), or varying the mean number of DSB (d) and the mean binding affinity ȳ of the target sites (Supplementary Figure 4).In all cases, these experiments confirm that the regime is polymorphic in the presence of dosage if 4N u > 10 and σ 0 τ < 3.
Altogether, gene dosage entails additional constraints, which tend to substantially restrict the conditions for observing a high diversity of PRDM9 alleles in the population.

Empirical input parameters
Finally an empirical calibration of the model was attempted, using parameter values roughly estimated based on current knowledge in mammals and, more specifically, in the mouse.The effective population size (N e ) of different Mus musculus subspecies such as Mus m. musculus or Mus m. domesticus is around 10 5 [38].The number of hostspots recognized per PRDM9 alleles across the genome (h) is estimated between 15, 000 and 20, 000 [39].Knowing that the mouse possesses 20 pairs of chromosomes, this corresponds approximately to h ≈ 1000 sites per chromosome.The mean number of DSBs (d) performed per individual is estimated between 200 and 250 per meiosis [40], which correspond to approximately 10 DSBs per chromosome pair per meiosis.An exponential distribution for the affinities has a good fit to observed distribution obtained from Chip-seq experiments [29] (Supplementary Figure 5).The mean, however, is unknown but can be determined by assuming an average of 5, 000 targets bound per meiocyte [41] out of 20, 000 available targets.
We obtain a mean of approximately 0.2.
Concerning the target mutation rate, considering that the mutation rate per nucleotide per generation across the mouse genome is 5.4.10 −9 [42] and that the motifs characterising most of the hotspots in the mouse are several tens of nucleotides long [30], the inactivating mutation rate per generation (v) at the targets can be estimated at 10 −7 [28].The other mutation rate to determine is the one at the PRDM9 locus (u).Owing to the mutational instability of the minisatellite encoding the zinc finger domain, this mutation rate is high and has been estimated at (10 −5 ) in humans [22].However, this is the raw mutation rate (including mutation that lead to either non-functional or functionally equivalent alleles).In contrast, the mutation rate u of the model is the net functional mutation rate (probability of creating a new functional allele recognizing entirely different targets), and the net rate is likely to be smaller than the raw experimental estimate.Accordingly, and as in Latrille et al [28], we used u = 3.10 −6 .Finally, since a population size of N = 10 5 is too large for running the simulator, a standard scaling argument was used, by setting N = 5.10 3 and multiplying u and v by a factor 20 (i.e. using between u = 6.10 −5 to u = 6.10 −4 and v = 2.10 −6 ).The other parameters are set for the smallest chromosome in mouse, so with lower h and d than the mean in the entire genome (h = 800, d = 8, ȳ = 0.2).This rescaling leaves approximately invariant the following quantities: D, z, σ and s 0 .The settings just presented represent our reference for empirical confrontation.Based on this reference, several variations of the model were also explored, which are described below.

Model predictions
The simulator was calibrated based on these rough empirical estimates for its parameters, then run and monitored for several key summary statistics, specifically, the genetic diversity of PRDM9 (D), the proportion of eroded sites (z), the mean haplo-insufficiency at the birth of the allele (σ het 0 ), the mean haplo-insufficiency over alleles sampled at the equilibrium regime (σ hemi  Equilibrium values of output summary statistics at equilibrium (PRDM9 diversity D,mean level of erosion z, mean haplo-insufficiency at the birth of the allele (σ 0 ), mean haplo-insufficiency at the equilibrium (σ z ) and the mean selection coefficient experienced by new PRDM9 alleles (s 0 ) as a function of the input parameters (mutation rate at PRDM9 locus (u), mean of the affinity distribution (ȳ) , number of DSB par meiosis (d), dosage coefficient (c hom ) and maximum number of meiosis allowed for each individual), and with fixed values for the other parameters (d = 8, h = 800 and v = 10 −7 ) a Haplo-insufficiency is here formally defined as the relative difference in success rate of meiosis between homozygotes and hemizygotes.This is equivalent to the relative difference in fertility, except in the case where n mei = 5, where fitness and rate of successful meiosis are not proportional.
The first line of Table 2 reports the results obtained from simulations run under the parameter values corresponding to our reference for empirical comparison (see also Fig 7 for a typical simulation trajectory).
The level of erosion (z) predicted by the model is consistent with what is known in the mouse (between 20% and 50% of erosion [1,29,30]).However, the predicted PRDM9 diversity appears to be too low: these simulations result in a monomorphic regime (i.e.D ∼ 1), whereas the PRDM9 diversity observed in natural populations of mice is of the order of D = 6 to D = 18 [29] (See Supplementary materials section 2).Of note, the diversity predicted by the model is the diversity of functionally different alleles (owing to the assumption made by our model of non-overlapping sets of targets for different alleles).In reality, closely related alleles can share a substantial fraction of their targets [29].However, even accounting for this redundancy, the empirical functional diversity is still typically greater than 1, of the order of 2 to 6 in mouse subspecies [29] (See Supplementary materials section 2).Thus, the monomorphic regime obtained here seems to be in contradiction with empirical observations.This low diversity can be explained by the fact that we are in a range of parameters involving an eviction regime due to gene dosage effects (compare Fig 7A and Fig 6D,E,F), such that one allele dominates in the population, while all the other alleles are counter-selected except during the short phases of allelic replacement (on average, the mean selection coefficient experienced by new alleles, s 0 , is negative, Table 1).
The fact that eviction is caused by gene dosage can be further verified by running the model without gene dosage (i.e.c = 1, Table 2, line 2), in which case higher levels of diversity are produced.Intermediate levels of gene dosage (c = 1.5, Table 2, line 3), on the other hand, give results that are essentially identical to those obtained with a dosage directly proportional to the number of gene copies (c = 2).Gene dosage also results in a non-negligible haplo-insufficiency, especially in old alleles (Table 2 column σ z ), with a reduction of a few percents in the success of meiosis in hemizygotes (or, equivalently, heterozygotes with two alleles of same age), compared to homozygotes.Interestingly, such levels of haplo-insufficiency are comparable to those observed in old alleles in the mouse (B6, C3H [37]).The predicted haplo-insufficiency of young alleles σ 0 is weaker (Table2).Such an age-dependency for the impact of gene dosage was previously suggested [1].
Nevertheless, at least in its current form and under those parameter values, the model does not predict an empirically reasonable regime.
Increasing the PRDM9 functional mutation rate (u) by as much as a factor 10 is not sufficient to get out of the eviction regime, resulting instead in a predicted PRDM9 diversity still very close to one (fourth row of Table 2).Alternatively, increasing the mean affinity (or, equivalently, the concentration of PRDM9 in the cell) allows for higher levels of PRDM9 equilibrium diversity while maintaining empirically reasonable levels of erosion (fifth row of Table 2).However, the mean number of sites bound by PRDM9 in a meiocyte would then predicted to be too large (∼ 20, 000), compared to current empirical knowleldge (∼ 5, 000 [41]).In addition, in this parameter regime, the model predicts very low levels of haplo-insufficiency (σ z < 1 × 10 −3 for old alleles), substantially lower than empirically observed levels in the mouse (σ z > 1 × 10 −2 [37]).
The model is naive in several other respects.First, gametes may often not be limiting and, as a result, the fitness of an individual may not be proportional to the probability of success of meiosis, such as assumed by the model thus far.A less-than-linear relation between fitness and success of meiosis can be modeled by increasing the number of meiosis that an individual can attempt before being declared sterile.Allowing for 5 attempts, thus mimicking a situation where an up to 80% reduction in the number of gametes would not substantially affect the reproductive success of individuals (sixth row of Table 2), we observe a higher functional diversity with a still reasonable level of erosion.The predicted reduction in meiosis success in hemizygotes is also consistent with the ones observed empirically in mice (∼ 2 to 5% [37]).However, the fitness now reacts more weakly to variation in the success of meiosis, and as a result, the model is running in an nearly-neutral regime, with a mean scaled selection coefficient acting on new alleles entering the population (s 0 ) smaller than 1/N e = 10 −5 , such that the turnover of recombination landscapes is mostly driven by the neutral mutational turnover at the PRDM9 locus.Although this could be seen as a possible working regime for the evolutionary dynamics of recombination given the high mutation rate at this locus, it is incompatible with the empirical support previously found for a positive selection acting on PRDM9 [3,25].
Alternatively, the version of the model considered thus far assumes that the total number of DSBs induced along the chromosome does not depend on the subsequent steps of the process.In reality, DSBs are tightly regulated, in a way that may entail a negative feedback inhibiting the creation of further DSBs once the chromosome has managed to synapse.For that reason, the mean number of DSBs per successful meiosis, which is what is empirically measured [14], may be substantially smaller than the maximum number of DSBs allowed before a meiocyte undergoes apoptosis.Yet, the success rate of meiosis depends on the maximum, not on the mean number.As a way to indirectly account for this, we performed a last simulation allowing for d = 24 DSBs (instead of 8, last row of Table 2, Fig 8).Running the model under this configuration results in an evolutionary dynamics which is not in the eviction regime, with moderate levels of diversity (D = 2.5), reasonable levels of erosion (z = 0.26), and strong positive selection on new PRDM9 alleles (s 0 > 10 −4 ).On the other hand, the haplo-insuffiency predicted for old alleles is now weaker than that observed empirically [37].
Altogether, these results show that the requirement of symmetric binding can result in a Red Queen maintaining erosion at moderate levels under empirically reasonable parameter values.On the other hand, there is only a narrow window for which eviction due to dosage is avoided and a sufficiently high PRDM9 diversity is maintained, while still having haplo-insufficiency for old alleles and strong selection acting on PRDM9.

Discussion
Fundamental role of the symmetrical binding of PRDM9 The first studies on the function of PRDM9 uncovered its role in targeting DSBs at specific sites, by its capacity to set histone marks to recruit the DSB machinery [12,[43][44][45].More recently, Davies et al. (2016) discovered that this gene plays another important role during meiosis, namely, by facilitating chromosome pairing [1].In this process, the symmetrical binding of PRDM9 to its target sites plays a key role.By bringing to the chromosomal axis the sites on the homologue, PRDM9 presents them to the single-stranded DNA produced by the resection around the DSB, thereby facilitating the search for the complementary sequence (Fig 1).This second function of PRDM9, combined with the differential erosion of the target sites between sub-species (which is itself a direct consequence of the first function of PRDM9, the recruitment of the DSB machinery), was found to be sufficient for explaining most aspects of the hybrid sterility pattern [1].
Here, we show that, similarly, these two aspects of the function of PRDM9 during meiosis provide the necessary ingredients for explaining the intragenomic Red Queen occurring in the context of a single population -thus giving a globally coherent picture bridging the gap between the molecular mechanisms and the evolutionary dynamics of recombination.Using scaling experiments, we have also characterized what determines the steady state regime.Overall, we recover the main results of Úbeda & Wilkins 2011 [19] and Latrille et al. 2017 [28], although now with a mechanistic foundation.In particular, PRDM9 diversity is mostly determined by the mutational input at the PRDM9 locus u, while the mean erosion and fertility at equilibrium are the result of a balance between erosion (v, g) and selection of new alleles (u, α).

Impact of gene dosage of PRDM9 on the Red Queen dynamics
Gene dosage of PRDM9 implies that a homozygote has a PRDM9 concentration for its allele which is twice that of each of the two alleles of a heterozygote.By the law of mass action, everything else being equal, increased dosage results in an increased occupancy of the targets and therefore an increased probability of symmetrical binding and a higher fertility.This clearly impacts the Red Queen dynamics for certain combinations of parameters, by acting against diversity and leading to a monomorphic regime characterized by the eviction of minor alleles (mostly found in a heterozygous state) by the currently dominant allele (mostly present in a homozygous state).The higher the selection coefficient associated to the homozygous advantage (captured here by σ 0 ), the stronger the effect against diversity.This effect is mitigated when PRDM9 is in high concentration in the cell.Indeed, a large concentration of PRDM9 makes the binding probability at a target of a given affinity less responsive to gene dosage (Supplementary Figure 6), thus reducing the fertility gap between a homo-and a heterozygote (Supplementary Figure 7).Based on these observations, we therefore hypothesize that the PRDM9 expression level has been selected at a sufficiently high level so as to limit the impact of gene dosage.This is an interesting hypothesis that could be studied in the future.

Comparison with Baker et al.'s model
In a recent article, Baker et al. [36] also propose a mechanistic model of the PRDM9 Red Queen.Several of their conclusions agree with ours.Most notably, Baker et al. also recognize that the symmetric binding of PRDM9 provides the key ingredient for running the Red Queen.However, the two studies differ on some important points.
First, in our model, the amount of PRDM9 protein is not limiting (that is, most PRDM9 molecules are not bound to the target sites), whereas Baker et al. assume strong competition between targets for PRDM9 proteins (most molecules are bound).Of note, gene dosage and competition are two distinct aspects of the steady-state equilibrium of PRDM9 binding.Gene dosage is a direct consequence of the law of mass action (site occupancy is an increasing function of the free concentration of PRDM9, see Supplementary Figure 1), a phenomenon that can happen even when PRDM9 is not limiting, as shown here.Competition between targets, on the other hand, relates to the fact that the inactivation of some of the binding sites results in extra PRDM9 molecules available for other sites to bind, an effect that is negligible if PRDM9 is in excess (as in our model), but important otherwise (as in Baker et al's model [36]).Whether PRDM9 is limiting ultimately depends on the total concentration of PRDM9 relative to the mean affinity and total number of sites, a question that still remains open.In any case, our model shows that competition between targets is not a necessary feature to drive the evolutionary dynamics of PRDM9.
The second difference concerns the affinity distribution.Baker et al. use a two-heat distribution.
Here we use an exponential distribution (Supplementary Figure 5), which is motivated by results from Chip-seq experiments [29], although with some uncertainty regarding the shape of the distribution in the low affinity range, which is not captured by Chip-seq.The use of these different affinity distributions has clear consequences on the behavior of models with respect to dosage.Indeed, increasing the gene dosage can have two opposing effects.On the one hand, a higher dosage increases the symmetrical binding at sites of intermediate affinity.On the other hand, it also results in more sites of low affinity being bound, and this, most often asymmetrically.Depending on the balance between these two effects, different outcomes are obtained.In our case, where the affinity distribution has a moderate variance, the increased symmetrical binding of sites of intermediate affinity always wins.As a result, there is always an advantage to increasing dosage.That is, homozygotes always have an advantage over heterozygotes, which creates an eviction regime that tends to play against diversity.A contrario, in Baker's model, the two-heat affinity distribution results in a non-monotonous dependency, with a turning point, such that the dosage is in favor of homozygotes for young alleles, but in favor of heterozygotes for old alleles.This turning point is one potential solution to the problem of eviction.Here, we present an alternative solution, which is captured in our model by the statistics σ 0 τ .Intuitively, eviction does not take place if the homozygote advantage is sufficiently weak and erosion sufficiently rapid, such that ageing alleles are replaced before eviction has enough time to take place.
As it turns out, it is not so easy to find empirically reasonable parameter configurations such that the eviction regime is avoided, although this could be the consequence of other aspects of the biology of meiosis and reproduction being missed by the model.On the other hand, our model predicts that old alleles should be haplo-insufficient, thus in agreement with what is observed in the mouse (for allele B6 and C3H [37]).This haplo-insufficiency of old alleles is not observed systematically in Baker's et al. model, but only occasionally and in small populations.
Altogether it would be useful to empirically investigate the haplo-insufficiency for young and old alleles over a broader range of alleles and species, in order to determine whether a lifelong homozygous advantage is systematic or occasional.More globally, it will be important to unravel the exact roles of affinity distribution, PRDM9 concentration and competition between targets in the evolutionary dynamics.Finally, all this raises the question of a possible evolution of the PRDM9 expression level, the affinity (whether in its distribution or in its mean) and the number of targets recognized per allele.

Current limitations and perspectives
In addition to those discussed above, the model introduced here has other limitations.First, in its current form, the model implements only limited variance among alleles in their strength at birth.Yet, in reality, some alleles are dominant over others [46], such that, in a heterozygote for a strong and a weak allele, DSBs are more often produced at the target sites of the stronger allele.Of note, although some dominance is expected to emerge purely as a consequence of erosion (with younger alleles being on average dominant over older ones), a substantial part of it appears to be instead related to intrinsic differences between alleles regardless of their age [16,47].If stronger alleles have an advantage over weaker alleles, either because they promote a higher binding symmetry or just because of their higher penetrance, then weaker alleles should be less likely to invade.As a result, the population would tend to be dominated by stronger alleles.In good approximation, this would amount to running the model under a lower effective mutation rate (now to be taken as the rate at which new functional and sufficiently strong PRDM9 alleles are being produced by mutation) but otherwise would not fundamentally change the evolutionary dynamics.
Second, the model currently assumes that all DSBs are repaired with the homologue.In reality, some are repaired with the sister chromatid [16].This simplifying assumption, however, should have a moderate impact on our conclusions, as it essentially amounts to a change in the rate of erosion, which can be accounted for by considering a lower mutation rate v at the target sites.Our experiments suggest that small changes in v do not fundamentally impact the dynamics.
Although deserving more careful examination, the limitations just mentioned are therefore probably minor.Perhaps a more fundamental issue is to fully understand how the eviction regime induced by gene dosage effects can be avoided, so as to predict reasonable levels of PRDM9 diversity, while having a Red Queen regime driven by strong positive selection on PRDM9.As it stands, the model appears to be stuck in a dilemma, such that either dosage has a substantial impact on fertility, but then eviction takes place and the model fails at explaining empirically observed levels of PRDM9 diversity, or the effects of dosage are made weaker by assuming globally less stringent conditions for achieving a good fertility (such as allowing for an excess in PRDM9 protein, or in gametes or in DSBs), but then the fitness differences between old and new alleles also become small and the model approaches a nearly-neutral regime, in which the turnover of PRDM9 alleles is primarily driven by the mutational turnover at the locus.Although the latter regime may not be so unreasonable from an evolutionary point of view, it fails at explaining the positive selection observed on PRDM9, or its haplo-insufficiency, depending on the detailed model configuration (last two rows of Table 2).
This dilemma is still in need of a robust and convincing explanation.In this respect, as mentioned above, the questions of the affinity distribution and the concentration of PRDM9 relative to the affinity of the target sites should be explored in depth.Alternatively, the empirical regime of the Red Queen may possibly alternate between long nearly-neutral epochs, with occasional bouts of positive selection whenever the current alleles become too eroded.Such occasional epsiodes of positive selection could be sufficient to induce the empirically observed patterns of accelerated evolution of the zinc finger domain at the non-synonymous level.The impact of Hill-Robertson interference and its dissipation on the evolutionary dynamics of PRDM9 could also contribute to maintaining a high diversity at the PRDM9 locus, a point that may also need to

The model
The simulation model is graphically summarized in Fig 2 , and its key parameters are listed in Table 1.
This model assumes of a population of N diploid individuals, whose genetic map is composed of a single chromosome.A PRDM9 locus is located on this chromosome.Each PRDM9 allele has a set of h binding sites, each of which has an intrinsic binding affinity for its cognate PRDM9 protein.The positions of binding sites along the chromosome are drawn uniformly at random and their affinities are drawn according to an exponential law of parameter ȳ (Supplementary Figure 5).Each target site is associated to a unique allele and different sites cannot overlap.In practice, this is implemented by encoding the chromosome as an array of L slots, such that, upon creation of a new PRDM9 allele by mutation, each binding site of this allele chooses one of the available slots uniformly at random.Given the composition of the population of the current generation, the next generation, of the same population size, is generated as follows.

Mutations
The PRDM9 locus mutates with a probability u per gene copy.Given a mutation, a new functional PRDM9 allele is created to which are associated h new sites along the genome, according to the procedure just described.Next, each target site recognized by each allele currently present in the population mutates with a probability v.This type of mutation results in a complete inactivation of the target site.Of note, all target sites are assumed to be monomorphic for the active variant at the time of the birth of the corresponding PRDM9 allele.As a result, the loss of target sites is entirely contributed by gene conversion acting on inactivating mutations that have occurred after the birth of the allele.Also, we neglect new target sites that might arise by mutation, which is reasonable since hotspot alleles newly arisen by mutation would be at a very high risk of being lost by conversion.

Meiosis and reproduction
A meiosis is attempted on a randomly selected individual, according to the following steps.Of note, the meiosis can fail (in which case we assume that the meiocyte undergoes apoptosis) for multiple reasons, all of which will be described below.In the simulations presented here, unless stated otherwise, whenever meiosis fails, then a new individual is chosen at random from the population and a new meiosis is attempted.
First, the two homologous chromosomes are replicated, thus creating a set of 4 chromatids.Then, PRDM9 binds to its target site according to an occupation probability determined by the chemical equilibrium.A binding site i has an affinity where [P S i ] = x 1 and [S i ] = 1 − x i are the proportions of target site i (across meiocytes) which are occupied by PRDM9 or free, respectively.Thus: We assume that PRDM9 is not limiting, meaning that most PRDM9 molecules are free [P ] f ree ≈ [P ] tot ; there is therefore a total absence of competition between binding sites.Thus, if we define the rescaled affinity as y i = [P ] tot K i , then the occupancy probability can be re-written : Based on this equation, for each target site, binding is randomly determined, by drawing a Bernoulli random variable of parameter x i for site i.Of note, at a given target locus, there are four instances of the binding site (one on each of the two sister chromatids for each of the two homologues), and binding is determined independently for each of those instances.
Once the occupation status of all binding sites has been determined, the total number k of sites bound by PRDM9 over the four chromatids, is calculated.If k = 0, meiosis fails.Otherwise, each site bound by PRDM9 undergoes a DSB with a probability equal to p = min 1, d k , with d being a parameter of the model.In most experiments, we use d = 6.Thus, in the most frequent case where k > d, an average number of d DSBs are being produced.This procedure aims to model the regulation of the total number of DSBs through the genome, which in mammals seems to be independent from PRDM9 binding [1,14].
Next, the four chromatids are scanned for symmetrically bound sites, which the model assumes are essential for chromosome pairing (see introduction).If no such site is detected, meiosis fails.Otherwise, one of the symmetrically bound sites is uniformly picked at random and becomes the initiation site for a CO.Thus, only one CO is performed per meiosis (and this, in order to model CO interference), and all other DSBs are repaired as NCO events.Note that in our model we do not allow for the possibility of DSB repair by the sister chromatid.
A successful meiosis therefore produces 4 chromatids, with exactly one CO between two of them, and some events of gene conversion at all sites that have undergone a DSB.Of note, in the presence of inactive versions of the binding sites (created by mutations, see above), these gene conversion events are effectively implementing the hotspot conversion paradox [17].Finally one chromatid is randomly picked among the tetrad to become a gamete.The whole procedure is repeated, starting from the choice of a new individual until another gamete is obtained.Then these two gametes merge and create a new diploid individual of the next generation.All these steps, from the choice of an individual until the choice of a chromatid among the tetrad, are performed as many time as needed until the complete fill of the next generation.

gene dosage
In the case where the gene dosage of PRDM9 is taken into account, it is assumed that, for a given PRDM9 allele, a homozygote will produce twice the concentration of the corresponding protein compared to a heterozygote.By assuming that [P ] homo tot = 2[P ] het tot , the probability that a PRDM9 protein binds to a site i of rescaled affinity y i is where c = 1 for a heterozygote and c = 2 for a homozygote.

Summary statistics
Several summary statistics were monitored during the run of the simulation program.They were used, first, to evaluate the time until convergence to the equilibrium regime (burn-in).The burn-in was taken in excess, so as to be adapted to all simulation settings.In practice, the burn-in is set at 10000 generations over a total of 50, 000 generations.Second, the summary statistics were averaged over the entire run (burn-in excluded), thus giving a quantitative characterization of the equilibrium regime, as a function of the model parameters.
The main statistics are the PRDM9 diversity in the population (D), the mean proportion of binding sites that are still active per allele θ, the mean symmetrical binding probability (q) and the mean fertility (w).
Diversity is defined as the effective number of PRDM9 alleles, or in other words as the inverse of the homozygosity [28]: where f i,t is the frequency of allele i at time t.With this definition, when K alleles segregate each at frequency 1 K in the population, the diversity D is equal to K.
The mean proportion of binding sites that are still active per allele is defined as: where θ i,t is the proportion of sites that are still active for allele i at time t.Equivalently, z i,t = 1 − θ i,t gives a measure of the level of erosion of the target sites of allele i and, by extension z t = 1 − θ t gives the mean erosion level.
The probability of symmetrical binding q corresponds to the mean probability of having a PRDM9 protein bound on at least one of the two chromatids of the homologous chromosome at a certain position, given that a DSB has occured at this very position.In the case of a homozygous individual, this quantity can be obtained analytically for a given complete diploid genotype and is given by: where x = cy 1+cy is the occupancy of a binding site of affinity y and, for any j, < x j > is the mean of x j over all sites, taking into account their affinity distribution.In the case of a heterozygous individual, with two PRDM9 alleles 1 and 2, the symmetrical binding rate is: where the subscripts 1 and 2 correspond to averages over sites of allele 1 and 2, respectively.This statistic was then averaged over all diploid complete genotypes present in the population at a given generation.
For a given genotype, the fertility can be computed analytically.Here we assumed that fertility in proportional to the rate of success of meiosis.Thus, it is equivalent to 1-(the mean probability of failure of meiosis) characterized by the absence of a DSB in a symmetric site.In turn, the number of DSBs in a symmetric site follows a Poisson law with parameter dq where d is the average number of DSBs per meiocyte and q is the mean probability of symmetrical binding for this allele.Thus, the mean fertility of an allele can be expressed as follows : For a given allele, and at a given time, this statistic was then averaged over all diploid complete genotypes carrying this allele present in the population at a given generation (and then averaged over all generations of the run).

Scaling experiments
In order to visualize how the equilibrium regime varies according to the model parameters, first, a central parameter configuration was chosen, which will represent a fixed reference across all scaling experiments.
Then only one parameter at a time (or two for bi-dimensional scaling) is systematically varied over a range of regularly spaced values on both sides of the reference configuration.This parameter is fixed along the entire simulation and is variable only between simulations.For each parameter value over this range, a complete simulation is run.Once the equilibrium is reached for a given simulation setting, the summary statistics described above are averaged over the entire trajectory, excluding the burn-in.These mean equilibrium values, which characterize and quantify the stationary regime of the model, were finally plotted as a function of the varying parameter(s).
The central parameters were chosen as follows.First, the parameters of population size N was set to 5, 000, corresponding to the maximum population size that can be afforded computationally.Then the number of sites recognized as target sites by each PDRM9 allele's protein was set to h = 400 in order to get closer to the average number of target sites found on the smallest chromosome of the mouse (around 800 sites) while limiting the memory requirements.The mean for the affinity distribution of the target was set to ȳ = 6.The parameter d representing the number average of DSBs in a meiocyte is set at 6 which corresponds to the approximate number of DSBs found on the smallest chromosome of the mouse.For uni-dimensional scaling, the parameter v, the mutation rate at target sites, was set to 2 × 10 −6 and the parameter u was set to 5 × 10 −5 .For the bi-dimensional scaling of d and ȳ, u = 2 × 10 −4 and v = 5 × 10 −5 .
for the key molecular processes involved in meiotic recombination (Fig 2).The genome consists of a single chromosome, bearing a locus encoding the PRDM9 gene (Fig 2A).The locus mutates at a rate u, producing new alleles (Fig 2B).Each allele recognizes a set of binding sites randomly scattered across the genome, of varying binding affinity.Binding sites undergo inactivating mutations at a rate v (Fig 2B).At each generation, sexual reproduction entails the production of gametes that are themselves obtained by explicitly implementing the process of meiosis on randomly chosen individuals (Fig 2C-F).

Figure 2 :
Figure 2: Diagram summarizing the main features of the model and the successive steps of the simulation cycle.(A) The model assumes a population of N diploid diploid individuals (2N chromosomes, vertical lines).Each chromosome has a PRDM9 locus (filled oval, with a different color for each allele) and, for each PRDM9 allele, a set of target sites (filled rectangles, with color matching their cognate PRDM9 allele) of variable binding affinity (variable width of the filled rectangles).(B) Mutations at the PRDM9 locus create new alleles (here, purple allele), while mutations at the target sites inactivate PRDM9 binding (grey sites with a cross).(C-E) meiosis; (C) Chromosome replication and binding of PRDM9 to its target sites (horizontal arrows).(D) Induction of DSBs at a small number of randomly chosen sites bound by PRDM9 (red stars) and search for DSBs at symmetrically bound sites (symmetrical DSBs); if no symmetrical DSB is found, meiosis fails; otherwise, one symmetrical DSB is chosen uniformly at random (here, on the yellow binding site).(E) Completion of the crossing-over (red curved arrow) and repair of all DSBs using the homologous chromosome (red box on top); (F) random choice of one of the 4 gametes of the tetrad, which will contribute to the next generation.

Fig
Fig 3B) and the mean affinity of the remaining non-eroded sites (Fig 3C).

Figure 3 :
Figure 3: A simulation trajectory showing a typical evolutionary dynamics, under a monomorphic regime (u = 5 × 10 −6 and v = 5 × 10 −5 ).In all panels, each color corresponds to a different allele.Note that a given color can be reassigned to a new allele later in the simulation.Successive panels represent the variation through time of (A) the frequency of PRDM9 alleles and their corresponding (B) proportion of active sites, (C) mean affinity, (D) probability of symmetrical binding and (E) fertility.The thick line singles out the trajectory of a typical allele.

Figure 4 :
Figure 4: A simulation trajectory showing a typical evolutionary dynamics, under a polymorphic regime (u = 5 × 10 −4 and v = 5 × 10 −5 ).In all panels, each color corresponds to a different allele.Note that a given color can be reassigned to a new allele later in the simulation.Successive panels represent the variation through time of (A) the frequency of PRDM9 alleles and their corresponding (B) proportion of active sites, (C) mean affinity, (D) probability of symmetrical binding and (E) mean fertility.The thick line singles out the trajectory of a typical allele.Note that the scale of the axes of the ordinates are not the same as in Fig 3.

Fig 1 )
are sufficient to induce a Red Queen dynamics.However, it remains to be understood how the equilibrium regime quantitatively depends on the parameters of the model.To address this issue, scaling experiments were performed (see methods).These experiments are helpful for determining how the different characteristics of the model at equilibrium (the allelic diversity at the PRDM9 locus, the average level of erosion of the target sites or the fertility of the individuals) respond to variation in the values of the model parameters.The results of these scaling experiments are shown in Fig 5 (in blue), along with the predictions of an analytical approximation (in orange), introduced further below.A first scaling experiment was carried out on the mutation rate u at the PRDM9 locus.It immediately appears that u has a direct impact on the standing diversity (Fig 5A), which is roughly proportional to N u, the population-scaled mutation rate.Increasing u also reduces the equilibrium erosion level (Fig 5B).This can be explained as follows.The equilibrium set point of the Red Queen is fundamentally the result of a balance between erosion and invasion.The rate of invasion by new alleles can be expressed as 2N uP inv

Figure 5 :
Figure 5: Scaling of key summary statistics at equilibrium, as a function of the mutation rate u at the PRDM9 locus and the mutation rate v at the target sites.The statistics are: the PRDM9 diversity D as a function of u (A) and v (D); the mean erosion z (i.e. the mean fraction of target sites that have been inactivated) at equilibrium as a function of u (B) and v (E); the mean fertility of the population w as a function of u (C) and v (F).On each graph, the mean (blue line) and standard variation (blue vertical bars) over a simulation are displayed against the prediction of the analytical approximation (orange line).The area colored in green corresponds to the range of parameters for which the analytical model verifies the assumptions of a high diversity (1 < 4N u < 100), a low erosion (z < 0.5) and strong selection on new PRDM9 alleles (4N s 0 > 3).
Fig 5A to F (orange curves), againstthe results obtained directly using the simulation program (blue curves).The model and the analytical approximation give qualitatively similar results in the range of parameters validating all the conditions (in practice, we consider that the analytical results should be valid in the following intervals for the model parameters: 1 < 4N u < 100, z < 0.5 and 4N s 0 > 3).Concerning PRDM9 diversity, substantial differences are observed between the simulation results and the analytical approximations, up to a factor of 10, in the scaling of u (Fig5A).However, the nature of the regime, polymorphic (many alleles segregating at the same time in the population) or monomorphic (only one allele present in the population at a time), is correctly predicted.In particular, we can say that the nature of the regime is directly and mostly determined by N u and the level of erosion has almost no influence (Fig5D).Finally, the analytical approximations are less accurate for low and high u or high v.These correspond to strong erosion regimes (low u and high v) or to regimes with weak selection (high u), for which the assumption of the analytical developments are not met.
Fig 6F).The old allele then quickly disappears and all other alleles competes for invasion.This transient polymorphic regime is unstable however: as soon as one of the young alleles reaches a higher frequency than its competitors, it benefits from a homozogous advantage, ultimately evicting all competitors and thus becoming the new dominant allele.The example on Fig 6 corresponds to only one particular parameter settings showing that gene dosage can act against diversity.To more systematically assess the conditions under which gene dosage is expected to have an impact on the qualitative regime of the model, a perturbative development was conducted,

Figure 6 :
Figure 6: Example of a PRDM9 dynamic without gene dosage (A) to (C) and with gene dosage (D) to (F) (mutation rates u = 5 × 10 −4 at the PRDM9 locus, and v = 2 × 10 −6 ) at the target sites.In all panels, each color corresponds to a different allele.Note that a given color can be reassigned to a new allele later in the simulation.Successive panels represent the variation through time of (A),(D) the frequency of PRDM9 alleles and their corresponding (B),(E) proportion of active sites and (C),(F) selection coefficient.

z)
and the selection coefficient experienced by new PRDM9 alleles (s 0 ).The predictions are shown in Table 2 and examples of Red Queen dynamics are shown in Figs 7 and 8.

Figure 7 :
Figure 7: An example evolutionary trajectory under the fitness scheme allowing for only one meiosis per individual (n mei = 1, and with parameters u = 6 × 10 −5 and ȳ = 0.2).Variation through time of the frequency of each PRDM9 allele (A) and their proportion of active sites (B).
be investigated.Finally, population structure could play a role, by maintaining different pools of alleles in different sub-populations connected by recurrent migration, thus resulting in a large diversity at the metapopulation level, and this, in spite of non-negligible gene dosage effects.Population structure is also pointing toward the other big question still in need of a model-based exploration: the potential role of PRDM9 in hybrid sterility and speciation.

Table 1 :
Descritption of input parameters and output variables