Bridging the gap between the evolutionary 2 dynamics and the molecular mechanisms of 3 meiosis : a model based exploration of the PRDM 9 4 intra-genomic Red Queen

12 Molecular dissection of meiotic recombination in mammals, combined with population-genetic and 13 comparative studies, have revealed a complex evolutionary dynamics characterized by short-lived recom- 14 bination hot-spots, highly suggestive of an intra-genomic Red Queen. At the center of this exceptional 15 dynamics is a key gene, Prdm9 . Initially discovered for its role in hybrid sterility, Prdm9 was subsequently 16 recognized to have a double molecular function during meiosis, ﬁrst recruiting the double strand break 17 machinery and thus determining the location of hot spots; second, facilitating the pairing of homologous 18 chromosomes. This latter function requires symmetric binding of Prdm9 (on both homologues). However, 19 in many respects, the evolutionary mechanisms underlying this complex dynamics are still poorly unders- 20 tood. Here, we present a theoretical model of the evolutionary dynamics of meiotic recombination integra- 21 ting current knowledge about the underlying molecular mechanisms. Our modeling work gives important 22 insights into the evolutionary dynamics of recombination. Most importantly, a Red Queen spontaneously 23 emerges as an outcome of the interplay between the dual molecular functions of PRDM9 . Thus, on one 24 side, hot spots are eroded by biased gene conversion, itself a result of the recruitment of the double-strand 25 break machinery by Prdm9 . On the other side, the reduced symmetrical binding of Prdm9 caused by the 26 loss of high afﬁnity binding sites induces a net positive selection eliciting new Prdm9 alleles recognizing 27 new targets. The model also offers new insights about the inﬂuence of the genetic dosage of PRDM9 on the 28 dynamics of the Red Queen, which can paradoxically result in negative selection on new

This model assumes of a population of N diploid individuals, whose genetic map is composed of a single 137 chromosome. A Prdm9 locus is located on this chromosome. Each Prdm9 allele has a set of h binding sites generally produce an excess of gametes. It is thus likely that a fraction of meioses can fail without affecting 164 the fitness of individuals. To model this phenomenon, we also performed some simulations where we allow 165 several attempts of meiosis per individual to produce a gamete. The number of attempts (n mei ) was varied 166 from n mei = 1 (i.e. fitness proportional to meiosis success) to n mei = 5 (i.e. a reduction in the number of 167 gametes has no fitness coast up to −80%). 168 First, the two homologous chromosomes are replicated, thus creating a set of 4 chromatids. Then, 169 PRDM9 binds to its target site according to an occupation probability determined by the chemical equi-170 7/40 librium. A binding site i has an affinity is therefore a total absence of competition between binding sites. In this model, if we define the rescaled 176 affinity as y i = [P] tot K i , then the occupancy probability can be re-written : The total number of sites bound by PRDM9 is given by k = to model the regulation of the total number of DSBs through the genome, which in mammals seems to be 184 independent from PRDM9 binding [11], [1]. If, at the end of this step, no DSB has been induced in the 185 genome, meiosis fails. Otherwise, the four chromatids are scanned for symmetrically bound sites, which 186 are essential for crossing-over (CO) achievement. According to the current model of mammalian meiosis, 187 when a site has been broken by the DSB machinery, a single strand is generated. This single stranded DNA such site becomes a potential position for CO. If no symmetrically bound site has been detected, meiosis fails. Otherwise, if at least one symmetrical binding with a DSB has been detected, one of them is uniformly 196 picked at random and becomes the initiation site for a CO. Only one CO is performed per meiosis (and this, 197 in order to model CO interference). The 9/40 where c = 1 for a heterozygote and c = 2 for a homozygote. This implies that the binding probability to a 223 target of affinity y is x het i = y i 1+y i for a heterozygote and x hom i = 2y i 1+2y i for a homozygote.

224
Summary statistics 225 Several summary statistics were monitored every 100 generations during the run of the simulation pro-226 gram. They were used first, to evaluate the time until convergence to the equilibrium regime (burnin). The

227
burnin was taken in excess, so as to be adapted to all simulation settings. In practice, the burnin is set at 228 10000 generations over a total of 50, 000 generations. Second, the summary statistics were averaged over 229 the entire run (burnin excluded), thus giving a quantitative characterization of the equilibrium regime, as a 230 function of parameters. These numerical measurements are a key to achieving a better understanding of the 231 dependence of the stationary regime of the Red Queen dynamics on the model parameters. 232 The main statistics are the following : first, the PRDM9 diversity in the population (D). Second, for 233 each allele, the mean erosion of its target sites (θ ), the mean symmetrical binding probability (q) and the 234 mean fertility (w). Concerning the two statisitcs q and w, the equations presented here correspond to their 235 analytical approximations used to override sampling limitations.

236
All of these statistics are now described in detail.

238
The diversity is defined as the effective number of PRDM9 alleles, or in other words as the inverse of 239 the homozygosity [19] : where f i,t is the frequency of each allele i.

241
With this definition, when K alleles segregate each at frequency 1 K in the population, the diversity D is 242 equal to K.

243
Mean activity 244 The mean activity corresponds to the proportion of sites that are still active for each allele. This fraction 245 is noted θ i ∈ [0, 1] for allele i. The mean activity is the mean of the activity of all the current alleles weighted 246 10/40 by their frequency.
Equivalently, z i ≈ 1 − θ i gives a measure of the level of erosion of the target sites of an allele. By extension, 248 when averaged over all alleles in the population, it corresponds to the mean equilibrium erosion level.

249
Probability of symmetrical binding 250 The probability of symmetrical binding q corresponds to the mean probability of having a PRDM9 251 protein bound on at least one of the two chromatids of the homologous chromosome at a certain position,

252
given that a DSB has occured at this very position. In the case of a heterozygous individual at the Prdm9 253 locus, with two alleles of mean erosion level z 1 and z 2 , this quantity can be obtained analytically for a given 254 complete diploid genotype and is given by : and for a homozygous individual with an allele of mean erosion level z : In these equations x = cy 1+cy is the mean occupancy of a binding site of affinity y, and < x > is the mean

285
Once the equilibrium is reached for a given simulation setting, the summary statistics described above   lation ( f i,t ), shows a regime with few alleles segregating at any given time in the population, due to a low 295 PRDM9 mutation rate (u = 5 × 10 −6 ). We call this a monomorphic regime, after Latrille et al [19]. The 296 14/40 dynamic is as follows. First, an allele appears in the population and invades, until reaching a frequency close to 1. Meanwhile, the activity of this allele θ decreases, due to the erosion of its target sites caused by 298 inactivating mutations and biased gene conversion. Once the allele has eroded a fraction of around 20% of 299 its sites, it is quickly replaced by one of the low-frequency alleles that keep being created by mutations at 300 the PRDM9 locus. This rapid replacement clearly suggests the presence of strong positive selection. This 301 positive selection is entirely a consequence of the molecular mechanisms of meiosis such as implemented 302 by the simulation model (the process implicated will be described below). The newly invading allele then 303 erodes its target until being replaced by a new allele which in turn follows a similar trajectory, and so on.

304
This cyclic process clearly shows that a Red-Queen process is operating, under the combined action of ero-305 sion of the PRDM9 target sites by inactivating mutations and biased gene conversion, and positive selection 306 on new PRDM9 alleles.

307
A key aspect of the model is that it does not explicitly invoke a fitness function. Instead, the positive 308 selection that seems to be operating on new alleles ( Figure 2) is an emerging property of the mechanism of 309 meiosis. This positive selection acting on new alleles can be more precisely understood by examining how 310 the rate of PRDM9 symmetrical binding (7),(8) and the mean fertility of a typical allele (9) evolve over the 311 lifespan of this allele ( Figure 2D and 2E), and how this relates to the level of erosion (6) ( Figure 2B) and the 312 mean affinity of the remaining non-eroded sites ( Figure 2C). First, we observe a clear correlation between 313 allele frequency and activity. When the frequency of an allele increases in the population, its general activity 314 decreases, meaning that the proportion of still active sites for this allele decreases. This erosion ( Figure 2B) 315 seems to occur at a rate which is directly proportional to the frequency of the allele in the population ( Figure   316 2A). This is expected, the more frequent an allele, the more often it will be implicated in a meiosis and the 317 more opportunities its bounding sites will have to undergo gene conversion events. Second, erosion results 318 in a decrease of the proportion of active sites for a given allele. But, more importantly, it results in a decrease 319 of the mean affinity of the sites that are still active ( Figure 2C). This reflects the fact that sites of high affinity 320 are eroded more rapidly. This last point is also expected, since the sites of high affinity are more often bound 321 by PRDM9 (the binding probability = cy 1+cy is higher when the affinity y is higher), and thus are more often are chosen uniformly at random among all bound sites, q corresponds to the probability that a DSB will 328 occur in a symmetrically bound site and will thus contribute to a successful pairing of the homologues. This 329 probability is a monotonous function of affinity ( Figure 2D). The mean value of q over all sites of a given 330 allele is lower for older alleles ( Figure 2D). Finally, since, in our model, meiosis requires at least one DSB 331 in a symmetrically bound site, the mean fertility of an older allele is lower ( Figure 2E). Hence, new alleles 332 (young alleles) will be positively selected at the expense of old alleles and will ultimately replace them in 333 the population.

334
Thus, a key result obtained here is that the requirement of symmetrical binding for chromosome pairing, 335 combined with preferential erosion of high affinity sites, is sufficient for creating differences in fitness 336 (fertility) between old and young alleles, which in turn will spontaneously induce a Red Queen dynamics 337 at the level of the population.

338
It should be noted that we observe a slight increase in the rate of symmetrical binding and in the mean  in the values of the model parameters.

365
A first scaling experiment was carried out on the mutation rate u at the Prdm9 locus. It immediately 366 appears in Figure 4 A that u has a direct impact on the standing diversity, which is roughly proportional to 367 24Nu, the population scaled mutation rate. Increasing u also reduces the equilibrium erosion level (Figure 4    First, to a good approximate, the rate of erosion of the targets of an allele depends on its frequency in 403 the population : where ρ = Nvd 2h ≈ Nvg 2h , with g as the gene conversion rate, is the net rate of erosion per generation. As a 405 19/40 result, the cumulated erosion of an allele of age t is : where f (X) is the frequency of the allele at time X. z can thus be seen as the intrinsic age of the allele (i.e.

407
an allele ages more rapidly if more frequent in the population). Because of selection, the frequency of an 408 allele changes as a function of its erosion, or intrinsic age z.

409
Assuming weak erosion (z << 1) allows one to linearize the differential equation describing the evolution 410 of the frequency, which gives : if an allele is younger, its erosion is lower than the mean erosion at equilibrium, the allele have eroded less 416 target than the average and will increase in frequency. But when the allele reaches the mean erosion level 417 and overtake it, this implies that the allele has a level of erosion higher than the average and will decrease 418 in frequency in the population. 419 The previous equation shows the evolution of the frequency of a typical allele in the population, but 420 it depends on the mean level of erosionz which is unknown. To determine its value, we can apply a self-421 consistent argument, essentially saying that all alleles entering the population have this typical dynamic.

422
Using this self-consistent argument leads to the following explicit expression forz [19] : Here, we find 4 constants of the model. There is first of all ρ, the net erosion constant, which by 424 increasing will contribute to increasing the mean equilibrium erosion levelz. Then, in the denominator, 425 there is µ = 4Nu, the population-scaled mutation rate of PRDM9, which by increasing will allow a more 426 frequent emergence of new alleles. This restricts the lifetime of alleles in the population and thus decreases 427 the average cumulative erosion rate. Finally, we find again the constant α. If the latter increases, this means 428 that the log-fitness will decrease more quickly as a function of erosion. As a result, the allele will be more 429 20/40 quickly eliminated (i.e. with less erosion), which lowers the equilibrium erosion levelz.
Note that this equation is the same as that presented on Latrille et al. However, in their paper, α and 431 g were phenomenological parameters, set arbitrarily, while in our mechanistic model, they emerge directly 432 from the molecular mechanisms of meiosis. As such, they can be expressed as functions of the model para-433 meters (d, h, N or v, see supplementary materials equations (56) and (34)).This gives a better understanding 434 of the mechanisms and the dependence between the outputs of the model and the input parameters.

435
From there, we can express all the mean equilibrium quantities over the population and across the whole 436 simulation as a function ofz. Namelyθ , the mean activity of the target sites,w, the mean fertility of the 437 alleles, and the diversity D of PRDM9 in the population at the equilibrium : In the expression ofw we can find the parameterq corresponding to the mean probability of symmetrical 439 binding at the equilibrium. This parameter depends itself onz. however it will also give the conditions on the model parameters for this to be the case. Thus, with dosage, 490 the evolution of the allele frequency through time is now given by : where α het corresponds to the α coefficient for a heterozygote and : is the relative difference in fertility between homozygotes and heterozygotes for young alleles (or homozy-493 gous advantage). 494 We therefore note that the evolution of the frequency of an allele no longer depends solely on its in-495 trinsic age (i.e. erosion level z) but also on its frequency at time t. Contrary to what was observed without 496 dosage, the fate of an allele is now determined by a competition between two different effects : an age 497 effect and a frequency effect. The age effect (everything else being equal, older alleles tend to have a lower 498 fitness than younger alleles) was already present without dosage (equation 12). The frequency effect (eve-499 rything being equal, more frequent alleles benefit from a homozygous advantage and will therefore tend to 500 become even more frequent) is the specific contribution of dosage. Importantly, the frequency effect (which 501 is proportional to σ 0 ) systematically acts against diversity.

502
Based on the equation 15, we can determine when the dosage will play an important role. Indeed these 503 effects will be all the more pronounced as σ 0 is large. Also, there should be enough time between successive 504 invations for eviction to take place. An analytical approximation for the time between invasions is given by : Altogether we predict a qualitative change induced by gene dosage on PRDM9 standing diversity when Nu 507 is large (i.e. the regime would have been polymorphic without dosage) and σ 0 τ is also large (i.e. eviction 508 24/40 due to homozygous advantage is sufficiently strong and has enough time to occurs, and thus the regime 509 turns out to be monomorphic with dosage). Conversely, we predict a polymorphic regime when Nu is large 510 and when the dosage effects are negligible, i.e. when σ 0 τ is small.

511
These predictions were verified by conducting bi-dimensional scaling experiments. First, a scaling of 512 the parameters u and v was performed to determine which parameter combination would lead to a change 513 of regime ( Figure 6). Once the scaling performed, we note that a certain number of simulations underwent 514 a change of regime between the experiments without and with genetic dosage. In particular, the most pro-515 nounced differences that are observed between 6A and B relate to parameters allowing for a polymorphic 516 regime without dosage (i.e. u high, or 4Nu >> 1), a small erosion parameter v giving a fairly substantial 517 latency time between two allele invasions (τ) and a high selection coefficient associated to genetic dosage 518 σ 0 . In all cases, the criterion that σ 0 τ should be large is met (i.e. always > 3).  (Figure 7). Here, a change of regime is essentially found when these two parameters are low. It 523 is also well predicted by the criterion that σ 0 τ should not be large ( Figure 7C). Of note, both parameters, d 524 andȳ, play on σ 0 , rather than on τ. Thus, if DSBs are limiting or sites tend to be of low affinity, a change in 525 dosage can make a big difference in the chance of having at least one DSB in a symmetrically bound site. 6 × 10 −5 0.2 5 2.5 0.22 2.9 × 10 −2 7.5 × 10 −2 2 TABLE 2. Results of the simulations after empirical calibrations. these tab shows the values of output statistics at the stationary equilibrium (PRDM9 diversity D, mean activity θ , mean haplo-insuficiency at the birth of the allele (σ 0 ) and mean haplo-insuficiency at the equilibrium (σz) depending on different values of parameter (mutation rate at PRDM9 locus (u), mean of the affinity distribution (ȳ) and maximum number of meiosis allowed for each individual) with fixed parameters (d = 8, h = 800 and v = 2 × 10 −6 ) Figure 8. An example of a PRDM9 dynamic with only one meiosis allowed per individual (n mei = 1, u = 6 × 10 −5 andȳ = 0.2). In all the graphs, each color correspond to a different allele. If a color appears several generation after the death of an allele of the same color, it is considered as a complete different allele.This graph represents the evolution along time of the frequency of each PRDM9 allele (A) and their proportion of active sites (B). This corresponds to an eviction regime.
The first line of Table 2 Supplementary materials section 5). Thus, the monomorphic regime obtained here seems to be in contradiction with empirical observations. This monomorphic regime can be explained 581 by the fact that we are in a range of parameters involving an eviction regime where one allele dominates in 582 the population while all the other alleles are counter-selected ( Figure 8A). In turn, this eviction is caused 583 by a non-negligible haplo-insufficiency, especially in old alleles (Table 2 column

589
Increasing the PRDM9 functional mutation rate (u) by a factor 10 does not even allow to get out of 590 this monomorphic regime with predicted PRDM9 diversity still very close to one (second row of Table 2).

591
Moreover, there is a priori no reason to consider that the PRDM9 mutation rate should be higher in mice 592 than in humans. Alternatively, increasing the mean affinity (or, equivalently, the concentration of PRDM9 in 593 the cell) allows for higher levels of PRDM9 equilibrium diversity while maintaining empirically reasonable 594 levels of erosion (third row of Table 2). However, the mean number of sites bound by PRDM9 in a meiocyte 595 would then predicted to be too large (∼ 20, 000), compared to current empirical knowleldge (∼ 5, 000

599
All these results were obtained assuming that the reproductive fitness is directly proportional to the 600 success rate of meiosis, which, as mentioned above, is not realistic in the case of mammals. Assuming 601 instead that gametes are not limiting by allowing for up to five meioses per individual (fourth row of Table   602 2), we observe a higher functional diversity with a still reasonable level of erosion ( Figure 9). Indeed, we are 603 no more in the eviction regime which tends to acts against diversity ( Figure 8B). The predicted reduction 604 in meiosis success in hemizygotes is also consistent with the ones observed empirically in mice (∼ 2 to 5%

605
[27]). Of note, by making meioses non-limiting, we decoupled reproductive fitness and the meiosis success 606 rate, thus reducing the decline in fertility due to dosage, which makes it possible to get out of the eviction 607 regime. Altogether, we deduce that there is no need for much more realism in our model to correctly fit the 608 empirical data.   [1]. In this process, the symmetrical binding of PRDM9 to its target sites plays a key 615 role. Indeed, by bringing back to the chromosomal axis the sites on the homologue, it presents it to the single 616 strand of DNA produced by the DSB, thereby facilitating the search for the complementary sequence. This 617 second function of PRDM9 combined with the differential erosion of the target sites between sub-species 618 (which is itself a direct consequence of the first function of PRDM9, the recruitment of the DSB machinery), 619 was found to be sufficient for explaining most aspects of the hybrid sterility pattern [1]. 620 Here, we show that, similarly, these two aspects of the function of PRDM9 during meiosis, provide the More fondamentally, the Red Queen results from a balance between erosion and invasion by new 625 PRDM9 alleles that are positively selected. On the one hand, the erosion is the direct consequence of the 626 first role of PRDM9 (recruitment of the DSB machinery), which triggers dBGC. On the other hand, the po-627 sitive selection comes from differences in fertility between young alleles with brand new active target sites 628 and ensuring a good fertility, and old alleles having eroded there high affinity targets leading to asymmetri-629 cal binding at low affinity sites generating bad fertility. Symmetry therefore plays an important role both in 630 hybrid context where the differential erosion on both chromosomes generates sequence asymmetry, and in 631 panmictic context where the erosion of high affinity targets generate a statistical asymmetry of binding.

637
Impact of genetic dosage on the Red Queen dynamics 638 The Prdm9 genetic dosage implies that a homozygote has a PRDM9 concentration for its allele which is 639 twice that of each of the two alleles of a heterozygote. Everything else being equal, this increases the occu-640 pancy of each target and therefore the probability of symmetrical binding and fertility. This clearly impact 641 the Red Queen dynamics for certain combinations of parameters by acting against diversity (polymorphic 642 regime becomes monomorphic). In these cases, on one side, a dominant allele is present and by its high 643 frequency in a homozygous state it is positively selected even when it has eroded many of its target. On 644 the other side, all the other alleles necessarily appearing in the heterozygous state are counter selected. The 645 higher the selection coefficient associated to the homozygous advantage (captured here by σ 0 ), the stronger 646 the effect against diversity. This effect is however mitigated when PRDM9 is in high concentration in the 647 cell. Indeed, when the PRDM9 concentration increases, the difference in binding probability for a given 648 activity target decreases between a homozygote and a heterozygote (Supplemental Figure 1). By extension, 649 this reduces the fertility gap between a homozygote and a heterozygote. Thus, the homozygous advantage  Figure 1. Binding probability depending on the free PRDM9 concentration for homozygotes heterozygotes (y = 0.6). The binding probability for homozygotes is always higher than for heterozygotes but the difference decreases when the free PRDM9 concentration increases Supplemental Figure 2. Homozygotes selection coefficient associated to genetic dosage (σ 0 ) depending on the total PRDM9 concentration in the cell and the mean affinity of the target sites. The σ 0 is all the greater as the concentration of PRDM9 and the average affinity is low 40/40