Down the Penrose stairs: How selection for fewer recombination hotspots maintains their existence

In many species, meiotic recombination events tend to occur in narrow intervals of the genome, known as hotspots. In humans and mice, double strand break (DSB) hotspot locations are determined by the DNA-binding specificity of the zinc finger array of the PRDM9 protein, which is rapidly evolving at residues in contact with DNA. Previous models explained this rapid evolution in terms of the need to restore PRDM9 binding sites lost to gene conversion over time, under the assumption that more PRDM9 binding always leads to more DSBs. In recent experimental work, however, it has become apparent that PRDM9 binding on both homologs facilitates DSB repair, and moreover that, in the absence of enough symmetric binding, meiosis no longer progresses reliably. We therefore consider the possibility that the benefit of PRDM9 stems from its role in coupling DSB formation and efficient repair. To this end, we model the evolution of PRDM9 from first principles: from its binding dynamics to the population processes that govern the evolution of the zinc finger array and its binding sites in the genome. As we show, the loss of a small number of strong binding sites leads to the use of a greater number of weaker ones, resulting in a sharp reduction in symmetric binding and favoring new PRDM9 alleles that restore the use of a smaller set of strong binding sites. This decrease in PRDM9 binding symmetry and in its ability to promote DSB repair drive the rapid zinc finger turnover. These results imply that the advantage of new PRDM9 alleles is in limiting the number of binding sites used effectively, rather than in increasing net PRDM9 binding, as previously believed. By extension, our model suggests that the evolutionary advantage of hotspots may have been to increase the efficiency of DSB repair and/or homolog pairing.


34
Meiotic recombination is initiated by the deliberate infliction of double strand breaks (DSBs) 35 across the genome and resolved with their subsequent repair off homologous chromosomes. In many 36 species, including all plants, fungi and vertebrates investigated to date, the vast majority of meiotic 37 recombination events localize to narrow intervals of the genome known as hotspots (reviewed in Tock 38 and Henderson 2018). Why recombination is concentrated in hotspots in some taxa, as opposed to being 39 more uniformly spread across the genome, as in others, such as Drosophila (Manzano-Winkler et al. Moreover, among species with hotspots, there exists a great deal of variation in how hotspots are 42 localized, and it is largely unknown why these different mechanisms have been adopted in different 43

lineages. 44
In humans and mice, and likely many other metazoan species, the primary determinant of hotspot 45 locations is PRDM9 and the DNA-binding specificity of its Cys-2-His-2 Zinc Finger (ZF) array (Myers et  is observed in PRDM9-/-mice (Brick et al. 2012) and, more tentatively, in a human female lacking both 59 copies of PRDM9 (Narasimhan et al. 2016). This observation suggests that in species that habitually use 60 PRDM9, this mechanism masks a "default" mechanism for recombination that resembles the one 61 employed by species lacking PRDM9. whereas in species that use PRDM9 to direct recombination, the recombination landscape is rapidly-65 evolving. Consequently, closely related species, such as humans and chimpanzees, or even different 66 subspecies, use sets of hotspots with no more overlap than expected by chance (Ptak et   More specifically, when individuals are heterozygous for "hot" and "cold" alleles, defined as 71 those more or less likely to experience a DSB during meiosis, GC leads to the under-transmission of hot 72 alleles in a manner mathematically analogous to purifying selection against them (Nagalaki and Petes The specific importance of PRDM9 binding symmetrically likely stems from the fact such sites 132 can be repaired more rapidly, and that early recombining sites are preferentially used for COs and 133 homolog pairing. In particular, genome-wide analyses of DSB outcomes at PRDM9 binding sites provide 134 direct evidence that DSBs at asymmetrically-bound hotspots are more likely to experience delays in their These observations raise the possibility that selection driving the evolution of the PRDM9 ZF 145 stems from its role in DSB repair, rather than DSB initiation, as previously envisioned. The Red Queen 146 theory of recombination hotspots suggests that the erosion of PRDM9 binding sites should lead to a 147 measurable decrease in fitness, which can be restored by the introduction of a novel PRDM9 allele. The 148 asymmetric binding and associated reduced fertility observed in hybrid mice is the only known evidence 149 consistent with this phenomenon. It occurs as the consequence of the lineage-specific erosion in each 150 parental lineage leading to widespread heterozygosity at PRDM9 binding sites, an extreme scenario that 7 is unlikely to occur within populations, in part because gene conversion will act to rapidly remove 152 heterozygosity at PRDM9 binding sites. Instead, one might envisage that PRDM9 binding asymmetry 153 arises not from heterozygosity in its underlying binding sites, but as a consequence of PRDM9's 154 competitive binding: namely, that the erosion of strong binding sites shifts binding toward many more 155 weaker ones, and in so doing reduces the chances of symmetric binding and of the successful completion 156 of meiosis. 157 On that basis, it seems likely that selection on PRDM9 is mediated by a requirement for 158 symmetric binding. Based on this premise, we model the co-evolution of PRDM9 and its binding sites 159 from first principles, assuming, as seems more realistic, that: (i) the number of DSBs is regulated 160 independently of PRDM9, (ii) PRDM9 binding is competitive between sites, and (iii) fitness is a function 161 of the probability that the smallest chromosome experiences some minimal number of DSBs at sites 162 symmetrically bound by PRDM9. We model the co-evolution of PRDM9 and its binding sites in a panmictic population of constant 168 size. An individual is described in terms of their two PRDM9 alleles. Each PRDM9 allele recognizes a 169 distinct set of binding sites, which may have distinct binding affinities or "heats" (i.e., different 170 probabilities of being bound). The model is composed of four main parts (Fig. 1). The first describes 171 PRDM9 binding during meiosis in terms of Michaelis-Menten kinetics, in which sites compete for 172 binding their cognate PRDM9 proteins. We use this model to approximate the probability that any given 173 site on the four chromatids (i.e., the four copies of a chromosome present during meiosis) is bound by 174 PRDM9 (Fig 1A). We then approximate the probabilities of any given site experiencing a DSB, and of 8 any given locus experiencing a DSB while being symmetrically bound by PRDM9, i.e., of a symmetric 176 DSB (Fig. 1B). Next, we model the population processes. We use Wright-Fisher sampling with viability selection 182 in a panmictic diploid population of size to describe the change in PRDM9 allele frequencies in the 183 population (Fig. 1C). New PRDM9 alleles arise from mutation with probability per individual meiosis 184 and establish a new set of binding sites. Selection on a PRDM9 allele arises from its impact on fitness, 185 and more specifically, the need for at least one symmetric DSB on per chromosome per meiosis. For 186 simplicity, we consider only the smallest chromosome because fitness defects arising from asymmetry in 187 PRDM9 binding should affect the smallest chromosomes first (assuming, as seems plausible, that they 188 have fewer sites at which a symmetric DSB might be made). Also for simplicity, we assume that 189 8 on 9 individual fitness is equal to the probability of a symmetric DSB in a single meiosis. In reality, we expect 190 fitness to depend on the proportion of an individual's gametes that have a sufficient number of symmetric 191 DSBs, but this fitness should be a monotonically-increasing function of the probability per meiosis, and 192 therefore we expect the qualitative conclusions to be unchanged. Lastly, we use a deterministic 193 approximation to describe the erosion of binding sites over time (Fig. 1D). Mutations that destroy the 194 ability of a site to bind to its cognate PRDM9 are introduced with probability

Molecular dynamics of PRDM9 binding 199
We model the binding of PRDM9 as a consequence of Michaelis-Menten kinetics, wherein 200 binding sites compete for association with a fixed number of PRDM9 molecules. In Supplementary  201 Section S1, we solve these dynamics at equilibrium, i.e., when the rate at which PRDM9 dissociates from 202 sites with any given binding affinity ("heat") is equal to the rate at which PRDM9 binds those sites. 203 Specifically, we show that the heat of a site, i.e., the probability that a site with binding affinity is bound 204 The total number of PRDM9 proteins of a given type depends on whether an individual is 213 homozygous or heterozygous for the corresponding PRDM9 allele. We assume that a fixed number of 214 PRDM9 molecules are produced per allele. Consequently, the corresponding PRDM9 binding sites will 215 always be more likely to be bound in individuals homozygous for the cognate PRDM9 allele relative to 216 heterozygotes. Moreover, the ratio of heats for sites with different binding affinities will differ between 217 genotypes. 218 219

Molecular dynamics of DSBs and symmetric DSBs 220
Next, we approximate the probability that a PRDM9-bound site experiences a DSB in a given 221 meiosis. We assume that the total number of DSBs ‫)ܦ(‬ is constant and that all PRDM9 bound sites are

ሻ
This approximation assumes that conditional on being bound, PRDM9 molecules arising from different 225 alleles are equally likely to recruit the DSB machinery, but allows for some degree of dominance as a 226 consequence of differences in the numbers of sites bound by each allelic variant genome-wide. 227 In Supplementary Section S2, we show that the probability that a site with binding affinity i 228 experiences a symmetric DSB, i.e., that at least one of the chromatids is bound by PRDM9 and 229 experiences a DSB and that at least one sister chromatid of each homolog is bound by PRDM9, is 230

ሻ
For brevity, we omitted indexes corresponding to the genotype at the PRDM9 locus, which affect

Evolutionary dynamics of PRDM9 alleles 234
We model individual fitness as the probability per meiosis that at least one symmetric DSB 235 occurs on the smallest chromosome, which comprises a proportion ‫ݎ‬ of the genome. We make the 236 simplifying assumption that the proportion of binding sites of any affinity found on that chromosome is 237 also ‫ݎ‬ , and remains fixed over time; the number of binding sites with binding affinity We also assume that the probability of symmetric DSBs at different binding sites are independent (i.e., 239 ignoring interference between them). Under these assumptions, we approximate the fitness of a 240 homozygote for PRDM9 allele ݆ as one minus the probability that none of the binding sites on the 241 smallest chromosome experience a symmetric DSB, i.e., 242 where the product is taken over the The marginal fitness associated with a PRDM9 allele is calculated as a weighted average over genotypic 246 frequencies, in which homozygotes for the focal allele,

Estimated parameters 286
Except where stated otherwise, the parameters used in simulations are given in Table 1  Probability that a given site with binding affinity i, recognized by PRDM9 allele j, is bound by PRDM9 in an individual with PRDM9 alleles j and k.

ܿ ,
Probability that any PRDM9-bound site experiences a DSB in an individual with PRDM9 alleles j and k.

ߙ ,
Probability of a PRDM9 binding site with affinity i, recognized by PRDM9 allele j, being symmetrically bound by PRDM9 and experiencing a DSB in an individual with PRDM9 alleles j and k.

ܹ ,
The fitness of an individual with PRDM9 alleles j and k ܹ The marginal fitness of PRDM9 allele j across possible genotypes ݃ , Probability or rate of a PRDM9 binding site with affinity i, recognized by PRDM9 allele j, experiencing gene conversion spanning the PRDM9 binding motif, in an individual with PRDM9 alleles j and k.
݃ Probability or rate of a PRDM9 binding site with affinity i, recognized by PRDM9 allele j, experiencing gene conversion spanning the PRDM9 binding motif, across the population.

Fitness when considering one heat 305
Our model differs from previous ones ( If we assume that there are always more PRDM9-bound sites than DSBs per meiosis, and ignore 313 assumptions (ii) and (iii), such that the number of bound PRDM9 molecules is proportional to the number 314 of binding sites and the number of DSBs is proportional to the number of bound PRDM9 molecules, 315 fitness will decrease over time as binding sites are lost, as expected under Red Queen theories of hotspot 316 evolution (Fig 2A: green line). This decrease in fitness is due to the dwindling numbers of DSBs per 317 meiosis, however-the probability of symmetric binding per site remains constant (Fig 2B: blue lines). If 318 instead, we assume that PRDM9-bound sites compete for a constant number of DSBs, as seems more 319 realistic (Diagouraga et al. 2018), but continue to assume that PRDM9 binding is not competitive-i.e., 320 ignore only assumption (ii)-fitness does not change over time, because both the number of DSBs per 321 meiosis and the probability of symmetric binding per site remain constant (Fig 2A: blue line). Thus, there 322 is no source of selection to drive the evolution of PRDM9. Lastly, if we consider a more biologically 323 realistic model, in which there is competition for both PRDM9 binding and DSBs, fitness increases over 324 time, with the loss of binding sites (Fig 2A: red line); indeed, as the number of binding sites decreases, 325 the proportion of sites that are bound symmetrically increases (Fig 2B:

Fitness when considering two heats 341
To understand how fitness might decay in a more realistic setting, with competition among 342 binding sites for PRDM9 molecules and among bound sites for DSBs, we turn to a model with two 343 classes of binding sites. Specifically, we assume that there are a small number of strong binding sites that 344 are bound symmetrically at substantial rates, and a larger number of weaker binding sites, which are not. 345 Consequently, recombination occurs almost entirely at the strong binding sites, whose numbers decrease 346 over time and determine how fitness behaves; the weak binding sites, whose numbers do not change 347 First, we consider how fitness depends on the number of hotspots, in individuals that are either 357 homozygous or heterozygous for two PRDM9 alleles with the same binding distribution (Fig 3). 358 Increasing the number of hotspots has conflicting effects on fitness: the proportion of bound PRDM9 359 localized to hotspots rises, and therefore there is a larger number of DSBs at hotspots, but the probability 360 that any given hotspot is bound symmetrically decreases (Fig 3 -Figure Supplement 1). Increasing 361 PRDM9 expression--and hence the number of PRDM9 molecules--has the opposite effects: it increases 362 the probability that any given hotspot is bound symmetrically, but also increases the proportion of bound 363 PRDM9, and thus DSBs, localized to weak binding sites (Fig 3 -Figure Supplement 1). Consequently, 364 for any value of ݇ ଵ , there is an optimal number of hotspots, which maximizes the probability of forming 365 symmetric DSBs, and this number increases with PRDM9 expression levels (in order to counteract 366 PRDM9 localization to weak binding sites). These considerations imply that the optimal number of 367 hotspots per allele is always smaller in heterozygotes relative to homozygotes (Fig 3). The optimal 368 number of hotspots is also always smaller when hotspots are stronger, as fewer are needed to prevent 369 PRDM9 binding to weak binding sites (Fig 3). ), fitness will always be higher in individuals 381 homozygous for a given PRDM9 allele than in individuals heterozygous for two distinct PRDM9 alleles 382 with the same number of hotspots (Fig 3A). Weak hotspots are far from being saturated for PRDM9 383 binding: the increased expression of a given PRDM9 allele in homozygotes relative to heterozygotes 384 results in a notable increase in the probability that hotspots will be bound by PRDM9, while only slightly 385 increasing the proportion of PRDM9 bound to weak binding sites (Fig 3 -Figure Supplement 1A). 386 Consequently, the benefit of the increase in symmetric binding at hotspots outweighs the cost of the slight 387 increase in the proportion of DSBs localized to weak binding sites. 388 In contrast, when hotspots are relatively strong (e.g., ), fitness can be higher in individuals 389 homozygous for a given PRDM9 allele or heterozygous for two similar alleles, depending on the number 390 of hotspots recognized by the alleles (Fig 3B). When the number of hotspots far exceeds the number of 391 PRDM9 molecules, hotspots are not saturated for PRDM9 binding, and fitness is always higher in 392 homozygotes, as in the case with weaker hotspots. However, as hotspots are lost, those hotspots 393 remaining approach saturation for PRDM9 binding and consequently, the increased expression of a given 394 PRDM9 allele in homozygotes relative to heterozygotes has a lesser impact on rates of symmetric binding 395 (Fig 3 -Figure Supplement 1B). Moreover, the cost associated with greater PRDM9 expression leading 396 to an increased proportion of PRDM9 localizing to weak binding sites becomes more pronounced (Fig 3 -397 Figure Supplement 1B). The net effect is that, for PRDM9 alleles that recognize smaller numbers of 398 hotspots, fitness is higher in heterozygotes than in homozygotes: the loss in fitness caused by PRDM9 399 localization to weak binding sites in homozygotes relative to heterozygotes outweighs the small increase 400 in symmetric binding at hotspots. Thus, in contrast to the case for weaker hotspots, for stronger hotspots 401 (e.g., ݇ ଵ ൌ 5 ), individuals heterozygous for two PRDM9 alleles, each with the optimal number of 402 hotspots for heterozygotes, will have higher fitness than individuals homozygous for a PRDM9 allele 403 with the same number of hotspots (Fig 3B). 404 The relationship between the fitness of homozygotes and heterozygotes provides important 405 insights into the evolutionary dynamics of our model. Consider a simplified model, with a population of 406 infinite size, in which mutations to PRDM9 alleles with any number of hotspots arise every generation. If 407 individuals heterozygous for a given mutant allele have greater fitness than any other individual in the 408 population, that allele will quickly invade the population and rise to appreciable frequencies. Under these 409 conditions, the PRDM9 alleles that successfully invade the population will necessarily be those with 410 optimal numbers of hotspots in heterozygotes. When hotspots are relatively weak (e.g., ݇ ଵ ൌ 5 0 ), the 411 population exhibits cycles (Fig 3A): a PRDM9 allele with the optimal number of hotspots in 412 heterozygotes invades the population and eventually fixes (arrow 1 in Fig 3A), after which individuals are 413 homozygous for that allele and have greater fitness than any PRDM9 heterozygote, thereby preventing 414 any other new allele from invading the population. Hotspots will then be lost from the population (arrow 415 2 in Fig 3A), until another PRDM9 allele can invade (arrow 3 in Fig 3A), fix and begin a new cycle. 416 When hotspots are relatively strong (e.g., ݇ ଵ ൌ 5 ), the population remains at a fixed point at which 417 PRDM9 alleles with the optimal number of hotspots in heterozygotes are constantly invading (Fig 3B): 418 once such an allele reaches appreciable frequencies and individuals become homozygous for the allele, 419 their fitness becomes lower than that of a heterozygote for a new PRDM9 alleles with the optimal number 420 of hotspots, and this prevents from them reaching fixation. These two kinds of behaviors, of cycles and a 421 fixed point, carry over with some modifications to the case of more complex models. 422 423

Dynamics of the two-heat model 424
Next, we consider the dynamic in a finite population size and given a mutation rate at the PRDM9 425 locus. We assume that the number of hotspots associated with new PRDM9 alleles is uniformly 426 distributed across a wide range (ܵ ଵ ൌ 1 െ 5 0 0 0 ), which spans the optimal values for both homozygotes 427 and heterozygotes across the range of hotspot dissociation constants that we explore (݇ ଵ ൌ 5 െ 5 0

). 428
Where possible, the values that we use for model parameters are based on empirical observations, 429 including the mutation rates at the PRDM9 locus and its binding sites, the number of DSBs initiated per 430 meiosis, and the probability that a gene conversion event removes the hot allele at a PRDM9 binding site 431 in individuals heterozygous for hot and cold alleles ( Table 1). We focus on how the evolutionary 432 dynamics depend on the dissociation constant at hotspots, k 1 , which we know less about, and the effective 433 population size, which varies over orders of magnitude in natural populations (Leffler et al. 2012). 434 When the population size is large (e.g., ܰ ൌ 1 0 ), the dynamics are similar to that of the infinite 435 population size case ( Fig. 4A and C). When hotspots are fairly weak (e.g., ݇ ଵ ൌ 5 0 ), they follow an 436 approximate cycle (Fig 4A): a new PRDM9 allele that establishes a near optimal number of hotspots for 437 heterozygotes invades and fixes, dominating the population in a homozygous state until enough hotspots 438 are lost that the population becomes susceptible to invasion of a new allele, likewise with a near optimal 439 number of hotspots for heterozygotes. Diversity at the PRDM9 locus is low throughout most of this cycle, 440 with bouts of diversity in the period during which the dominant allele is taken over by a new one. Fitness 441 peaks when a new allele is fixed and then decreases as that allele loses its strong binding sites. 442 ), dynamics are approximately at a fixed point 452 (Fig 4C): new PRDM9 alleles that lead to near optimal numbers of hotspots for heterozygotes continually 453 invade but never fix. The fitness of individuals homozygous for such PRDM9 alleles is lower than that of 454 individuals heterozygous for one such allele and those already present. Consequently, new PRDM9 455 alleles experience a kind of frequency-dependent selection, in which they are favored at low frequencies 456 because they have near optimal numbers of hotspots for heterozygotes, but selected against at higher 457 frequencies because of the reduced fitness of homozygotes. As a result, diversity remains high and 458 average fitness is approximately constant. ; compare Fig 4A and B). The reduced 461 mutational input at hotspots and correspondingly lower rate at which they are lost leads to longer intervals 462 of time between fixation events. In turn, the reduced mutation input at PRDM9 and stronger drift leads to 463 reduced peaks of PRDM9 diversity during those events. 464 In contrast, when hotspots are strong (e.g., ), dynamics differ markedly in smaller 465 population compared to larger ones (compare Fig 4C and D): whereas in large populations, PRDM9 466 alleles never fix, in small populations, the lower mutational input at the PRDM9 locus and stronger 467 genetic drift allows PRDM9 alleles to fix intermittently (Fig 4D). PRDM9 alleles with a small number of 468 hotspots are typically lost rapidly and without reaching fixation--the same behavior as in larger 469 populations. But when a mutation occurs to a PRDM9 allele with a large number of hotspots, such that 470 homozygotes for it have higher fitness than any heterozygotes (see Fig. 3B), and the allele happens to 471 reach high frequency by chance, it can be maintained in the population at appreciable frequencies-until it 472 has lost a sufficient number of hotspots for the population to become susceptible to the invasion of a new 473 PRDM9 allele. These chance fixation events and the corresponding number of hotspots leads to 474 intermittent, chaotic cycles. 475 In both small or large populations, stochasticity in the PRDM9 mutations that arise and escape 476 immediate loss gives rise to a distribution of initial number of hotspots associated with segregating 477 PRDM9 alleles. We obtain this distribution by weighting PRDM9 alleles by the product of their sojourn 478 times and mean frequencies (Fig 5). This distribution is centered close to but above the optimal number of 479 hotspots for heterozygotes, in part because newly arising alleles are always found in heterozygotes (Fig 5  480 and 6A). The mean is generally closer to this optimal number in large populations than in small 481 populations, reflecting the greater mutational input at PRDM9 (Fig 6A). However, when hotspots are 482 strong and the population size is small, the intermittent fixations of PRDM9 alleles with a large number 483 of hotspots shift the distribution towards larger initial numbers of hotspots (Fig 5 and Fig 6A). In turn, the distribution of the final numbers of hotspots associated with PRDM9 alleles exiting 496 the population (similarly weighted by sojourn times and mean frequencies) is centered below the number 497 at which they become selected against (at which the population fixed for such an allele would become 498 susceptible to invasion for one with the optimal numbers of hotspots for heterozygotes), because they 499 26 continue to lose hotspots during the time it takes a new PRDM9 allele to fix. More hotspots are lost 500 during this time interval in larger populations than in smaller populations, owing to the increased 501 mutational input at hotspots (Fig 5 and 6A). 502 The turnover time at PRDM9, defined as the mean product of sojourn times and mean frequencies 503 across PRDM9 alleles, is determined principally by the rate at which hotspots are lost and how many 504 hotspots need to be lost on average before an allele becomes selected against. In particular, turnover time 505 is inversely proportional to population size (Fig 6B). This relationship is driven almost entirely by the 506 increased mutational input at PRDM9 binding sites in larger populations (and correspondingly increased 507 rate at which hotspots are lost), while the mutational input at PRDM9 is only minorly rate-limiting in 508 smaller populations (Fig 6 -Figure Supplement 1A, Fig 6 -Figure Supplement 2). Similarly, the 509 turnover time is reduced for strong hotspots relative to weak one (Fig 6B). This reduction results from the 510 lower initial number of hotspots (Fig 6A), as well as from the fact that stronger hotspots experience more 511 DSBs and are correspondingly lost more quickly. 512 The effects of the population size and the dissociation constant of hotspots on turnover times 513 shape diversity levels at PRDM9. Inversely, the peak levels of diversity observed are influenced primarily 514 by the mutation rate at PRDM9 (Fig 6 -Figure Supplement 2, compare rows 1 and 2). The effect of 515 population size on PRDM9 diversity is explained in part by the mutational input at binding sites 516 decreasing the time between turnover events (Fig 6 -Figure Supplement 2, compare rows 1 and 3), and 517 the high levels of diversity that accompany them, and in part by the mutational input at PRDM9 518 increasing diversity during such periods (Fig 6C, Fig 6 -Figure Supplement 1B). As explored above, 519 the strength of hotspots additionally plays an important role in shaping patterns of PRDM9 diversity by 520 determining the relative fitnesses in heterozygous or homozygous individuals for different PRDM9 521 alleles, and thus whether alleles will typically have a homozygous or heterozygous advantage. 522 Notably, the co-evolutionary dynamics of PRDM9 and its binding sites entail a substantial 523 genetic load, i.e., the average fitness is substantially lower than the optimal value. Moreover, and in 524 contrast to the usual tenet of population genetics, the average fitness is greater in smaller populations (Fig  525  6D). These phenomena arise for two reasons. First, the greater efficacy of selection and mutational input 526 at PRDM9 in larger populations shifts the distribution of initial numbers of hotspots among invading 527 alleles towards values optimizing fitness in heterozygotes (Fig 5 and 6A). Inversely, because the 528 maximum possible fitness for homozygous individuals is always higher than that of heterozygotes (Fig  529   3), the optimal number of initial hotspots for maximizing fitness across the population will be closer to 530 that which optimizes fitness in homozygotes. Because the shift towards optimal values for heterozygotes 531 tends to bring those values further from what would be optimal for homozygotes, this tends to cause a 532 reduction in mean fitness. Second, the increased mutational input at hotspots in larger populations leads to 533 lower (and less optimal) numbers of hotspots recognized by PRDM9 alleles segregating in the population 534 at any given time (Fig 5 and 6A). in which the loss of individual binding sites leads to increased PRDM9 binding and DSB formation at 552 remaining sites. In this setting, the rates at which binding sites are lost from the population due to GC and 553 the probabilities that they experience symmetric DSBs are each determined by their underlying affinities 554 for PRDM9. Strong hotspots are lost more rapidly than weak ones, resulting in a shift from a small 555 number of strong binding sites to a larger number of weak binding sites over the lifespan of a PRDM9 556 allele. This shift is accompanied by a reduction in the number of symmetric DSBs, and increasing 557 selection against that PRDM9 allele. New PRDM9 alleles restore binding sites, thereby serving as a 558 countervailing force to GC. But they are favored because, in doing so, PRDM9 binding is redirected to a 559 smaller proportion of the genome, and correspondingly is more likely to bind symmetrically. 560 Accordingly, we suggest that the selection driving the evolution of the PRDM9 ZF arises from the 561 requirement of symmetric binding in DSB repair, perhaps ultimately because of homolog pairing, and 562 thus from changes in the shape of its binding distribution over time, rather than from changes in net 563 binding, as previously believed. 564 We employed a number of simplifying assumptions in our model. Notably, we assumed a very 565 simple distribution of heats that suffices to demonstrate key behaviors of PRDM9 evolution, but is likely 566 insufficient to capture all quantitative dynamics. Considering a more complete distribution (i.e., with 567 many heats) may be necessary to more accurately characterize the timescale of PRDM9 turnover, for 568 example, or how the fitness of an allele in heterozygotes changes over time relative to homozygotes. We 569 also modeled the loss of hotspots deterministically, without considering the contribution to asymmetry 570 that might stem from heterozygosity at binding sites for instance. Also not considered in the model is the 571 possibility that when PRDM9 is not bound symmetrically, occasional repair off the sister chromatid rather 572 than the homolog could reduce the rate at which hotspots are lost, while a mutagenic effect of 573 recombination may increase the rate at which hotspots are lost. Lastly, by assuming that the majority of 574 PRDM9 is bound in the absence of strong binding sites, and that all PRDM9-bound sites are equally 575 likely to experience a DSB, we implicitly assumed that each allele in heterozygotes was responsible for 576 roughly half of all PRDM9-bound sites, and half the DSBs per meiosis, thereby ignoring some potential 577 dominance effects. Despite these limitations, our model already provides important new insights into 578 PRDM9 evolution and the selection pressures that shape the recombination landscape, as discussed 579 below. 580 581 The requirement for symmetry shapes the recombination landscape 582 In our two-heat model, PRDM9 alleles can arise with different binding distributions (i.e., with 583 different numbers of hotspots), and selection favors PRDM9 alleles for which PRDM9 binding is 584 concentrated to the smallest proportion of the genome (i.e., with a relatively small number of hotspots, but 585 enough to prevent PRDM9 binding to the more numerous weaker sites). In a more realistic model, with 586 many heats, we would likewise expect selection to favor highly skewed distributions, i.e., with hotspots, 587 as is observed empirically. 588 The benefit of PRDM9 binding symmetry presumably lies in the fact that the same locus used for 589 DSB initiation on one chromosome is, on the homolog, preferentially used as template for DSB repair. As 590 far as we are aware, Davies et al. 2016 was the first to hypothesize that this might reflect a more general 591 benefit of using hotspots for localizing meiotic recombination: that preferentially using a small proportion 592 of the genome as both sites for DSB initiation and templates for DSB repair vastly reduces the search 593 space during homolog engagement, i.e., the number of genomic positions each DSB needs to be checked 594 against before finding the appropriate template for repair (Davies et al. 2016  In our model, we considered individual fitness to be the probability that at least one DSB forms 600 on the smallest chromosome at a site that had been symmetrically bound by PRDM9. In doing so, we 601 considered only the probability that at least one site would find the appropriate template for repair after 602 preferentially checking each of n other PRDM9-bound sites. A more complex model might further 603 consider that the benefit of symmetry is a function of n, i.e., that the benefit of symmetry would be 604 improved if even fewer sites needed to be checked, or inversely, that being symmetrically bound would 605 be less predictive of homology in the presence of a very large number of bound sites. Although we do not 606 explicitly consider changes in the size of the search space that result from changes in net PRDM9 binding, 607 our results suggest an implicit benefit to the use of a smaller number of sites for recombination, when 608 considering competition between binding sites for PRDM9 and between bound-sites for DSBs (as shown 609 in Fig. 1A). 610 It further stands to reason that if binding symmetry, in terms of PRDM9 or of downstream factors 611 involved in DSB initiation and repair, enables DSB repair, it also likely serves to improve the efficacy of 612 DSB-dependent mechanisms of homolog pairing and synapsis. In this regard, it is interesting that many 613 taxa with DSB-independent mechanisms of homolog pairing, such as Drosophila or C. elegans, do not 614 have hotspots, whereas most species with DSB-dependent mechanisms, such as yeast, plants or 615 vertebrates, do. We speculate that the general prevalence of hotspots may be related to the distribution of 616 DSB-dependent mechanisms for proper homolog pairing and segregation. 617 618 Does the decay of hotspots by GC lead to more or fewer hotspots? 619 Intriguingly, the loss of individual hotspots in our model may lead to an increase or a decrease in 620 the total number of hotspots in our model, depending on the distribution of heats considered, as well as 621 how hotspots are defined. The standard definition of hotspots, also used operationally to identify them, is 622 as short intervals of the genome (typically < 2 kilobases) with recombination rates that exceed the local or 623 genome-wide background rate by more than some arbitrary multiplicative factor (e.g., five-fold; reviewed 624 in Tock and Henderson 2018). In the two-heat model that we explored, the stronger class of binding sites 625 would easily be identified as hotspots, whereas the weaker binding sites would be taken as non-specific 626 binding, if they were detected at all. However, if we were to consider more realistic binding distributions, 627 we would find some binding sites with heats just below any given threshold used to define hotspots, 628 which would become hotspots after some of the stronger ones are lost. Given such a distribution, we 629 could define a threshold such that the number of hotspots increases over the lifespan of a PRDM9 allele 630 as its binding becomes more dispersed along the genome. and Keeney 2015). Therefore, this recombination mechanism is likely ancestral to the evolution of 686 PRDM9's role in recombination, and still conserved in the presence of PRDM9. How and why PRDM9 is 687 repeatedly lost remains unknown however, as is the benefit PRDM9 typically confers over the use of 688 default hotspots. 689 The selection that drives PRDM9 evolution in our model may help answer the latter. If the benefit 690 of hotspots in species without PRDM9 is that they too allow for the recruitment of factors involved in 691 both DSB initiation and DSB repair on both homologs, then PRDM9 may be favored when it further 692 enhances the degree of symmetric recruitment of such factors relative to that of default hotspots, i.e., 693 when PRDM9-mediated hotspots are fewer and hotter than default hotspots given the same number of 694 DSBs. In support of this possibility, DSB heats measured by DMC1 ChIP-seq are higher in wild type than 695 in PRDM9-/-mice (Brick et al. 2012). has been lost so many times in different lineages (Baker et al. 2017, Cavassim et al. 2022. It is tempting 706 to speculate that the answer to this question also relates to the advantage of symmetry. Perhaps in these 707 species, segregating PRDM9 alleles no longer provided an advantage over the default use of promoter-708 like features, possibly because the use of such features in those species result in more symmetry than in 709 others. 710 711