A Population Genetics Theory for piRNA-regulated Transposable Elements

Transposable elements (TEs) are self-reproducing selfish DNA sequences that can invade the genome of virtually all living species. Population genetics models have shown that TE copy numbers generally reach a limit, either because the transposition rate decreases with the number of copies (transposition regulation) or because TE copies are deleterious, and thus purged by natural selection. Yet, recent empirical discoveries suggest that TE regulation may mostly rely on piRNAs, which require a specific mutational event (the insertion of a TE copy in a piRNA cluster) to be activated — the so-called TE regulation ”trap model”. We derived new population genetics models accounting for this trap mechanism, and showed that the resulting equilibria differ substantially from previous expectations based on a transposition-selection equilibrium. We proposed three sub-models, depending on whether or not genomic TE copies and piRNA cluster TE copies are selectively neutral or deleterious, and we provide analytical expressions for maximum and equilibrium copy numbers, as well as cluster frequencies for all of them. In the full neutral model, the equilibrium is achieved when transposition is completely silenced, and this equilibrium does not depend on the transposition rate. When genomic TE copies are deleterious but not cluster TE copies, no long-term equilibrium is possible, and active TEs are eventually eliminated after an active incomplete invasion stage. When all TE copies are deleterious, a transposition-selection equilibrium exists, but the invasion dynamics is not monotonic, and the copy number peaks before decreasing. Mathematical predictions were in good agreement with numerical simulations, except when genetic drift and/or linkage disequilibrium dominates. Overall, the trap-model dynamics appeared to be substantially more stochastic and less repeatable than traditional regulation models.

1 Introduction 1 Transposable elements (TEs) are repeated sequences that accumulate in genomes 2 and often constitute a substantial part of eukaryotic DNA. According to the 3 canonical "TE life cycle" model (Kidwell and Lisch 2001;Wallau et al. 2016), TE 4 families are not maintained actively for a long time in genomes. TEs are the most 5 active upon their arrival in a new genome (often involving a horizontal transfer, 6 Gilbert and Feschotte 2018); their copy number increases up to a maximum, at 7 which point transposition slows down. TE sequences are then progressively 8 degraded and fragmented, accumulate substitutions, insertions, and deletions, up to 9 being undetectable and not identifiable as such. The reasons why the total TE 10 content, the TE families, and the number of copies per family vary substantially in 11 the tree of life, even among close species, are far from being well-understood, which 12 raises interesting challenges in comparative genomics. 13 TEs spread in genomes by replicative transposition, which ensures both the 14 genomic increase in copy number and the invasion of populations across generations 15 of sexual reproduction. They are often cited as a typical example of selfish DNA 16 sequences, as they can spread without bringing any selective advantage to the host 17 species, and could even be deleterious (Orgel and Crick 1980;Doolittle and 18 Sapienza 1980). Even if an exponential amplification of a TE family could, in Capy 2006). Even though TEs can be inactivated by regular genomic mutations, as 30 any other DNA sequences, there exist documented mutational mechanisms that 31 specifically target repeated sequences, such as repeat induced point mutations in 32 fungi (Selker and Stevens 1985;Gladyshev 2017). Alternatively, substitutions or 33 internal deletions in TEs could generate non-autonomous elements, able to use the 34 transposition machinery without producing it, decreasing the transposition rate of 35 autonomous copies (Hartl et al. 1992;Robillard et al. 2016). 36 Transposition regulation refers to any mechanism involved in the control of the 37 transposition rate by the TE itself or by the host. There is a wide diversity of 38 known transposition regulation mechanisms; some prevent epigenetically the 39 transcription of the TE genes (Deniz et al. 2019), others target the TE transcripts 40 (Adams et al. 1997), or act at the protein level (Lohe and Hartl 1996). Recently, the 41 discovery of piRNA regulation systems have considerably improved and clarified our 42 understanding of TE regulation (Brennecke et al. 2007;Malone and Hannon 2009;43 Zanni et al. 2013;Ozata et al. 2019). piRNA regulation seems to concern a wide 44 range of metazoan species (Huang et al. 2021), and acts on TE expression through 45 a series of complex mechanisms, which can be summarized by a simplified 46 regulation scenario known as the "trap model" (Bergman et al. 2006;Kofler 2019). 47 In such a scenario, regulation is triggered by the insertion of a TE in specific "trap" 48 regions of the genome, the piRNA clusters. TE sequences inserted in piRNA 49 clusters (thereafter TE cluster insertions) are transcribed into small regulating 50 piRNAs, that are able to silence homologous mRNAs from the same TE family by 51 recruiting proteins from the PIWI pathway.

52
Early models, starting from Charlesworth and Charlesworth (1983), assumed 53 that the strength of regulation increases with the copy number. The transposition 54 rate is then expected to drop progressively in the course of the TE invasion up to 55 the point where transposition stops. In contrast, the PIWI regulation pathway 56 displays unique features that may affect substantially the evolutionary dynamics of 57 TE families: (i) it relies on a mutation-based mechanism, involving regulatory loci 58 that may need several generations to appear (ii) the regulatory loci in the host 59 genome segregate independently from the TE families and have their own 60 evolutionary dynamics (the TE amplifies in a genetically-variable population, which 61 is a mixture of permissive and repressive genetic backgrounds), and (iii) the 62 regulation mechanism may not be strongly dependent on genomic copy number. 63 The consequences of these unique features on the TE invasion dynamics are not 64 totally clear yet. Individual-based stochastic simulations have shown that piRNA regulated by a trap model thus appear to differ substantially from the predictions of 71 the traditional population genetics models. 72 With this paper, we extend the existing corpus of TE population genetics 73 models by proposing a series of mathematical approximations for the dynamics of 74 TE copy number when regulated by piRNA in a "trap model" setting. We studied 75 three scenarios, differing by whether or not TE copies induce a fitness cost when 76 inserted in piRNA clusters and/or in other genomic locations: Scenario (i) neutral 77 TEs, (ii) deleterious TEs and neutral TE cluster insertions, and (iii) deleterious 78 TEs and deleterious TE cluster insertions. We showed that an equilibrium copy 79 number could be achieved in scenarios (i) and (iii), while the TE family decays 80 (after having invaded) in scenario (ii). We confirmed that the transposition rate 81 does not condition the equilibrium number of copies in the neutral model (i), but it 82 does when TEs are deleterious. Model (iii), in which both genomic and TE cluster 83 insertions are deleterious, lead to a complex transposition-selection-regulation 84 equilibrium (the "transposition-selection cluster" balance in Kofler (2019)), in which 85 regulatory alleles do not reach fixation and stabilize at an intermediate frequency.

86
The robustness of these predictions was validated by numerical simulations. Model setting and notation traces back to Charlesworth and Charlesworth (1983), 90 who proposed to track the mean TE copy numbern in a population through the 91 difference equation: where u is the transposition rate (more exactly, the amplification rate per copy and 93 per generation), and v the deletion rate. In this neutral model, if u and v are 94 constant, the copy number dynamics is exponential. If the transposition rate u n is 95 regulated by the copy number (u 0 > v, du n /dn < 0, and lim(u n ) < v), then a 96 stable equilibrium copy numbern can be reached.

97
However, in most organisms, TEs are probably not neutral. If TEs are 98 deleterious, fitness w decreases with the copy number (w n < w 0 ). As a consequence, 99 individuals carrying more copies reproduce less, which decreases the average copy 100 number every generation. The effect of selection can be accounted for using 101 traditional quantitative genetics, considering the number of copies n as a 102 quantitative trait: ∆n ≃ Var(n)∂ log(w n )/∂n, where Var(n) is the variance in copy 103 number in the population, and ∂ log(w n )/∂n approximates the selection gradient on 104 4/30 n. The approximation is better when the fitness function w n is smooth and the copy 105 number n is not close to 0. Assuming random mating and no linkage disequilibrium, 106 n is approximately Poisson-distributed in the population, and Var(n) ≃n.
107 Charlesworth and Charlesworth (1983) proposed to combine the effects of 108 transposition and selection to approximate the variation in copy number among 109 generations t and t + 1 as: where sn t = ∂ log w n /∂n|n t .

111
When the transposition rate is high, the (neglecting u 2 in (1 + u) 2 ). This correction remains an approximation, as the effect 121 of linkage disequilibrium of TE copies is more subtle and complex (Roze 2022).

122
Overall, linkage disequilibrium due to transposition (slightly) amplifies the selection 123 penalty for high transposition rates, and tends to decrease the genomic copy 124 number.

126
Data analysis was performed with R version 4.0 (R Core Team 2020).

127
Mathematical model analysis involved packages deSolve (Soetaert et al. 2010) and 128 phaseR (Grayling 2014). All figures and analyses can be reproduced from the 129 scripts available at https://github.com/lerouzic/amodelTE. The K possible insertion sites are equally spread on chromosomes; the k piRNA clusters are distributed at one of the chromosome tips, representing a proportion π of the genome. Recombination is suppressed within clusters, but is possible in non-cluster genomic regions. In transposition-permissive genomes (no TE insertion in piRNA clusters), TEs located in normal insertion sites can transpose in any random location with a transposition rate u. TEs located in clusters cannot transpose, and prevent the transposition of all other elements in the genome. In order to ensure the generality of the results, simulations were set up to minimize genetic linkage: recombination between normal sites was free (recombination rate r = 0.5), and the genome had 30 chromosomes (always larger than the number of clusters). In the default conditions, 300 out of K = 10, 000 insertion sites are piRNA clusters (π = 0.03, close to the estimated proportion in the genome of Drosophila melanogaster ).

159
3.1 Neutral trap model 160 The model assumes k identical piRNA clusters in the genome, and the total 161 probability to transpose in cluster regions is π. Each cluster locus can harbor two 162 alleles: a regulatory allele (i.e., the cluster carries a TE insertion), which segregates 163 at frequency p, and an "empty" allele (frequency 1 − p). When all clusters are piRNA cluster (with probability π), which is only possible in permissive genotypes. 173 Assuming random mating and no linkage disequilibrium between clusters and the 174 rest of the genome (Cov(n t , p t ) = 0, i.e. no correlation between n and the genotype 175 at the regulatory clusters), we approximated the discrete generation model with a 176 continuous process, and the neutral model (equation 2) was rewritten as a set of 177 two differential equations onn (relabelled n for simplicity) and p: (3) 179

180
Here, n stands for non-cluster TE copies; for simplicity, equation 3 assumes that 181 π ≪ 1, so that 1 − π ≃ 1 (a more precise version of equation 3 for large values of π 182 is provided in Appendix A.1). Expressing the variation of the number of TE 183 cluster insertions m = 2kp as dm/dt = πdn/dt would be more straightforward here, 184 we kept the dp/dt formulation solely for consistency with more complex models 185 described below.

186
The initial state of the system is n 0 > 0 copies per individual in the population, 187 and no piRNA cluster insertions (p 0 = 0). The system of equation 3 admits three dn/dp = 2k/π, and n = n 0 + 2pk/π at any point of time: The fixation of regulatory alleles in all clusters is asymptotic (lim t→∞ p = 1), 195 and the equilibrium is asymptotically stable (dn/dt > 0 and dp/dt > 0). Figure 2 196 illustrates the effect of u and k on the dynamics n t and p t . The increase in 197 regulatory allele frequency p is due to new transposition events in clusters (there 198 would be multiple alleles at the sequence level, all being functionally equivalent).

199
Assuming that n 0 is small, the number of copies at equilibrium is proportional to to simplify the analysis, we derived the results assuming that the deleterious effects 214 of TE copies were independent, i.e. w n = exp(−ns), where n is the copy number 215 and s the coefficient of selection (deleterious effect per insertion), so that 216 ∂ log w n /∂n = −s.

217
The following calculation relies on the additional assumption that piRNA 218 clusters do not represent a large fraction of genomes (π ≪ 1, leading to n ≫ 2kp, 219 i.e. that the number of TE copies in the clusters is never large enough to make a 220 difference in the total TE count). We will describe two selection scenarios that Open symbols: simulations, plain lines: difference equations, hyphenated lines: predicted equilibria. The copy number equilibriumn does not depend on the transposition rate, and the TE cluster insertion frequency at equilibriump = 1 in all conditions. The time necessary to reach the equilibrium (both forn andp) increases with k. In simulations, frequencies could be slightly > 1 due to rare events in which several TEs could insert simultaneously in the same cluster.
Deleterious TEs and neutral clusters If TE cluster insertions are neutral, the 225 model becomes: The match between simulations (featuring a finite number of insertion sites, 237 linkage disequilibrium, and non-overlapping generations) and this theoretical result 238 was very good for a single cluster (k = 1), but degraded with larger number of 239 clusters; n * was overestimated by ∼ 10% compared to simulations for k = 5 from which an exact solution forp could not be calculated. The following 247 approximation (from Appendix A.4): happens to be acceptable for a wide range of transposition rates and for small 249 selection coefficients (s < 0.1) (Figure 4).

250
The drop inp when the number of clusters k increases suggests that there might 251 be a number of TE cluster insertions above which the transposition is effectively go through a stage where up to n * deleterious copies are present in the genome.

272
This makes it possible to approximate mathematically the critical cluster size π c , 273 from which the population fitness drops below an arbitrary threshold w c and is at 274 risk of extinction: This allows for a new equilibrium E 4 : The equilibrium exists (n > 0 andp > 0) whenever s < u(1 + 2u), i.e. the 284 transposition rate must be substantially larger than the selection coefficient. It 285 corresponds to the "Transposition-selection cluster" equilibrium described from A linear stability analysis (Appendix A.5) shows that for the whole range of u, 295 π, and k, as well as for most of the reasonable values of s, the equilibrium is a 296 stable focus, i.e. the system converges to the equilibrium while oscillating around it. 297

Genetic drift 298
The models described above assume infinite population size, which may not hold for 299 low-census species and for laboratory (experimental evolution) populations. We  The standard population genetics theory predicts that selection in small 314 populations is less effective at eliminating slightly deleterious alleles. Assuming that 315 TE copies are deleterious, they should be eliminated faster in large populations 316 compared to small ones. Although this mechanism has been proposed to explain 317 the accumulation of junk DNA (including TE copies) in multicellular eukaryotes 318 (Lynch and Conery 2003), little is known about how the equlibrium copy number of 319 an active TE family is expected to be affected by drift even in the simplest 320 scenarios (Charlesworth and Charlesworth 1983). Yet, informal models suggest that 321 drift may have a limited effect, as copy number equilibria rely on the assumption 322 that evolutionary forces that limit TE amplification (regulation and/or selection)

382
The biology of the piRNA cluster regulation was also simplified. We considered 383 that piRNA clusters were completely dominant and epistatic, i.e. a single genomic 384 insertion drives the transposition rate to zero. Relaxing slightly this assumption is 385 unlikely to modify qualitatively the model output, e.g. considering that regulatory 386 insertions are recessive would change the frequency of permissive genotypes from

387
(1 − p) 2k to (1 − p 2 ) k , which would increase the TE cluster insertion frequency at  In order to compute the equilibria, we assumed no epistasis on fitness, i.e.

397
constant ∂ log w/∂n = −s. Deriving the model with a different fitness function is 398 possible, although solving the differential equations could be more challenging.

17/30
Instead of our fitness function w n = e −ns , Charlesworth and Charlesworth (1983) 400 proposed w n = 1 − sn c (c being a coeffcient quantifying the amount of epistasis on 401 fitness), while Dolgin and Charlesworth (2006) later used w n = e −sn−cn 2 (different 402 parameterizations for directional epistasis are discussed in e.g. Le Rouzic 2014).

403
Considering negative epistasis on fitness (i.e. the cost of additional deleterious 404 mutations increases) in TE population genetic models has two major consequences: 405  (Roessler et al. 2018), and micro-organisms (Sousa et al. 2013), 426 makes it impossible to derive models that are both accurate and universal. to confirm this speculation, which cannot be addressed solely based on theoretical 465 considerations.

466
An interesting hypothesis was raised by Kelleher et al. (2018) Assuming that n 0 is reasonably small and π ≪ 1, the term n 0 π/2k can be 669 further neglected, and: The Jacobian matrix corresponding to the equilibrum (n,p) from equation 11 is: Eigenvalues are negative (i.e., the equilibrum is stable) for all tested parameter 674 combinations. Eigenvalues happen to be complex for all parameter combinations