P-elements strengthen reproductive isolation within the Drosophila simulans species complex

Determining mechanisms that underlie reproductive isolation is key to understanding how species boundaries are maintained in nature. Transposable elements (TEs) are ubiquitous across eukaryotic genomes. However, the role of TEs in modulating the strength of reproductive isolation between species is poorly understood. Several species of Drosophila have been found to harbor P-elements (PEs), yet only D. simulans is known to be polymorphic for their presence in wild populations. PEs can cause reproductive isolation between PE-containing (P) and PE-lacking (M) lineages of the same species. However, it is unclear whether they also contribute to the magnitude of reproductive isolation between species. Here, we use the simulans species complex to assess whether differences in PE status between D. simulans and its sister species, which do not harbor PEs, contribute to multiple barriers to gene flow between species. We show that crosses involving a P D. simulans father and an M mother from a sister species exhibit lower F1 female fecundity than crosses involving an M D. simulans father and an M sister-species mother. Our results suggest that the presence of PEs in a species can strengthen isolation from its sister species, providing evidence that transposable elements can play a role in reproductive isolation and facilitate the process of speciation. IMPACT SUMMARY Transposable elements (TEs) are repetitive genetic units found across the tree of life. They play a fundamental role on the evolution of each species’ genome. TEs have been implicated in diversification, extinction, and the origin of novelty. However, their potential role in contributing to the maintenance of species boundaries remains largely understudied. Using whole genome sequences, we compared the relative content of TEs across the three species of the Drosophila simulans complex. We find that the presence of one TE, P-element, in D. simulans, and its absence in the sister taxa, differentiates the three species. P-elements (PEs) cause a suite of fitness defects in Drosophila pure-species individuals if their father has PEs but their mother does not, a phenomenon known as hybrid dysgenesis (HD). We thus studied the possibility that PEs enhance isolation between recently-diverged species. In particular, we studied whether the progeny from interspecific crosses were more prone to suffer from HD than pure species. We found that the presence of paternal PEs reduces hybrid female fecundity, mirroring observations of HD described within species. The effect of PEs is stronger in the interspecific hybrids than in pure species. Our results suggest that PEs can strengthen reproductive isolation in well-formed sister species that still hybridize in nature and pose the question of whether other TEs are involved in the formation of species or in their persistence over time.

species individuals if their father has PEs but their mother does not, a phenomenon known as hybrid 48 dysgenesis (HD). We thus studied the possibility that PEs enhance isolation between recently-diverged 49 species. In particular, we studied whether the progeny from interspecific crosses were more prone to 50 suffer from HD than pure species. We found that the presence of paternal PEs reduces hybrid female 51 fecundity, mirroring observations of HD described within species. The effect of PEs is stronger in the 52 interspecific hybrids than in pure species. Our results suggest that PEs can strengthen reproductive 53 isolation in well-formed sister species that still hybridize in nature and pose the question of whether 54 other TEs are involved in the formation of species or in their persistence over time.

INTRODUCTION
the number of base pairs from each sample aligning to each transposable element and 174 calculated the effective copy number per TE per sample by first dividing that quantity by used a fixed estimate of 125 Mb. We generated consensus sequences per TE per line, 178 defining each base as the major allele mapping to that site and leaving as undefined any 179 site for which no allele had more than five supporting reads. This approach differs from 180 previous approaches (Serrato-Capuchina et al. 2020), in that we used the total number 181 of base pairs to calculate the TE copy number and not the number of mapped reads. 182 The results are similar (see below). If a line showed fewer than 0.5 copies of the PEs 183 (i.e., a single heterozygote PE copy), it was considered PE-negative. To assess whether 184 our inference was affected by the way we scored copy number, we tried two different 185 cutoff schemes and counted a copy as complete if it was missing (i.e., the alignment to 186 the reference contained ambiguous bases for) no more than than 0% and 25% of the 187 sequence. This approach potentially reveals the number of copies in each isofemale at 188 the time of sequencing. We verified the results from this approach with the long-read 189 sequencing. 190

191
To determine which TEs differ in their content across the three species of the simulans species 192 complex, we used a linear discriminant analysis (LDA) for the species group. We used the 193 function 'lda' (library MASS, (Venables et al., 2003)) to find the best two linear discriminant (LD) 194 functions, LD1 and LD2. For these analyses, we only retained TEs for which we had inferred at 195 least one full copy in more than each sequenced line. We plotted the scores of these two 196 functions using the function plot (library graphics, (R Core Team 2016)). We extracted the 197 relative contribution of each PE from the object scaling. To determine which TEs differed 198 between species, we fit nested ANOVAs for each TE using lm (library 'stats', (R Core Team 199 2016)). For all these linear models, estimated copy number was the response, species was a 200 fixed factor, and the line was a nested effect within species. We then performed pairwise 201 comparisons using a Tukey HSD test using the function glht (library 'multcomp', (Hothorn et al.,202 from seven lines of each of M and P D. simulans to seven lines of each of M D. simulans, D. 246 sechellia, and D. mauritiana, for a total of 294 types of crosses (14 × 21). For both within-and 247 between-species crosses, we placed four to nine day-old virgins of each species in 30mL 248 bottles with yeast and cornmeal medium. For conspecific crosses, we used a 1:1 ratio of 249 females to males in each cross. For heterospecific crosses, we used a 1:2 ratio of females to 250 males in each cross to account for greater female choosiness (Lachaise et al,. 1986;Turissini et 251 al., 2018), with no more than 50 total flies per vial. Right after mixing, we maintained the vials at 252 24ºC for 24 hours. After this time, vials were placed in a 25ºC or 29ºC incubator (Percival DR-253 36VL; Percival Scientific Inc.; Boone, USA). Every six days, we transferred the adults to a new 254 vial and inserted a pupation substrate (KimWipes, Kimberly-Clark; Roswell, USA) treated with 255 0.5% propionic acid to the old vial. Vials were inspected daily for the production of progeny. F1s 256 produced were collected for up to 15 days after parentals were removed. Our goal was to study 257 at least 20 individuals per cross. We dissected 20 females for each cross with the exception of 258 the crosses listed in Table S3, for which we dissected 40 females per cross. 259 260

Ovary number, counts 261
One of the most pronounced phenotypes of the HD syndrome is reduced female fecundity, as 262 evidenced by the presence of atrophied gonads (ovaries). We scored the number of functional 263 ovaries in pure species and F1 hybrids in the simulans species complex at two temperatures. 264 The procedure for conspecific and heterospecific crosses was identical. We produced F1 265 females as described above (See 'F1 Production'). To score female fecundity, we counted the 266 number of functional gonads in each F1 female (i.e., 0, 1, or 2 ovaries). We collected females 267 aged four to nine days post-eclosion, anesthetized them with CO2, and extracted their 268  Second, we studied the effects of PE copy number as well as the effects of its presence 292 and absence. For each of two types of interspecific crosses for which we observed evidence of 293 HD (i.e., ♀sech ´ ♂sim and ♀mau ´ ♂sim), we fitted a linear mixed model where the response 294 of the model was whether a female showed evidence of dysgenesis or not, the type of cross 295 (either conspecific or heterospecific) was a fixed effect, and the number of PE copies in the D. 296 simulans genome was a continuous trait. Each of these datasets included interspecific data and 297 intraspecific data (within D. simulans) from ♀ M ´ ♂ P crosses. We fitted a binomial regression 298 using the function glmer in the library lme4 (Bates et al., 2013). We included the interaction 299 between these two effects to account for differences in HD caused by the number of PE copies 300 and assessed its significance using a likelihood ratio test (function 'lrtest', R library lmtest; 301 ). An average of five to ten pairs of testes were mounted per slide. Using a microscope, we 325 scored whether each pair of testes had motile sperm within five minutes of having mounted the 326 samples. For each interspecific genotype combination, we scored sperm motility for 100 males 327 per cross (398 combinations: 39,800 total males).
First, we explored whether the genomes of the three species from the simulans complex 335 differed in their TE content. Figure 1A shows an LDA of the TEs that contribute to differences 336 among genotypes (i.e., species and lines). We show the results when we allowed for up to 25% 337 of a sequence to be missing in a TE to count it as present in that line. The first discriminant 338 function, LD1, separates the three species (85% of the trace) more effectively than LD2 (15% of 339 the trace). Figure 1B shows the loadings in LD1 for 27 TEs. PE and Het-A have the largest 340 contributions to the discriminant function, but each of them contributes ~10% of the variance. 341 Figure S1 shows the loadings for LD2, but this function is at least time five times less effective 342 at separating species than LD1. These results suggest that the differences in TE content in the 343 simulans species group are multifaceted and cannot be explained by a single type of TE. 344 Figures S2-S3 show LDAs for a different, and more stringent, cutoff (necessitating 0% of the 345 consensus sequence missing to count it as present in that line) to estimate TE copy number; 346 regardless of the cutoff, P-element (PE) content differs between species in the simulans 347

complex. 348
Next, we studied whether any TE was present in one of the species of the simulans 349 clade but not in the other ones. Figure 2 shows the estimated number of copies for the twelve 350 TEs that explain most of the variance in LD1 (listed in Figure 1B). Figures S4-S5 show the other 351 38 TEs. The majority of TEs were either present in all the three species-but in different copy 352 numbers ( Figure 2)-or absent in the three species ( Figures S4-S5). This similarity reflects the 353 close genealogical relationship between the triad of species. Different cutoffs had no effect on 354 our results. Figure S6 shows boxplots for a more stringent cutoff to estimate TE copy number.  presence-absence between D. simulans and the other two species of the simulans complex 382 (see above). We studied whether PEs had an effect on female hybrid fecundity. Specifically, we 383 studied whether F1 hybrid females from ♀mau ´ ♂sim and ♀sech ´ ♂sim crosses were more 384 likely to show atrophied ovaries, a proxy of hybrid dysgenesis if the fathers carried PEs. Figure  385 3 shows examples of progeny with atrophied ovaries from conspecific and heterospecific 386 crosses. First, females from ♀M ´ ♂M crosses showed few females with atrophied gonads in 387 both conspecific and heterospecific crosses (~0.1%, Table 1). Second, ♀M/♂P females from 388 heterospecific crosses showed a higher rate of gonad atrophy than females from conspecific 389 crosses at both temperatures ( Figure 4, Table 1). ♀M/♂P F1 females from interspecific crosses 390 were more likely to have atrophied ovaries than ♀M/♂P females from crosses within D. 391 simulans (Table 1). Third, we found dysgenic females in all ♀M ´♂ P crosses at both 392 temperatures but, as expected (Kidwell and Novy 1979), the frequency of gonad atrophy in ♀M 393 ´ ♂ P crosses is higher at 29ºC than at 25ºC in conspecific crosses (2sEP: X 2 = 159.37, df = 1, 394 P < 1 × 10 -10 ) and heterospecific crosses (2sEP mau × sim: χ 2 = 122.76, df = 1, P < 1 × 10 -10 ; 395 2sEPsech × sim: χ 2 = 18.593, df = 1, P = 1.618 × 10 -5 ; Figure 4A vs. 4C and Figure 4B vs. 4D).  Table 2). The rate of decrease in female fecundity as PE 416 increases was similar in heterospecific and conspecific crosses (interaction in Table 2). We be fecund than F1 females from conspecific crosses to be dysgenic at both 25ºC and 29ºC 420 (odds ratios in Table 2). The PE copy number effect was also significant, indicating the 421 existence of a negative relationship between paternally inherited PEs and ovary number in the 422 female ♀M/♂P progeny ( Table 2). The interaction term (PE copy ´ cross) in the comparison at 423 29ºC was significant (Likelihood Ratio test; c 2 = 7.605, df = 1, P = 5.821´ 10 -3 ). This result 424 suggests that the slope of the decrease in fecundity as PE copy number increases was slightly 425 more pronounced in ♀M/♂P heterospecific crosses than in ♀M/♂P conspecific crosses (Slope 426 sech ´ sim -slope sim ´ sim = -0.083, SD = 0.031, Z = -2.721, P= 6.51´ 10 -3 ). Results for all crosses 427 were identical when we used a previous inference of PE copy number ((Serrato-Capuchina et 428 al. 2020), Table S5). 429 cross), were completely sterile and produced no sperm, regardless of the identity of the cross. 434 The PE status of D. simulans did not affect hybrid male fertility. 435 this report, we studied the role of PEs in RI between three closely related species of the 441 simulans species complex by crossing D. simulans lines that varied in whether they carry PEs to 442 two sister species that do not carry PEs. We find that PE status of the father (and the number of 443 PE copies) affects F1 female fertility in interspecific crosses. HD in F1 ♀M/♂P females is 444 stronger in hybrid females than in within-species females, suggesting that hybrids are more 445 susceptible to the effects of HD than pure-species individuals. 446 The importance of TEs in the speciation process remains largely unexplored (Watson 447 and  The relative importance of different mechanisms of successful genome invasion by TEs 501 (horizontal gene transfer vs. introgression) remains largely unexplored. It is likely that multiple 502 mechanisms play a role in the transmission of PEs, and only a robust systematic assessment of 503 the relative frequency of PEs in different species of Drosophila and the putative vectors will reveal the relative importance of each mechanism in the spread of PEs within and between 505 species. Our results show that the dynamics of PEs, and possibly of TEs in general, should not 506 only be addressed with a lens on the fitness effects that PEs have on within-species crosses, 507 but also on the effects that they might have on interspecific hybrids. 508  Figure S1 shows the LD2 relative loadings.

FIGURE 2. The abundance of TEs varies in the three species of the simulans complex. 521
We filtered out copies in which at least 25% of the sequence of the TE was missing. We show 522 the 12 TEs with the largest loadings in LD1; other TEs are shown in Figures S4-S5