Abstract
Co-evolution between transposable elements (TEs) and their hosts can be antagonistic, where TEs evolve to avoid silencing and the host responds by reestablishing TE suppression, or mutualistic, where TEs are co-opted to benefit their host. The TART-A TE functions as an important component of Drosophila telomeres, but has also reportedly inserted into the D. melanogaster nuclear export factor gene nxf2. We find that, rather than inserting into nxf2, TART-A has actually captured a portion of nxf2 sequence. We show that Nxf2 is involved in suppressing TART-A activity via the piRNA pathway and that TART-A produces abundant piRNAs, some of which are antisense to the nxf2 transcript. We propose that capturing nxf2 sequence allowed TART-A to target the nxf2 gene for piRNA-mediated repression and that these two elements are engaged in antagonistic co-evolution despite the fact that TART-A is serving a critical role for its host genome.
Introduction
Transposable elements (TEs) must replicate faster than their host to avoid extinction. The vast majority of new TE insertions derived from this replicative activity are deleterious to their host: they can disrupt and/or silence protein-coding genes and lead to chromosome rearrangements (Y. C. Lee, 2015; Y. C. G. Lee & Karpen, 2017; Petrov, Fiston-Lavier, Lipatov, Lenkov, & Gonzalez, 2011). In response to the mutational burden imposed by TEs, TE hosts have evolved elaborate genome surveillance mechanisms to identify and target TEs for suppression. One of the most well-known genome defense pathways in metazoan species involves the production of piwi-interacting small RNAs, also known as piRNAs (Brennecke et al., 2007). PiRNA precursors are produced from so-called piRNA clusters, which are located in heterochromatic regions of the genome and contain fragments of many families of TEs, whose insertions have accumulated in these regions. These precursors are processed into primary piRNAs, which use sequence homology to guide piwi-proteins to complementary transcripts produced by active transposable elements (Brennecke et al., 2007; Gunawardane et al., 2007). Piwi proteins induce transcriptional silencing through cleavage of the TE transcript. The sense-strand cleavage product of the TE transcript can then aid in processing piRNA precursors though a process known as the ping-pong cycle, which amplifies the silencing signal (Brennecke et al., 2007; Gunawardane et al., 2007). Alternatively, the cleaved transcript can be processed by the endonuclease Zucchini into additional “phased” piRNAs starting from the cleavage site and proceeding in the 3’ direction (Han, Wang, Li, Weng, & Zamore, 2015; Mohn, Handler, & Brennecke, 2015).
In addition to piRNAs, various other host mechanisms have evolved to target TEs (Cam, Noma, Ebina, Levin, & Grewal, 2008; Esnault et al., 2005; Satyaki et al., 2014; Thomas & Schneider, 2011)(mammalian systems reviewed in (Molaro & Malik, 2016)). Despite these multiple layers of genome surveillance, active TEs are found in the genomes of most organisms. The ubiquity of active TEs suggests that host silencing mechanisms are not completely effective, possibly because the TE and its host genome are involved in an evolutionary “arms race” where TEs are continuously evolving novel means to avoid host silencing and the host genome is constantly reestablishing TE suppression (Parhad & Theurkauf, 2019). On the host side, many TE silencing components have been shown to be evolving rapidly under positive selection (Crysnanto & Obbard, 2019; Helleu & Levine, 2018; Jacobs et al., 2014; Kelleher, Edelman, & Barbash, 2012; Kolaczkowski, Hupalo, & Kern, 2011; Levine, Vander Wende, Hsieh, Baker, & Malik, 2016; Obbard, Jiggins, Bradshaw, & Little, 2011; Obbard, Jiggins, Halligan, & Little, 2006; Simkin, Wong, Poh, Theurkauf, & Jensen, 2013), in agreement with on-going host-TE conflict. On the transposon side, a TE can mount a counter-defense by silencing or blocking host factors (Fu et al., 2013; McCue, Nuthikattu, & Slotkin, 2013; Nosaka et al., 2012) or simply evade host silencing by replicating in permissive cells (L. Wang, Dou, Moon, Tan, & Zhang, 2018) or cloaking themselves in virus-like particles (Mari-Ordonez et al., 2013). However, there are surprisingly few examples of any of these strategies (Cosby, Chang, & Feschotte, 2019). In fact, there is some evidence that, rather than an evolutionary arms race, the rapid evolution of host silencing genes is related to avoiding gene silencing due to off-target effects (i.e. piRNA autoimmunity (Blumenstiel, Erwin, & Hemmer, 2016; Luyang Wang, Barbash, & Kelleher, 2019)) and/or co-evolution with viruses (reviewed in (Cosby et al., 2019)).
While there are currently only a few examples of TE counter-defense strategies, there are many examples of TEs being co-opted by their host genome for its own advantage (see reviews (Bohne, Brunet, Galiana-Arnoux, Schultheis, & Volff, 2008; Chuong, Elde, & Feschotte, 2017; Cosby et al., 2019; Feschotte, 2008; Volff, 2006)). TEs can disperse regulatory sequences across the genome, which allows them to rewire gene regulatory networks. Such rewiring phenomena have been implicated in a variety of evolutionary innovations from pregnancy to dosage compensation (Chuong, Elde, & Feschotte, 2016; Chuong, Rumi, Soares, & Baker, 2013; Dunn-Fletcher et al., 2018; C. Ellison & Bachtrog, 2019; C. E. Ellison & Bachtrog, 2013; Fuentes, Swigut, & Wysocka, 2018; Lynch, Leclerc, May, & Wagner, 2011; Lynch et al., 2015; Notwell, Chung, Heavner, & Bejerano, 2015; Pontis et al., 2019). TEs are also an important source of host genes and noncoding RNAs (Joly-Lopez & Bureau, 2018; Kapusta et al., 2013). Hundreds of genes in species ranging from mammals to plants have been acquired from transposons (Bohne et al., 2008; Joly-Lopez, Hoen, Blanchette, & Bureau, 2016; Volff, 2006). Finally, TEs can act as structural components of the genome. There is evidence that TEs may play a role in centromere specification in a variety of species (Chang et al., 2019; Chueh, Northrop, Brettingham-Moore, Choo, & Wong, 2009; Klein & O’Neill, 2018), and in Drosophila, which lacks telomerase, specific TEs serve as telomeres by replicating to chromosome ends (Levis, Ganesan, Houtchens, Tolar, & Sheen, 1993; Traverse & Pardue, 1988).
In Drosophila melanogaster, three related non-LTR retrotransposons occupy the telomeres: HeT-A, TAHRE, and TART, which are often abbreviated as HTT elements (Abad et al., 2004b; Biessmann et al., 1992; Levis et al., 1993; Sheen & Levis, 1994). These elements belong to the Jockey clade of Long Interspersed Nuclear Elements (LINEs), which contain open reading frames for gag (ORF1) and an endonuclease/reverse transcriptase protein (ORF2, lost in HeT-A) (Malik, Burke, & Eickbush, 1999; Villasante et al., 2007). These elements form head-to-tail arrays at the chromosome ends and their replication solves the chromosome “end-shortening” problem without the need for telomerase (Biessmann & Mason, 1997).
These telomeric elements represent a unique case of TE domestication. They serve a critical role for their host genome, yet they are still active elements, capable of causing mutational damage if their activity is left unchecked (Khurana, Xu, Weng, & Theurkauf, 2010; Savitsky, Kravchuk, Melnikova, & Georgiev, 2002; Savitsky, Kwon, Georgiev, Kalmykova, & Gvozdev, 2006). All three elements have been shown to produce abundant piRNAs, and RNAi knockdown of piRNA pathway components leads to their upregulation (Savitsky et al., 2006; Shpiz & Kalmykova, 2011; Shpiz et al., 2011), consistent with the host genome acting to constrain their activity and raising the possibility that, despite being domesticated, these elements are still in conflict with their host (Y. C. Lee, Leek, & Levine, 2017).
There are multiple lines of evidence that this is indeed the case: the protein components of Drosophila telomeres are rapidly evolving under positive selection, potentially due to a role in preventing the HTT elements from overproliferation (Y. C. Lee et al., 2017). There is a high rate of gain and loss of HTT lineages within the melanogaster species group (Saint-Leandre, Nguyen, & Levine, 2019), and there is dramatic variation in telomere length among strains from the Drosophila Genetic Reference Panel (DGRP) (Wei et al., 2017). These observations are more consistent with evolution under conflict rather than a stable symbiosis (Saint-Leandre et al., 2019). Furthermore, the nucleotide sequence of the HTT elements evolves extremely rapidly, especially in their unusually long 3’ UTRs (Casacuberta & Pardue, 2002; Danilevskaya, Tan, Wong, Alibhai, & Pardue, 1998). Within D. melanogaster, three TART subfamilies have been identified which contain completely different 3’ UTRs, and which are known as TART-A, TART-B, and TART-C (Sheen & Levis, 1994).
In this study we have characterized the presence of sequence within the coding region of the D. melanogaster nxf2 gene that was previously annotated as an insertion of the TART-A transposon (Sackton et al., 2009). We find that the shared homology between TART-A and nxf2 is actually the result of TART-A acquiring a portion of the nxf2 gene, rather than the nxf2 gene gaining a TART-A insertion. We also find that nxf2 plays a role in suppressing TART-A activity, likely via the piRNA pathway. Our findings support a model where TART-A produces antisense piRNAs that target nxf2 for suppression as a counter-defense strategy in response to host silencing. We identified nxf2 cleavage products from degradome-seq data that are consistent with Aub-directed cleavage of nxf2 transcripts and we find that, across the Drosophila Genetic Reference Panel (DGRP), TART-A copy number is negatively correlated with nxf2 expression. Our findings suggest that TEs can selfishly manipulate host silencing pathways in order to increase their own copy number and that a single TE family can benefit, as well as antagonize, its host genome.
Results
The TART-like region of nxf2 is conserved across the melanogaster group
It was previously reported that the homology between nxf2 and TART-A is due to an insertion of the TART-A transposable element in the nxf2 gene that became fixed in the ancestor of D. melanogaster and D. simulans (Sackton et al., 2009). To investigate the homology between these elements in more detail, we first extracted 700 bp of sequence from the 3’ region of the nxf2 gene that was annotated as a TART-A insertion (Figure 1A) and used BLAST (Altschul, Gish, Miller, Myers, & Lipman, 1990) to search this sequence against the TART-A RepBase sequence, which was derived from a full-length TART-A element cloned from the iso1 D. melanogaster reference strain (Abad et al., 2004a). Within the 700 bp segment of nxf2, there are four regions of homology between it and the 3’ UTR of the TART-A consensus sequence. These regions are between 63 bp and 228 bp in length and 93% - 96% sequence identity (Figure 1B). The 5’ UTR of TART-A is copied from its 3’ UTR during reverse transcription, which means that, for a given element, both UTRs are identical in sequence (George, Traverse, DeBaryshe, Kelley, & Pardue, 2010). The homology with the nxf2 3’ UTR is therefore mirrored in the 5’ UTR as well (Figure 1B).
To investigate the evolutionary origin of the homology between nxf2 and TART-A, we identified nxf2 orthologs in D. simulans, D. yakuba, D. erecta, D. biarmipes, and D. elegans. We created a multiple sequence alignment and extracted the sub-alignment corresponding to the 700 bp segment with homology to TART-A (Figure 1C). The TART-like region of nxf2 is clearly present in all six of these species, which means that, if this portion of the nxf2 gene was derived from an insertion of a TART-A element, the most recent timepoint at which the insertion could have occurred is in the common ancestor of the melanogaster group, ∼15 million years ago (Obbard et al., 2012). At the nucleotide level, there is only weak homology between nxf2 coding sequence and transcripts from more distantly related Drosophila species, such as D. pseudoobscura. However, at the peptide level, the C-terminal region of Nxf2, which was thought to be derived from TART-A, is actually conserved across Drosophila, from D. melanogaster to D. virilis (Figure S1), suggesting that, if a TART-A element did insert into the nxf2 gene, it was not a recent event.
A portion of nxf2 was captured by the D. melanogaster TART-A element
If an ancestral TART-A element was inserted into the nxf2 gene in the common ancestor of the melanogaster group, the shared homology between nxf2 and TART-A should be present in most, if not all, extant species in the group. To test this prediction, we obtained the sequences for previously identified TART-A homologs from D. yakuba and D. sechellia (Casacuberta & Pardue, 2002; Villasante et al., 2007). We aligned these sequences to the D. melanogaster TART-A consensus sequence and found that the TART-A region that shares homology with the nxf2 gene is only present in the D. melanogaster TART-A sequence (Figure 2A & S2). Next, we used BLAST to search the canonical TART-A sequence against the D. melanogaster reference genome. We identified 5 full-length TART-A sequences in the assembly (3 from the X chromosome and 2 from the dot chromosome), all of which contain the nxf2-like sequence. The nxf2-like sequence from these five elements is 100% identical to that from the canonical TART-A sequence. We also identified an additional four TART-A fragments that overlapped with the nxf2-like region. One of the four is also 100% identical to the canonical sequence while the remaining three are between 96%-99% identical to the canonical sequence.
We added these nine sequences to the multiple sequence alignment in Figure 1C and inferred a maximum likelihood phylogeny in order to better understand the evolutionary history of the nxf2/TART shared homology (Figure 2B). The youngest node in the phylogeny represents the split between the D. melanogaster nxf2 and TART-A elements, suggesting that the event leading to the shared homology between these sequences occurred relatively recently, which is consistent with the high degree of sequence similarity between the D. melanogaster TART-A and nxf2 subsequences. Based on these results, we conclude that the nxf2/TART-A shared homology is much more likely to have arisen via the recent acquisition of nxf2 sequence by TART-A after the split of D. melanogaster from D. simulans/sechellia, rather than an insertion of TART-A into the nxf2 gene. The mechanism by which TART-A could have acquired a portion of nxf2 is not clear, however one possibility is via transduction, a process where genomic regions flanking a TE insertion can be incorporated into the TE itself due to aberrant retrotransposition (Moran, DeBerardinis, & Kazazian, 1999; Pickeral, Makalowski, Boguski, & Boeke, 2000).
The nxf2 gene plays a role in suppressing the activity of D. melanogaster telomeric elements
Nxf2 is part of an evolutionarily conserved gene family with functions related to export of RNA from the nucleus (Herold et al., 2000). In Drosophila, a paralog of nxf2 (nxf1) has been shown to be involved in the nuclear export of piRNA precursors and the nxf2 gene itself was identified as a member of the germline piRNA pathway via an RNAi screen (Czech, Preall, McGinn, & Hannon, 2013; Dennis, Brasset, Sarkar, & Vaury, 2016). More recently, several studies have independently shown that Nxf2 is involved in the co-transcriptional silencing of transposons as part of a complex with Nxt1 and Panoramix (Batki et al., 2019; Fabry et al., 2019; Murano et al., 2019; Zhao et al., 2019). To determine whether nxf2 is involved in the suppression of TART-A, we used a short hairpin RNA (shRNA) from the Drosophila transgenic RNAi project (TRiP) with a nos-GAL4 driver to target and knockdown expression of nxf2 in the ovaries. We sequenced total RNA from the nxf2 knockdown and a control knockdown of the white gene. We observed a strong increase in expression for a variety of TE families upon knockdown of nxf2 (Figure S3). The three telomeric elements HeT-A, TAHRE, and TART-A, are among the top 10 most highly upregulated transposable elements, with HeT-A showing ∼300-fold increase in expression in the nxf2 knockdown (TAHRE: ∼110-fold increase, TART-A: ∼30-fold increase)(Figure 3). We repeated the experiment using a shRNA that targeted a different region of nxf2 and observed a similar pattern and strong correlation between TE expression profiles of both knockdowns (Spearman’s rho=0.94, Figure S4). These results support previous findings that nxf2 is a component of the germline piRNA pathway and show that this gene is particularly important for the suppression of the telomeric TEs HeT-A, TAHRE, and TART-A.
TART-A piRNAs may target nxf2 for silencing
Previous studies have reported abundant piRNAs derived from the telomeric TEs, HeT-A, TAHRE and TART-A (Savitsky et al., 2006; Shpiz et al., 2007; Shpiz et al., 2011). We sought to determine whether piRNAs arising from the nxf2-like region of TART-A could be targeting the nxf2 gene for downregulation via the piRNA pathway. We used previously published piRNA data from 16 wild-derived strains from the Drosophila Genetic Reference Panel (DGRP)(Song et al., 2014). Because the 5’ UTR is copied from the 3’ UTR, we masked the 5’ UTR of TART-A before aligning the piRNA data. Among the 16 strains, we found a large variation in TART-A piRNA production ranging from 60 – 12,300 reads per million (RPM). From the pool of 16 strains, we identified ∼1.3 million reads that aligned to TART-A, 98% of which map uniquely (see Methods)(Figure 4A). TART-A piRNAs have previously been shown to exhibit the 10bp overlap signature of ping-pong cycle amplification (Hur et al., 2016) and we identified both sense and antisense piRNAs arising from TART-A (Figure 4B) as well as an enrichment of alignments where the 5’ end of one piRNA is found directly after the 3’ end of the previous piRNA (i.e. 3’ to 5’ distance of 1), consistent with piRNA phasing (Figure 4C). We identified ∼95,000 piRNAs arising from the TART-A region that shares homology with nxf2. Of these reads, 59% are antisense to TART-A and 41% are sense.
We next focused on piRNA production from nxf2. We reasoned that, if nxf2 expression is subject to piRNA-mediated regulation, we should see piRNAs derived from the nxf2 transcript, outside of the region that shares homology with TART-A. We masked the nxf2/TART-A region of shared homology and aligned the piRNA sequence data to the nxf2 transcript. We found low but consistent production of piRNAs from nxf2 across all 16 DGRP strains (between 1.5 and 41 RPM), with 99.7% of nxf2-aligned reads mapping uniquely. To increase sequencing depth, we pooled the data from all 16 strains (2,624 nxf2 reads total) and examined piRNA abundance along the nxf2 transcript (Figure 4D). We found that the most abundant production of piRNAs from nxf2 occurs at the 3’ end of the transcript, downstream from the regions of shared homology with TART-A (Figure 4D). Overall, 99.4% of reads from nxf2 are derived from the sense strand of the transcript (Figure 4E) and the nxf2 piRNAs also show evidence of phasing (Figure 4F). The enrichment of nxf2-derived piRNAs downstream from the region of shared homology with TART-A, along with our observation that almost all nxf2 piRNAs are derived from the sense strand, suggests that these piRNAs are not amplified via the ping-pong cycle, but are instead produced by the Zucchini-mediated phasing process.
These results are consistent with a model where antisense piRNAs from the nxf2-like region of TART-A are bound by Aubergine and targeted to sense transcripts from the nxf2 gene. Aub cleaves target transcripts between the bases paired to the 10th and 11th nucleotides of its guide piRNA, resulting in a cleavage product with a 5’ monophosphate that shares a 10 bp sense:antisense overlap with the guide piRNA that triggered the cleavage. These cleavage products can be enriched and sequenced using an approach known as degradome-seq (Addo-Quaye, Eshoo, Bartel, & Axtell, 2008). We analyzed published degradome-seq and Aub-immunoprecipitated piRNA data from wild-type D. melanogaster ovaries (W. Wang et al., 2014) to determine whether we could detect nxf2 cleavage products resulting from targeting by antisense TART-A piRNAs. The degradome-seq data are 100 bp paired-end reads which are long enough to distinguish between the TART-like region of nxf2 and the nxf2-like region of TART-A. We found three locations within the TART-like region of nxf2 where we observe degradome cleavage products that share the characteristic 10bp sense:antisense overlap with TART-A antisense piRNAs (Figure S5). These results can be explained under the following model: TART-A antisense piRNAs are produced by the ping-pong cycle and bound to Aubergine. A subset of these piRNAs (those from the nxf2-like region of TART-A) guide Aub to nxf2 transcripts which are then cleaved. Aub cleavage products can be further processed by Zucchini in the 5’ to 3’ direction thereby producing phased piRNAs from nxf2 transcripts downstream from the nxf2/TART-A regions of shared homology (Figure 5).
If piRNAs from TART-A are targeting nxf2 and downregulating its expression, knockdown of piRNA pathway components that either decrease piRNA production from TART-A (ping-pong and/or primary piRNA pathway components) or disrupt silencing of nxf2 (primary piRNA components) should result in an increase in expression of nxf2. We analyzed published RNA-seq data from nos-GAL4 driven knockdowns of sixteen genes that were identified as components of the piRNA pathway and that were specifically shown to be involved in repression of HeT-A and TAHRE (Czech et al., 2013). We compared the expression of nxf2 in each piRNA component knockdown to its expression in the control knockdown of the white gene and found that nxf2 shows increased expression in 14 of the 16 knockdowns, which represents a significant skew towards upregulation (one-sided binomial test P=0.002)(Figure 6).
Natural variation in TART-A copy number is correlated with nxf2 expression levels
Previous work has shown that there is large variation in HTT element copy number at the telomeres of wild Drosophila strains (Walter et al., 2007; Wei et al., 2017). Our results predict that, if TART-A piRNAs are targeting nxf2 for suppression, then strains with more copies of TART-A should have lower expression of nxf2. To test this prediction, we used previously published Illumina genomic sequencing data and microarray gene expression profiles from the Drosophila Genetic Reference Panel (DGRP)(Huang et al., 2014; Mackay et al., 2012). We used the Illumina data to infer TART-A copy number for 151 DGRP strains (see Methods) and obtained nxf2 microarray gene expression levels from whole adult females for these same strains. We found that, as predicted, there is a strong negative correlation between TART-A copy number and nxf2 gene expression levels among the DGRP (Figure 7) (Spearman’s rho = -0.48, P=4.6e-10).
Discussion
If the coding sequence of a gene shares sequence homology with a known transposable element, the most likely explanation for this shared homology is that a portion of the gene was derived from a TE insertion. This is, understandably, what was previously reported by Sackton et al for the nxf2 gene and the TART-A TE (Sackton et al., 2009), however our analyses are not consistent with such a scenario. Specifically, based on sequence similarity and phylogenetic clustering, the event that created the shared homology between nxf2 and TART-A must have occurred relatively recently, after D. melanogaster diverged from D. simulans, yet the putative insertion of TART-A in the nxf2 gene is shared across Drosophila. A scenario that is more consistent with these observations is one where, rather than the nxf2 gene gaining sequence from TART-A, the TART-A element captured a portion of the nxf2 gene, likely via aberrant transcription that extended past the internal TART-A poly-A signal to another poly-A signal in the flanking genomic region. This process has been observed for other TEs and is known as exon shuffling or transduction (Moran et al., 1999; Pickeral et al., 2000). Notably, the nxf2-like sequence of TART-A is located in its 3’ UTR, which would be expected if it were acquired via transduction (Figure 1). Interestingly, TART is part of the LINE family of non-LTR retrotransposons and Human LINE-L1 elements are known to undergo transduction fairly frequently (Goodier, Ostertag, & Kazazian, 2000; Moran et al., 1999; Pickeral et al., 2000). However, transduction would require that an active TART-A element was inserted somewhere upstream of the 3’ region of nxf2 at some point in the D. melanogaster lineage, but has since been lost from the population. Is this possible given that TART-A should only replicate to chromosome ends? The TIDAL-fly database of polymorphic TEs in D. melanogaster reports several polymorphic TART-A insertions far from the chromosome ends, which suggests that this element is occasionally capable of inserting into locations outside of the telomeres (Rahman et al., 2015).
The aberrant TART-A copy that acquired a portion of the nxf2 gene most likely arose as a single polymorphic insertion in an ancestral D. melanogaster population, yet the nxf2-like region of TART-A is now present in all full-length TART-A elements in the D. melanogaster reference genome assembly. We were unable to find any D. melanogaster TART-A elements in the reference genome, or in GenBank, whose 3’ UTR lacks the nxf2-like sequence. This suggests that the initially aberrant TART-A copy, which acquired a portion of nxf2, has now replaced the ancestral TART-A element, consistent with the gene acquisition event conferring a fitness benefit to TART-A.
How could the gene acquisition benefit TART-A? We found that the nxf2-like region of TART-A produces abundant antisense piRNAs that share homology with the nxf2 gene, and the nxf2 gene produces additional phased piRNAs from the unique sequence directly downstream from the regions of shared homology (Figure 4). These two observations are consistent with a scenario where TART-derived piRNAs guide Aub proteins to the nxf2 transcript. The TART-A piRNAs may then act as “trigger” piRNAs that catalyze cleavage of nxf2 transcripts while also resulting in the production of phased piRNAs starting in the region of shared homology and proceeding in the 3’ direction to the end of the nxf2 transcript (Figure 5). The piRNA-mediated cleavage of nxf2 transcripts, which is supported by degradome-seq data (see Figure S5), should result in a reduction in nxf2 expression levels. PiRNA-mediated suppression of nxf2 is consistent with our finding that disruption of the piRNA pathway by RNAi tends to result in increased nxf2 expression (Figure 6). Given that nxf2 plays a role in suppressing TART-A activity, reduced nxf2 levels should relieve TART-A suppression, which would presumably increase TART-A fitness by allowing it to make more copies of itself. Indeed, in the DGRP, we find that individuals with lower nxf2 expression levels tend to have higher numbers of TART-A copies and vice versa (Figure 7).
If additional copies of TART-A act to further suppress nxf2 expression, which then further de-represses TART-A, why is there not run-away accumulation of telomere length in D. melanogaster? Previous work has shown that long telomeres in D. melanogaster are associated with both reduced fertility and fecundity (Walter et al., 2007), so it is possible that a run-away trend towards increasing telomere length is balanced by a fitness cost.
Targeting of host transcripts by transposon-derived piRNAs has been previously observed in Drosophila. Most notably, piRNAs from the LTR retrotransposons roo and 412 play a critical role in embryonic development by targeting complementary sequence in the 3’ UTR of the gene nos, leading to its repression in the soma (Rouget et al., 2010). More recent results suggest hundreds of maternal transcripts could be regulated in a similar fashion (Barckmann et al., 2015). However, these represent cases where TE piRNAs have been co-opted to regulate host transcripts, whereas our results suggest that the piRNA targeting of nxf2 is a counter-defense strategy by TART-A. This type of strategy has only been previously observed in plants (Cosby et al., 2019). In rice, a CACTA DNA transposon produces a micro-RNA that targets a host methyltransferase gene known to be involved in TE suppression (Nosaka et al., 2012), while in Arabidopsis, siRNAs from Athila6 retrotransposons target the stress granule protein UBP1b, which is involved in suppressing Athila6 GAG protein production (McCue et al., 2013).
Given that viruses and other pathogens have evolved a variety of methods to block or disrupt host defense mechanisms, it is surprising that there is much less evidence for TEs adopting similar strategies (Cosby et al., 2019). However, unlike viruses, TEs depend heavily on vertical transmission from parent to offspring. Any counter-defense strategy that impacts host fitness would therefore decrease the fitness of the TE as well. Furthermore, disruption of host silencing is likely to lead to upregulation of other TEs, making it more likely that will be a severe decrease in host fitness, similar to what is observed in hybrid dysgenesis. These explanations are relevant to our results: TART-A may be targeting nxf2 for its own advantage, but our knockdown experiment shows that nxf2 suppression causes upregulation of many other TEs besides TART-A (Figures 3 and S3) and other studies have shown that nxf2 mutants are sterile (Batki et al., 2019; Fabry et al., 2019). Why then, does TART-A appear to be targeting nxf2 in spite of these potentially deleterious consequences? One possibility is that the suppression of nxf2 expression caused by TART-A is relatively mild (i.e. much less than the level of down-regulation caused by the RNAi knockdown), which is enough to provide a slight benefit to TART-A without causing widespread TE activation. It is also possible that the suppression effect was initially much larger, but has since been counterbalanced by cis-acting variants that increase nxf2 expression. Future work examining TE activation under varying levels of nxf2 expression may help to determine whether there is a tipping point where nxf2 suppression becomes catastrophic.
In summary, our results show that so-called domesticated TEs, if active, can still be in conflict with their host and raise the possibility that TE counter-defense strategies may be more common than previously recognized, despite the potentially deleterious consequences for the host.
Methods
TART-A sequence analysis
We used the TART-A sequence from RepBase (Jurka, 2000), which is derived from the sequence reported in (Abad et al., 2004a) (Genbank accession AJ566116). This sequence represents a single full-length TART-A element cloned from the D. melanogaster iso1 reference strain. The nxf2-like portion of this sequence is 100% identical to another TART-A element cloned and sequenced from D. melanogaster strain A4-4 (Genbank DMU02279)(Levis et al., 1993) as well as the TART-A sequence from the FlyBase canonical set of transposon sequences (version 9.42)(Thurmond et al., 2019) (cloned from D. melanogaster strain Oregon-R: Genbank AY561850)(Berloco, Fanti, Sheen, Levis, & Pimpinelli, 2005).
We used BLAST (Altschul et al., 1990) to compare the TART-A sequence to the D. melanogaster nxf2 transcript and visualized BLAST alignments with Kablammo (Wintersinger & Wasmuth, 2015). To compare TART-A among Drosophila species, we used the D. yakuba TART-A sequence reported in (Casacuberta & Pardue, 2002)(GenBank AF468026), which includes the 3’ UTR. We also used the D. sechellia TART-A ORF2 reported by (Villasante et al., 2007)(Genbank AM040251) to search the D. sechellia FlyBase r1.3 genome assembly for a TART-A copy that included the 3’ UTR, which we found on scaffold_330:4944-14419. We attempted a similar approach for D. simulans, but were unable to find a TART-A copy in the D. simulans FlyBase r2.02 assembly that included the 3’ UTR. We aligned the D. melanogaster, D. yakuba and D. sechellia TART-A sequences to each other, and to the D. melanogaster nxf2 transcript (FlyBase FBtr0089479), using nucmer (Kurtz et al., 2004). We then used mummerplot (Kurtz et al., 2004) to create a dotplot to visualize the alignments. To identify all copies of TART-A carrying the nxf2-like sequence, we used BLAST to search the TART-A 3’ UTR against the D. melanogaster release 6 reference genome.
nxf2 sequence analysis
We downloaded nxf2 transcripts from the NCBI RefSeq database for Drosophila simulans (XM_016169386.1), yakuba (XM_002095083.2), erecta (XM_001973010.3), biarmipes (XM_017111057.1), and elegans (XM_017273027.1) and created a codon-aware multiple sequence alignment using PRANK (Loytynoja, 2014), which we visualized with JalView (Waterhouse, Procter, Martin, Clamp, & Barton, 2009). To compare Nxf2 peptide sequences, we used the web version of NCBI BLAST to search the D. melanogaster Nxf2 peptide sequence against all Drosophila peptide sequences present in the RefSeq database. We then used the NCBI COBALT (Papadopoulos & Agarwala, 2007) multiple-sequence alignment tool to align the sequences shown in Figure S1.
TART-A/nxf2 gene tree
We extracted the nxf2-like sequences from all TART-A copies present in the D. melanogaster reference genome and aligned them to the TART-like nxf2 sequences from seven Drosophila species using PRANK. We then inferred a maximum likelihood phylogeny with 100 bootstrap replicates using RAxML (Stamatakis, 2014).
nxf2 knockdown
We used two different strains from the Drosophila Transgenic RNAi Project (TRiP) that express dsRNA for RNAi of nxf2 (Bloomington #34957 & #33985), as well as a control strain for RNAi of the white gene (Bloomington #33613). Seven males of each of these strains were crossed to seven, 3-5 day old, virgin females carrying the nos-GAL4 driver (Bloomington #25751). After 6 days of mating, we discarded the parental flies and then transferred F1 offspring to fresh food for 2.5 days before collecting ovaries from six females for each cross. We performed two biological replicates for each of the three crosses, dissected the ovaries in 1x PBS and immediately transferred them to RNAlater. We extracted RNA using Trizol/Phenol-Chloroform and used the AATI Fragment Analyzer to assess RNA integrity. We then prepared stranded, total RNA-seq libraries by first depleting rRNA with ribo-zero and then using the NEBnext ULTRA II library prep kit to prepare the sequencing libraries. The libraries were sequenced on the Illumina NextSeq machine with 150 bp paired-end reads.
nxf2 knockdown RNA-seq analysis
The average insert sizes of the total RNA-seq libraries were less than 300 bp, which resulted in overlapping mate pairs for the majority of sequenced fragments. Instead of analyzing these data as paired-end reads, we instead merged the overlapping mates to generate single-end reads using BBmerge (Bushnell, Rood, & Singer, 2017). We removed rRNA and tRNA contamination from the merged reads by aligning them to all annotated rRNA and tRNA sequences in the D. melanogaster reference genome using Hisat2 (Kim, Langmead, & Salzberg, 2015) and retained all unaligned reads. In order to quantify expression from genes as well as TEs, we combined all D. melanogaster transcript sequences (FlyBase version 6.26) with D. melanogaster RepBase TE consensus sequences. We accounted for multi-mapping reads by using bowtie2 (Langmead & Salzberg, 2012) to align each read to all possible alignment locations (using --all and --very-sensitive-local) and then using eXpress (Roberts & Pachter, 2013) to estimate FPKM values, accounting for the multi-mapped alignments. We averaged FPKM values between biological replicates and assessed the reproducibility of both TE and gene expression profiles in the nxf2 knockdown by comparing the results from the two different dsRNA hairpins.
piRNA analysis
We analyzed previously published piRNA data from 16 strains from the Drosophila Genetic Reference Panel (DGRP)(Song et al., 2014). We used cutadapt (Martin, 2011) to trim adapter sequences from each library and then removed rRNA and tRNA sequences by using bowtie (Langmead, 2010) to align the reads to all annotated rDNA and tRNA genes in the D. melanogaster reference genome, retaining the reads that did not align. We then created a reference database composed of the following sequence sets: a hard-masked version of the D. melanogaster reference genome assembly (release 6) where all TE sequences and the nxf2 gene were replaced by N’s using RepeatMasker, the full set of D. melanogaster RepBase TE consensus sequences, and the nxf2 transcript, with its TART-like region replaced by N’s. We used the unique-weighting mode in ShortStack (Axtell, 2013; Johnson, Yeoh, Coruh, & Axtell, 2016) to align the piRNA reads to this reference database. With this mode, ShortStack probabilistically aligns multi-mapping reads based on the abundance of uniquely mapping reads in the flanking region. We then used the ShortStack alignments and Bedtools (Quinlan & Hall, 2010) to calculate coverage for sense and antisense alignments to TART-A as well as nxf2. To test for evidence of piRNA phasing, we used the formula described in (Han et al., 2015)
piRNA component knockdowns
We used the RNA-seq counts for nxf2 reported in GEO accession GSE117217 from 16 RNAi knockdowns of piRNA pathway components as well as a control knockdown of the Yb gene (Czech et al., 2013). For each knockdown, we normalized nxf2 expression by dividing the raw counts by the sum of all gene counts and reported the result in Reads Per Million (RPM).
Degradome-seq analysis
We used degradome-seq and Aub-immunoprecipitated small RNA data from wild-type D. melanogaster strain w1 (W. Wang et al., 2014). We used bowtie2 to align the degradome-seq data to the same reference sequence used in the piRNA analysis except we unmasked the nxf2 transcript. We analyzed the small RNA data as described under “piRNA analysis” and then used bedtools to extract degradome read alignments whose 5’ end was located in the TART-like region of nxf2 and antisense small RNA alignments whose 5’ end was located in the nxf2-like region of TART-A and whose length was consistent with piRNAs (23-30 bp). We then used bowtie to align the minus strand piRNAs to the nxf2 transcript and used bedtools to identify piRNAs whose 5’ end overlapped the 5’ of degradome reads by 10 basepairs.
TART-A copy number variation and nxf2 expression
We used Illumina genomic sequencing data from the DGRP (Huang et al., 2014; Mackay et al., 2012) to estimate TART-A copy number. Across strains, the DGRP Illumina data differs in terms of coverage, read length, and paired versus single-end data. To attempt to control for these differences, we trimmed all reads to 75 bp and treated all data as single-end. We also downsampled all libraries to ∼13 million reads. We first trimmed each strain’s complete dataset (unix command: zcat file.fastq.gz | cut -c 75) and then aligned the trimmed reads to the D. melanogaster release 6 genome assembly using bowtie2 with the --very-sensitive option. We then corrected the resulting bam file for GC bias using DeepTools (Ramirez, Dundar, Diehl, Gruning, & Manke, 2014) and counted the number of aligned reads in the corrected bam file using samtools (Li et al., 2009). We removed all strains with less than 13 million aligned reads and, for each remaining strain, we calculated the fraction of reads to keep by dividing the smallest number of aligned reads across all remaining individuals (13,594,737) by the total number of aligned reads for that strain. We then used this fraction to randomly downsample the GC corrected bam file using the subsample option from samtools view (Li et al., 2009). We converted each bam file to a fastq file with samtools fastq and aligned the fastq file to the D. melanogaster RepBase TE sequences with bowtie2 using the --very-sensitive, --local, and --all options. With --all, bowtie2 reports every possible alignment for each multi-mapping sequence. We then used eXpress to retain a single alignment for each multi-mapping sequence based on the abundance of neighboring unique alignments. We used the eXpress bam files to calculate the median per-base coverage (excluding positions with coverage of zero) for the TART-A coding sequence (i.e. ORF1 & ORF2), for each individual. To estimate TART-A copy number, we divided the median TART-A coverage of each strain by that strain’s median per-base coverage of all uniquely-mappable positions in the D. melanogaster reference genome (calculated from the GC corrected, downsampled bam file). Uniquely-mappable positions were identified using mirth (https://github.com/EvolBioInf/mirth). We obtained nxf2 expression values from previously published microarray gene expression profiles from whole adult females for all DGRP strains (Huang et al., 2015).