Abstract
In Arabidopsis, LTR-retrotransposons are activated by mutations in the chromatin remodeler DECREASE in DNA METHYLATION 1 (DDM1), giving rise to 21-22nt epigenetically activated siRNAs (easiRNAs). We purified virus-like-particles (VLPs) from ddm1 and ddm1rdr6 mutants in which genomic RNA is reverse transcribed into complementary DNA. Next generation short-read and long-read sequencing of VLP DNA (VLP DNA-seq) revealed a comprehensive catalog of active LTR-retrotransposons without the need for mapping transposition, and independent of genomic copy number. Linear replication intermediates of ATCOPIA93/EVADE revealed multiple central polypurine tracts (cPPT), a feature shared with HIV where cPPT promote nuclear localization. For ATCOPIA52, cPPT intermediates were not observed, but abundant circular DNA indicated transposon “suicide” by auto-integration within the VLP. easiRNA targeted ATCOPIA93/EVADE genomic RNA, polysome association of GYPSY (ATHILA) subgenomic RNA, and transcription via histone H3 lysine-9 dimethylation. VLP DNA-seq provides a comprehensive landscape of LTR-retrotransposons, and their control at transcriptional, post-transcriptional and reverse transcriptional levels.
Introduction
Long terminal repeat (LTR) retrotransposons are a major component of the large genomes of most animal and plant species (Huang et al., 2012; Wang et al., 2014). However, the relative activity of any individual element can only be assessed using transposition assays, or by comparing insertion sites among individuals within a population. The mouse genome, for example, contains more than one million endogenous retroviruses, of which only a handful are autonomous elements. We have developed an alternative strategy for retrotransposon discovery, using next generation sequencing of replication intermediates from viral-like particles (VLP). By sequencing intermediates from different genetic backgrounds, insights can be gained into mechanisms of genetic and epigenetic regulation.
VLPs have been isolated in yeast and Drosophila (Bachmann et al., 2004; Eichinger and Boeke, 1988; Kenna et al., 1998) as well as in plants (Bachmair et al., 2004; Jaaskelainen et al., 1999). Ty1/Copia elements in plants have a single open reading frame that encodes both the GAG protein, which is the capsid protein responsible for VLP formation, and the reverse transcriptase, RNase H, and integrase polyprotein (POL) which are co-assembled with their genomic RNA (gRNA) into VLPs (Finnegan, 2012; Pachulska-Wieczorek et al., 2016; Peterson-Burch and Voytas, 2002; Sabot and Schulman, 2006). Ty3/gypsy elements also have a single GAG-POL ORF, although the POL proteins are in a different order. In yeast, Ty1 uses a frameshift between GAG and POL to enhance translation of GAG. In both Drosophila and plants, the Ty1/copia GAG protein is translated from an abundant, alternatively spliced subgenomic RNA (Chang et al., 2013; Yoshioka et al., 1990). In Arabidopsis Ty1/copia elements, the subgenomic GAG RNA is more efficiently translated than unspliced GAG-POL transcripts, and blocking splicing leads to significant reduction of GAG protein translation (Oberlin et al., 2017). After VLP formation in the cytoplasm, LTR retrotransposons proliferate through tRNA-primed reverse transcription of gRNA, followed by nuclear import of cDNA and integration into new loci (Schorn and Martienssen, 2018) (Supplemental Fig. S1). In yeast and Arabidopsis, tRNA-iMet initiates reverse transcription of the LTR from the primer binding site (PBS) to the 5’ end of the R region making minus-strand strong-stop DNA (Griffiths et al., 2018; Mules et al., 1998; Schorn and Martienssen, 2018). RNase H degrades the template RNA upstream of the PBS, and minus-strand strong-stop DNA is transferred to the 3’ LTR to prime minus strand cDNA synthesis toward the PBS (Supplemental Fig. S1). During the extension of minus-strand cDNA synthesis, the template RNA is degraded except for an RNase H-resistant polypurine tract (PPT) near the 3’ LTR (Wilhelm et al., 2001). This PPT RNA fragment primes plus-strand strong-stop DNA synthesis up to U5 and the PBS sequence from the translocated minus strand (Supplemental Fig. S1B). Then, the plus-strand cDNA is transferred to the 5’ end to prime full length double-stranded DNA. Additional central PPT (cPPT) can also initiate plus-strand synthesis which is displaced by the 3’ end of plus-strand DNA causing DNA flaps to form during Ty1 replication (Garfinkel et al., 2006). cPPT and DNA flaps have been found in the highly active lentivirus HIV-1 where they play roles in nuclear import and in preventing mutagenesis by APOBEC (Hu et al., 2010; VandenDriessche et al., 2002; Wurtzer et al., 2006; Zennou et al., 2000).
Inhibition of retrotransposons by small RNA has been reported in metazoans and plants, as well as in fission yeast, and occurs at the transcriptional and post-transcriptional levels. In Drosophila, piwi-interacting RNA (piRNA) trigger transcriptional silencing of transposons in the germline (Czech et al., 2018) resembling fission yeast in this respect (Volpe et al., 2002). By contrast, Ago2 and Dcr2 lie in the post-transcriptional pathway, and mutations result in increased somatic retrotransposition (Xie et al., 2013). In mammalian embryos, 3’ tRNA fragments (3’-tRF) control transposition of LTR retrotransposons both post-transcriptionally and by direct inhibition of reverse transcription (Schorn et al., 2017). In Arabidopsis, transcriptional activation of some LTR retrotransposons by stress, or by loss of histone methylation, also requires loss of 24nt small RNA and RDR2/RNA polymerase IV (Ito et al., 2011; Mirouze et al., 2009). By contrast, in ddm1 mutants and wild-type pollen, most transposons are transcriptionally activated and 24nt siRNA are partly replaced by 21-22nt easiRNA (Slotkin et al., 2009). In ddm1 mutants, easiRNA are generated by RDR6 (Creasey et al., 2014; Nuthikattu et al., 2013) from the non-functional ATHILA2 and ATHILA6 Ty3/gypsy retrotransposons but also from the functional, TY1/copia element EVADE, and are triggered by diverse miRNA. In wild-type, retroelements generate easiRNA only in pollen, where they are targeted at the PBS by miR845, and biogenesis occurs via a non-canonical pathway (Borges et al., 2018).
In order to develop a comprehensive catalog of functional retrotransposons in Arabidopsis, we performed VLP DNA sequencing from ddm1 mutants, as well as genome-wide polysomal RNA (translatome) and chromatin immunoprecipitation (ChIP) sequencing. VLP sequencing recovered all known active retrotransposons in Arabidopsis, without the need for genome sequencing of transposition events. Replication intermediates revealed profound differences between elements with multiple cPPT and high integration rates, and elements with no cPPT which preferentially integrated into themselves (“suicidal” auto-integration within the VLP). We examined the roles of easiRNA in retrotransposon control by investigating ddm1rdr6 double mutants (Creasey et al., 2014; Lippman et al., 2004; Vongs et al., 1993). We found that some retrotransposons are regulated post-transcriptionally by RNA interference, while others are regulated at the transcriptional level by histone H3 lysine-9 methylation guided by small RNA. We conclude that easiRNA inhibits retrotransposition at multiple levels in the replication cycle and identify features of active retrotransposons that promote activity and escape from silencing.
Results
Characterization of functional LTR retrotransposons by VLP DNA sequencing
Functional LTR retrotransposons form VLPs assembled from GAG proteins (Sabot and Schulman, 2006) (Supplemental Fig. S1A). Reverse transcription occurs inside the VLPs, and cDNA products are subsequently imported into the nucleus bound to the integrase protein. After integration into new genomic loci these insertions transcribe additional gRNA. We purified VLPs after treatment with DNase I (Methods), and sequenced cDNA products from wild-type, ddm1, and ddm1rdr6 using both short read (Illumina) and long read (Oxford Nanopore Technologies) sequencing platforms (Supplemental Fig. S2). EVADE is one of two full length elements of the ATCOPIA93 family in A. thaliana Col-0, and when it is transcriptionally activated, it is the most successful retroelement by far in terms of copy number increases, although transposition of ATGP3, ATCOPIA13, ATCOPIA21, ATCOPIA31, ATCOPIA51, ATCOPIA63, ATGP2N, and ATRE1 have also been detected under non-stressed conditions (Ito et al., 2011; Quadrana et al., 2019; Tsukahara et al., 2009). Full length VLP DNA from all of these elements was dramatically enriched in ddm1 and ddm1rdr6 mutants consistent with active reverse transcription (Fig. 1; Supplemental Figs. S2,S3; Supplemental Table S1). VLP DNA from some ATCOPIA families were more enriched in ddm1 than ddm1rdr6, likely due to transcriptional down-regulation in ddm1rdr6 (Creasey et al., 2014). VLP DNA from ATHILA families were enriched in ddm1rdr6, but comprised small fragments, mostly from the LTR likely reflecting abortive retrotransposition intermediates from these non-functional elements (Supplemental Fig. S3A) (Marco and Marin, 2008). By contrast, long read coverage of EVADE and other active COPIA elements spanned the entire element, and was increased in ddm1rdr6 (Fig. 1B; Supplemental S2B; Supplemental Table S1). Furthermore, linear extrachromosomal DNA (ecDNA) was dramatically increased in ddm1rdr6 by Southern blot (Fig. 2A).
cDNA can exist in both linear and circular forms, and circular forms were previously reported for EVADE and other members of the ATCOPIA93 family (Lanciano et al., 2017; Reinders et al., 2013). We observed outward-facing paired-end read alignments from Illumina VLP-seq samples mapping to ATCOPIA51, ATCOPIA52, EVADE, and ATGP3, consistent with junction-crossing reads from circular templates (Fig. 2B). Outward-facing pairs appeared in the ddm1 and ddm1rdr6 samples, but not in WT, and were present in very low numbers after read de-duplication for most of the elements. Exceptionally, non-concordant read pairs were highly abundant in ATCOPIA52. Circular ecDNA formation was confirmed by inverse PCR whose products corresponded to one-LTR in size (Fig. 2C,D), and ATCOPIA52 was by far the most abundant. Double stranded one-LTR circular products are thought to be generated by integrase-mediated autointegration in VLP, or as gapped intermediates in cDNA synthesis (Garfinkel et al., 2006; Munir et al., 2013; Sloan and Wainberg, 2011). In contrast, two-LTR (tandem) circular DNA with junction nucleotides is formed in the nucleus by non-homologous end joining and enhanced when integrase is non-functional (Garfinkel et al., 2006; Sloan and Wainberg, 2011). The inverse PCR products of ATCOPIA52 were one-LTR in size, suggesting the circular DNA was either a gapped double stranded circular intermediate, or else a double stranded product of autointegration into same strands or opposite strands (Supplemental Fig. S1), which result in deletion circles, or inversion circles, respectively (Garfinkel et al., 2006; Munir et al., 2013; Sloan and Wainberg, 2011). Both inversion and deletion circles were detected in large numbers based on outward facing reverse-forward and forward-forward paired end reads, respectively, indicating auto-integration was the major source of these circles (Fig. 2B).
In yeast, auto-integration occurs near the central PPT (cPPT) taking advantage of a DNA flap structure (Garfinkel et al., 2006). There was no strong indication of a DNA flap based on polypurine sequences and read alignment in ATCOPIA52. We mapped individual long reads to investigate the integration sites (Fig. 3; Supplemental Figs. S1C, S3B). Deletion circles are predicted to have either the 5’ or the 3’ LTR, as well as a deleted portion of the full length cDNA, up to the integration site, while inversion circles have an inverted portion separating the two LTR (Garfinkel et al., 2006). Strikingly, many of the ATCOPIA52 ONT reads fell into these categories, comprising either the 5’ or the 3’ LTR contiguous with a truncated or inverted portion of the retrotransposon (Supplemental Fig. S1C). These structural variants indicated the presence of circularly permuted reads, which were presumably arbitrarily sheared during library preparation. Among all the COPIA and GYPSY elements examined, only ATCOPIA52 gave rise to large numbers of these structural variants. The inversions spanned diverse regions of the element, consistent with inversion circles. The deleted portions terminated at inferred autointegration sites, which were distributed throughout the length of the element, consistent with the lack of a cPPT flap in ATCOPIA52. One possibility is that nuclear import of cDNA is not efficient for ATCOPIA52, leading to elevated autointegration inside the VLP. This could be due to mutations in nuclear localization (Kenna et al., 1998), or else to reduced translation of the integrase gene (see below), although read distributions were comparable for ddm1 and ddm1rdr6, so easiRNA likely did not play a major role.
In sharp contrast, in EVADE we observed discontinuous regions of read alignments flanked by multiple cPPT, defined as 15-19 nt polypurine sequences (Figs. 1B, 3; Supplemental Fig. S3B). These regions represent active replication intermediates, generated by both minus strand and plus strand strong stop DNA, as well as extension products that terminate at cPPT and DNA flaps. The numbers of these intermediates, as well as their abundance, were significantly elevated in long-read sequencing data from ddm1rdr6 double mutants (Figs. 1B, 3; Supplemental Fig. S3B). ATGP3 also had elevated levels of strong stop intermediates, but few if any cPPT and no circular reads.
21-22nt easiRNA control retrotransposition
In a previous study, DCL2/4 were shown to promote transcription of EVADE transgenes driven by an ectopic promoter, while RDR6 had no effect, which was interpreted as evidence that easiRNA might promote transposition in wild-type cells (Mari-Ordonez et al., 2013). We tested whether easiRNA contribute to EVADE control in ddm1 and ddm1rdr6 mutants. Both ddm1 and ddm1rdr6 contained higher copy numbers of ATCOPIA93 than wild-type implying high rates of EVADE transposition, while copy numbers of ATGP3 and ATCOPIA52 remained constant. Using quantitative PCR, we detected an increase from 2 copies of EVADE in wild-type to 12 copies in ddm1 to 40 copies in ddm1rdr6 F2 siblings (Fig. 4A). Similar increases were observed in F2 and F3 progeny from backcross rdr6 progeny that inherited active EVADE elements epigenetically (Fig. 4C) (Mari-Ordonez et al., 2013). We detected parallel increases in gRNA levels reflecting these increases in copy number (Fig. 4B,D). Consistent with gRNA levels, extrachromosomal EVADE copies were also more abundant in ddm1rdr6 than in ddm1 (Fig. 2A). RNase H cleavage products just upstream of the PBS, which are a hallmark of active transposition (Schorn et al., 2017), were readily detected for EVADE in both ddm1 and ddm1rdr6 (Supplemental Fig. S4A,B). We conclude that easiRNA actually inhibit EVADE retrotransposition, in ddm1 mutants.
In backcrosses to wild-type (WT) plants, EVADE activity is inherited epigenetically but copy number increases are thought to be limited by a switch from 21nt to 24nt siRNA, accompanied by re-methylation and silencing (Mari-Ordonez et al., 2013; Mirouze et al., 2009; Reinders et al., 2013). Interestingly, active EVADE elements can be re-silenced through the female gametophyte, but not through the male gametophyte (Reinders et al., 2013) where easiRNA normally accumulate (Borges et al., 2018; Slotkin et al., 2009). We sequenced small RNA from wild type and ddm1 flower buds and pollen, and found that 21-22nt easiRNA from ATCOPIA93/EVADE were abundant in ddm1 inflorescence tissues, but absent from pollen (Fig. 5). In contrast, ATHILA2 and ATHILA6A easiRNA were present in wild type pollen (Slotkin et al., 2009), while ATCOPIA31 21-22nt easiRNA were strongly upregulated in ddm1 pollen. Thus the absence of EVADE easiRNA in pollen must be due to transcriptional repression independent of DDM1, and likely accounts for the lack of paternal re-silencing (Reinders et al., 2013).
Post-transcriptional suppression by easiRNA
Since easiRNA in ddm1 mutants depend on AGO1 (Nuthikattu et al., 2013), and AGO1 represses translation of target mRNA (Li et al., 2013), we tested whether easiRNA can affect translation efficiency of transposon transcripts. Translating ribosome affinity immunopurification (TRAP) RNAseq has been utilized to estimate polysomal occupancy and translation efficiency in plants (Juntawong et al., 2014). Furthermore, microsome-polysomal fractionation has revealed that microRNA-dependent translational control takes place on the endoplasmic reticulum (Li et al., 2013). We generated TRAP lines of 35S:FLAG-RPL18 in ddm1 and ddm1rdr6 mutant backgrounds, and performed total RNAseq, total-polysomal RNAseq, and microsome-polysomal RNAseq. The polysomal RNA occupancy (Polysomal RNA / Total RNA) was obtained for 3903 transposable elements defined as open reading frames from TAIR10 annotation (see Methods). As for the comparison between ddm1 and ddm1rdr6, we could detect the effect of the rdr6 mutation in microsome-polysomal RNAseq data for known targets of RDR6, such as ARF4 (Marin et al., 2010), and for a handful of transposons (Fig. 6A; Supplemental Fig. S5; Supplemental Tables S2,S3). Among 31 up-regulated transposons in ddm1rdr6 relative to ddm1, 26 elements belonged to ATHILA LTR retrotransposon families (Supplemental Table S3), which are a major source of RDR6-dependent easiRNA. Although ATHILA elements in A. thaliana cannot transpose, a subgenomic mRNA encoding ORF2 (the “env” gene) is spliced from the full length mRNA (Havecker et al., 2004; Wright and Voytas, 2002), and was enriched on polysomes (Supplemental Fig. S5; Supplemental Table S3). This subgenomic RNA is targeted extensively by miRNA which trigger easiRNA production (Creasey et al., 2014). Interestingly, the other 3 elements were ATENSPM3, LINE1_6 and VANDAL3, all of which have been identified as active elements in ddm1 mutants, or in population level studies of transposon variants (Stuart et al., 2016). These non-LTR and DNA transposons are also targets of miRNA and generate RDR6-dependent easiRNA (Creasey et al., 2014). EVADE easiRNA are generated from the GAG subgenomic RNA (Mari-Ordonez et al., 2013), but polysomal occupancy was not increased in ddm1rdr6 (Fig. 6B). GAG subgenomic mRNA from ATCOPIA52 was highly enriched in polysomes, consistent with previous studies (Oberlin et al., 2017), whereas the relative abundance of EVADE POL transcripts on polysomes indicates higher translation rates of integrase and reverse transcriptase (Oberlin et al., 2017). Unlike for ATHILA, polysome association of COPIA transcripts were unaffected by RDR6.
easiRNA require miRNA triggers that target these transcripts (Creasey et al., 2014), and ATCOPIA52 LTRs were targeted by a single miRNA in the R region of the LTR. Consistent with this miRNA acting as a trigger, easiRNA accumulated along the length of the mRNA between the LTRs (Fig. 6B; Supplemental Table S4). In the case of EVADE, 4 miRNA were predicted to target the gRNA somewhere along its length. Remarkably, miR2938 was predicted to target the start codon of the GAG gene immediately 5’ of the easiRNA cluster, while miR5648-5p targets the 3’ end of the easiRNA cluster (Supplemental Fig. S4C,D; Supplemental Table S4). EVADE easiRNAs were down-regulated in ddm1dcl1 as compared to ddm1 (Supplemental Fig. S4E) suggesting that miRNA were involved (Creasey et al., 2014). miR2938 and miR5648-5p expression were reported in pollen and root cells (Breakfield et al., 2012; Grant-Downton et al., 2009). We did not detect miRNA-mediated cleavage by PARE-seq (Creasey et al., 2014) or by RACE-PCR, but secondary siRNA, such as easiRNA, do not require cleavage so long as miRNA recognition recruits RdRP (Axtell et al., 2006; de Felippes et al., 2017). Consistent with induction without cleavage, EVADE easiRNA were not phased (Arribas-Hernandez et al., 2016). miR5663 was detected in inflorescence tissues (Supplemental Fig. S4F), and targets the EVADE intron near the splice acceptor site (Supplemental Fig. S4C). Interestingly, the level of unspliced RNA was increased in ddm1dcl1 mutants (Supplemental Fig. S4G), indicating that miR5663 might target unspliced gRNA and promote the accumulation of spliced GAG RNA, but further experiments would be required to demonstrate this requirement. Negative regulation of P-element splicing by piRNA has been reported in Drosophila (Teixeira et al. 2017). ATCOPIA21 and ATCOPIA51 had no strongly predicted miRNA targets, and easiRNA were barely detected in somatic tissues (Fig. 1B; Supplemental Fig. S3A) (Oberlin et al., 2017) accounting for lack of regulation by RDR6. In contrast, significant levels were detected in pollen (Fig. 5) (Borges et al., 2018), where most gypsy and copia class retrotransposons are targeted by miR845, a pollen specific miRNA that targets the primer binding site (Borges et al., 2018).
Transcriptional repression by easiRNA
Both 21/22nt easiRNA and especially 24nt siRNA can direct RNA directed DNA methylation in plants, via AGO6 and AGO4, respectively (Borges and Martienssen, 2015). However, genome wide bisulphite sequencing revealed few if any differences in DNA methylation between ddm1 and ddm1rdr6, as ddm1 mutants already had very low levels of DNA methylation (Creasey et al., 2014). In many organisms, histone modification can also be guided by small RNA, especially histone H3 lysine-9 dimethylation (Fagegaltier et al., 2009; Gu et al., 2012; Martienssen and Moazed, 2015; Volpe et al., 2002). We therefore performed H3K9me2 ChIP sequencing in ddm1 and ddm1rdr6, and compared this to wild type. We found that ATHILA elements, which matched by far the most abundant easiRNA, had ectopic H3K9me2 in ddm1 mutants, which was absent in ddm1rdr6 mutants (Fig. 7; Supplemental Fig. S6). Furthermore, the levels of small RNA correlated extremely well with the levels of H3K9me2 found at individual ATHILA elements (Fig. 7; Supplemental Fig. S6). Interestingly, COPIA elements actually gained H3K9me2 in the absence of easiRNA (Supplemental Fig. S7), along with previously reported increases in 24nt siRNA and reduced transcript levels (Creasey et al., 2014). We therefore conclude that in the absence of DDM1, 21/22nt easiRNA and 24nt siRNA can guide H3K9me2 at GYPSY and COPIA elements respectively.
Discussion
Next generation sequencing of VLP DNA detected all known functional LTR retrotransposons in Arabidopsis, as well as some non-functional ones. Full length VLP DNA from ATCOPIA and ATGP families (Fig. 1B; Supplemental Fig. S3A) corresponded to relatively young and low-copy elements known to transpose. Ancient ATHILA elements did not make full length VLP DNA confirming these gypsy retrotransposons are non-functional (Havecker et al., 2004; Marco and Marin, 2008), but short products matching the LTR appeared to correspond to aborted strong stop replication intermediates (Supplemental Fig. S3A). Interestingly, similar LTR fragments from ATHILA2 comprise an important family of dispersed centromeric satellite repeats known as 106B (May et al., 2005; Thompson et al., 1996), and retrotransposition might account for their origin. Thus functional and non-functional retrotransposons could be readily distinguished even though non-functional ATHILA elements are present in copy numbers 3 to 4 orders of magnitude higher than active ATCOPIA and ATGP elements. As for ATCOPIA52, non-productive one-LTR circular DNA, corresponding to autointegration “suicide” products, markedly accumulated in the VLP at levels far higher than productive retrotransposons such as EVADE. In contrast, two-LTR circular products of ATCOPIA52 were very rare, whereas small amounts of EVADE two-LTR products were present as previously described (Reinders et al., 2013), presumably due to recombination of non-integrated copies by host DNA repair enzymes in the nucleus. Both short read and long read sequencing revealed that these auto-integration products in ATCOPIA52 VLP led to non-functional deletion and inversion circles, accounting for lack of transposition.
Our study shows that RDR6-dependent easiRNA inhibit retrotransposition at multiple levels: via post-transcriptional silencing of genomic RNA, by translational suppression of subgenomic RNA, and by controlling transcription via histone modification. ATHILA elements are no longer functional, but they are the primary source of easiRNA which arise by miRNA targeting of a spliced subgenomic RNA encoding the “ENV” protein (Creasey et al., 2014). These easiRNA inhibit polysome association of this subgenomic RNA, and also inhibit transcript levels by guiding histone H3K9me2. This transcriptional silencing occurred in the absence of DNA methylation in ddm1 mutants. In plants, RNAi dependent histone modification is thought to depend entirely on RNA dependent DNA methylation, found in asymmetric CHH contexts. As CHH methylation stays more or less the same in ddm1, while H3K9me2 is increased (Fig. 7), this might indicate the existence of a novel pathway for RNA guided histone methylation, resembling that found in Drosophila, C.elegans and fission yeast, which lack DNA methylation. Further investigation will be required to establish if such a pathway exists.
In contrast to ATHILA, linear extrachromosomal copies of EVADE accumulated in ddm1 and were further enriched by mutations in RDR6. Like ATHILA, EVADE is targeted by 3 or 4 miRNA that likely trigger easiRNA from the subgenomic GAG gene transcript, which is found associated with polysomes (Oberlin et al., 2017). However, association of the GAG mRNA with polysomes was unaffected in ddm1rdr6 mutants. Instead, levels of gRNA increased 3-fold, suggesting that EVADE easiRNA act postranscriptionally to target gRNA directly. ATCOPIA52 easiRNA arose from full-length gRNA between the two LTR. Polysomal association of full length EVADE GAG-POL is far more abundant than ATCOPIA52 GAG-POL, although both were unchanged in the absence of RDR6 (Fig. 6). As the INT protein is translated from this transcript, this could contribute to lack of nuclear integration of ATCOPIA52 relative to EVADE. Thus, while easiRNA have a significant impact on COPIA gRNA accumulation, and so inhibit increases in copy number, they have only limited impact on translation. 22nt tRNA-derived small RNA fragments (3’CCA-tRFs) were recently shown to inhibit endogenous retroviruses (ERV) in mammalian cells by targeting the PBS by RNA interference (Schorn et al., 2017), and it is possible that EVADE easiRNA may have a similar function in plants.
In conclusion, next generation long-read and short-read sequencing of VLP DNA has revealed features that distinguish functional and non-functional replication intermediates, and provides a powerful tool for identifying active transposons from complex genomes, and for investigating molecular signatures of LTR retrotransposons. One such feature is the central PPT (cPPT), which is present in EVADE but absent in ATCOPIA52. cPPT are hallmarks of the most active retrotransposons including Ty1 in yeast, as well as HIV and other lentiviruses, where cPPT are thought to be important for nuclear import of cDNA (VandenDriessche et al., 2002; Zennou et al., 2000). Our work shows that these features may play a significant role in the activity of EVADE, the most active retrotransposon in Arabidopsis, and that their absence may account for the lack of nuclear integration of ATCOPIA52, and high levels of “suicide” by autointegration. By comparing VLP sequencing, transcriptome sequencing and translatome sequencing we have been able to establish the multiple levels at which easiRNA regulate the Arabidopsis retrovirome. Our methods are widely applicable to other plant and animal models and to human cells, especially those with genomes that contain very large numbers of non-functional LTR retrotransposons.
Author Contribution
SCL, EE and RM designed the study; SCL and EE performed the experiments; SCL, EE, BB, and AS analyzed the data and its significance; SCL, EE and RM wrote the manuscript.
Disclosure declaration
The authors declare no competing interest.
Methods
Plant materials
All genotypes in this study are Col-0 background including wild-type, dcl1-11, ddm1-2, and rdr6-15. Genotyping primers are listed in Supplemental Table S6. Homozygous plants of ddm1-2 and ddm1-2 rdr6-15 were generated from heterozygous ddm1-2 backcrossed five times with Col-0 (ddm1-2 BC5), and their 2nd generation was used for VLP DNA-seq experiments. For polysomal RNAseq experiments, inbred ddm1-2 was independently crossed to 35S:FLAG-RPL18 and to rdr6-15 35S:FLAG-RPL18. The F3 plants were used for polysomal RNA purification.
gDNA extraction and DNA analyses
Whole inflorescence stems of 4 week-old Arabidopsis plants were frozen and ground in liquid nitrogen. Total gDNA was isolated using Nucleon PhytoPure kit (GE healthcare). EVADE DNA copy number was quantified using qPCR with EVADE qPCR primers and single copy gene primers as reference (the primers are listed in Supplemental Table S6). Southern blotting was performed using EVADE Probe B as described (Mirouze et al., 2009).
ChIP was performed with two biological replicates of 10-d-old seedlings using H3K9me2 (Abcam; ab1220) and H3 (Abcam; ab1791) antibodies, following a previously described protocol (Ingouff et al., 2017). Sequencing libraries were prepared using NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs) with size selection for ∼200 bp insert DNA. The ChIP-seq libraries were sequenced using Illumina NextSeq High Output SR 76 with 76-cycle single reads. Two biological replicates were prepared and sequenced for each genotype of interest. Prior to alignment, adapter trimming was performed using Trimmomatic (Bolger, et al., 2014) and read quality was assessed with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Reads were aligned to the TAIR10 reference genome using BWA-MEM (https://arxiv.org/abs/1303.3997) with default parameters. Only primary alignments were retained, and optical and PCR duplicates were removed using Samtools (Li et al., 2009). Peak calling was performed using MACS2 (Zhang et al., 2008) broad peak calling with a q-value cutoff of 0.05 and normalization by signal per million reads. Peaks that were differentially regulated across genotypes were identified using MAnorm (Shao et al., 2012) and confirmed between biological replicates. Annotation of these differentially regulated peaks was performed using a combination of BEDOPS (Neph et al., 2012) tools and custom scripts. Deeptools (Ramirez et al., 2014) was used to visualize the data.
RNA extraction and RT-qPCR
Total RNA was isolated from the same tissues used for gDNA extraction with Direct-zol RNA MiniPrep Plus (Zymo Research). DNase I was treated on column. cDNA was synthesized with SuperScript VILO Master Mix (Thermo Fisher Scientific). qPCR was performed using iQ SYBR Green Supermix. Primers are listed in Supplemental Table S6.
Total polysome was isolated using ribosome immunopurification as described previously (Mustroph et al., 2009; Mustroph et al., 2013). Briefly, inflorescence tissues of FLAG-RPL18 lines were ground in liquid nitrogen and transferred to polysome extraction buffer (PEB). Cell debris was removed by centrifugation and filtering with miracloth. The supernatant was taken and transferred to pre-washed EZview anti-FLAG agarose beads (Sigma) for 2 h at 4 °C. The agarose beads containing polysomes were washed once with PEB and three times with washing buffer. Polysomes were eluted using 3X FLAG peptide (Sigma) and used for RNA extraction with Direct-zol RNA miniprep kit (Zymo Research) including DNase I treatment. Ribosomal RNA (rRNA) in the samples was depleted by Ribo-Zero Magnetic Kit (Plant Leaf) (Epicentre). Then, rRNA free samples were used for RNA-seq library preparation using ScriptSeq v2 RNA-Seq Library Preparation Kit (EPicentre). Microsome-polysomal RNA was obtained using a previously described method with some modifications (Li et al., 2013). Briefly, 2 g frozen tissues were suspended to 7 ml microsome extraction buffer (MEB). After removing cell debris by filtration with micracloth and centrifugation at 10,000g for 15 min at 4°C, the supernatant was transferred on the top of 1.7M/0.6M sucrose cushions and applied to ultracentrifugation using swing rotor at 140,000g for 1 h at 4°C. The microsome fraction of the 1.7M/0.6M layer interface was harvested and diluted 10 times by MEB and centrifuged at 140,000g for 0.5 h at 4 °C to obtain microsome pellet. The pellet was re-suspended with 8 ml PEB and used for ribosome immunopurification and RNA-seq library preparation as described above. The PE 101 sequencing data was obtained using Illumina HiSeq 2000 platform. The paired-end reads were mapped to Arabidopsis TAIR10 genome using Tophat and the polysome occupancy (Polysomal RNA / Total RNA) was calculated using systemPipeR package (Backman and Girke, 2016) with raw count data obtained by Cuffnorm.
VLP DNA-seq
Virus-like-particles were purified using modified method reported previously (Bachmair et al., 2004). 4 g of 4 week-old whole inflorescence stems were ground with 10 ml of ice-cold VLP extraction buffer and 10 ml of sea sand on ice. 10 ml of the extraction buffer and Triton X-100 were added and mixed. The slurry was transferred to a 50 ml tube and centrifuged for 5 min at 180g and 4 °C. The supernatant was carefully transferred onto 5 ml of prechilled 15% sucrose, 10 mM potassium phosphate buffer, pH 7.2 and ultracentrifuged for 1.5 h at 109,000g and 4 °C using fixed angle rotor. The pellet was washed with the 15% sucrose buffer and resuspended with 4 ml particle suspension buffer to obtain VLP fractions. To remove non-VLP DNA, 0.5 ml of the VLP sample was treated with 5 µl of 1 mg/ml DNase I at 37°C for 10 min. 20 µl of 0.25 M EDTA, 50 µl of 10% SDS, 25 µl of 10 mg/ml proteinase were added and incubated at 65°C for 10 min. VLP DNA was purified by 0.5 ml equilibrated (pH 8.0) phenol:chloroform:IAA (25:24:1) mixture three times and with 0.5 ml chloroform:IAA (24:1) once. The last aqueous fraction was transferred into a new 1.5ml tube and used for 100% ethanol precipitation with 40 µl 3M sodium acetate, pH 7.0. The DNA pellet was washed with 70% ethanol, dried, and resuspended with 100 µl TE buffer. 1 µl of RNase A (10 mg/ml) was added to the VLP DNA sample and incubated 10 min. The treated DNA sample was purified using DNA Clean & Concentrator (Zymo Research). The DNA was sheared to 650 bp using Covaris S220 and subsequently used for DNA-seq library preparation with NEBNext Ultra DNA Library Prep Kit (New England Biolabs). The paired-end sequencing datasets with 101 nt read length (PE101) were obtained by Illumina HiSeq 2000. Adapters were trimmed from raw reads with Skewer (Jiang et al., 2014) in paired-end mode and read pairs with both mates longer than 25 nt were retained. Reads were aligned to the TAIR10 genome with STAR (Dobin et al., 2013) in two-pass mode to improve spliced alignment at unannotated introns. Intact bacteria co-purified with VLP, as indicated by large numbers of reads mapping to bacterial genomes (up to 95% in WT), and these were discarded. Reads mapping equally well to multiple locations were randomly assigned, and chimeric/split read alignments were output separately from concordant alignments. Optical and PCR duplicates were removed from the alignments with picard-tools (http://broadinstitute.github.io/picard). Counts of reads mapping to the TAIR10 transposon annotations were computed with featurecounts (Liao et al., 2014). Pairwise differential expression at TAIR10 transposon loci was tested across three wild-type, two ddm1, and three ddm1rdr6 replicates using quasi-likelihood F-tests in edgeR (Robinson et al., 2010), controlling FDR at 5% and a log2(fold-change) threshold of 2.
Oxford Nanopore Technologies (ONT) long-read libraries were prepared as follows: 10 ng per genotype of purified VLP DNA extract was pooled from the replicate samples and initially amplified following the conditions in the “1D Low-input genomic DNA with PCR” (SQK-LSK108) protocol with reagents. End-repair, dA-tailing and PCR adapter ligation were performed, followed by 16 cycles of PCR amplification. PCR products were purified and concentrated with Ampure XP beads (Agencourt), and 300 ng of eluate per sample was carried through to library preparation following the “1D Genomic DNA by Ligation” protocol with SKQ-LSK109 reagents. Libraries were loaded onto r9.4 (FLO-MIN106) flow cells and sequenced on a GridION X5. Basecalling was performed offline with Guppy v2.3.1 using the default r9.4.1 model. Using porechop (https://github.com/rrwick/Porechop), ONT sequencing adapters were trimmed from 5’ ends, 3’ ends, and the middle of raw reads. Reads with middle adapters were split. Remaining reads longer than 100 bp were aligned to the TAIR10 reference with minimap2 (Li, 2018) for coverage and read alignment plots. Structural variants were called on NGMLR (Sedlazeck et al., 2018) alignments using Sniffles (Sedlazeck et al., 2018) with default parameters, except minimum read support was reduced to 3.
5’ RACE PCR
5’ RACE PCR was performed using FirstChoice RLM-RACE Kit (Thermo Fisher Scientific) without the treatments of calf intestine alkaline phosphatase and tobacco acid pyrophosphatase. A gene-specific primer was used for cDNA synthesis after adaptor ligation (Supplemental Table S6). 1st and 2nd nested PCR was performed with the primers are listed.
Small RNA-seq data
Small RNA-seq libraries from inflorescence and pollen for comparisons of 21, 22, and 24nt small RNA between wild-type and ddm1 were prepared as previously described (Borges et al., 2018). Wild-type pollen sample was previously deposited in the Gene Expression Omnibus (GEO) database (GSM2829912). Briefly, small RNAs were purified by running total RNA from pollen and inflorescence tissues on acrylamide gels (15% polyacrylamide, 7 M urea) with size-selection of 18-to-30-nt regions. Small RNAs were extracted from the gel bands using Trizol LS (Life Technologies) and Direct-zol columns (Zymo Research). Libraries were prepared with the TruSeq small RNA sample preparation kit (Illumina) and sequenced in Illumina MiSeq platform. Data analysis was done as previously reported (Borges et al., 2018). 21-22nt small RNA datasets from inflorescence (Creasey et al., 2014) were obtained from NCBI GEO accession GSE52951. After adapter trimming with Skewer, reads were quality filtered with fastp (Chen et al., 2018) and aligned to the TAIR10 genome with ShortStack (Axtell, 2013) with default parameters except “--bowtie_m 1000 --ranmax 50”.
LTR Retrotransposon Annotation
GenomeTools was used to structurally annotate retrotransposons across the TAIR10 genome. First, LTRharvest (Ellinghaus et al., 2008) was run to detect LTR sequences with at least 85% similarity separated by 1-15 kbp flanked by target site duplications and the TGCA motif. Then, LTRdigest (Steinbiss et al., 2009) was run to annotate internal transposon features including the PBS, PPT, and GAG and POL protein homology.
Genome Browser Figures
Genome-wide read coverage for VLP DNA, small RNA, total and polysomal RNA libraries was calculated with bamCoverage from deepTools (Ramirez et al., 2014) and normalized to reads per nucleotide per million mapped reads and plotted across the genome with Gviz (Hahne and Ivanek, 2016) or IGV (Thorvaldsdottir et al., 2013).
Data access
The datasets generated during and/or analyzed during the current study are available at NCBI (GEO study: GSE128932).
Acknowledgements
We thank Vincent Colot, Leandro Quadrano, Tetsuji Kakutani and all members of the Martienssen laboratory for discussions. Research in the Martienssen laboratory is supported by the US National Institutes of Health (NIH) grant R01 GM067014, The National Science Foundation Plant Genome Research Program, and by the Howard Hughes Medical Institute. The authors acknowledge assistance from the Cold Spring Harbor Laboratory Shared Resources, which are funded in part by the Cancer Center Support Grant (5PP30CA045508).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵