Abstract
Chromosome dosage plays a significant role in reproductive isolation and speciation in both plants and animals, but underlying mechanisms are largely obscure1. Transposable elements can promote hybridity through maternal small RNA2, and have been postulated to regulate dosage response via neighboring imprinted genes3,4. Here, we show that a highly conserved microRNA in plants, miR845, targets the tRNAMet primer-binding site (PBS) of LTR-retrotransposons in Arabidopsis pollen, and triggers the accumulation of 21 to 22-nucleotide small RNA in a dose dependent fashion via RNA polymerase IV. We show that these epigenetically activated small-interfering RNAs (easiRNAs) mediate hybridization barriers between diploid seed parents and tetraploid pollen parents (“the triploid block”), and that natural variation for miR845 may account for “endosperm balance” allowing formation of triploid seeds. Targeting the PBS with small RNA is a common mechanism for transposon control in mammals and plants, and provides a uniquely sensitive means to monitor chromosome dosage and imprinting in the developing seed.
Epigenetic silencing of transposable elements (TEs) is regulated by cytosine methylation (mC), which is established by RNA-directed DNA methylation (RdDM) and occurs in three sequence contexts in plants (mCG, mCHG and mCHH, where H is A, C or T)5. TE methylation can be maintained independent of RdDM, but undergoes reprogramming in the germline via small RNA and histone modification, in animals and to some degree in plants6. Flowering plants reproduce sexually through double fertilization, resulting from the fusion of two sperm cells (in pollen grains) with the egg cell and central cell (in the embryo sac), to produce a diploid embryo and a triploid endosperm, respectively7. In Arabidopsis pollen8–10 and endosperm11,12 active removal of DNA methylation from TEs flanking imprinted genes is essential for fertility and for seed viability10,13. Targeted demethylation occurs specifically in vegetative and central cell nuclei and has been associated with the accumulation and mobilization of easiRNA into neighboring germ cells8–10,14, but the biological significance of these small RNAs has remained unclear. Here we show that easiRNAs play a critical role in epigenetic reprogramming before and after fertilization in interploid hybrids.
easiRNA from TE transcripts are triggered by microRNAs (miRNAs), most notably miR845a (21-nt) and miR845b (22-nt) that target multiple transposons in Arabidopsis pollen15 (Fig. 1a), where TEs are also expressed9. miR845a and miR845b target specific Gypsy and Copia retrotransposons at the 18nt primer-binding site (PBS) (Fig. 1b and Supplementary Table 1), where tRNAs initiate reverse transcription, and their targets generate easiRNA specifically in pollen (Fig. 1c). MIR845 belongs to a highly conserved family of miRNAs in plants with a complex history of retention and loss in Brassicaceae16, and is not present in the perennial Arabis alpina where Gypsy transposons otherwise targeted by miR845 have contributed to a massive genome expansion17. MIR845 orthologs have been found in drought-stressed rice leaves18 and diploid strawberry19, and may have been derived from tRNAiMet through a single inversion event19. However, as target complementarity extends from the PBS towards the 5’ LTR (Fig. 1b), it appears that MIR845 in Arabidopsis has derived from truncation and inversion of 5’ LTRs (plus PBS). Intriguingly, in mammalian cells, abundant 18 to 22-nt 3’CCA tRNA fragments also match endogenous retroviruses at the PBS, and strongly suppress retrotransposition, suggesting an ancient mechanism for transposon control20.
We used a GFP sensor incorporating a 3’UTR miR845b target site and driven by the ubiquitously expressed UBIQUITIN10 (UBQ10) promoter (Fig. 1d) to demonstrate strong reduction of GFP expression in pollen (Fig. 1e) and sperm cells (Supplementary Fig. 1a) of the Arabidopsis accession Columbia (Col-0). Strikingly, the same GFP sensor was not silenced in pollen from Landsberg (Ler-0), suggesting that miR845b was missing from this accession (Fig. 1e). In Ler-0, MIR845a has been deleted completely while MIR845b has a single nucleotide polymorphism (SNP) in the complementary miRNA* sequence that creates an extra bulge in the duplex that impairs miRNA processing (Supplementary Fig. 2a-f). Importantly, the Ler-0 MIR845 haplotype is conserved in many Arabidopsis accessions such as Kro-0, Bay-0 and Tsu-0, and we found that the levels of miR845b are depleted in these ecotypes (Fig. 1f). To address the potential effect of miR845 on TE silencing, we compared Col-0 and Ler-0 pollen transcriptomes and found that TE transcripts are overall more abundant in Ler-0 pollen (Supplementary Fig. 3a), including miR845 targets such as ATGP2 and ATCOPIA36 that have escaped silencing in this ecotype (Supplementary Fig. 3c). Further, miR845-targeted TEs such as ATGP2 and ATCOPIA41 accumulate easiRNA in Col-0 pollen, while other TE families such as ATCOPIA63 are expressed and produce easiRNA specifically in Ler-0 (Supplementary Fig. 3b).
easiRNAs have been associated with RdDM activity15,21, and in pollen, these pathways are activated in the vegetative cell nucleus (VN) but not in sperm cells (SC)8, where 21 and 22-nt easiRNAs accumulate9. Therefore we performed bisulfite sequencing of FACS-sorted Col-0 and Ler-0 pollen nuclei, and found that levels of mCHH in the Ler-0 VN were substantially lower than in Col-0 VN (Fig. 2a), resembling seed methylomes in this respect22. This difference is particularly striking in pericentromeric heterochromatin, where the chromomethylases CMT2 and CMT3 maintain mCHH and mCHG, respectively (Fig. 2a). In contrast, the levels of mCHH in Col-0 and Ler-0 SC nuclei were found to be identical and very low (Fig. 2a)8,10,23. In the VN, we found approximately 6000 differentially methylated regions (DMRs) for mCHH between Col-0 and Ler, which overlapped primarily with TEs (including predicted miR845 targets) (Fig. 2b and Supplementary Table 2). The majority of DMRs were hypermethylated in Col-0 VN (Fig. 2b), but not in the VN methylomes of the RdDM mutant nrpd1a (largest subunit of RNA Polymerase IV) or cmt2, indicating the that loss of mCHH in Ler-0 VN occurs at particular loci targeted by CMT2 and Pol IV-dependent siRNAs (Fig. 2c-e). Taken together, these observations suggest that natural variation in miR845 and easiRNA biogenesis is associated with deregulation of TE silencing in pollen, which might have contributed to epigenetic variation in Arabidopsis populations24.
In Arabidopsis, most 20 to 22-nt miRNAs are processed by DICER-LIKE1 (DCL1) and loaded into ARGONAUTE1 (AGO1) in order to mediate post-transcriptional gene silencing (PTGS). We therefore crossed the GFP sensor with the null dcl1-5 mutant allele and the strong hypomorphic ago1-9 allele, and observed that GFP expression was restored in mutant pollen grains isolated from heterozygous plants, thus confirming that miR845 function in pollen requires the canonical miRNA pathway (Fig. 3a and Supplementary Fig. 1c). In somatic tissues, miRNA-directed biogenesis of secondary siRNAs from target transcripts requires either a “double hit” with two 21-nt miRNAs or a single hit with a 22-nt miRNA. Cleaved transcripts are converted into double stranded RNA (dsRNA) by the RNA-dependent RNA polymerase RDR6, and processed into 21- and 22-nt siRNAs by DCL4 and DCL2, respectively25. We used the miR845b-GFP sensor in dcl1-5/+ and dcl1/+,dcl2/dcl4 mutant backgrounds and purified wild-type, dcl1, dcl2/4 and dcl1/2/4 pollen by fluorescence-activated cell sorting (FACS) (Supplementary Fig. 1d). Small RNA sequencing of sorted pollen confirmed that most miRNAs were depleted in segregating dcl1 mutant pollen, including miR845a and miR845b (Fig. 3c,d and Supplementary Fig. 1e), as well as abundant secondary siRNAs derived from the GFP transcript that were present in wild-type pollen but not in dcl1, dcl24 and dcl124 (Fig. 3b). However, loss of easiRNAs was observed only in dcl2/4 pollen, but not in dcl1 mutants (Fig. 3c), including 21/22-nt small RNAs produced from 5’ LTRs upstream of the PBS (miR845b target site) (Supplementary Fig. 4a). In somatic tissues, small RNAs matching to these LTRs are 24-nt in length and produced by Pol IV, RDR2, and DCL325, raising the possibility that 21/22-nt easiRNA in pollen are dependent on Pol IV. Indeed, small RNA from nrpd1a mutant pollen lost siRNAs for the majority of TEs in all size classes (Supplementary Fig. 4b), indicating that easiRNA biogenesis in pollen at miR845 targets depends on Pol IV, DCL2 and DCL4. Similar pathways have been described under certain types of genotoxic stress, and may be relevant here26,27.
As these Pol IV-easiRNAs (21/22-nt) are not significantly depleted in dcl1 pollen (Fig. 3c) where both miR845a and miR845b are down-regulated (Supplementary Fig. 1e), we hypothesized that miR845 contributes to Pol IV-easiRNA biogenesis either during meiosis or early at the onset of gametogenesis. In order to test this possibility, we cloned the Col-MIR845b locus and transformed Ler-0 wild-type plants. Strikingly, we observed strong and specific up-regulation of 21- and 22-nt TE-derived siRNAs, resembling wild-type Col-0 pollen (Fig. 3e). Interestingly, restored easiRNA biogenesis in Ler:MIR845b pollen did not result in significant changes in CHH methylation (Supplementary Fig. 5b), supporting the idea that easiRNAs in pollen do not modulate RdDM activity in the VN, because they are actively transported to the sperm cells9,14.
Parental small RNA differences can build strong barriers to hybridization2, and we have previously speculated that they might play a role in interspecific and interploid hybridization barriers in flowering plants3. Spontaneous chromosome doubling (polyploidization) is common in plants and is a major pathway towards reproductive isolation and speciation4,28. This is because hybrid seeds collapse as a result of unbalanced expression of imprinted genes in the endosperm, a phenomenon known as the “triploid block”. The “endosperm balance number” hypothesis further postulates the existence of specific loci responsible for seed collapse in different strains4. The triploid block can be conveniently demonstrated in Arabidopsis using the omission of second division (osd1) mutant that forms unreduced diploid male and female gametes that are self-fertile29,30. Diploid osd1 pollen crossed to wild-type seed parents leads to the production of triploid seeds with tetraploid endosperm, that abort at high frequency depending on the genetic background29,30. In order to test whether miR845b-directed easiRNA biogenesis is involved in the triploid block response, we used a line carrying a T-DNA insertion at the MIR845b locus (mir845b-1) in Col-0 in which miR845b was down-regulated by roughly one half (Fig. 4a), and 21/22-nt easiRNAs were much lower than 24-nt siRNA (Fig. 4b), resembling dcl2/4 mutants and wild type Ler-0 in this respect. Thus miR845b-directed easiRNA biogenesis is dose-sensitive, which is consistent with a role in endosperm balance. We next generated double mutants between mir845b-1 and osd1-1 in Col-0 to test whether loss of easiRNA in pollen has an effect on the triploid block. Pollinations of wild-type Col-0 seed parents with osd1-1 homozygous pollen gives rise to approximately 5% of viable seeds, while pollinations with osd1-1/mir845b-1 pollen had significantly increased seed viability at 35% (Fig. 4c). Interestingly, previous studies have shown that the triploid block response is much weaker in certain Arabidopsis accessions such as Ler-031,32, where miR845-dependent easiRNAs are lost. Therefore, we crossed pollen from osd1-2 (Ler-0 background) and osd1-2 expressing Col-miR845b (osd1-2:MIR845b) to Ler-0 wild-type female parents. However, easiRNA up-regulation in Ler-0 diploid pollen (Fig. 4a,b) was not sufficient to restore the triploid block, since the levels of viable triploid seeds remained comparable (Fig. 4c). We conclude that miR845b-dependent easiRNAs are required but not sufficient for the triploid block, suggesting that there are additional regulatory mechanisms to balance parental gene dosage in Ler-0 endosperm, as previously reported30,32.
In summary, paternally expressed miR845b stimulates dose-dependent biogenesis of 21/22-nt secondary siRNAs via Pol IV transcription, to mediate the triploid block response. These observations are consistent with previous results showing that the ratio of 21/22-nt vs 24-nt siRNA is much higher in interploid hybrid seeds with paternal excess31–33. Mechanistically, 21/22-nt easiRNAs might stabilize unbalanced genomic imprinting in interploid seeds, by promoting dose-dependent expression of imprinted genes in tetraploid endosperm. In previous studies, reduced levels of maternal 24-nt siRNAs resulted in up-regulation of maternally expressed genes (MEGs) but did not suppress triploid seed abortion33, while loss-of-function mutations in several paternally expressed genes (PEGs) are strong suppressors of the triploid block response30,34. Disrupted expression of PEGs and TEs is also associated with post-zygotic barriers in crosses between Arabidopsis species35,36, thus implying that up-regulation of PEGs is at least in part responsible for seed abortion upon interspecific and interploidy hybridizations.
One possibility is that paternal easiRNAs are involved in silencing MEGs, which include important components of the Polycomb Repressive Complex (PRC2) such as MEDEA (MEA) and FERTILIZATION INDEPENDENT SEED 2 (FIS2). PEGs are silenced by PRC2-mediated H3K27me3 in the endosperm37, and so would be upregulated in triploid seeds. This idea is supported by the fact that MEGs accumulate 21/22-nt easiRNAs in interploid seeds with paternal excess, thus suggesting they are paternal in origin33 (Supplementary Fig. 6a-c), while maternal expression of MEA and FIS2 is specifically repressed in these crosses38. Another possibility is that easiRNA promotes the expression of PEGs, by antagonizing RdDM activity at imprinted loci. In an accompanying paper, Martinez et al. show that paternal Pol IV mutations suppress the triploid block39, which is consistent with these mechanisms15. Our work reveals a key function for germline easiRNA in plants, suggesting parallels with germline piRNA in Drosophila that also control hybrid viability2, and 3’CCA tRF that regulate endogenous retroviruses in early mammalian development20. Dose-dependent regulation by miRNA might also contribute to “endosperm balance number”, which has been classically implicated in the triploid block4.
Methods
Plant material and growth conditions
Plants were grown under long day conditions at 22 °C. Seeds were always surface sterilized with sodium hypochlorite, sowed on Murashige and Skoog (MS) medium and stratified for 3 days at 4°C. Seedlings were transplanted to soil two weeks after germination and grown under long day conditions at 22 °C. We used the following Arabidopsis mutants: dcl1-5 (CS16069), ago1-9 40 (originally in Ler, but introgressed in Col-0 for 6 generations of backcrossing), mir845b-1 (SAIL_172_A08), osd1-2 (GT21481), nrpd1a-3 (Salk_128428) and osd1-3 (Koncz collection) 29. The mir845b-1 tetraploid plant was obtained by colchicine treatment as previously described 41. Additionally to Col-0 and Ler-0, we used the following ecotypes: Tsu-0 (CS1564), Kro-0 (CS1301), Bay-0 (CS954), Cvi-0 (CS902), RLD-1 (CS28687), Wei-0 (CS76301), Van-0 (CS28796), Sha (CS28737), Mt-0 (CS28502).
Transgene cloning
The UBQ10 promoter and miR845b target site promoter were cloned into a GFP reporter vector obtained from the VIB Department of Plant Systems Biology, UGent, by using Gateway technology (Life Technologies). Ectopic expression of MIR845a and MIR845b in Col-0 was performed by gateway cloning both MIR genes into pB2GW7. Expression of Col-MIR845b in Ler-0 was performed by PCR amplification of 1kb fragment of genomic DNA flanking MIR845b in Col-0, which was cloned into pMDC123 by Gateway cloning system. Primers used in this study are listed in Supplementary Table 4.
Pollen collection and FACS
Pollen was collected in 1.5mL eppendorf tubes by vortexing open flowers in pollen extraction buffer (PEB, 10 mM CaCl2, 2 mM MES, 1 mM KCl, 1% H3BO3, 10%) for 3 min 42, followed by filtration through a 30um mesh (Partec/Sysmex) and centrifugation at 5,000g for 1 min. Pollen was suspended in 50ul of PEB, immediately frozen in liquid nitrogen and stored at - 80°C until RNA extraction. Purification of pollen expressing the GFP-miR845b sensor in dcl1-5/+ and dcl1/+,dcl2/dcl4 mutants was performed by vortexing open flowers in 1 mL of PEB and filtering through a 30um mesh before FACS. Pollen population was identified in SSC/FSC scatter plots, and GFP fluorescence was analyzed by excitation with a 488nm laser and detected with a 530/30 bandpass filter (Supplementary Fig. 1d).
RNA purification, qPCR analysis and sequencing
Pollen total RNA was extracted using Trizol LS reagent (Life Technologies) by vortexing with glass beads for 5 minutes, and concentrated with Direct-zol columns (Zymo Research). miRNA-qPCR was performed using the Quantimir kit (System Biosciences) following manufacturer instructions. Libraries for RNA sequencing from Col-0 and Ler-0 pollen were prepared using rRNA-depleted total RNA samples and the ScriptSeq-v2 RNA-Seq Library Prep kit (Epicenter/Illumina), following manufacturer instructions. cDNA libraries were sequenced on a HiSeq2500 instrument. FastQ files were processed and mapped to TAIR10 genome using SubRead 43, normalized by calculating the number of paired-end reads per kilobase of transcript per million of mapped reads (FPKM), and analyzed using R scripts. Small RNAs were purified by running total RNA on acrylamide gels (15% polyacrylamide, 7M urea) and performing size selection of approximately 18 to 30-nt region using a small RNA ladder (Zymo Research). The small RNA fraction was isolated from acrylamide gel plugs by grinding with a plastic pestle in Trizol LS (Life Technologies), and concentrated using Direct-zol columns (Zymo Research). Libraries were prepared using the TruSeq Small RNA Sample Preparation Kit (Illumina) following manufacturer instructions, barcoded and sequenced in Illumina HiSeq 2500, NextSeq500 or MiSeq platforms depending on sample pooling strategies and desired sequencing depth. After de-multiplexing, 36-nt reads were pre-processed by trimming the 3’ adapter and filtering collapsed reads according to length and quality. Filtered reads were mapped to the Arabidopsis TAIR10 genome annotation (or GFP transgene) with bowtie reporting all multi-mappers. Only perfect match reads were used for down-stream analysis, and reads mapped to multiple genomic locations where normalized by dividing non-redundant read counts by the number of genomic hits, and subsequently calculating the number of reads per million of filtered (18-30nt) and perfectly mapped reads. Additional downstream analyses were performed using R scripts. A summary of all small RNA sequencing data generated in this study is presented in Supplementary Table 3. Inflorescence small RNA data was previously reported21.
Bisulfite sequencing and DNA methylation analysis
Pollen nuclei were isolated as previously described42,44. Approximately 50.000 nuclei were purified by FACS and used to construct sequencing libraries of bisulfite-treated DNA using the Pico Methyl-Seq kit (Zymo Research), according to manufacturer instructions. Non-directional libraries were sequenced on HiSeq and NextSeq platforms. Single end 100 and/or 76 reads were preprocessed using Trimmomatic 45 to remove adapters, trim the first 10 nucleotides and split the read in half. Preprocessed C/T- and G/A-converted reads were mapped to C/T- and G/A-converted TAIR10 genome allowing two mismatches. Perl and Python scripts were used to recover the sequence of each mapped read and calculate methylation at each individual cytosine covered by 3 or more reads. Differentially methylated regions (DMRs) were defined as 100bp bins containing at least 4, 6 or 8 differentially methylated mCGs, mCHGs or mCHHs, respectively, and absolute methylation difference of 0.35. DMRs localizing 200bp of each other were merged. A summary of all genome-wide bisulfite sequencing data generated in this study is presented in table S4. Bisulfite sequencing data of cmt2 mutant VN was previously reported 23.
Author contributions
F.B., C.K. and R.A.M designed the study. F.B., J.S.P., F.V.E., P.W. and G.M performed experiments, and F.B. analyzed the data. All authors contributed with ideas and discussion. F.B. and R.A.M prepared the manuscript.
Competing financial interests
The authors declare no competing financial interests.
Supplementary information
Acknowledgements
We thank all members of the Martienssen laboratory for discussions. Research in the Martienssen laboratory is supported by the US National Institutes of Health (NIH) grant R01 GM067014, The National Science Foundation Plant Genome Research Program, and by the Howard Hughes Medical Institute and Gordon and Betty Moore Foundation. The authors acknowledge assistance from the Cold Spring Harbor Laboratory Shared Resources, which are funded in part by the Cancer Center Support Grant (5PP30CA045508). Research in the Köhler laboratory was supported by a European Research Council Starting Independent Researcher grant, a grant from the Swedish Science Foundation and a grant from the Knut and Alice Wallenberg Foundation (to C.K.). G.M. was supported by a Marie Curie IOF Postdoctoral Fellowship (PIOF-GA-2012-330069). Sequencing for the Köhler laboratory was performed by the SNP&SEQ Technology Platform, Science for Life Laboratory at Uppsala University, a national infrastructure supported by the Swedish Research Council (VRRFI) and the Knut and Alice Wallenberg Foundation.