ABSTRACT
The zygote, a totipotent stem cell, constitutes a critical stage of the life cycle for all sexually reproducing organisms. It is produced by fusion of two differentiated cells, egg and sperm, that in plants have radically different siRNA transcriptomes from each other and from multicellular embryos. Here we examined the small RNA transcriptome of unicellular rice zygotes. We find that the overall 24-nt siRNA landscape in the zygote resembles that of the unfertilized egg cell, consistent with maternal carry-over. A large fraction (∼75%) of the siRNAs in the zygote arise from a small proportion (∼2%) of siRNA loci, which corresponded to similar loci in the egg cell and ovary. However, these highly expressing loci were distinct from endosperm siren loci that were detected in later stage endosperm. miRNA abundances changed rapidly after fertilization, resulting in a miRNA profile distinct from the egg cell. Notably, the de novo 24-nt siRNAs expressed by the zygote had characteristics of canonical siRNAs, such as proximity to genes and tendency to overlap TIR DNA transposable elements, and resembled seedling siRNA loci, indicating a return to the canonical siRNA profile typical of vegetative tissues. Taken together, our results suggest that resetting of the gametic epigenome towards the canonical vegetative profile is initiated before the first embryonic division.
Introduction
Gametes and zygotes constitute critical developmental stages in the life cycle of all sexually reproducing organisms. During fertilization, the egg cell fuses with a sperm cell to form the zygote, which is an undifferentiated and totipotent stem cell that initiates embryogenesis. Flowering plants undergo double fertilization, with a second sperm cell that fuses with the central cell, which gives rise to the endosperm, a nutritive tissue that nurtures the developing embryo or germinating seedling [reviewed in (Lord and Russell 2002)]. The maternal to zygotic transition (MZT) in animals consists of two steps: zygotic genome activation (ZGA) and maternal RNA degradation. In animals, early embryogenesis is controlled by maternal gene products pre-deposited in the egg cell. Depending on the organism, the zygotic genome does not become transcriptionally active until a number of cell divisions occur (Tadros and Lipshitz 2009). However, MZT in flowering plants proceeds differently [reviewed in (Armenta-Medina and Gillmor 2019)]. In rice zygotes, thousands of genes are upregulated in zygotes, many of which are undetected in the egg cell, consistent with similar observations in maize and Arabidopsis zygotes (Chen et al. 2017; Zhao et al. 2019). Furthermore, zygotic transcription was shown to be required for early embryogenesis (Zhao et al. 2019; Kao and Nodine 2019). These observations suggest that in angiosperms, unlike animals, zygotes are transcriptionally active, and plant ZGA occurs in the zygote.
Along with dynamic changes in gene expression, epigenomic reprogramming has been observed during plant reproduction. In rice and maize, the egg cell is ∼10 times larger than sperm in diameter, and thus ∼1000 times larger than the sperm cell in volume (Kranz, Bautor, and Lörz 1991; Anderson et al. 2013), and its chromatin is diffused (Scholten, Lörz, and Kranz 2002). In contrast, the sperm cell chromatin undergoes global condensation, paralleling protamine deposition in animal sperm cells (Kimmins and Sassone-corsi 2005). Many other sex-specific changes in chromatin occur during plant reproduction, as reported in the model plant Arabidopsis thaliana (Wang and Köhler 2017). For example a male-germline specific histone H3 variant MGH3 (also termed H3.10) is present in the sperm cell (Okada et al. 2005; Borg and Berger 2015), following the removal of H3.1 (Borg et al. 2020). H3.10 is resistant to trimethylation at H3K27 (H3K27me3), thus priming the activation of key genes for sperm differentiation and embryogenesis (Borg et al. 2020). Upon karyogamy, H3.10 is removed from the paternal chromatin via a replication independent process (Ingouff et al. 2007). Other histone H3 variants, such as H3.3, are also removed from egg cell chromatin upon karyogamy, followed by loading of newly synthesized histones, again via a replication independent mechanism (Ingouff et al. 2010). In addition, other cells of both male and female gametophytes in Arabidopsis experience global chromatin changes as well. Heterochromatin is decondensed in the central cell, the cell which gives rise to endosperm (Pillot et al. 2010). A similar phenomenon occurs in the pollen vegetative cell, the cell which encapsulates the sperm cells and enables their migration through the style to the ovule (Schoft et al. 2009; Mérai et al. 2014; Hsieh et al. 2016). Relaxation of heterochromatin in the pollen vegetative cell has been reported to produce short interfering RNA (siRNA) that traffic into the sperm cells, and proposed to reinforce transposon silencing in the gametes (Slotkin et al. 2009; Calarco et al. 2011; Martínez et al. 2016; Park et al. 2016; Kim et al. 2019). Similarly, it has been proposed that siRNAs traffic from the central cell to the egg cell, as well as from the endosperm into the developing embryo (Hsieh et al. 2009; Ibarra et al. 2012; Martinez and Köhler 2017).
In addition to chromatin reprogramming, there is also evidence for changes in DNA methylation during plant reproduction, especially in the context of RNA-directed DNA methylation (RdDM) [reviewed in (Gehring 2019)]. In plants, RdDM can function in both de novo and maintenance DNA methylation [reviewed in Cuerda-Gil, and Slotkin (2016)]. Briefly, 24-nt siRNAs are produced and loaded onto an argonaute protein (AGO). This siRNA-AGO complex base-pairs with the nascent transcript of RNA polymerase V (Pol V), using it as a scaffold to recruit the DNA methyltransferase DRM2. DRM2 leads to methylation at all sequence contexts, but methylation at the CHH context (mCHH), where H is A, C or T, is a strong indicator of RdDM in rice and maize (Tan et al. 2016, 2018; Gent et al. 2013), but not in all plants (Zemach et al. 2013). Methylated DNA is recognized by chromatin remodelers, which in turn lead to the deposition of repressive histone marks, such as H3 dimethylated at lysine 9 (H3K9me2). In specific genomic contexts, H3K9me2 recruits RNA polymerase IV (Pol IV), which produces the majority of 24-nt siRNA in plants [reviewed in (Matzke and Mosher 2014)]. Multiple studies reported that disruption of RdDM leads to a variety of reproductive phenotypes, including aborted embryos (Autran et al. 2011; Grover et al. 2018), arrested pollen (Wang et al. 2020), defective triploid block when the seeds were produced from a 2n maternal × 4n paternal cross (Erdmann et al. 2017; Borges et al. 2018; Martinez et al. 2018; Satyaki and Gehring 2019) and defective floral development (Dorweiler et al. 2000; Moritoh et al. 2012). These observations suggest siRNAs and RdDM are essential for normal plant reproduction.
In mammals, it has long been proposed that fusion of two epigenetically distinct gametes presents a challenge in reproduction, and resetting of the epigenome is required for the pluripotent state of the early embryo [reviewed in (Messerschmidt, Knowles, and Solter 2014)]. Epigenome reprogramming in mammals includes large-scale erasure of somatic chromatin signatures in germ cell precursors, establishment of sex-specific signatures in gametes, and post-fertilization resetting towards pluripotency [reviewed in (Messerschmidt, Knowles, and Solter 2014; Saitou, Kagiwada, and Kurimoto 2012; Tang et al. 2016)]. The functional consequences of epigenomic changes in gametic fate acquisition and subsequent zygotic totipotency in plants are unclear. It is clear, however, that in plants the majority of DNA methylation is stably transmitted both maternally and paternally [reviewed in (Gehring 2019)]. In C elegans, siRNAs can serve as carriers of transgenerational epigenetic information, in which siRNAs can be inherited across a few generations [reviewed in (Houri-Zeevi and Rechavi 2017)]. While multiple changes in siRNA profiles have been observed during plant reproduction (Schoft et al. 2009; Slotkin, et al. 2009; Calarco et al. 2012; Ibarra et al. 2012; Li et al. 2020; Grover et al. 2020), transgenerational inheritance of siRNAs, or the lack thereof, has yet to be rigorously demonstrated in plants.
In vegetative tissues, such as seedling, 24-nt siRNAs coincide with mCHH islands, which are enriched around genes, marking the ends of TEs and euchromatin-heterochromatin boundaries (Gent et al. 2013; Li et al. 2015); we refer to such an siRNA profile as the canonical siRNA profile. We previously showed that the siRNA transcriptome is reprogramed in rice gametes (Li et al. 2020), where siRNA transcriptomes of egg and sperm were distinct from each other genome-wide, as well as distinct from that of the seedling (Fig. 1). The relative magnitude of the egg-borne and sperm-borne contribution of siRNAs to the zygote and the stage at which the embryo transitions toward a canonical siRNA profile are unknown. Since siRNA production is influenced by histone modifications and DNA methylation, and siRNAs in turn can direct histone modifications and DNA methylation, the siRNA transcriptome is an output and indicator of the epigenome. Given the likely importance of siRNAs during plant reproduction, we sequenced the small RNA transcriptome of rice zygotes and investigated changes in the small RNA transcriptome soon after fertilization. We inferred that the siRNA transcriptome initiates resetting towards the canonical profile before the first cell division, along with zygotic genome activation.
Results
We collected unicellular rice zygotes 8 hours after pollination (HAP), which corresponds to the completion of S-phase, just prior to the first zygotic division (Ding et al. 2009; Anderson et al. 2017). We produced small RNA transcriptomes from 6 replicates, with ∼50 zygotes in each replicate. For our analyses, we also included small RNA transcriptome data from rice gametes, ovary and seedlings (Li et al. 2020). Except where indicated otherwise, siRNAs used for analyses were small RNA reads (20 – 25-nt) not overlapping 90% or more of their lengths with miRNA [miRBase v22, (Kozomara, Birgaoanu, and Griffiths-Jones 2019)], 5S rRNA, tRNA, NOR or phasiRNA loci [as detected in (Li et al. 2020)], and multi-mapped reads were included in all analyses unless indicated otherwise.
The global siRNA pattern in zygote is determined by siRNA transcript carryover from the egg cell
We previously found that genome-wide redistribution of siRNAs occurred in gametes (Li et al. 2020). In canonical vegetative tissues, such as seedling shoot, 24-nt siRNAs were produced from gene-rich, euchromatic regions, which correspond to mCHH islands around genes (Gent et al. 2013; Li et al. 2015). In contrast, the sperm cell has a complementary pattern, in which 24-nt siRNA were spread out across wide pericentromeric regions instead. The egg cell and ovary had a different pattern, where 24-nt siRNAs were concentrated at discrete loci (Fig 1A, top five tracks). We found that in a whole-genome view, the zygote had a very similar pattern to the egg cell (Fig 1A, bottom track). This pattern was reproducible across all zygote replicates (Fig S1A). In addition, the 24-nt siRNA expression pattern of zygote clustered with the egg cell, rather than with sperm cell or the average of the gametes (Fig S1B). Further, genome-wide 24-nt siRNA relative abundances of zygote and egg cell were highly correlated (r = 0.94, P = 0, Pearson’s correlation on egg vs. zygote tracks in Fig 1A). The siRNA landscape in zygote can be explained by siRNA carryover from the egg cell, since the egg cell is ∼1000-fold larger than the sperm cell by volume (Kranz, Bautor, and Lörz 1991; Anderson et al. 2013; Li et al. 2019). Although 24-nt siRNAs function in the nucleus, 24-nt siRNAs were found primarily in the cytoplasm of whole-plant homogenates (Ye et al. 2012). Thus, we predict that siRNAs already present in the egg cell before fertilization would contribute to much of the siRNAs present in the zygote. This is consistent with previous observations that the 50 most highly expressed genes in egg cell remained as most highly expressed in zygote, whereas the 50 most highly expressed genes in the sperm cell became much lower expressed in the zygote (Anderson et al. 2013; 2017). Further, the highest expressed miRNAs in sperm were also very much downregulated in the zygote (Fig S8A), again consistent with the prediction that sperm small RNAs were diluted by the egg cell cytoplasm, and that the overall zygote siRNA landscape is determined by transcript carryover from the egg cell.
To take a deeper look at the siRNA composition in zygote, we looked at the length profile of siRNAs and compared with our recent data from other cell and tissue types (Li et al. 2020). We found that in zygotes, as in all other tissues, 24-nt siRNAs were the most predominant (Fig 1B). This is consistent with the observation that in angiosperms, the most abundant siRNAs are 24-nt siRNA that are involved in RNA-directed DNA methylation [reviewed in (Cuerda-Gil and Slotkin 2016)]. Zygote trended well with egg, except with a slight reduction of Gypsy (LTR retrotransposon) reads and CentO (centromeric tandem repeat) reads. Like the egg cell, the zygote has a low abundance of siRNAs overlapping terminal inverted repeat (TIR) transposons (PIF/Harbinger, Tc1/Mariner, Mutator, or hAT superfamilies). These patterns showed that overall, the zygote siRNA pattern is similar to that of the egg cell, supporting that overall siRNAs in zygote were dominated by maternal transcripts carryover from the egg cell.
Next, we looked at distribution of 24-nt siRNAs relative to genes. We produced metagene siRNA coverage plots for seedling, gametes, and zygote (Fig 1C). Seedling had a strong peak about 700-bp upstream of the transcription start site (TSS), corresponding to where TIR transposons are enriched in the genome, with the exception of the CACTA superfamily (Han, Qin, and Wessler 2013), and such a peak was absent in gametes. Zygote was similar to egg; however, zygote had a slight increase in 24-nt siRNA coverage upstream of TSS. This observation indicated that although overall zygote siRNA pattern is like the egg cell, the zygote might be starting to return to the canonical siRNA pattern. We previously identified representative genomic regions as egg-specific siRNA loci, seedling-specific siRNA loci, sperm-specific siRNA loci and three-way intersection loci (Li et al. 2020). When we quantified the relative abundances of zygote reads in each category, a couple of observations were apparent. 1) Zygote had very little siRNAs in sperm-specific loci (Fig 1D). This is consistent with the prediction that the sperm siRNAs were diluted by the egg cell cytoplasm. 2) More interestingly, zygote had gained siRNAs in seedling-specific loci. This observation is another indication that the recovery of canonical siRNA pattern may be initiated in the zygote.
A small number of siRNA loci in zygote expressed abundant siRNA, which correspond to ovary siren loci
It has been previously reported that rice developing endosperm (7-8 days after pollination) has a unique siRNA profile in which a small number of loci accounted for the majority of siRNAs (Rodrigues et al. 2013). These siRNA loci were termed siren loci (siRNA in the endosperm). A similar phenomenon was recently reported in Brassica rapa and Arabidopsis ovules and endosperm (Grover et al. 2020). The term siren loci was also used by Grover et al to describe these loci. To more accurately define edges of variably-sized siRNA loci here we used Shortstack (Axtell 2013) rather than simply dividing the genome into 100-bp bins as previously (Li et al. 2020). In ovary, ∼2% (n = 818) of the siRNA loci accounted for 75% of the siRNA, and the numbers were similar in egg cell and zygote (Fig 2A). These 818 loci found in ovary we termed ovary siren loci. Here we use the term siren loci in a general sense to describe loci with highly abundant siRNAs, independently of their presence in endosperm, because the abundant siRNA loci in rice ovaries and eggs showed little correlation with the siren loci in endosperm, at least at the specific later endosperm stage examined (Li et al. 2020; Rodrigues et al. 2013). Reanalysis of the endosperm siRNAs revealed an even more extreme distribution, where ∼1% (n = 122) of the siRNA loci accounted for 75% of the siRNA expression. For subsequent analysis, we defined these 122 loci as endosperm siren loci. In contrast, in seedling and sperm cell, ∼45% of the siRNA loci accounted for 75% of the siRNA expression (Fig 2A). Siren loci were found on all 12 chromosomes and had no clear relationship relative to gene density in terms of their genomic distributions (Fig S2A). Similar to what was reported in Brassica (Grover et al. 2020), siren loci were on average longer than non-siren siRNA loci (Fig S2B). Median lengths for endosperm siren loci and ovary siren loci are 6.4-kb and 1.4-kb, respectively, whereas median lengths for other siRNA locus categories were well below 1-kb (Fig S2B). However, even after adjusting for length, siren loci were still ∼10-fold higher expressed per kilobase than non-siren loci in endosperm and ovary (Fig S2C).
We reanalyzed publicly available DNA methylome datasets, including those from our previous study, to examine the methylation status at ovary siren loci (Tan et al. 2016; 2018; Li et al. 2020). mCG and mCHG were higher at ovary siren loci relative to non-ovary-siren loci across all tissues examined, even in ddm1 and drm2 leaves (Fig S3A – B, P < 10−5, generalized linear model with logit link followed by Tukey tests). Our previous analysis of ovary small RNA loci, which also included thousands of small RNA loci with lower 24-nt siRNA abundance, revealed that most egg/ovary-specific small RNA loci have low levels of mCHH (Li et al. 2020). In contrast, in this set of 818 ovary siren loci, mCHH was elevated compared to non-ovary-siren loci in ovary and egg cell (Fig S3C, P < 10−5, generalized linear model with logit link followed by Tukey tests). Palea and lemma of rice florets and mature embryo also had elevated mCHH at ovary siren loci (Fig S3C). mCHH levels were not as strongly elevated, if at all, at ovary siren loci in the rest of the tissues (Fig S3C). DNA methylation at all contexts was also consistently higher around and on genes nearest or overlapping ovary siren loci in ovary and egg cell (Fig S4). To test whether ovary siren loci were associated with histone modifications indicative of heterochromatin, we reanalyzed publicly available ChIP-seq datasets (Fig S5) (Tan et al. 2016; Lu et al. 2018; Liu et al. 2016; Zhang et al. 2012; Lu et al. 2015; Maher et al. 2018; Lu et al. 2020; Zahraeifard et al. 2018; Zhang et al. 2017). We clustered siRNA locus categories based on their chromatin profiles (Fig S6). We found that in vegetative tissues, such as seedling shoot, on which most of the published datasets were based, ovary siren loci had an intermediate chromatin profile between euchromatin and heterochromatin (Fig S5 and Fig S6).
In Brassica rapa and Arabidopsis, siren loci detected in ovules matched those detected in endosperm (Grover et al. 2020). Our 8 hour zygote siRNAs did not match the 7-8 day endosperm siRNAs in rice embryo (Rodrigues et al. 2013). We found that the 7-8 day endosperm siren loci had low siRNA expression in other tissues, whereas ovary siren loci had high expression in egg cell and zygote, but low expression in seedling, sperm cell and endosperm (Fig 2B, linear model with log(RPM) followed by Tukey tests under a P = 0.05 cutoff), consistent with the levels of mCHH at ovary siren loci in these tissues (Fig S3C). In addition, most endosperm siren loci overlapped genes (median distance to nearest genes = 0-bp), whereas ovary siren loci had an intermediate distance from their adjacent genes (median distance to nearest genes = 1.54-kb, Fig 2C). Unlike other siRNA loci, siren loci were less likely to overlap annotated transposable elements. On average, ∼15% of the length of endosperm siren loci overlapped annotated transposable elements, and ∼45% for ovary siren loci (Fig 2D). The majority of ovary siren loci that did overlap transposon loci overlapped Gypsy retrotransposons. On average, ∼25% of the length of ovary siren loci overlapped Gypsy retrotransposons (Fig 2D). The subset of loci that did overlap annotated Gypsy elements had on average ∼3-fold fewer siRNAs per kilobase than the those that did not (Fig S3D, P = 6.4e-42, linear model with log(RPKM) followed by Tukey test). In addition, ovary siren loci are unlikely to match an unannotated high-copy transposable element, as siRNAs from ovary siren loci were less repetitive than siRNAs from ovary siRNA loci as a whole (Fig S3E). Taken together, these results showed that a small number of siRNA loci that were not associated with transposable elements accounted for the majority of siRNAs in zygote, and siRNAs from these loci were also highly expressed in the egg cell and ovary, and where they coincided with high mCHH.
Zygote experienced major change in miRNA expression and produced de novo miRNAs
Our previous study found that the rice zygote expressed de novo mRNA transcripts before the first zygotic division (Anderson et al. 2017), indicating that plant zygotic genome activation (ZGA) occurs in the zygote. De novo mRNA expression was also observed in maize and Arabidopsis zygotes (Chen et al. 2017; Zhao et al. 2019). However, whether the zygote has started to produce miRNAs, which are known regulators of gene expression, has not been explored in plants. It was found that the global mRNA expression of zygote clustered with the egg cell, rather than with sperm cell or the average of the gametes (Anderson et al. 2013; 2017), and thus we hypothesized that the global expression of zygote miRNAs will be similar to that of the egg cell. However, that was not the case. We performed a principal component analysis (PCA) on miRNA expression pattern (values for each miRNA listed in Supplemental Dataset 1). We found that the first PC axis explained ∼55% of variance in our dataset, corresponding to the global differences between seedling and gametes, whereas the second PC axis explained ∼14% of the variances, corresponding to the global differences between egg cell and sperm cell. Contrary to our prediction that the zygote would cluster with the egg cell, the global expression of zygote miRNAs was about mid-way in between the egg and seedling (Fig 3A). This observation can be explained by a major change in expression of miRNAs between the egg cell and the zygote. For example, miR159a.1/b, family members of miR159, which were the most abundant miRNAs in the egg cell and ovary, accounted for ∼70% of all miRNA reads in these tissues (Li et al. 2020); in contrast, in the zygote, the relative abundances of miR159a.1/b decreased to ∼34% of all miRNA reads in the zygote, which is closer to ∼25% in seedling and sperm cell, albeit still abundant in the zygote (Fig 3B). The high expression of miR159a.1/b raised concerns on whether they dominated the principal component analysis. This was not the case. A principal component analysis excluding expression values for miRNA159a.1/b returned the same results (Fig S7A). Moreover, miRNAs higher expressed in the zygote contributed in displacing zygote libraries to the left along PC1, and miRNAs higher expressed in the egg contributed in displacing egg libraries to the right along PC1 (Fig S7B, r = −0.62, P = 3e-34, Pearson’s correlation). Taken together, these results suggested that miRNA expression is dynamic in unicellular zygotes.
We also looked at de novo expressed miRNAs, which were defined as miRNAs that were lowly expressed in the egg cell (< 1 read per million siRNA reads) but upregulated in the zygote (Fig. 3C). Since top sperm-enriched miRNAs were all downregulated by orders of magnitude in the zygote (Fig S8A), we concluded that miRNA carryover from the sperm cell is very limited. Thus, we defined de novo miRNAs as zygote-expressed miRNAs that were lowly expressed in egg cell, instead of lowly expressed in both gametes. We detected 18 miRNAs representing 11 miRNA families (from 314 expressed miRNAs in our dataset) that met this criterion (Fig 3C). These miRNAs were detected as differentially expressed between zygote and egg under a log2FC > 1, FDR < 0.05 cutoff, and they all had near undetectable expression in all six replicates of egg cell. All 18 were also lowly expressed in sperm, further confirming sperm small RNA contribution was limited. These de novo expressed miRNAs showed that the zygote has started to express new miRNAs before the first zygotic division, much like the case of mRNAs (Anderson et al. 2017) during plant ZGA.
Upregulation of de novo miRNAs in the zygote would predict the downregulation of their mRNA targets. Using published degradome data (Zhou et al. 2010) and 5’ RACE validated targets (Liu et al. 2009) for rice miRNAs, together with published mRNA transcriptomes from rice gametes and zygotes (Anderson et al. 2013; 2017), we examined the expression of predicted targets of miRNAs (Fig S8B). We found that targets of miR393 family members, auxin F-box 2 (AFB2, Os04g32460) and TIR1 (Os05g05800), were downregulated in the zygote (Fig S8B). In addition, a target of miR171, Hairy Meristem 3 (HAM3, Os04g46860), a GRAS family transcription factor was also downregulated in the zygote. In Arabidopsis, triple mutant of ham3 and its paralogs ham1 ham2 exhibited abnormal floral meristem initiation (Engstrom et al. 2011). Two other HAM-like genes (Os02g44360, Os02g44370), both detected as miR171 targets by a published degradome (Zhou et al. 2010), were also downregulated in the zygote. Other predicted miRNA targets were not downregulated in the zygote; these targets were either lowly expressed in the egg cell to begin with, or had variances in expression levels that were too high to confidently call downregulation (Fig S8B). Lastly, we previously showed that RNA contamination from surrounding sporophytic tissue in the egg cell was minimal (Li et al. 2020). Using similar analyses, we detected five miRNAs that were highly expressed in pre-anthesis ovary but nearly undetectable in all six replicates of zygote, suggesting that RNA contamination from surrounding sporophytic tissue during zygote isolation is negligible (Fig S8C).
Zygote de novo siRNA loci had characteristics of canonical vegetative siRNA loci
Since genome-wide reprogramming of siRNA loci occurs in gametes whereas later-stage (7-8 days after flowering) immature embryos resemble seedlings (Rodrigues et al. 2013, Li et al. 2020), it is expected that the siRNA transcriptome resets to the canonical siRNA profile during embryogenesis. To test whether such a resetting process may be initiated in the zygote, we first determined the de novo siRNA loci expression in zygotes. First, we identified siRNA loci from seedling, gametes, and zygote using Shortstack (Axtell 2013) and kept only 24-nt-dominant loci. We then identified zygote siRNA loci that did not overlap with any egg siRNA loci. Since transcript carryover from the sperm cell is minimum (Fig 1D, Fig S8A), de novo siRNA loci were defined as zygote NOT egg siRNA loci, consistent with the standard definition of de novo mRNA loci in zygotes (Chen et al. 2017; Anderson et al. 2017; Zhao et al. 2019).
The genomic distribution of zygote de novo loci mirrored seedling siRNA loci and gene density, unlike zygote/egg or zygote/sperm intersection loci (Fig 4A). Zygote NOT gametes loci, defined as zygote siRNA loci that did not overlap either egg siRNA loci or sperm siRNA loci, can be viewed as a more stringent subset of zygote de novo loci, and these showed the same trend as zygote de novo loci. Zygote NOT gametes loci and zygote de novo loci were largely equivalent, as zygote NOT gametes loci constituted 80% of zygote de novo loci (Fig 4B, 14993/18467 loci). A feature of the canonical siRNA pattern is that 24-nt siRNAs are enriched near genes, as exemplified by the distribution of seedling loci closely matching that of genes (Fig 4A). We found that 84% of the zygote de novo loci overlapped seedling siRNA loci (Fig S9A). In fact, zygote/seedling intersection loci were enriched among zygote de novo loci (Fig S9B, P = 1.3e-34, Fisher Exact test). Hierarchical clustering also showed that the genomic distribution of zygote de novo loci was more closely related to that of genes or seedling siRNA loci than zygote/egg or zygote/sperm intersection loci (Fig S9C). We found that 4% of zygote de novo loci overlapped sperm siRNA loci and not seedling siRNA loci. However, these loci were unlikely the results of siRNA carryover from sperm cell for the following reasons. First, there was a lack of correlation between expression levels in sperm and zygote (Fig S9D). Second, carryover would predict these are among the highest expressed loci in sperm, which was not the case, as the vast majority of them (98%) were not among the most highly expressed siRNA loci in sperm (Fig S9D). Although spread across all 12 rice chromosomes, the siRNA abundances at zygote de novo loci were overall low (Fig 4B), only accounting for ∼14% of siRNA reads in zygote. This is consistent with the data that overall, the zygote siRNA landscape is similar to that of the egg cell. Nonetheless, the zygote has started to produce siRNAs that were undetected in the egg cell before the first zygotic division, in addition to producing de novo miRNAs. Further, since the distribution of zygote de novo loci coincided with seedling loci and gene density, siRNA expression in zygote had started to return to the stable canonical siRNA profile, that is maintained in post-germination development.
We produced metagene plots for siRNA reads overlapping zygote de novo loci. Consistent with the genome-wide distribution, in the zygote, there was a noticeable peak upstream of TSS, resembling the profile seen in seedling (Fig 4C). Immature embryo (Rodrigues et al. 2013), like seedling, also had a prominent peak upstream of TSS, suggesting that during embryogenesis, the siRNA landscape redistributes from the gametic profile back to the canonical profile, and such a process has begun in the zygote. We also quantified the relative abundances of siRNAs at total zygote siRNA loci, zygote/egg intersection loci, and zygote de novo loci. First, we found that zygote and egg had near identical relative abundances across total zygote loci and zygote/egg intersection loci (Fig 4D), which was expected given siRNA carryover from the egg cell (Fig 1A). Second, we found that zygote had low siRNA abundance at zygote de novo loci, which was again expected given that zygote de novo loci were low expressors of siRNAs (Fig 4B). However, we also found that zygote de novo loci were not zygote specific. Rather, seedling and immature embryo also had siRNAs arising from zygote de novo loci, and seedling had an even higher relative abundance of siRNAs at zygote de novo loci than the zygote itself (Fig 4D – E). Lastly, we detected 12% of the zygote de novo loci to be zygote specific, not overlapping gamete siRNA loci or seedling siRNA loci (Fig S9A). This small subset of loci accounted for only less than 1% of siRNAs in zygote, and even fewer in embryo (Fig S9E), and thus they may have represented siRNAs expressed transiently in the zygote.
We further tested the hypothesis that zygote de novo loci are indeed similar to canonical siRNA loci by comparing the distance to the nearest genes and proportion of loci overlapping different transposable element superfamilies. We found that like seedling siRNA loci, which had a median distance of 966-bp to their nearest genes, zygote de novo loci had a median distance of just 1.11-kb (Fig 5A), whereas egg siRNA loci and sperm siRNA loci were on average much further away from their nearest genes, with median distances of 1.54-kb and 4.06-kb, respectively. In addition, canonical siRNA loci tend to overlap with TIR DNA transposons rather than Gypsy retrotransposons, since TIR transposons are enriched near genes (Han, Qin, and Wessler 2013), while Gypsy retrotransposons are enriched in gene-poor pericentromeric regions (Kawahara et al. 2013). As expected, 10% of the zygote de novo loci overlapped a Gypsy, and 42% overlapped a TIR element, consistent with a shift towards seedling siRNA loci (8.8% and 63%, respectively, Fig 5B). In contrast, 19% and 49% of the egg and sperm siRNA loci overlapped with a Gypsy element, consistent with farther distances to their nearest genes. These overlaps were defined by overlapping a TIR or Gypsy element with at least 1-bp, which was very stringent. In a supporting analysis, we used a relaxed cutoff, defining overlap as overlapping a TIR or Gypsy element with at least 33% of the length of the locus, which produced the same pattern (Fig S10). Taken together, both lines of evidence support that the reestablishment of canonical siRNA loci has initiated in the zygote.
Discussion
We report here the first small RNA transcriptome of plant unicellular zygotes, which provides insights into epigenome reprogramming during plant reproduction. Overall, the siRNA transcriptome of zygote is similar to that of the egg cell (Fig 1A – C, Fig S1B, summarized in Fig 6), which can be explained by transcript carryover from the egg cell and dilution of sperm cell siRNAs, since the egg cell is ∼1000-fold larger than the sperm cell, and that 24-nt siRNAs are predominantly found in the cytoplasm (Ye et al. 2012). This is consistent with the fact that transcript carryover from the sperm cell is limited, as the zygote had few siRNAs overlapping sperm-specific siRNA loci (Fig 1D), and the most highly expressed miRNA in the sperm cell were orders of magnitude downregulated in the zygote (Fig S8A).
Similar to Brassica rapa and Arabidopsis ovules (Grover et al. 2020), a small number of loci accounted for most of the siRNA in zygote (Fig 2A), and these loci were termed siren loci. Siren loci were first discovered in rice endosperm (Rodrigues et al. 2013); however, siren loci in the zygote was distinct from endosperm siren loci in endosperm collected 7-8 days after fertilization, but instead coincided with siren loci detected in ovary and egg cell (Fig 2B – D). It should be noted, however, that endosperm nuclei divide rapidly, massive changes in cellular organization occur over a short time, and endosperm DNA undergoes active DNA demethylation [reviewed in (Gehring and Satyaki 2017)]. It is quite possible that the central cell and earlier stages of endosperm development have an siRNA transcriptome more like that of the zygote. It has been proposed that the embryo receives siRNAs from the endosperm (Martinez and Köhler 2017). This does not appear to be the case in 7-8 day rice seeds, since rice embryos had low siRNA abundance at endosperm siren loci at this stage (Rodrigues et al. 2013). A recent publication demonstrated that trans-acting siRNAs from ARFs (tasiR-ARF) traffic across ovule cell layers to regulate megaspore mother cell (MMC) identity in Arabidopsis (Su et al. 2020). It has also been proposed that siRNAs may traffic from the seed coat into the embryo during seed development (Grover et al. 2018; 2020). Likewise, it is possible that siren siRNA in the egg cell and zygotes are produced in the ovary tissue instead. As ovary siren loci have higher mCHH in ovary and the egg cell (Fig S3C), siren siRNA may serve biological functions in these tissues. Since these siren siRNA persisted in the zygote, mCHH may also be higher at both maternal and paternal alleles in the zygote, which may suggest a functional significance of these siren siRNA post-fertilization.
Unlike the siRNA transcriptome, which was globally similar to the egg cell, there was a global change in miRNAs associated with the zygotic transition (Fig 3A), which included the downregulation of miRNAs, as well as the expression of de novo miRNAs (Fig 3B – C, Fig S7). Eighteen miRNAs that were undetected in the egg cell were significantly upregulated in the zygote (Fig 3C), implying that before the first embryonic division the expression of miRNAs as well as mRNAs (Anderson et al. 2017) from the zygotic genome is a feature of zygotic genome activation in plants. The predicted targets of these miRNAs include auxin receptors (AFB2 and TIR1) that we speculate might be involved in the fine tuning of auxin signaling, and meristem related factors (HAM transcription factors), whose downregulation may be involved in establishing zygotic identity (Fig S8B). However, it remains to be investigated whether these differentially expressed miRNAs and their targets have significant functions in ZGA and embryogenesis.
The observation that the distribution of de novo siRNA loci in the zygote mirrored canonical siRNA loci (Fig 4A, Fig S9A – C) suggests that the transition to the canonical siRNA transcriptome is initiated in the zygote (summarized in Fig 6). This is further supported by a few lines of evidence. 1) zygote siRNAs from zygote de novo loci, like seedling and embryo siRNAs, had a noticeable peak upstream of TSS (Fig 4C), and such a peak was minimal in gametes. 2) Zygote de novo loci had similar characteristics as canonical siRNA loci, such as proximity to genes and tendency to overlap TIR rather than Gypsy transposable elements (Fig 5). 3) The properties of zygote de novo loci are not an artefact of how these loci were defined, as zygote overall had more gene-proximal siRNAs (Fig 1C) and more siRNAs overlapping seedling-specific siRNA loci (Fig 1D). Zygote siRNA loci overall were closer to genes (Fig 5A) and tended to overlap more TIR DNA transposons but fewer Gypsy retrotransposons (Fig 5B) when compared to the egg cell. Lastly, 4) Most zygote de novo loci were not zygote specific. Seedling had even higher siRNA abundance at zygote de novo loci (Fig 4D – E). All the above results suggest that the resetting of siRNA transcriptome is initiated in the zygote and is completed by the time seedling emerges (Fig 6).
Due to the extreme difficulties associated with zygote isolation and low input sequencing, epigenome profiling for plant zygotes has been challenging. Plant gametes are highly dimorphic in terms of size, chromatin (Wang and Köhler 2017; Borg and Berger 2015; Ingouff et al. 2010), and gene expression (Anderson et al. 2013), consistent with a differential reprogramming of gamete epigenomes prior to fertilization inferred from their siRNA profiles (Li et al. 2020). However, changes in the gametes must be followed by resetting to the canonical somatic profile during the next generation. Here, using the siRNA transcriptome of unicellular rice zygote, we inferred that the resetting to canonical siRNA transcriptome is initiated along with the zygotic transition, soon after fertilization, which is consistent with previous observations in Arabidopsis that the resetting of H3 variants also occurs in the zygote before the first cell division (Ingouff et al. 2007; 2010). Resetting of the siRNA transcriptome after fertilization would predict a lack of transgenerational siRNA inheritance in plants. However, genome-wide reprogramming of siRNAs does not preclude the persistence and inheritance of selected siRNA molecules. Therefore, transgenerational siRNA inheritance in plants is not excluded by our data. Lastly, as siRNA expression is influenced by chromatin structure, and siRNAs can either reinforce or initiate DNA methylation and histone modifications, the siRNA transcriptome is an indicator and output of the epigenome. Thus, it is likely that resetting of the gametic epigenome, such as histone modifications and chromatin conformation, is also initiated in the unicellular zygote in plants.
Methods
Plant growth condition and zygote collection
Rice (Oryza sativa) variety Kitaake was grown in soil in greenhouse under natural light condition. Zygote isolation was performed as described (Anderson et al. 2017; Li et al. 2019). Briefly, rice flowers were hand pollinated. At eight hours post pollination, ovaries were dissected. A transverse cut was made at the middle region of the ovary in a droplet of 0.3M mannitol. The lower part of the cut ovary was gently pushed by an acupuncture needle under a phase contrast inverted microscope. Once the zygote floated out of the ovary incision, it was captured by a fine glass capillary and immediately frozen in liquid nitrogen. 50 zygotes were collected for each replicate, and six replicates were collected (Supplemental Table 1).
RNA extraction and small RNA library construction
RNA extractions were performed using Ambion RNAqueous Total RNA kit (AM1931), including an on-column DNase I treatment using Qiagen DNase I (79254). Total RNA was run on a Bioanalyzer (Agilent) to check for RNA integrity, using the eukaryotic total RNA-pico program. RNA input for library construction was ∼30ng. Small RNA libraries were made using the NEXTflex small RNA-seq kit v3 (PerkinElmer NOVA-5132-05), with the following modifications. ¼ dilution of adapters was used. The 3’ adapter ligation step was done at 20°C overnight. All libraries were amplified at 24 cycles. The library product was size selected using PippinHT (Sage Science) 3% agarose gel cassettes.
Small RNA sequencing analysis
Analyses were based on the Os-Nipponbare-Reference-IRGSP-1.0 reference genome (Kawahara et al. 2013). Genome annotations for transposable elements, genes, miRNAs, 5S rRNA, tRNA, NOR, CentO repeats and phasiRNA loci were performed as described (Li et al. 2020). Quality filtering, adapter trimming, PCR duplicate removal and alignment were performed as described (Li et al. 2020). Small RNA-seq reads were quality filtered and trimmed of adapters using cutadapt (Martin 2011), parameters “-q 20 -a TGGAATTCTCGGGTGCCAAGG -e .05 -O 5 --discard-untrimmed -m 28 -M 33”. PCR duplicates were then removed using PRINSEQ, parameters “prinseq-lite.pl -fastq out_format 3 - out_good -derep 1” (Schmieder and Edwards 2011). The four random nucleotides at each end were then removed using cutadapt “-u4” followed by cutadapt “-u −4”. Reads were aligned to the genome with BWA-backtrack (version 0.7.15) (Li and Durbin 2009), parameters “aln -t 8 -l 10.” Except where indicated otherwise, multi-mapping reads were included in all analyses. The uniquely mapping subset of siRNAs was defined by having MAPQ values of at least 20 using SAMtools (Li et al. 2009). Except where indicated otherwise, siRNAs used for analyses were small RNA reads (20 – 25-nt) not overlapping 90% or more of their lengths with miRNA, 5S rRNA, tRNA, NOR and phasiRNA loci as determined by the BEDTools intersection tool (Quinlan and Hall 2010). For analysis of overlaps of siRNAs and Gypsy retrotransposons, the CentO centromeric tandem repeat, Terminal Inverted Repeat (TIR) DNA transposons, and 24-nt siRNA loci, only siRNAs that overlapped by at least 50% of their lengths were counted. CACTA elements were excluded from the TIR DNA transposons. Distances to closest genes were obtained using the BEDTools closest tool. Whole-genome small RNA heatmaps were made on 50-kb intervals using IGVtools (Thorvaldsdottir, Robinson, and Mesirov 2013). For better visualization of midrange values, heatmap intensity was maxed out at 1.25X coverage (per 10 million 24-nt siRNAs).
miRNA expression analysis
To measure miRNA expression, the BEDTools coverage tool was used to count the number of 20-25nt reads that overlapped at least 90% of their length with annotated miRNA positions (Supplemental Dataset 1). R package EdgeR was used to analyze miRNA expression (McCarthy, Chen, and Smyth 2012). Individual miRNA counts were normalized by total mapped small RNAs and filtered for >1 counts per million reads (CPM) in at least three libraries. Differential expression analyses were performed under |log2FC| > 1 and FDR < 0.05 cutoffs. Differential expressing miRNAs were visualized under counts per million miRNAs. Principal component analyses were performed using log-transformed CPM values.
ChIP-seq data analyses
Chromatin Immunoprecipitation (ChIP) data were obtained from the NCBI Sequence Reads Archive, accession numbers listed in Supplemental Table 2. Reads were aligned to the Os-Nipponbare-Reference-IRGSP-1.0 reference genome using Bowtie2 (Langmead and Salzberg 2012) default parameters, except the --trim-to parameter was set to 101 for all paired-end reads (from Tan et al 2016, Liu et al 2018, Lu et al 2020), to 36 for a subset of single-end reads (from Zhang et al 2020), and to 49 for the other single-end reads (Lu et al 2015, Zahraeifard et al 2018, Maher et al 2018). Read coverage was calculated for each set of loci using the BEDTools coverage tool. 2000 loci were randomly subsampled from each set to reduce memory requirements, except for the ovary siren loci that only had 818 total without subsampling. Principal component analysis was done using the log-transformed values of median of each locus category for each ChIP-seq library.
Definition of siren loci and zygote de novo loci
Small RNA loci were identified from the initial 20-25nt total small RNA alignment bam files using Shortstack (Axtell 2013) after merging replicates using default parameters. For each tissue type (ovary, egg cell, sperm cell, zygote, seedling shoot and endosperm), siRNA loci were defined as RPM > 2 (default), 24-nt-dominant and not detected as a miRNA locus (“DicerCall=24; MIRNA=N”). Endosperm siren loci were defined as the highest expressing loci that accounted for 75% of the cumulative RPM in the endosperm. Similarly, ovary siren loci were defined as the highest expressing loci that accounted for 75% of the cumulative RPM in the ovary. The 75% cutoff was selected based on the turning point of cumulative expression vs. percentage rank plot of ovary (Fig 2A). Zygote de novo loci were identified by the BEDTools intersect tool (Quinlan and Hall 2010). Zygote de novo loci were defined as zygote NOT egg loci (zygote siRNA loci that did not overlap any egg cell siRNA loci). To visualize the genomic distribution of siRNA loci, bed files of loci were imported into IGV and visualized across the whole genome.
Data Access
All small RNA data have been deposited in the Sequence Read Archive, BioProject PRJNA533115.
Code Access
All R codes regarding data visualization and statistical analyses were deposited in https://github.com/cxli233/zygote_smRNA/
Author contributions
CL, JIG, SDR and VS designed the study. HX and HF collected zygotes. SDR supervised zygote collections. CL produced small RNA sequencing libraries. CL and JIG analyzed data. VS supervised data collection and analyses. CL wrote the manuscript with input from all authors.
Acknowledgements
We thank Zachary Liechty and Christian Santos for assistance in R programming; and Alina Yalda, Jake Anichowski, and Michelle Binyu Cui for greenhouse maintenance and technical assistance. The UC Davis Genome Center provided Illumina sequencing, library quality control and size selection services. CL also acknowledges the partial supported by Elsie Taylor Stocking Memorial Fellowship from the Department of Plant Biology at University of California, Davis. This study was supported in part by resources and technical expertise from the Georgia Advanced Computing Resource Center, a partnership between the University of Georgia’s Office of the Vice President for Research and Office of the Vice President for Information Technology. This research was funded by the National Science Foundation (IOS-1547760) and the U.S. Department of Agriculture (USDA) Agricultural Experiment Station (CA-D-XXX-6973-H).