ABSTRACT
Background In diverse eukaryotes, DNA methylation is deposited within the coding regions of many genes, termed gene body methylation (GbM). Whereas the role of DNA methylation on the silencing of transposons and repetitive DNA is well understood, gene body methylation is not associated with transcriptional repression, and its biological importance has remained unclear.
Results We report a newly discovered type of GbM in plants, which is under constitutive addition and removal by dynamic methylation modifiers in all cells, including the germline. Methylation at Dynamic GbM genes is removed by the DRDD demethylation pathway, and added by an unknown source of de novo methylation, most likely the maintenance methyltransferase MET1. We show that the Dynamic GbM state is present at homologous genes across divergent lineages (>100 million years), indicating evolutionary conservation. Furthermore, we demonstrate that Dynamic GbM is tightly associated with the presence of a promoter/regulatory chromatin state within the gene body, in contrast to other gene body methylated genes. Lastly, we find Dynamic GbM is associated with enhanced gene expression plasticity across development and diverse physiological conditions, whereas stably methylated GbM genes exhibit reduced plasticity.
Conclusions This study proposes a new model for GbM in regulating gene expression plasticity, including a newly discovered type of GbM in which increased gene expression plasticity is associated with the activity of DNA methylation writers and erasers and the enrichment of a regulatory chromatin state. We propose that Dynamic GbM is a hallmark of genes that can navigate a greater landscape of transcriptional states by maintaining regulatory flexibility.
INTRODUCTION
In diverse eukaryote species, DNA methylation is important for transcriptional silencing of transposable elements [1,2], repressing recombination at repetitive regions [3] and establishing genomic imprinting [4,5]. DNA methylation is a reversible and dynamic mark, which can be added or removed by a number of writer or eraser enzymes [6]. Many of these writer pathways have evolved to promote highly stable inheritance of DNA methylation over cell divisions and reproductive generations [7,8]. For example, the enzyme MET1 (also known as DNMT1 in animals) acts on hemimethylated DNA at the replication fork, faithfully copying existing methylation patterns to the daughter strand at symmetrical CG sites [6]. Consequently, the vast majority of methylated CG sites are highly stable and consistently methylated across all cells [9]. In plants, many additional writers also maintain and methylate DNA de novo at CG, CHG and CHH sites, including the CHROMOMETHYLASE (CMT) family and the DOMAINS-REARRANGED METHYLTRANSFERASE (DRM) family [6]. Additionally, a family of plant-specific DNA glycosylase enzymes (collectively named DRDD enzymes) act as methylation erasers, removing DNA methylation from thousands of loci within the genome [10]. This family is comprised of the enzymes DEMETER (DME) [11], REPRESSOR OF SILENCING 1 (ROS1) [12], and DEMETER-LIKE2&3 (DML2/DML3) [13]. The DRDD pathway is critical for reproductive development [11,14,15], and also maintains epigenetic homeostasis by protecting transcribed genes from silencing by pathways targeting proximal repetitive DNA [12,16–20]. Additionally, the DRDD pathway can also act during somatic development to generate differences in the levels of DNA methylation between tissues [10].
These methylation writers and erasers together shape DNA methylation patterns across the genome, into two broad categories: 1) “heterochromatic” DNA methylation, in which cytosines are methylated in CG, CHG and CHH sequence contexts, abundant at repetitive DNA, heterochromatin and some intergenic sequences within euchromatin [6], and 2) “gene body” DNA methylation (GbM), in which the coding region of genes are methylated solely in the CG sequence context [9]. Whereas the function of heterochromatic DNA methylation is broadly understood, primarily functioning in the transcriptional silencing of repetitive DNA [1,21,22], or the potent regulation of certain specialized target genes [17,18,23,24], the functional role of GbM has remained more enigmatic [9,25,26]. GbM is associated with moderate-to-highly expressed genes [26,27], is inherited with high accuracy [28] and is conserved at homologous target genes across lineages [25]. However, a mechanistic link between GbM and gene regulation has been difficult to establish. Some lineages have dispensed with GbM altogether [9], raising questions about its necessity for gene regulation. Indeed, DNA methylation writer mutants or naturally-occurring genomes with reduced GbM do not display widespread changes to the transcriptional output of GbM genes, or strongly perturbed phenotypes [27,28]. Why GbM occurs in so many eukaryotic genomes, and at conserved genes in distinct lineages is therefore yet to be fully resolved.
The majority of GbM genes in the genome exhibit consistent methylation of CGs within all cells. In this study, we show that a subset of GbM genes exhibit heterogeneous methylation patterns between cells, indicating dynamic modification to methylation during development. This dynamic methylation heterogeneity is due to the activity of both writer and eraser pathways in all cell and tissue types, is evolutionarily conserved, and represents a newly discovered epigenetic state associated with regulatory histone modifications not typically found within gene bodies. Lastly, we show that Dynamic GbM genes exhibit dramatically higher gene expression plasticity than their stably methylated counterparts, suggesting that active modification to GbM may permit a greater exploration of the gene expression landscape.
RESULTS
DNA demethylation targets a subset of gene bodies
We recently generated a quadruple somatic mutant (drdd) of the four active DNA demethylases (DRDD enzymes) in the model plant Arabidopsis thaliana [10]. Unlike previously studied DNA demethylation mutants, drdd mutants exhibited increased CG methylation at a subset of gene bodies. To understand this genic demethylation further, we compared the 581 genes most evidently targeted by DRDD to previously described gene-body methylated (GbM) genes [9] (Fig. 1A). These genes were identified based on two cutoffs: the presence of a WT vs drdd differentially methylated region (DMR) within the gene body (see methods), and an additional filter requiring at least 5 CG dinucleotides (10 CG cytosines total) that gained >20% methylation in drdd mutants. We found that the genes targeted by DRDD were a mostly distinct group compared to previously identified GbM genes (Fig. 1B), exhibiting a number of different properties. Previously identified GbM genes exhibit high levels of CG methylation throughout the gene body (Fig. 1A, C), and each individual CG is consistently highly methylated across the majority of cells (Fig. 1A, E), much like the vast majority of methylated CGs in the genome (Fig. 1D). As the methylation state of these methylated genes are stable across development, we hereafter refer to these genes as Stable GbM genes. In contrast, genes targeted by DRDD display reduced methylation overall, in large part due to the fact that the majority of methylated CGs are methylated in only a subset of WT cells (in drdd mutants, these CGs are methylated in all cells, much like Stable GbM genes (Fig. 1A,E)). We refer to these genes as Dynamic GbM genes, due to the active removal of methylation that is occurring during plant development. The intermediate methylation state of CGs within Dynamic GbM genes (methylated in 10-85% of cells) is unusual and distinct from the vast majority of the genome (Fig. 1D-E), as the methylation maintenance by MET1 typically copies existing methylation states during DNA replication to ensure consistency across cell populations.
A) Genome browser snapshot showing representative example of a gene with Stable GbM and a gene with Dynamic GbM in both WT and drdd mutants. Red bars represent methylated cytosines, and the height of the bar represents the percentage of cells in which a cytosine is methylated. B) A Venn diagram showing the number of Stable and Dynamic GbM genes and the overlap between the two groups. C) Box plot showing the overall percentage CGs methylation across entire genes (all CGs across all cells) with Stable or Dynamic GbM. Line = median, box = interquartile range, whiskers = 5th and 95th percentile. D) Violin plot showing the distribution of methylation heterogeneity levels of all CGs in the Arabidopsis genome, shown as the percentage of cells in which a CG is methylated. E) The proportion of fully methylated, heterogeneously methylated and unmethylated CGs within both Stable and Dynamic GbM gene bodies, in both WT and drdd mutants.
DNA demethylation generates cellular heterogeneity in all tissues and cell types
As Dynamic GbM genes exhibit DNA demethylation in a subset of cells, we sought to determine the underlying developmental context of these methylation dynamics. It is possible that DRDD removes methylation in a specific subset of differentiated cell types within somatic tissues, or that the reduced methylation is developmentally specific to the tissue type (rosette leaves) of our sample. To test this, we assembled whole-genome methylation sequencing data from a number of studies that have defined the methylation states of a number of cell and tissue types within Arabidopsis [10,29–32], and examined the cellular heterogeneity of DNA methylation patterns of both Stable GbM and Dynamic GbM genes. To our surprise, we discovered that the intermediate methylation of Dynamic GbM genes was not specific to active demethylation of a specific tissue or cell type (Fig. 2A-B) within somatic tissues, but rather a feature of all the tissue and cell types we examined. This suggested that the removal of methylation by DRDD is ubiquitous across somatic development. By contrast, and as expected, Stable GbM genes were consistently methylated in the majority of somatic cells (Fig. 2B). As Dynamic GbM genes show cellular heterogeneity in DNA methylation patterns (a signature of active demethylation) in all tissues, we next wished to examine if the stem cell population in shoot meristems and germline cells also display cellular heterogeneity, or if the Dynamic GbM state is restricted to somatic development. By analyzing the cellular heterogeneity of CG methylation in shoot apical meristem stem cells (labelled with the CLV3 promoter [32]), meiocyte cells from the male gametophyte [31] and sperm cells [30], we clearly observe intermediate methylation of CG sites across cells in all cases (Fig. 2C-D). To confirm that the cellular heterogeneity of CG methylation was due to the active removal by the DRDD pathway, we next examined DNA methylation in sperm nuclei mutant for one or both of the major DRDD genes DME and ROS1 [30]. In demethylase mutant sperm, Dynamic GbM CG methylation patterns were homogenous across 100% of cells, conclusively demonstrating that heterogeneity between cells in WT sperm is due to active demethylation by DME and ROS1 (Fig. 2C). Together, these data demonstrate that Dynamic GbM loci are actively demethylated in a subset of all cells in somatic, meristem and germline tissues, and consequently are not demethylated in a specific lineage or developmental context.
A) Representative genome browser snapshot showing heterogeneously methylated CGs at a Dynamic GbM locus in a variety of cell and tissue types. (B and C) Violin plots showing the distribution of methylation heterogeneity levels of CGs within Stable and Dynamic GbM genes (methylated domains only) from various somatic cell and tissue types (B), and meristem and germ cells (C). Solid lines represent the median and dashed lines represent the interquartile range. All Dynamic GbM violins are significantly different to Stable GbM violins (p=<0.0001), with the exception of ros1; dme (+/-) sperm cells. D) Proportion of fully methylated, heterogeneously methylated and unmethylated CGs across all tissue and cell types for both Stable and Dynamic GbM gene bodies.
De novo methylation is required to maintain dynamic GbM
As Dynamic GbM genes are demethylated in the germline, we reasoned that they must also be subject to de novo methylation to counteract DRDD activity. If Dynamic GbM genes are not also targeted by one or more methyltransferases, then DRDD activity would ultimately remove all methylation, and the methylation patterns of dynamic GbM loci would not be stable over inheritance. We observe Dynamic GbM in WT samples from multiple independent laboratories [10,23,33], as well as distinct ecotypes [34] (Figure S1), so the methylation patterns of Dynamic GbM loci must therefore be robust against many generations of inbreeding. To identify the methyltransferase pathway that counteracts DRDD to maintain Dynamic GbM, we examined Dynamic GbM methylation patterns in mutants of the major de novo methyltransferase pathways, RNA-directed DNA methylation (RdDM) [33] and Chromomethylases 2 and 3 (CMT2/CMT3) [33,35]. Unexpectedly, we observed no evidence that these de novo pathways act at Dynamic GbM loci, with quadruple ddcc mutants exhibiting near-identical methylation levels and cellular heterogeneity to wild-type (Fig. 3A-B). De novo methylation of Dynamic GbM loci must therefore occur by an unknown mechanism. We therefore hypothesized that the maintenance methyltransferase MET1 may exhibit de novo activity at these loci. De novo methylation by MET1/DNMT1 is not unprecedented – it has been observed in mammalian cells [36], and de novo CG methylation has been observed at an introduced transgene within Arabidopsis [37]. Recently, de novo CG methylation was also reported in inbred ddcc lines [38]. Heterozygous met1 mutants lose the vast majority of gene body methylation [33], including at most Dynamic GbM genes (Fig. 3C). We therefore chose to exploit this system to perform a genetic experiment to test whether reintroduction of homozygous WT MET1 alleles is sufficient to incur de novo methylation at Dynamic GbM loci. Heterozygous met1 mutants were self-fertilized, and two subsequent generations of WT-segregant (+/+) progeny were selected for whole genome methylation sequencing by enzymatic methyl-seq (EM-seq). As has been previously reported, the memory of DNA methylation patterns maintained by MET1 is initially lost in WT progeny of heterozygous mutants [33]. Similarly, we did not see complete reintroduction of Dynamic GbM in WT segregant plants (Fig. 3C). While we did observe a modest gain in methylation at a subset of CGs within Dynamic GbM genes in two out of three wild-type segregant lines, this enrichment was not statistically significant (Fig. 3C). Furthermore, inbreeding WT segregant lines for an additional generation did not result in further restoration of methylation at Dynamic GbM loci (Fig. 3C). We then further assessed the number of putative de novo methylation events per CG across all cells (e.g. the presence of one or more methylated reads per CGs) at Dynamic GbM and Stable GbM gene bodies. As expected, CG methylation events were very rare at both types of GbM genes in homozygous met1 mutants (Figure S2). In met1 heterozygotes and WT-segregants, we observed that de novo methylation was approximately 25% more common at Dynamic GbM genes that Stable GbM genes (Figure S2). We also observed rare but heritable CG methylation events at individual GbM genes, which varied from individual CGs to near-WT restoration of methylation patterns (Figure S3). Together, these results suggest that de novo methylation by MET1 at an unmethylated locus is rare. If MET1 does maintain the Dynamic GbM state through de novo methylation activity, then it likely occurs when MET1 is recruited to the replication fork by the presence of hemimethylated CGs, or adjacently methylated CGs. Our data do not conclusively prove that MET1 maintains the Dynamic GbM state and thus leaves open the possibility that unknown mechanisms could operate at these loci.
A) Representative genome browser snapshot of a Dynamic GbM locus in met1, ddcc and drdd mutants. Scale bar = 200 bp. B) Boxplots showing the cellular heterogeneity of CGs within Dynamic GbM genes in WT, drdd mutants and multiple de novo methylation mutants. C) Boxplots showing the cellular heterogeneity of CGs within Dynamic GbM genes in homozygous (-/-) and heterozygous (+/-) met1 mutants, as well as multiple WT-segregant progeny from a selfed met1 heterozygote. In B and C, horizontal lines = median, box = interquartile range and whiskers = 95th percentile. Horizontal dashed lines represent the cutoffs used to identify heterogeneously methylated CGs (10-85% methylation across all cells).
Dynamic GbM loci are conserved across distant lineages
As Dynamic GbM loci are clearly maintained over many generations of inheritance and between divergent ecotypes of Arabidopsis thaliana (Fig. S1), we sought to determine whether the Dynamic methylation state was conserved over evolutionary time. Evolutionary conservation of the Dynamic GbM state in distant lineages would suggest that the targeting of methylation/demethylation pathways to these loci in all cells serves an adaptive benefit. We isolated the methylation profiles of the most likely homologous genes from five additional genomes representing lineages that diverged from Arabidopsis thaliana between 6 and 117 million years ago [39,40]. Homologs were identified by blast homology and further filtered using two criteria: 1) only homologs that were classified as one-to-one orthologs by Inparanoid orthology analysis [41] were included, to avoid misidentification of homologs in gene families with duplications. 2) homologs possessing >1% non-CG methylation across the gene body were excluded, in order to remove genes targeted by de novo methylation pathways, which could impact CG methylation heterogeneity. Even in species that diverged >100 million years ago, intermediate methylation levels at individual CGs (representing cellular heterogeneity in methylation) were clearly observable at orthologous genes to the Dynamic GbM loci observed in Arabidopsis (Fig. 4A). To verify that the frequency of intermediately methylated CGs at Dynamic GbM homologs (likely orthologs) were enriched relative to the number expected by chance, we compared homologs to the set of 581 Dynamic GbM genes against homologs to ten randomly selected sets of 581 genes. Both the number and proportion of heterogeneously methylated CGs within Dynamic GbM homolog gene bodies were substantially higher than the number and proportion observed across the gene bodies of randomly selected genes (Fig. 4B). For all species analyzed, the number of heterogeneously methylated CGs was >2-fold higher than in randomly selected genes. These data strongly suggest that the loci targeted by DRRD and MET1 are similar across distant evolutionary lineages, and that the Dynamic GbM epigenetic state may offer an adaptive benefit to certain loci.
A) Representative Dynamic GbM gene homologs from six different species. Each red bar represents a single CG, and the bar height is the percent methylation heterogeneity across cells. Dashed lines represent the cutoffs used to call heterogeneously methylated cytosines (10-85% of cells). Rightmost panel: the estimated divergence time between species and Arabidopsis thaliana, obtained from TimeTree [64]. Points represent the median divergence time estimate, and vertical dashed lines represent the 95% confidence interval. B) Violin plots showing the number (left panel) and proportion (right panel) of heterogeneously methylated CGs across all homologs of Dynamic GbM genes, and ten identically-sized groups of randomly selected genes. Solid lines represent the median and dashed lines represent the interquartile range. *p = <0.05, ***p = <0.0005.
Dynamic GbM is associated with a regulatory chromatin state
As the Dynamic GbM state is conserved over evolutionary time, we sought to understand the biological purpose of this pattern of epigenetic modification. Similarly to previous studies on Stable GbM [9,27,28], we did not identify any strong associations between Dynamic GbM and broad patterns of gene expression. Although a small number of Dynamic GbM genes exhibited altered transcript abundance in drdd and met1+/- mutants (28 genes in drdd & 37 non-overlapping genes in met1+/-), overall transcriptional output (in standard growth conditions) was largely unaltered in the absence of Dynamic GbM (Figure S4). We also found that Dynamic GbM is associated with genes that exhibit a range of expression levels, whereas Stable GbM is predominantly associated with moderate-highly expressed loci (Figure S4). Additionally, while GO-term analysis identified a few statistically significant terms across GbM genes (Figure S5), there was no overall functional similarity that could account for more than 20% of total Dynamic GbM genes.
To better understand the possible function of Dynamic GbM beyond impacting gene expression levels, we examined the chromatin state of individual exons exhibiting Dynamic GbM, compared with exons exhibiting a Stable GbM state and genomic total exons as a control. Coding regions are typically associated with the histone marks H3K4me1 and H3K36me3, which consequently correlate with high expression levels, elevated DNA repair, and low mutation rates [42,43]. Both of these histone marks were enriched in total exons relative to the genomic average, and enriched further still in Stable GbM exons (Fig. 5A), consistent with their proposed function as stably expressed, functionally important (e.g. housekeeping) constitutive genes [25,27]. In contrast, Dynamic GbM exons exhibited reduced presence of these typical coding region histone marks, instead showing a significant enrichment in H3K4me2, as well as increased levels of H3K4me3 and histone acetylation. Exons are typically depleted for the histone mark H3K4me2 which is associated with a regulatory promoter chromatin state [44]. Consequently, we observed a much greater degree of overlap between Dynamic GbM genes and intergenic chromatin states (Fig. 5B-C) – including those normally associated with distal promoter sequences – compared to Stable GbM genes. The majority of genes in the genome, including Stable GbM genes, display a single peak of H3K4me2 at the promoter and transcriptional start site, which is replaced by H3K4me1 within the coding region. In Dynamic GbM genes, we observed elongated H3K4me2 domains, which encompassed the entire gene body, or occasionally additional distinct H3K4me2 peaks that coincided with the cytosines displaying dynamic methylation turnover (Fig. 5F). While Dynamic GbM genes are typically shorter than Stable GbM genes (Fig. 5D), the enrichment of H3K4me2 is independent of gene length (Fig. 5E). Together, these data show that Dynamic GbM is associated with a regulatory, or “promoter” chromatin state, suggesting a novel link between gene body methylation and transcriptional regulation.
A) Boxplots showing the normalized read depth of ChIP-seq data of fourteen histone modifications (data from [42]) within exons overlapping Dynamic GbM domains (blue), Stable GbM domains (green) or total exons (grey). Horizontal lines = median, box = interquartile range, whiskers = 5th and 95th percentiles. The red box highlights model gene body (H3K4me1) and promoter (H3K4me2) histone marks. B) The percentage overlap between Stable and Dynamic GbM gene bodies and four chromatin states identified by [44]. C) The total base pair overlap between Stable and Dynamic GbM gene bodies and a chromatin state typically enriched in distal promoters. D) Boxplot showing the distribution of gene body lengths of both Stable and Dynamic GbM genes. E) Boxplot showing the normalized read depth of H3K4me2 ChIP-seq data at Stable and Dynamic GbM genes separated into bins based on gene body length. F) Genome browser snapshot of raw H3K4me1 and H3K4me2 ChIP-seq data at Stable and Dynamic GbM genes. Tracks represent WT data from four independent studies. Data obtained from [65].
Due to the presence of regulatory histone marks within the gene bodies of Dynamic GbM genes, we sought to understand the relationship between the Dynamic GbM state and transcription. Dynamic GbM genes do not show a substantial change in transcriptional output between WT and mutants with defective MET1 maintenance of GbM or DRDD demethylation of GbM (Figure S3). We therefore hypothesized that the Dynamic GbM state does not directly modulate transcriptional output (like RNA-directed DNA methylation or other silencing mechanisms) but rather facilities a more flexible or open-ended regulatory structure, due to the presence of regulatory histone marks. To test this hypothesis, we leveraged large transcriptome datasets that compare transcript abundance across a wide range of tissues and cell types [45], biotic and abiotic stresses [46] and inter-individual variability under consistent conditions [47]. Using these datasets, we defined the coefficient of variation for both Stable and Dynamic GbM genes across conditions, to examine whether GbM patterns relate to the range, or plasticity of possible expression states a gene can exhibit. Stable GbM genes exhibited low variability across development and variable conditions, consistent with the observed correlative association between stable gene body methylation and consistently expressed housekeeping genes [25,27](Fig. 6A). In contrast, Dynamic GbM genes exhibit high variance in expression levels across developmental contexts, and physiological conditions, suggesting their altered methylation and/or chromatin state confers increased expression plasticity. This increased variance was not observed across multiple individual WT replicates grown in consistent conditions (Fig. 6A), suggesting that high variance is not caused by high stochasticity in expression levels inherent to these loci. We therefore propose that the abundance of regulatory histone modifications within Dynamic GbM genes enables a greater range of possible expression states, in stark contrast to stable methylation, which promotes canalization towards a single, robust and consistent expression state (Fig. 6B). One prediction of this model is that disruption of methylation dynamics at dynamic GbM loci should impact gene expression plasticity. drdd mutants possess consistently high methylation levels at Dynamic GbM genes in the majority of all cells, similar to Stable GbM genes in WT (Fig. 1E). We therefore sought to test if drdd mutants exhibit reduced variance in Dynamic GbM expression levels compared to WT. Consistent with our model, drdd mutants exhibited a 30% lower standard deviation in Dynamic GbM gene expression compared to WT (Figure S6), whereas the standard deviation in Stable GbM gene expression was unchanged. This finding further supports our model – that the turnover of DNA methylation patterns at Dynamic GbM loci is associated with enhanced gene expression plasticity.
A) Boxplots showing the coefficient of variation in transcript abundance for all genes, Stable GbM, and Dynamic GbM genes across development, diverse physiological stresses and WT replicates in standard growth conditions. Horizontal lines = median, box = interquartile range, whiskers = 5th and 95th percentile. ***p=<0.0005, *p=<0.05. B) Schematic showing the proposed impact of Stable and Dynamic GbM on gene expression plasticity, represented as Waddington’s landscape. Stable GbM genes exhibit low variance, suggesting canalization into a single, robust gene expression state. Dynamic GbM genes exhibit high variance, suggesting the capacity to access multiple possible gene expression states.
DISCUSSION
DNA methylation within the bodies of genes has been a well-known feature of eukaryotic genomes for several decades [26,27,48]. Despite this, the precise function of gene body methylation has remained unclear. A positive correlation between gene body methylation and robustly expressed genes has been well established in both animals and plants [27,48], with genes that occupy the extremes of the distribution of expression states typically being unmethylated in their gene body [27]. In this study, we have presented further evidence that gene body methylation functions by modulating the possible positions within “expression space” that genes can occupy. Stable gene body methylated genes (in which CGs are methylated in every cell) exhibit a dramatically lower gene expression plasticity across development and diverse physiological conditions than the genomic average (Fig. 6). A recent high-throughput genetic knockout study also supports this model. Genetic knockout of the CG methylation maintenance methyltransferase MET1 in 18 different wild-type Arabidopsis accessions resulted in a dramatic increase in overall expression variability between genotypes, suggesting that CG methylation functions in canalizing transcriptional activity, or reducing variability [49]. Interestingly, studies of the responses of both animals and plants have also identified GbM as a correlate of gene expression plasticity. A study of two seagrass species showed that the genes that display low expression plasticity in response to environmental stress share properties of Stable GbM genes and are predicted to be methylated [50]. A similar association has been observed in corals transplanted between high and low fitness environments, where GbM is proposed to manage the balance between low/un-methylated environmentally-responsive genes and transcriptionally stable GbM genes [51].
Our study identifies a new sub-type of GbM locus within plant genomes, an exception that helps demonstrate the rule associating GbM with reduced gene expression plasticity. This sub-type of GbM, which we term Dynamic GbM, is distinguished by the targeting of methylated CGs by the DRDD DNA demethylation pathway in all cell and tissue types. Dynamic GbM genes display greatly elevated gene expression plasticity in comparison to their Stable GbM counterparts, strongly suggesting that enzymatic modification of the Stable GbM state disrupts its primary function in restricting expression plasticity. One consequence of this DNA demethylation activity is the generation of cellular heterogeneity in DNA methylation patterns and the production of unmethylated CGs within stem and gamete cells. In order to be stable over inheritance and evolution, Dynamic GbM loci must therefore also be re-methylated by one or more methyltransferase enzymes, most likely the maintenance methyltransferase MET1. The dynamic activity of demethylase and methyltransferase enzymes at these loci therefore functions as a constant source of methylation turnover, in stark contrast to the cellular consistency of Stable GbM genes. We therefore propose that dynamic methylation turnover in all cell types is a mechanism that could enable greater expression plasticity over the development of the entire organism. As the targeting and expression of methylation modifier pathways is not cell or tissue-type specific in plants, we reason that they target Dynamic GbM genes universally, so that increased gene expression plasticity is available when adaptively beneficial. Consequently, a Dynamic GbM gene may be targeted by methylation writers and erasers even in cell types or conditions in which the gene is not expressed.
While our study offers strong evidence for an association between GbM and gene expression plasticity, this proposed function is largely based on a correlative association, similar to prior hypotheses of GbM function. To move towards a mechanistic understanding of the roles of Dynamic and Stable GbM, we performed a detailed examination of the chromatin states of each. Our results show a striking association between Dynamic GbM and chromatin marks typically associated with regulatory regions, such as H3K4me2. This is consistent with the increased expression plasticity of Dynamic GbM genes, as genic H3K4me2 is likely to increase the extent to which regulators of transcription can target and modify expression of these loci [52,53]. H3K4me2 has also been shown to be associated with genes with high expression variability in Arabidopsis, as well as increased tissue-specificity across development [54], consistent with our proposed model. Intriguingly, gene-body H3K4me2 has also been reported in human cells, and – consistent with our model – is enriched within genes that show high cell-type specificity across development [55], aka elevated plasticity in possible expression states. This raises the possibility that properties of the Dynamic GbM epigenetic state could be conserved across eukaryote kingdoms. In contrast to Dynamic GbM genes, Stable GbM genes show the strongest genic enrichment of histone marks associated with polymerase II elongation and the “housekeeping gene” chromatin state, namely H3K4me1 and H3K36me3 [44]. Defining the precise molecular relationship between gene body methylation, dynamic methylation turnover, expression plasticity and a regulatory histone modification state is a key future extension of this research. One finding within our study points towards a causative role for DNA methylation dynamics in impacting gene expression variability. drdd mutants, which cannot generate methylation dynamics at GbM genes, exhibit a reduction in the variability of Dynamic GbM gene expression states (Figure S6). As gene expression plasticity is an important determinant of how organisms adapt to stress, variable environments, and climate change [51,56], discerning the precise ways in which epigenetic states impact plasticity is an important future question.
CONCLUSIONS
In this study, we define a new feature of the epigenetic landscape in plants. A subset of gene bodies within the genome are targeted by DNA methylation writers and erasers in all cells, which create a dynamic DNA methylation turnover. This dynamic DNA methylation state is associated with a “promoter-like” chromatin state, as well as enhanced gene expression plasticity. Our study also demonstrates a clear link between typical gene body methylation (which is stable across all cells) and a canalized expression state with low plasticity. Overall, our study offers a substantial advance towards understanding the functional role of enigmatic gene body methylation in eukaryotes.
MATERIALS AND METHODS
Plant Growth Conditions
All plants were grown on a 1:1 mix of potting compost and vermiculite in a Percival AR100L3 with 16-hour days at 21-22°C and 60% humidity.
met1 mutation segregation
Seeds from a single self-fertilized heterozygous met1-3 [57] plant (obtained from the Arabidopsis Biological Resource Center – stock number CS16394) were grown for 4 weeks in standard growth conditions as outlined above. A single rosette leaf was collected from individual plants and DNA was extracted as described [58]. The genotype of the MET1 locus was then assayed using the following primers: METNF2, TAGCCAACAAGTTATCGCTTACTC; METNR2, TTCGCAAACCATTCTTCACAGAGC; TL-4, TAATTGCGTCGAATCTCAGCATCG. Plants that were identified as heterozygous for met1-3 and homozygous for the WT MET1 allele were self-fertilized and collected to repeat as described for an additional generation.
DNA extraction and Enzymatic Methyl Sequencing (EM-Seq)
Leaf tissue from 28-day old plants was collected and flash frozen in liquid nitrogen. Two leaves per sample were ground using a Qiagen TissueLyser II and immediately after 200 μl of CTAB was added and heated to 65°C for approximately 20 minutes. This was followed by the addition of 200 μl chloroform, 10 minute centrifugation and ethanol washes to isolate total genomic DNA. The DNA concentration was quantified using a Qubit Flex fluorometer, and a diluted spike-in was added according to NEB recommendations (1 μl diluted pUC19 (0.001 ng) control DNA and 1 μl diluted unmethylated lambda DNA (0.02 ng) in 0.1X TE, pH 8.0 per 10–200 ng plant sample DNA). Genomic DNA samples with added spike-ins were sonicated to an average fragment size of 300bp using a Covaris S220 Focused Ultrasonicator (80s treatment, 140 peak power, 10 duty factor 200 cycles/burst). Subsequently, samples were incubated with an RNAse cocktail and RNAse was removed using a Qiagen spin column. EM-seq libraries were constructed following the NEBNext® Enzymatic Methyl-seq Kit protocol. Library quality was verified with a KAPA Library Quant Kit and fragment analyzer. The samples were pooled to a concentration of 4 ng/μl and 19.2 nM and sequenced using an Illumina NextSeq 2000 (paired-end 150 x 150 bp) 11 million average reads/sample. Further information on each EM-seq library sequenced in this study is available in Supplemental Table 1.
Whole genome methylation analysis
Mapping of whole genome bisulfite sequencing and EM-seq libraries was performed as described [10]. In brief, reads were processed with TrimGalore (Babraham Bioinformatics), trimming 8bp from the 5’ end of reads and enforcing a 3’ end quality score of >25%. Reads were mapped to the Araport11 genome using Bismark 0.20.1 [59], allowing for 1 mismatch per read in the seed region and removing PCR duplicates. Methylation values for each cytosine were calculated using the Bismark methylation extractor function. The efficiency of EM-seq conversion was verified by quantifying the percentage of methylation for reads mapped to the chloroplast. EM-seq conversion rates were >99.7% across all samples.
To classify Stable and Dynamic GbM genes, data from WT and drdd mutant bisulfite sequencing and EM-seq libraries [10] was used. Analysis was initially performed on a single high-depth EM-seq replicate to avoid inter-individual variation in methylation profiles. Subsequently, the classification of Dynamic GbM genes was also verified by pooling four bisulfite sequencing replicate libraries, showing that Dynamic GbM identification is robust against variation between replicates of the same genotype. CGs exhibiting a methylation percentage <= 10% were classified as unmethylated, CGs >=85% were classified as fully methylated and CGs methylated between 10-85% were classified as heterogeneously methylated, as each deduplicated read derives from an independent cell in the original sample. Dynamic GbM genes were identified based on two cutoffs: 1) the presence of a WT vs drdd differentially methylated region (DMR) within the gene body (DMRs were called based on aggregation of differentially methylated CGs within overlapping 200bp windows, as described in our previous work [10,24]). 2) Dynamic GbM genes required at least 5 CG dinucleotides (10 CG cytosines total) that gained >20% methylation in drdd mutants. This second step removed genes possessing only a small number of heterogeneously methylated sites in an otherwise unmethylated or fully methylated gene body. Stable GbM genes were identified as described [9]. These two cutoffs resulted in a list of 581 total Dynamic GbM genes.
The evolutionary conservation of Dynamic GbM was performed by identifying most likely homologs to the 581 Arabidopsis genes using BLAST homology from 5 additional species: Arabidopsis lyrata, Capsella rubella, Theobroma cacao, Citrus clementina and Vitis vinifera. Homologs were further filtered using two criteria: 1) only homologs that were classified as one-to-one orthologs by Inparanoid orthology analysis were included, 2) homologs possessing >1% non-CG methylation across the gene body were excluded, in order to remove genes targeted by de novo methylation pathways. Additionally, most likely homologs were also identified for ten sets of 581 randomly selected genes, which were then averaged as a control group. The whole genome methylation profiles of each species were downloaded and remapped from other studies [39,40]. For each Dynamic GbM / random homolog, the number of heterogeneously methylated CGs was isolated using bedtools intersect. Both the number and density of heterogeneously methylated CGs was compared relative to average across 10 sets of randomly selected genes. Statistical significance was calculated using an unpaired two-tailed t-test.
RNA-seq analysis
Prior to mapping, adapters were trimmed using Trim Galore (Babraham Bioinformatics), trimming 9 bp from the 5’-end of reads, and enforcing a 3’-end quality of >25%. Reads were mapped to the Araport11 genome using STAR, permitting 0.05 mismatches as a fraction of total read length and discarding reads that did not map uniquely [60]. Differentially expressed genes were identified by running htseq-count and DESeq2 [61], ensuring a minimum of two-fold change in expression and a Benjamini– Hochberg corrected P-value <0.05. GO term analysis was performed using agriGO v2.0 with default parameters [62].
Chromatin state analysis
Chromatin state classification for the entire Arabidopsis genome was obtained from Sequeira-Mendes et al [44]. The percentage and base-pair overlap between Stable and Dynamic GbM genes and each chromatin state was calculated using Bedtools intersect. In Figure 5, “Gene body” refers to chromatin states 3 and 7 combined, “Promoter (TSS)” refers to state 1, “Promoter (distal)” refers to state 2, and “Other intergenic” refers to states 4, 5, 8 and 9 combined.
Expression variance analysis
Gene expression counts for Stable GbM genes, Dynamic GbM genes and Total genes were obtained from the Bio-Analytic Resource for Plant Biology (BAR) Expression Browser platform [63]. The coefficient of variation for each gene was then calculated using all available data across tissues/cell types (305 data points) or physiological stress conditions (165 data points). Genes that were not represented in at least 75% of all tissues/conditions were excluded. 133 genes that were identified in both the Stable and Dynamic GbM gene groups were also excluded to avoid overlap between datasets. To calculate the variability in gene expression across identical conditions, RNA-seq from 14 wild-type replicates was obtained from a study by Cortijo et al [47], and the coefficient of variation of TPM values for each gene was calculated.
DECLARATIONS
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Availability of data and materials
All high throughput sequencing data generated in this study will be made available (upon acceptance) at the NCBI Gene Expression Omnibus.
Competing interests
The authors declare that they have no competing interests
Funding
CW is supported by an NSF Postdoctoral Fellowship in Biology (award #2209401).
Authors’ contributions
CJW and KAT performed experiments. CJW, DD, GJM and BPW performed data analysis. BPW conceived of the study and wrote the manuscript.
FIGURE LEGENDS
The rate of de novo CG methylation events per CG across all cells. Bars represent the mean, error bars represent standard deviation. Individual data points represent independent genetic lines that were sequenced. Chloroplast CG methylation is included a control for the false positive rate from incomplete conversion or sequencing errors.
Genome browser snapshots of four Dynamic GbM genes in WT, drdd, met1 heterozygotes (+/-) and three independent lines of WT-segregant progeny (+/+).
A) Boxplots showing distribution of Log2 fold-change in transcript abundance for Stable and Dynamic GbM genes in WT v met1 heterozygous mutants (which lose GbM) and WT v drdd mutants (which exhibit elevated GbM at Dynamic GbM genes). Horizontal line = median, box = interquartile range, whiskers = 1st and 99th percentile. B) Distribution of Stable GbM genes across WT transcript abundance deciles. C) Distribution of Dynamic GbM genes across WT transcript abundance deciles. In B and C, 1 = highest expression decile, 10 = lowest expression decile.
Symbol color is a heat map scale for statistical significance and symbol size is proportional to the number of genes for the GO term. The x axis indicates the enrichment ratio for DEGs relative to all genes with the GO term.
Barplots show the number of genes in the extreme ends of the expression distribution. Extreme high = top expression decile, extreme low = bottom expression decile + silent genes. Asterisks represent the total standard deviation in TPM expression values for each group.
Acknowledgements
We are grateful to Annika Quist for her assistance with mapping RNA-seq data. We are thankful to Bob Schmitz, Steve Jacobsen and PRV Satyaki for helpful feedback and discussion.