Unmethylated Regions Encompass The Functional Space Within The Maize Genome

Delineating the functional space within genomes has been a long-standing goal shared among geneticists, molecular biologists, and genome scientists. The genome of Zea mays (maize) has served as a model for locating functional elements within the gene-distal intergenic space. A recent development has been the discovery and use of accessible chromatin as a proxy for functional regulatory elements. However, the idea has recently arisen that DNA methylation data could supplement the use of accessible chromatin data for homing in on regulatory regions. Here, I test the robustness of using DNA methylation as a proxy for functional space. I find that CHG methylation can be non-arbitrarily partitioned into hypo-methylated and hyper-methylated regions. Hypo-methylated CHG regions are stable across development and contain nearly all accessible chromatin. Note: changes that will be made in version 2: expand introduction; expand discussion; add additional analyses; expand methods; link to github scripts.


Introduction
Accessible chromatin regions (ACRs) are regions either depleted of nucleosomes or containing unstable nucleosomes, thereby allowing access to non-nucleosome DNAbinding proteins. The first survey of accessible chromatin in maize was performed with differential MNase treatment 1 and revealed elevated accessibility at KNOTTED1 binding sites, conserved non-coding sequences, and the transcription start sites of expressed genes. Later studies performed with differential MNase 2 , DNase-seq 3,4 , and ATAC-seq 5,6 collectively revealed that accessible chromatin was the probable location of cisregulatory elements (CREs). Accumulating genetic evidence suggested that large plant genomes-most notably that of maize-contained substantial quantities of CREs within the gene-distal space, dozens of kilobases away from the nearest genes [7][8][9][10][11] . Researchers have attempted to home in on these remote putative CREs by narrowing the focus to ACRs included congruent estimates of 80% of ACRs shared between V2 inner stem and husk 3 and ~80% of leaf ACRs shared with ear primordia 5 . Slightly less congruent estimates include 23% of leaf ACR bp shared with root 2 , and ~37% of leaf ACRs shared with root 4 . Overall, the maize ACR literature suggests that accessible chromatin is dynamic among tissues, or at a minimum, ACR estimates are sensitive to differences in experimental and computational methods. These factors pose a challenge for exhaustively surveying the regulatory space of maize. ACRs present only in rare cell types or induced only by specific environmental cues could be invisible to most experimental designs. Furthermore, weakly accessible regions or very small ACRs (for example, cryptic accessible chromatin within H3K27me3 domains 12 ) are expected to fall below the detection thresholds of most methods.
In a recently published comment, Crisp et al. 13 proposed using DNA methylation, specifically in the CHG sequence context (hereafter called mCHG), to overcome the limitations associated with accessible chromatin data. Regions with depleted mCHG levels could act as a guide to home in on the functional space within large plant genomes.
The proposed benefits of using mCHG included the following: 1. Methylation cleanly partitions into hypo-methylated and hyper-methylated regions. 2. Hypo-methylated regions remain stable, despite developmental changes and environmental stressors. 3. Hypo-methylated regions encompass all ACRs, regardless of the ACRs' developmental and stress-responsive dynamics. 4. Collectively, hypo-methylated regions provide clarity to demarcating the regulatory space, including distal CREs and the upstream edges of genes' promoters. Such demarcations have previously remained ambiguous. 5. Furthermore, hypo-methylated regions can distinguish genuine genes from pseudogenes.
The maize methylome literature lends credence to these claims; however, as of now there has been only one study 22 that has rigorously tested them in the context of a large plant genome. The potential utility of using DNA methylation to demarcate the functional space behooves us to test the robustness of these claims. Therefore, this manuscript aims to test the following hypotheses: 1. The maize methylome is bimodal and can be separated into clear hypomethylated and hyper-methylated regions. 2. Hyper-methylated regions are stable across development. 3. The totality of accessible chromatin is encompassed by hypo-methylated regions.

The Maize Methylome is Bimodal
In this section of the manuscript, I aim to achieve the following goals: (1) Test if bimodal methylation distributions persist across diverse tissue types.
(3) Establish a threshold for identifying discrete hypo-methylated regions in a non-arbitrary manner. highly expressed genes (sup. data file 1) were used for the hypo-methylated control. Such exons were expected to constitutively exclude mCHG. sup. data files 3 and 4). In the three endosperm methylomes (9 DAP, 12 DAP, and mature), the DMRs ranged from approximately 20-60% mCHG and mCG, with modes centered around 40%. If chromosome-specific DMRs were prevalent, modes between 20-50% should have been visible in global methylation distributions, which they were not, even in endosperm (although endosperm methylomes did show slight increases in midmethylation bins ( fig. 3.1I)). This indicated that among all methylomes surveyed, methylation status was largely consistent among loci of homologous chromosomes.
Although the mid-methylated windows are rare, it is necessary to explain the small proportion that do exist. These windows could be transitions from hypo-to hypermethylated regions, and thus should reside within close proximity to hypo-methylated windows. Alternatively, the windows could be standalone and independent of hypomethylated regions or allele-specific hypo-methylated regions. To address this question, I plotted the mid-methylated windows' distances to the nearest hypo-methylated windows. and D, which display the proportions of mid-methylated windows directly adjacent to or within 1 kb of hypo-methylated windows. As methylation approached 20%, midmethylated windows were increasingly in close proximity to hypo-methylated windows.
Meiocyte displayed anomalous low proportions that could be attributed to the high sparseness of its methylome. Approximately 77% of mCHG and 93% of mCG midmethylated windows could be explained by proximity to hypo-methylated windows. The unexplained mid-methylated windows may be standalone; however, it was unknown how many of these windows could be attributed to stochastic methylation variation.
Collectively, the evidence suggested that mid-methylated regions did not constitute independent elements. The leaf (6 days) (BS-seq) methylome is shown as a representative sample. Plotted are 2D histograms of % methylation versus distance to nearest hypo-methylated window in the CHG context (A) and CG context (B).

C & D:
The proportion of 100 bp windows falling into discrete % methylation bins that satisfy the criteria of being either adjacent to or within one kb of the nearest hypo-methylated window. All methylomes shown. Each point corresponds to a single methylome (denoted on right). Boxplots indicate quartiles.
In summary, there is little compelling evidence to refute the bimodal methylome.
20% mCHG and mCG acts as a non-arbitrary threshold for identifying hypo-methylated regions. Supplementary figure 3.2 shows parallel analyses performed in the CHH sequence context. mCHH remained consistently low across all elements and was unsuitable for identifying hypo-methylated regions. Therefore, mCG and mCHG were used to identify regions that were either lowly methylated (< 20%) or lacking methylation (hereafter known as unmethylated regions-UMRs).

Unmethylated Regions (UMRs) are Stable Across Development
This section assesses the stability of UMRs across diverse developmental stages.  were moderately more stable than UMRs from single biological samples (sup. fig. 3.11).
This was expected, considering that merged replicates should conceal nonreproducible UMRs. However, the length-stability curves persisted, suggesting that experimental variation alone could not fully account for the reduced stability of small UMRs. B73 methylomes were ranked by the baseline-corrected stabilities of their distal UMRs (sup. fig. 3.12). Root (6 days) ranked as the least stable; although, with increasing UMR length, it converged with the other methylomes (sup. fig. 3.10C). It is noteworthy that the methylomes ranked as most stable (e.g. ear (5 cm)) also had relatively low library coverages, suggesting that low coverages could artificially inflate UMR stabilities.
An analysis of non-conserved UMR methylation revealed that non-conserved UMRs generally did not become hyper-methylated (sup. fig. 3 In order to identify developmentally dynamic ACRs, all tissues were compared to each other in pairwise to identify overlapping ACRs. ACRs were grouped into categories from 0 to 7, based on the number of tissues containing overlapping ACRs, with 0 denoting a tissue-specific ACR and 7 denoting an ACR that overlaps other ACRs in every tissue. These ACR categories were tested for overlap with UMRs ( fig. 3.4). ACRs were compared against the consensus UMR set, as well as leaf (6 days) and root (6 days) UMR sets filtered by lengths of 150 bp and larger. With an overlap threshold of 1 bp, the  The nature of the ACR-UMR overlaps were analyzed in greater depth in fig. 3.4B, which breaks down the fractions of ACRs overlapping UMRs (for distal ACRs only).
More than 95% of category 1-7 ACRs were contained within UMRs by at least half of their ACR lengths. However, the percent overlap decreased as the fraction approached complete overlap, suggesting that as much as 1/4 of distal category 1-7 ACRs overlapped UMRs but simultaneously overlapped space outside of UMRs. The remainder of ACRs were fully contained within UMRs. To account for artifacts associated with peak-calling edges, ACRs were expanded outward by 2 kb in order to detect UMRs within close proximity. This expansion increased the category 0 ACR overlap rate from 92% to 95%, leaving a remaining 5% of ACRs that were isolated within the intergenic space.
Analogous analyses for genic and proximal ACRs are shown in sup. fig. 3 increased progressively as UMRs became more constitutively accessible, but decreased at the most constitutively accessible UMRs. The input SNPs showed no overlap bias among UMR groups, indicating that the differences in QTL enrichment among groups could not be attributed to mapping artifacts.
It is possible that the top three quartiles of the non-ACR UMRs actually did contain ACRs, despite being missed by the peak caller. This could explain the elevated QTL hits for these quartiles compared to Q1. To address this, I relaxed the Genrich peak- Therefore, these data collectively suggest the existence of a population of UMRs lacking ACRs and lacking genetic evidence of functionality.

Discussion
The demarcation of the genome's functional space is a long-standing goal among plant biologists and plant breeders. ACRs have served as a useful proxy for homing in on the regulatory space in large plant genomes 3,5 . However, the dynamic nature of ACRs and the virtually unstudied effect of stressors on ACR induction in maize means that the regulatory space may extend beyond what has been described in ACR studies. Indeed, the ~11 megabases of ACRs previously reported 2,3 is far exceeded by the ~40 megabases of UMRs in the intergenic space alone and the ~170 megabases of UMRs in total.
Furthermore, UMRs can be identified with greater confidence than can ACRs. Because UMRs fall into two virtually qualitative categories-hypo-methylated or hypermethylated-they can be identified without subjective biases and without the need for an input mapping control. The rise of accurate long-read sequencing technology 20 , coupled with enzymatic methyl-seq 21 , will allow UMRs within the highly repetitive portion of the genome to be identified. Currently, this portion of the genome is off-limits to assays that rely on small read technology, including bisulfite-seq, ATAC-seq, DNase-seq, and MNase-seq. Therefore, the potential of UMRs may be even greater with the adoption of these new technologies.
The counts and total megabases of called UMRs varied substantially among tissues (sup. fig. 3.7). These differences could largely be attributed to variation in library sparseness. It is notable that the variation in UMR counts and median lengths decreased as methylomes increased in coverage, indicating that the higher coverage libraries The practical utility of DNA methylation behooves us to rigorously compare it to accessible chromatin data. Here, I report that nearly all ACRs are contained by UMRs.
However, up to 1/4 of ACRs may spill over the edge of the UMR boundary, akin to the asymmetric placement of ACRs at the edge of transcription units. It still must be determined if the cis-regulatory elements within the ACRs spill over the UMR boundaries as well. Parsing the ACRs into the 0-7 tissue specificity metric revealed that constitutive and developmentally dynamic ACRs are similarly contained within UMRs.
This indicates that the UMRs from a single tissue are sufficient to encompass the ACRs specific to a different tissue. This is further supported by the nearly identical overlap rates of the leaf, root, and consensus UMR sets. However, the exception to this is the most extreme tissue-specific ACRs, present in only 1 of the 8 tissues. As much as 8% of these  ATAC-seq reads (ear (5 mm), leaf (6 day), and mesophyll) and DNase-seq reads (inner husk and inner stem) were pooled together, along with their respective input controls, to get aggregate read counts at each UMR. UMRs are separate into subcategories based on the number of tissues in which a UMR overlaps an ACR. The category 0 means that the UMR never overlaps ACRs. NC: distal UMR negative control regions. A: The coverages of the input reads. Input read counts per UMR are divided by the length of the UMR. B: The experimental read counts per UMR divided by the input read counts per UMR. Boxplots denote quartiles. Data points denote outliers.

Supplementary Figure 3.17: UMRs overlapping relaxed-stringency ACR peaks.
For the distal UMRs previously lacking any ACR overlaps, the UMRs were split into quartiles Q1-Q4. Genrich peak-calling stringency was relaxed and the UMRs re-analyzed for overlaps with the new ACR peaks. The percentages on the y-axis show what percent of each UMR group now overlaps the new ACR peaks.