Benchmarking DNA Methylation Assays for Marine Invertebrates

Interrogation of chromatin modifications, such as DNA methylation, has potential to improve forecasting and conservation of marine ecosystems. The standard method for assaying DNA methylation (Whole Genome Bisulfite Sequencing), however, is too costly to apply at the scales required for ecological research. Here we evaluate different methods for measuring DNA methylation for ecological epigenetics. We compare Whole Genome Bisulfite Sequencing (WGBS) with Methylated CpG Binding Domain Sequencing (MBD-seq), and a modified version of MethylRAD we term methylation-dependent Restriction site-Associated DNA sequencing (mdRAD). We evaluate these three assays in measuring variation in methylation across the genome, between genotypes, and between polyp types in the reef-building coral Acropora millepora. We find that all three assays measure absolute methylation levels similarly, with tight correlations for methylation of gene bodies (gbM), as well as exons and 1Kb windows. Correlations for differential gbM between genotypes were weaker, but still concurrent across assays. We detected little to no reproducible differences in gbM between polyp types. We conclude that MBD-seq and mdRAD are reliable cost-effective alternatives to WGBS. Moreover, the considerably lower sequencing effort required for mdRAD to produce comparable methylation estimates makes it particularly useful for ecological epigenetics.


Introduction
The alarming effects of climate change on marine environments have led to a growing interest in Ecological Epigenetics. This relatively new field, focused on the interrelationships between environment, epigenetic modification, gene expression, and phenotypic variation (Bossdorf et al. 2008), has potential to improve forecasting and conservation of marine ecosystems. For instance, epigenetic modifications are hypothesized to mediate phenotypic plasticity, a mechanism important for resilience to environmental change (Reusch 2013;Eirin-Lopez and Putnam 2019). In humans, individuals prenatally exposed to famine show persistent differences in DNA methylation at relevant genes alongside alterations in disease risk (Painter et al. 2005;Heijmans et al. 2008). There is evidence that effects may extend even to the grandchildren of those who experienced food shortage (Kaati et al. 2007). Evidence from other mammals adds further support for such intergenerational, and even transgenerational effects (Radford et al. 2014;Irmler et al. 2020). In one remarkable case, traumatic olfactory conditioning in male mice was reported to produce epigenetic effects in F1s, and behavioral sensitivity even in F2s (Dias and Ressler 2014).
Intergenerational effects and maternal effects have also been reported in plants (Feil and Fraga 2012), corals  and sea urchins (Wong et al. 2018;Strader et al. 2019;Wong et al. 2019). While such reports are exciting, it is important to maintain a reserved view on the overall importance of epigenetics for adaptation, especially as many published examples await independent replication (Horsthemke 2018) or have had attempts at replication fail to produce the same results (Irmler et al. 2020).
A notable feature found in plants and invertebrates is an association between gene body methylation (methylation of CpG sites within coding regions; gbM), and gene expression. In both groups, genes with gbM tend to be actively and stably expressed, whereas those without gbM tend toward less active, inducible expression (Zemach and Zilberman 2010;Sarda et al. 2012;Takuno and Gaut 2012;Gavery & Roberts, 2013;Takuno and Gaut 2013;Dixon et al. 2014;Dimond & Roberts, 2016;Takuno et al. 2016).
Although gbM does not systematically regulate gene expression in plants or animals (Bewick et al. 2016;Zilberman 2017;Bewick et al. 2018;Bewick et al. 2019;Harris et al. 2019;Choi et al. 2020), comparisons between populations may still be ecologically informative. Indeed, in the coral Acropora millepora, comparative methylomics predicted fitness characteristics of transplanted corals better than either SNPs or gene expression (Dixon et al. 2018). The potential to predict fitness in novel conditions is especially important for conservation efforts involving outplanting individuals to maintain and rescue wild populations (van Oppen et al. 2015;van Oppen et al. 2017).
Hence there is a need for cost-effective examination of chromatin modifications in ecological contexts. While chromatin marks such as histone modifications are undoubtedly important, DNA methylation is currently the easiest to measure, and the best-studied (Hofmann 2017).
Here, we use a model reef-building coral, Acropora millepora, to benchmark methods for assaying DNA methylation. Reef-building corals are prime candidates for the application of ecological epigenetics. They are exceptional both in their socioecological value, and sensitivity to anthropogenic change (Cesar 2000;Foden et al. 2013). As they are long-lived and sessile, they cannot migrate in response to suboptimal conditions, and must instead depend upon plasticity. Using this system, we compare three assays for measuring DNA methylation: Whole Genome Bisulfite Sequencing (WGBS), Methylated CpG Binding Domain Sequencing (MBD-seq) (Serre et al. 2009), and a modified version of the MethylRAD (Wang et al. 2015). WGBS, considered the gold standard for measuring DNA methylation, works by chemical conversion of unmethylated cytosines to uracils. Following PCR amplification, these bases are read as thymines. Hence, when mapped against a reference, fold coverage of reads indicating cytosine at a given site relative to fold coverage indicating thymines quantifies the rate at which the site was methylated in the original DNA isolation. MBDseq works by capturing methylated DNA fragments with methyl-CpG-binding domains affixed to magnetic beads. This methodology has been used previously for ecological studies in A. millepora (Dixon et al. 2016;Dixon et al. 2018) and benchmarked against bisulfite sequencing in cultured embryonic stem cells (Harris et al. 2010). MethylRAD selects for methylated DNA through the activity of methylation-dependent restriction enzymes. DNA is digested with these enzymes, producing sticky ends exclusively near methylated recognition sites that allow for adapter ligation and sequencing. Methylation is quantified based on resulting fold coverage within a given region. The original MethylRAD protocol involved size selection for short fragments that were cut on both sides of palindromic methylated recognition sequences (Wang et al. 2015). We have modified the protocol by size-selecting for all digestion-derived fragments in the 170-700 bp range. The method is now conceptually similar to the genotyping by sequencing (GBS) protocol described in Elshire et al. (2011) and Andrews et al. (2016).
To differentiate it from the original methylRAD, we refer to it as methylation-dependent Restriction site-Associated DNA sequencing (mdRAD).
With these three assays, we examine variation in methylation between genomic regions, between two polyp types (axial and radial), and between coral colonies (genotypes). We compare results from each assay to assess how consistently they measure methylation, and the optimal sequencing effort to maximize sensitivity while minimizing costs.

Sample collection
Two adult colonies of A. millepora were collected by SCUBA on November 25 th , 2018, one from Northeast Orpheus (labeled N12), and one from Little Pioneer Bay (labeled L5), under the Great Barrier Reef Marine Park Authority permit G18/41245.1. Colonies were maintained in the same raceway with flow of unfiltered seawater for 22 days.
Branches from each colony were submerged in 100% ethanol and immediately placed at -80°C for 48 hours. Samples were then maintained at -20 or on ice for approximately 48 hours during transport to the laboratory where they were again stored at -80°C until processing.

DNA Isolation
For each axial polyp sample, the very tips of four branches were cut off and pooled. For radial polyps, similar amounts of tissue were pooled from the sides of the same four branches. Tissue samples were lysed in Petri dishes with 2 ml of lysis buffer from an RNAqueous™ Total RNA Isolation Kit (cat no. AM1912). DNA was isolated using phenol:chloroform:isoamyl alcohol with additional purification using a Zymo DNA cleanup and concentrator kit (cat no. D4011)(Supplemental Methods file). Isolations were quantified using a Quant-iT™ PicoGreen™ dsDNA Assay Kit (cat no. P7589). The same DNA isolations were used for each downstream methylation assay. We isolated three replicates from each genotype-tissue pairing, for a total of 12 isolations (2 colonies, 2 tissues, 3 replicates per). In downstream analyses, we use treatment groups to refer to either coral colony (N12 vs L5), or polyp type (tip vs side).

Whole genome bisulfite sequencing library preparation
Whole genome bisulfite sequencing (WGBS) libraries were prepared using a Zymo Pico Methyl-Seq Library Prep Kit (cat no. D5455). Each library was prepared from 100 ng of genomic DNA. For half the samples, we included 0.05 ng (0.05%) of λ phage standard DNA to estimate conversion efficiency. The final sample size was 8 (2 genotypes, 2 tissues, 2 replicates per; Table 1). The 8 libraries were sequenced across four lanes on a Hiseq 2500 for single-end 50 bp reads at The University of Texas Austin Genome Sequencing and Analysis Facility (GSAF). Single-end sequencing was recommended in the Zymo Pico Methyl-Seq manual. mdRAD library preparation mdRAD libraries were prepared using a protocol based on Wang et al. (2015). Importantly, Wang et al. (2015) selected small sized fragments that had been cut on either end by the enzyme due to palindromic recognition sequences. Since in our hands the yield of the palindrome-derived product was very low, we instead sequenced any ligated fragments in the 170-700 bp range. We also used different oligonucleotide sequences, designed for similarity to those used in the current 2bRAD protocol (Table   S1) (Dixon et al. 2015;Matz et al. 2018;Matz 2019). A detailed version of the protocol used is included as a Supplemental Methods file. We prepared libraries using two  (Table S1). At this point in the protocol, all samples were distinguishable by the dual barcoding scheme.
The concentration of each PCR product was quantified using PicoGreen™ dsDNA Assay Kit (cat no. P7589). Based on these concentrations, 200 ng of each product was combined into a final pool with approximate concentration of 32 ng/µl. A portion of this pool was then size selected for 170 -700 bp fragments using 2% agarose gel and purified using a QIAquick gel Extraction kit (cat no. 28704). After gel purification, the pool was sequenced with a single run on a NextSeq 500 for paired-end 75 bp reads at the University of Texas Genome Sequencing and Analysis Facility. The final number of libraries included in the pool was 24 (2 genotypes, 2 tissues, 2 different restriction endonucleases, 3 replicates per combination; Table 1). As this methylation assay depends on fold coverage to infer methylation levels, single-end reads are a more costeffective approach. We opted for paired-end reads in this case only to ensure proper product structure for benchmarking purposes.

MBD-seq library preparation
MBD-seq libraries were prepared using a Diagenode MethylCap kit (cat no. C02020010) as described previously (Dixon et al. 2016;Dixon et al. 2018). Briefly, genomic DNA was sheared to a target size of 300 -500 bp. Concentrations based on PicoGreen™ dsDNA assay on genomic DNA were assumed not to have changed during shearing. Because limited genomic DNA remained, we prepared these libraries from pools of genomic DNA for each genotype-tissue pair. Also due to limited genomic DNA, the two libraries for N12 tips were prepared using only 0.565 μg as input. For the remaining libraries, half were prepared with 1 μg of input and the other half from 1.5 μg. During capture of methylated DNA, we retained the flow-through for sequencing, which we refer to at the unbound fraction. Captured methylated fragments were eluted from capture beads in one single total elution using High Elution Buffer. The final sample size was 8 (2 genotypes, 2 tissues, 2 replicates per; Table 1). After capture, fragment size was assessed using 1.5% agarose gels. The captured and unbound fractions both ranged between 200 and 1000 bp. These fragments were submitted to the University of Texas Genome Sequencing and Analysis Facility. Here the fragments were further sheared to a target size of 400 bp. This additional shearing was done to ensure appropriate library sizes of 300 -500 bp for sequencing. Libraries were prepared with a NEBNext Ultra II DNA Library Preparation Kit (cat no. E7645). Libraries were sequenced with a single run on a NextSeq 500 for single-end 75 bp reads.

Whole genome bisulfite sequencing data processing
Raw reads were trimmed and quality filtered using cutadapt, simultaneously trimming low-quality bases from the 3' end (-q 20) and removing reads below 30 bp in length (-m 30) (Martin 2011). Trimmed reads were mapped to the A. millepora reference genome

MBD-seq data processing
Raw reads were trimmed and quality filtered using cutadapt simultaneously trimming low-quality bases from the 3' end (-q 20) and removing reads below 30 bp in length (-m 30) (Martin 2011). Trimmed reads were mapped to the A. millepora reference genome (Fuller et al. 2019) with bowtie2 using the --local argument (Langmead and Salzberg 2012). Alignments were sorted and indexed using samtools (Li et al. 2009), and PCR duplicates were removed using MarkDuplicates from Picard Toolkit (Broad Institute 2019). Fold coverage for different regions (eg. gene boundaries, exon boundaries, 1 Kb windows, etc.) was counted using multicov from BEDTools (Quinlan and Hall 2010).
Detailed steps used to process the MBD-seq reads are available on the git repository (Dixon 2020).

mdRAD data processing
All mdRAD reads were expected to contain NNRWCC as the first six bases of the forward read, and ACAC as the first four bases of the reverse read (Table S1; Supplementary Methods Section). The degenerate NNRW sequence in the forward read allows for discrimination of PCR duplicates, as uniquely ligated digestion products are unlikely (1/64) to bear identical sequences for these four bases. With this in mind, we used a custom python script to filter out any reads for which the first 20 bp was duplicated in a previous read (ie a likely PCR duplicate). At the same time, all paired end reads were filtered to retain only those with the expected NNRWCC beginning to the forward read and ACAC in the reverse read. These non-template bases were trimmed, along with adapters and low-quality bases using cutadapt (Martin 2011).
Trimmed reads were mapped to the A. millepora reference genome (Fuller et al. 2019) with bowtie2 using the --local argument (Langmead and Salzberg 2012). Alignments were sorted and indexed using samtools (Li et al. 2009). Fold coverage for different was counted using multicov from BEDTools (Quinlan and Hall 2010). Detailed steps used to process the mdRAD reads are available on the git repository (Dixon 2020).

Designating of regions of interest
Statistical analyses for all three assays were based on windows recorded in .bed files.
These included genes, exons, upstream sequences, and tiled windows of varying sizes.
Gene, exon, and upstream sequence boundaries were identified from the reference GFF file (Fuller et al. 2019). Upstream sequences included 1 Kb upstream of each gene.
These were intended to approximate promoter regions. Tiled windows were generated using makewindows from the BEDTools suite (Quinlan and Hall 2010). General statistics for these regions such as length, nucleotide content, and the number of CpGs, were extracted from the reference genome with a custom python script using SeqIO from Biopython (Cock et al. 2009). All downstream analyses of methylation level and differences between groups were based on these regions.

Whole genome bisulfite statistical analysis
Statistical analyses of WGBS data were conducted on the .cov files output from Bismark. Analysis was conducted only on CpG sites. Methylation level was calculated in several ways. The simplest metric was the overall fractional methylation, calculated as the number of methylated counts divided by all counts summed across CpGs within the region.
We report this as the % methylation on the log2 scale throughout the manuscript (eg Figure 1A). We calculated a similar metric using generalized logistic regression. Here the estimate of a region's methylation level was the sum of the intercept and the region's coefficient for a model of the probability of methylation given all methylated and unmethylated counts within the region. We also report the frequency of methylated CpGs, calculated as the number of methylated CpG sites divided by the total number of CpG sites within a region. We classified a CpG as methylated when the number of methylated counts was significantly greater than the null expectation with 0.01 error rate (binomial test; p-value < 0.05). We also calculated the ratio of methylated CpGs to the total length (bp).
Statistical analysis of differences in methylation between treatment groups (tissue type or colony) was done with the MethylKit package (Akalin et al. 2012).
Filtering parameters supplied to the filterByCoverage() function were lo.count=5, and hi.perc=99.9. The function methylKit::unite() was run using min.per.group = 4, so that only sites with data from all samples in each treatment group passed. Methylation counts for particular regions were isolated using the appropriate .bed file, the Granges() function from the GenomicRanges package (Lawrence et al. 2013), and the regionCounts() function from MethylKit.

MBD-seq statistical analysis
Statistical analyses of MBD-seq data were conducted on the fold coverages output from BEDTools multicov. Methylation level was calculated based on the difference in fold coverage between the captured and unbound fractions taken during library preparation. We quantified this using DESeq2 as the log2 fold change between the two fractions from a model including colony and polyp type as covariates (Love et al. 2014).
Following previous studies (Dixon et al. 2016;Dixon et al. 2018), we refer to this value as the MBD-score. We also calculated methylation level based on the fragments per kilobase per million reads (FPKM) from the captured fraction averaged across all samples. Differential methylation was also assessed using DESeq2. This was done in two ways, one using both the captured and unbound fractions, the other using only the captured fraction. Using both the captured and unbound fractions, the effect of treatment group was assessed as the interaction between treatment group and fraction.
In other words, we assessed the effect of treatment group on the difference between the captured and unbound fractions. To assess methylation differences without using the unbound fraction, we simply compared fold coverages from the captured fraction between treatment groups with a model including the alternative grouping as a covariate. DESeq tests were run using fitType = 'local' and significance was assessed using Wald tests.

mdRAD statistical analysis
Statistical analyses of mdRAD data were conducted on the fold coverages output from BEDTools multicov. Methylation level was calculated as FPKM averaged across all samples, and as the fragments per recognition site per million reads. Methylation differences were calculated using DESeq2 comparing fold coverage between treatment groups while controlling for the restriction enzyme used and the other treatment group.
DESeq tests were run using fitType = 'local' and significance was assessed using Wald tests.

Simulating reduced fold coverage
To assess the importance of fold coverage for methylation statistics, we simulated reduced fold coverages for each of the three assays. For MBD-seq and mdRAD, this was done by sampling iteratively lower total counts with replacement weighted by the gene's proportion of total read counts in the original dataset. To clarify, to simulate read reductions for 28188 genes for each sample, a vector of weights was generated by dividing each gene's fold coverage by the total for the sample. A vector ranging from 1 to 28188 was then randomly sampled with replacement, with probabilities set by the weight vector. The number of times each value was sampled was then totaled to give each genes' count in the simulated fold reduction. For WGBS, the trimmed fastq files were randomly sampled without replacement and all processing steps were repeated as indicated above.

Statistical reporting
Unless otherwise noted, we report significant results as those with false discovery corrected p-values less than 0.1 (FDR < 0.1) (Benjamini and Hochberg 1995).
Correlations are reported as Pearson correlations. All scripts for data processing and analysis in this study are available on GitHub: (Dixon 2020).

WGBS sequencing results
Sequencing the WGBS libraries produced 954 million single-end reads across 8 samples (2 from each colony-tissue type pair; median = 120 million per sample).
Trimming and quality filtering reduced the median to 119 million per sample. Mapping efficiency was 40% on average, with a median of 47 million mapped reads per sample.
Conversion efficiency averaged 98.5 ± se 0.05% based on spiked in lambda DNA and 98.0 ± se 0.10% based on mitochondrial DNA. The overall percentage of mapped reads was 39% of raw reads.

MBD-seq sequencing results
Sequencing the MBD-seq libraries produced a total of 488 million single-end reads.
These were divided across 8 samples each with two libraries (one captured and one unbound

mdRAD sequencing results
Sequencing the mdRAD libraries produced a total of 284 million paired-end reads across 24 libraries (3 replicates for each of the 4 colony-polyp type combinations each prepared with 2 different restriction enzymes). These were filtered to include only reads with the appropriate adapter sequences found in both the forward and reverse directions (~71% of reads) and to remove PCR duplicates based on degenerate sequences incorporated into the forward read (average 13.5% duplication rate). On average 60% of raw reads passed both these filters (172 million total passing reads).
Trimming and quality filtering further reduced this by 0.2%, for 74 million reads for Fspe1 libraries (median = 5.9 million per sample) and 98.5 million for the Mspj1 libraries (median = 7.3 million per sample). Properly paired mapping efficiency averaged 77% and 66% for Fspe1 and Mspj1 libraries respectively, giving final median read counts of 4.6 and 4.9 million reads per library. The final percentage of raw reads that passed all filters and properly mapped was thus 44% for Fspe1 and 42% for MspJ1.

Estimating methylation level
Measurements of absolute levels of gbM were consistent across assays. Each assay identified a bimodal distribution of gbM (Figure 1 A-C). Pearson correlations between assays were all greater than 0.8 (Figure 1 D-F). All three assays correlated negatively with the CpGo/e, with the strongest correlation for WGBS ( Figure S1). Correlations similar to those for gbM were found for exons ( Figure S2  two assays ( Figure S5). For MBD-seq, metrics that did not include the unbound fraction (FPKM and a similar metric based on the number of CpGs) correlated poorly with other assays ( Figure S6).
Hence sequencing the unbound fraction is important for measuring absolute methylation level with MBDseq. For mdRAD, the two restriction enzymes produced nearly equivalent results. mdRAD FPKM was more consistent with other assays than a similar metric based on the number of recognition sites ( Figure S7).

Methylation differences between groups
Estimates of differential methylation between coral colonies were concordant between assays, but less so than methylation level. Each assay identified extensive differential methylation between the two colonies (Figure 2A-C). The number of significant differentially methylated genes (DMGs) detected with each assay reflected the sample sizes used, rather than overall sequencing effort (Table 1)  with the other two assays similarly (Pearson correlation = 0.39 and 0.41). mdRAD and WGBS were less correlated (Pearson correlation = 0.26). Correlations were stronger (0.31 -0.55) when only methylated genes (> 3.1% methylation based on WGBS; Figure   1A) were considered. Similar results were found for differences between exons ( Figure   S9), 1 Kb windows (Figure S10), and upstream regions of coding sequences ( Figure  Figure 2: Correla7on of gbM difference es7mates between two coral colonies (genotypes). (A-C) Volcano plots illustra7ng differen7al gbM for the indicated assay. Red points indicate significant genes (FDR < 0.1). The number of biological samples, libraries, total number of filtered and aligned reads, and the number of significant and nonsignificant genes is given in the sub7tle for each panel. (D-F) ScaYerplots of gbM difference es7mates for the indicated assays. Pearson correla7ons are indicated in the top leZ. S11). Hence, estimates of methylation differences between colonies (genotypes) were noisy, but reproducible across assays.
In contrast to differential methylation between colonies, differences between polyp types were weak, and not reproducible across assays. The number of significant differences was reversed compared to the colony comparison, with the most (169 DMGs) detected by WGBS, the second (12 DMGs) by MBD-seq, and the least (1 DMG) by mdRAD ( Figure S12). There was no overlap in significant calls between assays.
Difference estimates based on WGBS showed no correlation with the other two assays (Pearson correlation between 0.01 and 0.02). MBD-seq and mdRAD correlated weakly 0.2 ( Figure S12).

Spatial precision
Correlations between assays were generally robust across window sizes. For each assay, we calculated methylation level, as well as methylation differences between the two colonies for tiled windows of varying sizes: (100bp, 500bp, 1Kb, 5Kb, and 10Kb).
Correlations between assays were generally consistent across window sizes, both for methylation level and methylation differences ( Figure 4). As with gbM, correlations for methylation level were stronger (2-4 fold) than those for methylation differences. Hence, for the coral genome, MBD-seq and mdRAD reproducibly agree with the singlenucleotide measures from WGBS even across small regions.

Effect of fold coverage on detecting methylation differences
Given the importance of reducing sequencing costs for ecological epigenetics, we sought to evaluate the importance of sequencing effort for each assay in estimating methylation statistics. To do this, we simulated reduced sequencing effort by random resampling of fold coverage from the datasets. We then re-calculated estimates of methylation level and methylation differences from the reduced sets. As we detected no reproducible differences between polyp types ( Figure S12), we focused on differences between colonies (genotype).
For estimates of absolute levels of gbM, fold coverage appeared to matter very little. We found that correlation between assays plateaued between 0.75 and 0.80 with roughly 20% of the original sequencing effort ( Figure S13). Although lower, correlation of gbM differences also plateaued with relatively little sequencing effort ( Figure 4A-C).
Hence correlation between assays was sensitive only to severe reductions in fold coverage. Moreover, increasing fold coverage appeared unlikely to improve correlations between assays.
Detecting significant DMGs in contrast, was more dependent on fold coverage.
For the sake of comparability, here we reduced the mdRAD dataset to just eight libraries prepared with the FspE1 enzyme. To illustrate the importance of fold coverage for statistical significance, we plotted the proportion of DMGs detected by at least two of the assays (all overlapping regions in figure 3) that were also detected with each read reduction ('any 2' trace in Figure 4 D-F). Given its similarity to the sensitivity metric used to evaluate classification models, we refer to this statistic as comparative sensitivity.
For a more stringent test of sensitivity, we also computed this value based on DMGs detected in each of the alternative assays ('alt. 2' in Figure 4 D-F). Based on this analysis, it appeared that increasing sequencing effort would have returned many more DMGs for WGBS, somewhat more for MBD-seq and relatively few more for mdRAD. We also assessed how often DMG calls by each assay were corroborated by the other assays. Here we computed comparative precision as the proportion of DMGs from a given reduction that were also significant for at least two of the assays' full datasets ('any 2' in Figure 4G-I). For greater stringency, this was also computed based on significance in the two alternative assays. Corroboration rates were slightly higher for WGBS DMGs, but generally similar for all three assays. When we repeated the analysis using the full mdRAD dataset (which still used less overall sequencing; Table1), mdRAD detected many more corroborated differences, with only slightly lower comparative precision ( Figure S14). In summary, mdRAD can identify reproducible differences in Figure 5: Effect of simulated read reduc7ons on es7mates of methyla7on differences between coral colonies. Columns are assigned to the three assays. Rows are assigned to sta7s7cs measuring agreement between assays. Each data point represents a simulated reduc7on in fold coverage. (A-C) Pearson correla7on between assays as fold coverage is reduced. (D-F) Sensi7vity of each assay in detec7ng significant differences (FDR < 0.1) detected by other assays. For each reduc7on in fold coverage, compara7ve sensi7vity is computed as the number of significant genes shared with the comparison divided the total significant genes for the comparison. Comparisons include any 2: genes that were significant in any 2 assays; alt. 2: genes that were significant for both the alterna7ve assays (G-I) Precision of each assay in detec7ng only significant differences (FDR < 0.1) also detected by other assays. For each reduc7on in fold coverage, compara7ve precision is computed as the number of significant genes shared with the comparison divided the total significant genes for the fold reduc7on. Read counts on the X axis refer to the total number of reads included in the final filtered alignment file, hence mapping efficiencies and PCR duplica7on rates should be accounted for when deciding on total sequencing effort. methylation with sensitivity and precision comparable to MBD-seq and WGBS with relatively little fold coverage.

Discussion
Here we present a benchmarking study of methods for assaying DNA methylation for ecological epigenetics in a marine invertebrate. We found that all three assays measure methylation level consistently, with a minimum correlation of 0.8 for gbM (Figure 1).
Analysis of differential methylation was less consistent, but still indicated reproducible differences between coral colonies ( Figure 2). Surprisingly, we found no such reproducible differences between polyp types (branch tips compared to branch sides; Figure S12). It is interesting to note that in this case WGBS identified 169 DMGs, none of which were detected by the other assays. This may reflect greater sensitivity of WGBS, however, since the other assays identified more of the reproducible differences between genotypes (Figure 2; Figure S13D-F), greater sensitivity seems unlikely. Given the extensive transcriptional differences between axial and radial polyps (Hemond et al. 2014), the absence of reproducible methylation differences between them suggests that variation in gbM is not involved for tissue-specific gene regulation in corals. This result adds to growing evidence that gbM does not directly regulate gene expression in invertebrates (Zilberman 2017;Bewick et al. 2018;Harris et al. 2019).
Simulating reduced sequencing effort for each assay showed that fold coverage is most important in the context of statistical significance. While the number of corroborated DMGs dropped steeply with fold coverage (Figure 4 D-F), correlations between assays were relatively stable (Figure 4 A-C; Figure S8). This suggests that adding a second assay to a methylomic experiment can provide valuable corroboration even with relatively little sequencing effort. This approach could also potentially prevent spurious conclusions. For instance, here we detected no reproducible differences in gbM between polyp types, a conclusion distinct from the one we would have drawn from WGBS alone (over 150 DMGs). Based on these results, we suggest an experimental strategy that uses high fold coverage for one assay to obtain statistical significance and low coverage from one or more other assays for corroboration. For instance, mdRAD could be used to sequence a large number of individuals to identify significant differences, with WGBS, MBD-seq, or both applied with relatively lower coverage for confirmation.
To conclude, MBD-seq and mdRAD are cost effective alternatives to WGBS, providing consistent estimates of methylation level and similar or greater sensitivity to methylation differences at lower library preparation and sequencing costs. The considerably lower sequencing effort required for mdRAD makes it particularly promising for the large sample sizes needed for ecological epigenetics.