Summary
The focus of this study is to profile changes in DNA methylation occurring with increased age in almond breeding germplasm in an effort to identify possible biomarkers of age that can be used to assess the potential individuals have to develop aging-related disorders in this productive species.
To profile DNA methylation in almond germplasm, 70 methylomes were generated from almond individuals representing three age cohorts (11, 7, and 2-years old) using an enzymatic methyl-seq approach followed by analysis to call differentially methylated regions (DMRs) within these cohorts.
Weighted chromosome-level methylation analysis reveals hypermethylation in 11-year-old almond breeding selections when compared to 2-year-old selections in the CG and CHH contexts. A total of 17 consensus DMRs were identified in all age-contrasts, and one of these DMRs contains the sequence for miR156, a microRNA with known involvement in regulating the juvenile-to-adult transition.
Almond shows a pattern of hypermethylation with increased age, and this increase in methylation may be involved in regulating the vegetative transition in almond. The identified DMRs could function as putative biomarkers of age in almond following validation in additional age groups.
Introduction
The study of aging has centered primarily around mammalian systems with a focus on humans (Kirkwood, 2005; Ferrucci et al., 2020); however, the aging process has also been shown to impact plants with emphasis placed on long-lived perennials (Munné-Bosch, 2007; Thomas, 2013; Brutovská et al., 2013; Woo et al., 2018). These impacts can include things like diminished growth, reduced flower and fruit production, and the development of aging-related disorders (Kester & Jones, 1970; Van Dijk, 2009). In perennial plants and in other organisms such as humans, causal mechanisms underlying the development of age-related phenotypes include genetic alterations such as somatic mutations or differential epigenetic marks (Jaligot et al., 2000; Dubrovina & Kiselev, 2016; Ogneva et al., 2016; Xiao et al., 2019; Wang et al., 2020). In fact, DNA methylation in particular has been proposed as a biomarker of aging in many systems, serving as a biological “clock” that can be used to track aging and predict aging outcomes (Runov et al., 2015; Jylhävä et al., 2017; Xiao et al., 2019).
Studying epigenetic alterations like differential DNA methylation associated with advanced age in perennial plant systems can (1) provide a means to track aging in these systems and (2) lead to an increased understanding of the development of age-related disorders or degeneration of important physiological processes. This information is valuable to agricultural industries that rely on sustained production of perennial crops, including fruit and nut trees. Almond (Prunus dulcis [Mill.] D.A. Webb) is an example of a perennial nut crop that is negatively affected by the aging process through the exhibition of non-infectious bud failure, an aging associated disorder (Kester & Jones, 1970; Micke, 1996; Kester et al., 2004). Additionally, almond trees are primarily produced by clonal propagation for orchard establishment, meaning age, and thus susceptibility to age-related impacts is difficult to determine (Ally et al., 2010; de Witte & Stöcklin, 2010; Salguero-Gomez, 2018). A means to track aging, particularly in crops like almond produced by clonal propagation or shown to exhibit age-related disorders, benefits growers, producers, and consumers by helping to protect the supply chain of these valuable commodities.
Profiling genome-wide DNA methylation is one approach to quantify differential epigenetic marks in an effort to model alterations associated with advanced age. Whole-genome enzymatic methylation sequencing is equivalent to the “gold standard” bisulfite sequencing approach to profile the methylome at the nucleotide level (Feng et al., 2020). Utilizing this approach provides information on both genome-wide methylation in each context (CG, CHG, and CHH [H = A, T, or C]) and allows for the identification of differentially methylated regions (DMRs), pin-pointing regions of the genome showing dynamic patterns of methylation associated with increased age (Vaisvila et al., 2020; Feng et al., 2020). Identification of specific regions of the genome showing changes in methylation associated with aging provides both the opportunity to develop biomarkers to track aging and information on those genes or genic regions that might contribute to the development of age-related phenotypes (Xiao et al., 2019).
In this study, we utilize almond breeding germplasm grown from seed following pedigreed crosses as part of the almond breeding program at the University of California, Davis. The individuals used in this study are, since grown from seed, of known age and thus particularly useful to generate models to track aging in this species where clonal propagation is standard. The goal of this study was to examine DNA methylation patterns in the genome of a productive perennial crop by performing an exhaustive methylome profiling of ∼70 almond individuals from three distinct age-cohorts. The hypothesis is that the almond breeding selection cohorts will exhibit, on average, divergent DNA methylation profiles associated with age. Our overall aim is to identify variability in the almond methylome that could enable model development to track aging in this clonally propagated crop and provide targets (i.e., differentially methylated regions) for further investigation into mechanisms influencing age-related phenotypes such as non-infectious bud failure or the juvenile-to-adult transition. This work additionally serves as a model to explore aging and its impacts in other important perennial crops.
Materials and Methods
Plant Material
Almond leaf samples were collected in May 2019 from the canopy of 30 distinct breeding selections planted in 2008, in 2012, and in 2017, totaling 90 individuals sampled. These selections represent three almond age cohorts aged 11, 7, and 2-years at the time of sampling. Almond breeding germplasm sampled for this study is maintained at the Wolfskill Experimental Orchards (Almond Breeding Program, University of California – Davis, Winters, CA). The pedigree of each sample collected was also documented, including both the female and male parents of each individual. Leaf samples were collected in the field and immediately stored on ice and then at −20 °C until shipping. Samples were shipped on ice to the Ohio Agricultural Research and Development Center (OARDC; The Ohio State University, Wooster, OH, USA) and immediately stored at −20 °C until sample processing.
DNA Extraction
High-quality DNA was extracted from leaves following a modified version of the protocol outlined in Vilanova et al., 2020. Briefly, samples were ground to a fine powder with a mortar and pestle in liquid nitrogen, and 50 mg of the ground material was added to 1 mL of extraction buffer (2% w/v CTAB; 2% w/v PVP-40; 20 mM EDTA; 100 mM Tris-HCl [pH 8.0]; 1.4 M NaCl), 14 μL beta-mercaptoethanol, and 2 μL RNase (10 mg/mL). The solution was incubated at 65 °C for 30 mins and on ice for 5 mins followed by a phase separation with 700 μL chloroform:isoamyl alcohol (24:1). The aqueous phase (∼800 μL) was recovered, and 480 μL binding buffer (2.5 M NaCl; 20% w/v PEG 8000) was added followed by 720 μL 100% ice-cold ethanol.
A silica matrix buffer was prepared by adding 10 g silicon dioxide to 50 mL ultra-pure water prior to incubation and centrifugation steps. Silica matrix buffer (20 μL) was added to each sample, and samples were gently mixed for 5 mins. Samples were spun for 10 secs and the supernatant was removed. To resuspend the remaining mucilaginous material (but not the pellet), 500 μL cold 70% ethanol was used, and supernatant was removed. Another 500 μL cold 70% ethanol was added to resuspend the silica pellet, the tubes were spun for 5 secs, and the supernatant was removed. The pellet was allowed to dry at room temperature for 5 mins and was resuspended in 100 μL elution buffer (10 mM Trish HCl [pH 8.0]; 1 mM EDTA [pH 8.0]) followed by a 5 min incubation at 65 °C. Samples were centrifuged at 14,000 rpm for 10 mins at room temperature and 90 μL of supernatant was transferred to a new tube. DNA concentration was assessed by fluorometry using a Qubit™ 4 and Qubit™ 1X dsDNA HS Assay Kit (ThermoFisher Scientific, Waltham, MA, USA).
Enzymatic Methyl-Seq Library Preparation and Illumina Sequencing
Whole-genome enzymatic methyl-seq libraries were prepared using the NEBNext® Enzymatic Methyl-seq kit (New England BioLabs® Inc., Ipswich, MA, USA) according to the manufacturer’s instructions. Each sample was prepared using 100 ng input DNA in 48 μL TE buffer (1 mM Tris-HCl; 0.1 mM EDTA; pH 8.0) with 1 μL spikes of both the CpG unmethylated Lambda and CpG methylated pUC19 control DNA provided in the kit. The samples were sonicated using a Covaris® S220 focused-ultrasonicator in microTUBE AFA Fiber Pre-Slit Snap-Cap 6×16 mm tubes (Covaris®, Woburn, MA, USA) with the following program parameters: peak incident power (W) = 140; duty factor = 10%; cycles per burst = 200; treatment time (s) = 80.
Following library preparation, library concentration and quality were assessed by fluorometry using a Qubit™ 4 and Qubit™ 1X dsDNA HS Assay Kit (ThermoFisher Scientific) and by electrophoresis using a TapeStation (Agilent, Santa Clara, CA, USA). Library concentration was further quantified by qPCR using the NEBNext® Library Quant Kit for Illumina® (New England BioLabs® Inc.). Libraries were equimolarly pooled in batches of ∼15 (five libraries per age cohort) and cleaned using an equal volume of NEBNext® Sample Purification Beads (New England BioLabs® Inc.). The library pools were eluted in 25 μL TE buffer (1 mM Tris-HCl; 0.1 mM EDTA; pH 8.0), and concentration and quality were assessed by fluorometry and electrophoresis as above. Library pools were sequenced on two lanes of the Illumina® HiSeq4000 platform to generate 150-bp paired-end reads.
Processing and Alignment of Enzymatic Methyl-Seq Libraries
Methyl-Seq read quality was initially assessed using FastQC v. 0.11.7 (Andrews, 2010) and reads were trimmed using TrimGalore v. 0.6.6 and Cutadapt v. 2.10 with default parameters (Krueger, 2016). Forward read fastq and reverse read fastq files from the two HiSeq4000 lanes were merged for each library to produce single fastq files for both read one and read two. Reads were aligned to the ‘Nonpareil’ v. 2.0 almond reference genome, deduplicated, and methylation calls were generated using Bismark v. 0.22.3 (Krueger & Andrews, 2011) with default parameters in paired-end mode. To test conversion efficiency, reads were also aligned to both the Lambda and pUC19 nucleotide sequence fasta files provided by NEB (https://www.neb.com/tools-and-resources/interactive-tools/dna-sequences-and-maps-tool). All analyses were performed using the Ohio Supercomputer Center computing resources (Ohio Supercomputer Center, 1987).
Weighted Genome-wide Methylation Analysis of Age-Cohorts
Weighted genome-wide percent methylation values were calculated for each individual within each cohort by taking the total number of methylated reads at each cytosine and dividing this by the total number of reads (methylated + unmethylated) at each cytosine. Weighted values were calculated for each methylation context. These values were used as input to R v. 4.0.2 (R Core Team, 2020) to perform beta regression using the package betareg v. 3.1-3 (Cribari-Neto & Zeileis, 2010). Pairwise comparison of least squared means was completed by the functions emmeans() and cld() from the R packages emmeans v. 1.5.2-1 and multcomp v. 1.4-14 with an alpha = 0.05 and Sidak adjustment (Hothorn et al., 2008). The R package ggplot2 v. 3.3.2 was used to create plots for weighted percent methylation within each methylation-context (Wickham, 2016). Files were then subset by chromosome (chr1 – chr8), and weighted percent methylation values were calculated for all individuals by chromosome using the same formula as above for each methylation context. These values were used as input in R v. 4.0.2 (R Core Team, 2020) to perform beta regression and subsequent pairwise comparison of least squared means as performed above for the genome-wide weighted percent methylation values.
Differential Methylation Analysis of Age-Cohorts
Coverage files for each methylation context produced by Bismark were prepared for input into the R package DSS (Dispersion Shrinkage for Sequencing Data) v. 2.38.0 (Wu et al., 2013; Feng et al., 2014; Park & Wu, 2016). The functions DMLtest() and callDMR() were used with a significance p.threshold set to 0.0001 to identify differentially methylated regions (DMRs) through pairwise comparisons between the three age cohorts. Comparisons were made relative to the oldest cohort in each DMR test (i.e., 11-year-old cohort relative to 2-year-old cohort).
Classification and annotation of differentially methylated regions
Following identification of DMRs in each age-contrast (11 – 2 year; 11 – 7 year; 7 – 2 year) and methylation-context, DMRs were further characterized based on the directionality of differential methylation. Hypermethylated DMRs are those that show increased methylation in the oldest cohort in each contrast, and hypomethylated DMRs are those that show decreased methylation in the oldest cohort in each contrast. The cumulative binomial probability of identifying an equal or greater number of hypermethylated DMRs in each age-contrast by methylation-context was calculated using the R base package stats command pbinom() where x = the number of hypermethylated DMRs in each age-contrast by methylation-context, size = the total number of DMRs identified in each age-contrast/methylation-context, p = 0.5, and lower.tail = FALSE.
To visualize enrichment of DMRs across the eight chromosomes in the ‘Nonpareil’ genome, circos plots were generated with one track depicting each DMR classified as either hyper- or hypomethylated and two additional tracks depicting DMR enrichment across the genome. To create the circos plots, the R package circlize v. 0.41.2 (Gu et al., 2014) was used along with the bed files for all hyper- and hypomethylated DMRs in each methylation-context for all age-contrasts. The command circos.genomicRainfall() was used to create the first track with dots representing each individual DMR (red – hypermethylated, blue – hypomethylated) and positioned based on the number of DMRs occurring in that location. The command circos.genomicDensity() was used to create the two additional tracks representing enrichment of hyper- and hypomethylated DMRs on each chromosome where the taller the peak, the higher the number of DMRs occurring in the specific region (Gu et al., 2014).
Following classification into hyper- and hypomethylated regions, bed files were generated for these DMRs using genomic coordinates. These bed files were used as input along with the ‘Nonpareil’ genome annotation file into the R v. 4.0.2 (R Core Team, 2020) packages GenomicRanges v. 1.40.0 (Lawrence et al., 2013) and genomation v. 1.20.0 (Akalin et al., 2015) to prepare a GRanges object and annotate DMRs using the command annotateWithFeatures(). Initial annotation of DMRs by features includes the percentage of DMRs overlapping one of four features: gene, exon, 5’ untranslated region (UTR), and 3’ UTR. The DMRs were further annotated, and gene ontology (GO) enrichment was performed using the software HOMER v. 4.11 (Heinz et al., 2010) and the R package topGO v. 2.40.0 (Alexa A, 2020). Initially, all DMRs were annotated by assigning the gene with the closest transcriptional start site to each DMR using annotatePeaks.pl -noann with the ‘Nonpareil’ genome and genome annotation files. This produced a list of gene identifiers from the genome annotation file that are associated with each DMR. The GO terms assigned to each DMR-associated gene were used as input along with the ‘Nonpareil’ genomic annotation file to determine enrichment of GO terms in each age-contrast DMR-associated gene set. The DMR-associated genes in each methylation-context were classified based on biological process, molecular function, and cellular component GO term to produce tables depicting the number of DMR-associated genes assigned to each descriptor.
The DMRs were then further classified based on the occurrence of overlapping genomic regions among DMRs when comparing age-contrasts. The bed files generated above were used as input in the bedtools v. 2.29.2 (Quinlan & Hall, 2010) command intersect -wao to identify overlaps in DMRs from each of the age-contrasts (i.e., 11-7 contrast compared to 11-2 contrast). Finally, genomic regions were identified that contain significant DMRs in all three age-contrasts using bedtools intersect. These overlapping DMRs were annotated using the annotatePeaks.pl script as above to find DMR-associated genes as well as GO terms, Pfam identifiers, and Interpro identifiers associated with each gene. The genomic DMR sequence was extracted from the ‘Nonpareil’ genome fasta file, and individual DMR fasta files were searched against the miRbase database v. 22.1 (Kozomara et al., 2019) to identify any putative microRNAs (miRNAs) within those regions. Searches were performed by sequence using an e-value cutoff of 10 and the Prunus persica (L.) Batsch species filter.
Annotation of unknown protein sequences
Genes coding for proteins with unknown function and associated with the DMRs shared across the three age-contrasts were interrogated using in silico approaches to characterize the proteins. Several programs were used to annotate these protein sequences and determine additional information about their putative functions. The program ProtParam was used to characterize protein properties including molecular weight (Gasteiger et al., 2005). To predict subcellular localization, the program YLoc was used (Briesemeister et al., 2010a,b). Finally, the Motif tool on the GenomeNet website (https://www.genome.jp/tools/motif/) was used to search a protein query against several databases to identify putative alignments (Marchler-Bauer et al., 2013; Sigrist et al., 2013; Finn et al., 2014).
Results
Genome-wide methylation analysis in almond accessions representing three age-cohorts
Following DNA isolation, library preparation, and Illumina sequencing, a total of 21 almond breeding selections were used for subsequent analysis in the 2-year-old age cohort, 25 in the 7-year-old age cohort, and 24 in the 11-year-old age cohort. Sequencing results show aligned coverage for almond accessions ranged from 3.85 – 50.41X with an average mapping efficiency of 49.8 % (Data S1). Conversion efficiency was greater than 98% based on alignment to the Lambda reference sequence file (Data S1).
Analysis of weighted genome-wide percent methylation within all methylation-contexts (CG, CHG, and CHH) revealed a significant increase in weighted methylation in the 11-year-old age cohort compared to the 2-year-old in the CG (p-value = 0.0105) and CHH (p-value = 0.0399) contexts, respectively (Fig. 1a, c). There was also a significant increase in CG methylation in the 11-year-old age cohort compared to the 7-year-old age cohort (p-value = 0.0115; Fig. 1a). There was not a significant difference in weighted genome-wide methylation in the CHG context when comparing age cohorts (Fig. 1b).
To further analyze weighted methylation in these samples, methylation data for each individual was processed per chromosome, and weighted methylation was analyzed at the chromosome level for each methylation-context. (Figure S1a-c). Pairwise comparisons of DNA methylation within each chromosome revealed significant differences in cytosine methylation on distinct chromosomes for each methylation-context (Table S1a-c). In the CG context, both the 2 – 11 year and the 7 – 11-year age-contrasts were significant on chromosomes 1, 3, 5, 7, and 8 (Table S1). In the CHG context, both the 2 – 11 year and the 7 – 11-year age-contrasts were significant on chromosome 5, the 7 – 11-year age-contrast was significant on chromosome 7, and the 2 – 11-year age-contrast was significant on chromosome 8 (Table S1). Finally, in the CHH context, the 2 – 11-year age-contrast was significant on chromosomes 5, 7, and 8 (Table S1). Overall, significant differences in chromosome-level DNA methylation between age cohorts tend to occur on chromosomes 5, 7, and 8.
Identification and classification of differentially methylated regions (DMRs) between age cohorts
DMRs were identified based on comparisons between the age cohorts in each methylation context. Most DMRs identified are in the CG context, followed by CHH and CHG, respectively (Table 1). These DMRs were further classified as hyper- and hypomethylated based on the amount of methylation in the older cohort compared to the younger. Hypermethylated DMRs have a higher amount of methylation in the older cohort, while hypomethylated DMRs have a higher amount of methylation in the younger cohort for each comparison. In the CG context, 96%, 94%, and 64% of the identified DMRs were hypermethylated in the 11 – 2 year, 11 – 7 year, and 7 – 2-year age-contrasts, respectively (Table 1). In the CHG context, 68%, 52%, and 64% of DMRs were hypermethylated in the 11 – 2 year, 11 – 7 year, and 7 – 2-year age-contrasts, respectively (Table 1). Finally, in the CHH context, 82%, 38%, and 82% of DMRs were hypermethylated in the 11 – 2 year, 11 – 7 year, and 7 – 2-year age-contrasts, respectively (Table 1). The cumulative binomial probability of the occurrence of hypermethylated DMRs was less than 1×10−6 for all age-contrasts except 11 – 7 year in the CHG and CHH contexts, suggesting there are more hypermethylated DMRs than would be expected given an equal probability of hyper- and hypomethylated DMRs in the genome. Identified DMRs ranged in length from 51-4824 base pairs (Fig S2a-l). The average length of a gene in the ‘Nonpareil’ genome is 2,912 bp, so most of the identified DMRs are much shorter than the average gene.
The distribution of CG-context DMRs showed a similar pattern across all chromosomes, where the 11 – 2-year age-contrast has the highest number of DMRs per chromosome, followed by the 11 – 7 year and 7 – 2-year age-contrasts (Fig. S3a). In the CHG and CHH contexts, the distribution of DMRs showed greater variability, with the 11 – 7-year age-contrast typically showing the lowest number of DMRs across all chromosomes, while the 11 – 2 and 11 – 7-year age-contrasts oscillate in number of DMRs occurring on each chromosome across the genome (Fig. S3b,c).
Classification of DMRs as hyper- or hypomethylated in the age cohort comparisons
Using the classifications of hyper- and hypomethylated, DMRs were plotted across the eight chromosomes of the ‘Nonpareil’ genome revealing unique distributions based on both methylation-context and age-contrast, as well as indicating DMR enrichment in specific chromosomes (Fig. 2). In the CG context, DMR enrichment occurs in the 11 – 2-year age-contrast, with predominantly hypermethylated DMRs, though enrichment of hypomethylated DMRs appears on chromosome 5 (Fig. 2a). The CHG context represents the lowest overall enrichment of DMRs compared to the other methylation-contexts, with regions throughout the genome showing slight enrichment of DMRs (Fig. 2b). Finally, DMRs in the CHH context show similar patterns in enrichment for both the 7 – 2 year and 11 – 2-year age-contrasts, with evident DMR enrichment occurring on chromosomes 3 and 8 (Fig. 2c). The 11 – 7-year age-contrast in the CHH methylation-context is the only contrast to have a higher number of hypomethylated DMRs compared to hypermethylated (Table 1; Fig. 2c).
Following classification of DMRs as either hyper- or hypomethylated in each age-contrast, DMRs were compared among age-contrasts to identify overlapping genomic regions. This analysis revealed several overlapping DMRs from distinct age-contrasts. The highest number of overlaps occurred in CG context hypermethylated DMRs, particularly when comparing the 11 – 2 by 11 – 7 and 11 – 2 by 7 – 2 age-contrasts (Table 2). Interestingly, the 11 – 7 by 7 – 2 age-contrast revealed very few overlaps in hypermethylated DMRs, and no overlapping hypomethylated DMRs (Table 2). Finally, a comparison was performed to identify DMRs with overlapping genomic regions among all three age-contrasts, showing the 11 – 2 age-contrast contains DMRs that share a genomic region with the overlapping DMRs in the 11 – 7 by 7 – 2 comparison (Table 2). This final analysis revealed 17 overlapping DMRs among the three age-contrasts, meaning these DMRs share overlapping genomic coordinates (Table S2). These 17 DMRs are the longest (ranging from 106 – 1,504 base pairs) in the 11 – 2 age-contrast, followed by the 11 – 7 (56 – 713 base pairs) and 7 – 2 (52 – 1,037 base pairs) age contrasts (Table S2). Analysis of the average percent methylation of cytosines within the genomic regions identified in the 11 – 7 age-contrasts shows that in the majority of these regions, cytosines become more methylated with increased age across the three age cohorts (Figs S4a-m, S5a-c, S6).
Annotation of hyper- and hypomethylated differentially methylated regions (DMRs)
An annotation was performed classifying all hyper- and hypomethylated DMRs in each methylation-context into four categories (gene, exon, 5’ untranslated region [UTR], and 3’ UTR) based on their association with features in the ‘Nonpareil’ genome annotation. CG context DMRs generally tended to have higher associations with genes and exons compared to the other methylation contexts, while CHG DMRs tended to have higher associations with 5’ UTRs and CHH DMRs with 3’ UTRs compared to the other contexts (Table S3).
Identified DMRs were then annotated using the ‘Nonpareil’ genome annotation file to determine the closest gene associated with each DMR. Enrichment analysis was performed for both hyper- and hypomethylated DMR-associated genes in all methylation contexts for each age-contrast, revealing a suite of biological process, molecular function, and cellular component gene ontology (GO) terms associated with each contrast (Tables S4-S9). Comparing annotations between age-contrasts identified GO terms unique to each age-contrast in each methylation-context and degree of methylation (i.e., hyper or hypo). For example, a subset of genes associated with hypermethylated DMRs in the CG context from all three age-contrasts were assigned the molecular function GO terms transmembrane transporter activity, protein serine/threonine kinase activity, and DNA-binding transcription factor activity (Table S4b).
Annotation of genes associated with 17 hypermethylated DMRs identified across the three-age cohort age-contrasts
Of the DMRs identified in each age-contrast, 17 hypermethylated DMRs were found to share genomic regions in all three age-contrasts, meaning these regions showed consistent significant increases in methylation in the older age cohort relative to the younger in each age-contrast. The 17 DMRs were annotated using the ‘Nonpareil’ genome annotation to identify the closest associated gene. In total, eight previously annotated genes including FAR1-RELATED SEQUENCE 5 (FRS5), a receptor-interacting serine/threonine-protein kinase 4 (Ripk4), and dCTP pyrophosphatase 1 (Dctpp1) were identified as associated with nine of the DMRs (Table 3). One CG DMR and one CHG DMR are both associated with the gene Tryptophan aminotransferase-related protein 3 (TAR3) (Table 3). The remaining eight DMRs are associated with genes of unknown function (Table 3). These eight unknown protein sequences were used as input into three programs to determine properties including predicted motifs, localization, and weight. Two of these unknown proteins contained transposase_24 motifs, and four are predicted to be localized to the nucleus (Table S10).
In addition to identifying genes associated with these 17 shared DMRs, the DMR genomic sequences (Data S2) were searched against the miRBase database using the Prunus persica species filter to identify any potential miRNAs within these regions. Results from this analysis showed that two of the 17 DMRs contain miRNA sequence including CGDMR8 and CHGDMR1. The miRNA identified in CGDMR8 is ppe-miR6276 and ppe-miR156 in CHGDMR1.
Discussion
Perennial plant aging and the impacts of this process, particularly on productive fruit and nut crops, is a neglected area of research with potential applications for agricultural production and crop improvement. The ability to track age in clonally propagated crops could aid in the mitigation of age-related disorders like non-infectious bud failure (Kester et al., 2004) and, more broadly, in overcoming the decrease in plant performance resulting from intense production systems that affect orchard/vineyard longevity. Biomarkers of age in these species, however, are lacking. The aim of this study was to test the hypothesis that, on average, almond breeding selection cohorts will exhibit divergent DNA methylation profiles associated with age. The long-terms goals of this work are to develop biomarkers of age in almond, a clonally propagated crop, and to further our understanding of the aging process in perennial species. To address this, whole-genome DNA methylation profiles were generated for ∼70 almond individuals from three distinct age cohorts, and comparisons were made between cohorts to identify regions of interest for further study into their involvement in the aging process and utility as biomarkers of age in almond.
DNA hypermethylation in the CG and CHH contexts is associated with increased age in almond
The DNA methylation profiles generated for individuals in the three age cohorts (11, 7, and 2-years old) were compared at the whole-genome and chromosome level, which showed that hypermethylation in the CG and CHH methylation-contexts is associated with increased age in almond. Further, the probability of identifying the number of hypermethylated DMRs observed in this study was very low for most age-contrasts, suggesting there was a disproportionately high number of hypermethylated DMRs identified compared to hypomethylated DMRs. This result supports previous work theorizing an increase in total genomic DNA methylation with increased age in plants (Dubrovina & Kiselev, 2016). Previous studies have also shown that genome-wide hypermethylation can result in a high number of identified hypermethylated DMRs in subsequent analyses, as was reported in several species including Monterey pine (Pinus radiata D.Don), peach (P. persica), and coast redwood (Sequoia sempervirens [D.Don] Endl.) (Bitonti et al., 2002; Fraga et al., 2002b; Huang et al., 2012).
DNA methylation has been proposed as a “biological clock” capable of predicting the true, ontogenetic age of an individual due to observed patterns of increased methylation with increased age in a variety of species (Runov et al., 2015). Results in this study suggest that almond fits this pattern of hypermethylation, and thus DNA methylation may serve as a biomarker of age in this species. Whole-genome hypermethylation with increased age represents an opportunity to develop high-throughput screening methods that do not require whole-genome sequencing. These methods could include high-performance liquid chromatography or capillary electrophoresis (Stach et al., 2003; Armstrong et al., 2011).
Epigenetic regulation by DNA methylation, histone modifications, and chromatin remodeling has also been shown to modulate the juvenile-to-adult phase transition in plants, including in gymnosperms like Monterey pine and in angiosperms such as peach (Bitonti et al., 2002; Fraga et al., 2002a; Xu et al., 2018). The juvenile period in almond is approximately 3-4 years, thus differential patterns of methylation observed in this study between the 2-year cohort and the 7- and 11-year cohorts could be associated with the juvenile-to-adult transition as has been documented in other plants (Dubrovina & Kiselev, 2016). Patterns of differential methylation identified in this study and associated with specific regions of the genome further demonstrate the potential involvement of DNA methylation in regulating this transition in almond. Further investigation is needed focusing on the involvement of DNA-methylation in the juvenile-to-adult transition in almond and other perennial species, including those with available transgenic germplasm exhibiting reduced juvenility, such as apple (Flachowsky et al., 2011; Kumar et al., 2020).
Differentially methylated regions (DMRs) in the CG and CHG contexts are enriched on specific chromosomes in the almond genome
Following identification of DMRs in the three age-contrasts, these regions were plotted across the almond genome showing enrichment of DMRs on specific chromosomes, particularly in the CG and CHH methylation-contexts. These so called “hotspots” of differential DNA methylation could suggest loci in these regions are prone to epigenetic alterations including methylation. Transposable elements (TEs) tend to be heavily methylated and have been reported to have involvement in developmental processes in almond, such as the juvenile-to-adult transition (Han et al., 2018; Corso-Díaz et al., 2020; Wyler et al., 2020), suggesting the “hotspots” identified in almond in this study could contain TE sequences. This type of pattern has been documented in other species such as rice, were a study on salt tolerance identified DMRs that tended to cluster on specific chromosomes and were typically associated with TEs on these chromosomes (Ferreira et al., 2019).
In this study, hypermethylated DMRs were enriched on chromosome 8 in the CG context, and on chromosomes 3 and 8 in the CHH context. As increased levels of methylation tend to occur in regions rich in TEs, it is possible that regions enriched in DMRs contain TEs which become increasingly methylated with age. Interestingly, a study in Brachypodium distachyon (L.) P.Beauv. found that DMRs were highly correlated with genetic diversity as classified by presence of single nucleotide polymorphisms (SNPs) throughout the genome (Eichten et al., 2016). This genetic diversity was found to be related to the presence of TEs at these sites, potentially contributing to the formation of SNPs as well as leading to differential levels of methylation between the lines tested (Eichten et al., 2016). Given the heterozygosity and diversity in almond germplasm, it may be relevant to compare regions enriched in DMRs from the age-contrasts to SNP data in almond to test for a correlation between increased methylation and genetic diversity, particularly for traits associated with growth and development and length of juvenility.
To identify genetic components involved in the juvenile-to-adult transition in Prunus, a study utilizing two almond × peach interspecific populations found a single QTL on chromosome 6 associated with juvenility period, defined as the number of years to first fruit (Donoso et al., 2016). However, this QTL only explained ∼13% of the variability in time to first fruit in these populations, suggesting there may be additional regions influencing this trait (Donoso et al., 2016), as has been observed in other species such as citrus (Raga et al., 2012). Additional work is necessary to explore the genetic variability associated with juvenility in almond to see if chromosomes enriched in DMRs based on age are associated with variation in these traits among almond populations. Further, the TE landscape of regions enriched in DMRs could be characterized to determine if there is an abundance of TEs at these locations, potentially explaining the high number of DMRs observed in these regions. A recent study in almond characterized the TE landscape in the ‘Texas’ cultivar and compared the distribution of TEs in the genome to that of peach (Alioto et al., 2020). This study revealed not only an increased involvement of TEs in trait diversity in almond compared to peach, but also showed enrichment of TEs on almond chromosomes 3 and 8 (Alioto et al., 2020).
DMRs as potential biomarkers of age in almond
The DMRs identified in this study represent those regions in the genome that showed either an increase or a decrease in cytosine methylation with increased age in almond. The results herein suggest that DMRs tend to be hypermethylated in the age-contrasts, meaning there is an increase in methylation in these regions with increased age in each age-contrast. This pattern fits with the weighted genome-wide methylation patterns showing significant increases in methylation in the CG and CHH contexts between the 2- and 11-year-old age cohorts. Unique, hypermethylated DMRs were identified in each age-contrast, providing information on DNA methylation dynamics associated with age. Regions that show increased methylation in each age contrast are of particular interest due to their potential suitability as biomarkers of age, since these regions show incremental increases in methylation from 2-to-7 years and again from 7-to-11 years old.
Within the DMRs identified in each of the age-contrasts, 17 hypermethylated DMRs were identified with overlapping genomic regions in all three contrasts. Once these regions are validated via DNA methylation profiling in additional almond cohorts of known age, a predictive model could be developed considering cytosine methylation level. This model could be applied to clonal germplasm to predict ontogenetic age, providing a basis to screen germplasm for susceptibility to undesirable, age-related phenotypes. These tools may have implications for germplasm management in breeding, production (orchard), propagation (nursery), and conservation (repository) settings.
The genetic features associated with these specific DMRs could also have involvement in developmental processes, including the juvenile-to-adult transition. To annotate these DMRs, genes were identified with transcriptional start sites close to or overlapping the DMR. Of the 17 DMRs, nine were found to be associated with eight annotated genes. These genes include FRS5, a FAR1-related protein in the FAR1 gene family which is involved in light perception and was demonstrated to have involvement in plant development and regulation of aging processes in Arabidopsis (Lin & Wang, 2004; Ma & Li, 2018; Xie et al., 2020). The gene TAR3 was found to be associated with two of the 17 identified DMRs, one in the CG context and one in the CHG context. This gene is part of a gene family known as TRYPTOPHAN AMINOTRANSFERASE (TAR), whose members are a component of one of the major auxin biosynthetic pathways (Hofmann, 2011). The TAR genes are involved in the first step of the pathway, in which tryptophan is converted to indole-3-pyruvic acid, which is subsequently converted to auxin (Brumos et al., 2014). Auxin is a well-known regulator of plant development and senescence processes (Ljung, 2013; Khan et al., 2014; Mueller-Roeber & Balazadeh, 2014). While tryptophan-independent pathways for auxin biosynthesis are present in plants, TAR genes are involved in the primary biosynthetic pathway, and disruption of TAR3 expression could impede auxin production (Hofmann, 2011). These genes and the others identified represent interesting targets for future study on their potential involvement in aging processes in almond, including in the vegetative transition. Additionally, eight proteins of unknown function were identified as associated with the overlapping DMRs. Characterization of these proteins could reveal novel genes with potential functions in plant development and aging in almond.
In addition to identifying nearby genes associated with these DMRs, microRNAs (miRNAs) were also surveyed in these regions. Interestingly, two of the DMRs were found to contain miRNA sequences, including one with the sequence for ppe-miR156, a well-characterized miRNA known to be a major regulator of development and phase transition in plants (Wu et al., 2009). Previous studies have shown that miR156 regulates vegetative phase transition in plants by targeting SQUAMOSA-PROMOTER BINDING PROTEIN-LIKE (SPL) genes, inhibiting their expression during juvenility (Wu et al., 2009; Jia et al., 2017). As plants age, miR156 expression is repressed, allowing activation of target genes and inducing the transition to adult (Wu et al., 2009). In fact, studies in both Arabidopsis and maize show that mutants overexpressing miR156 experience prolonged juvenility (Wu & Poethig, 2006; Chuck et al., 2007). A recent study in Arabidopsis showed that light perception via FAR1-family genes may also interact with miR156 in regulating plant development (Xie et al., 2020).
Previous work has shown 11 putative members of the miR156 family in peach, suggesting that the miRNA identified in this study could be one of several in this family in almond (Luo et al., 2013); work has shown that the function of miR156 in the aging pathway is conserved in Prunus species (Bastías et al., 2016). It has also been proposed that miR156 is regulated by epigenetic modifications (Xu et al., 2018), including DNA methylation, however more work is needed to disentangle epigenetic regulation of miR156 expression as well as to characterize and identify targets of miR156 in almond. Additionally, expression of miR156 has been altered through transgenic approaches in other crops to delay flowering, leading to increased plant biomass or abiotic stress tolerance (Zheng et al., 2016; Kang et al., 2020). Manipulation of miR156 in Prunus could lead to potential applications for crop improvement, including decreased juvenility, which could greatly decrease breeding cycles for these crops. Results in this study identified a hypermethylated DMR overlapping miR156 and associated with increased age, suggesting one mode of regulation for this miRNA could be cytosine methylation. Induced DNA methylation using gene-editing techniques could provide a potential avenue for manipulation of miR156 in almond or other Prunus species, not only for crop improvement applications, but to also further our understanding of this miRNA in plant development and aging in these crops. Since almond is recalcitrant to tissue culture methods, the application of gene-editing techniques in this species would first require optimization of methods allowing propagation of modified material.
The second miRNA sequence identified was ppe-miR6276, which was first identified as a novel miRNA in Japanese apricot (P. mume Sieb. et Zucc) and found to have homology with a miRNA sequence in peach (Gao et al., 2012). This miRNA is currently uncharacterized and could provide an interesting target for further investigation in almond and other Prunus species to determine its potential role in the plant aging process.
The study of aging in perennial plants is limited despite the potential applications for agriculture and plant production, particularly for fruit and nut crops. Results from this work show that DNA hypermethylation is associated with age in almond and identifies specific genomic regions that could serve as putative biomarkers of age for this species. Biomarkers of age are valuable for clonally propagated crops, whose ontogenetic age can be difficult to determine, to screen and select germplasm with low potential for developing age-related disorders. Further, the DMRs identified in this study can be used to guide future studies aimed at increasing our understanding of plant aging and vegetative phase transition in perennials. Perennial plants, including almond, are known for having long juvenile periods which can inhibit breeding and improvement efforts. Epigenetic regulators of phase transition such as DNA methylation could provide another tool for developing perennial crops with shorter juvenile periods, dramatically shortening breeding cycles for these species.
Author Contribution
KMDW and JFR conceptualized and designed the study. KMDW and TMG performed the tissue sampling. KMDW and ESA performed all the laboratory portions of the project. KMDW performed all analyses with the assistance of JFR and CEN. KMDW prepared the manuscript with the assistance of JFR and CEN. All authors contributed to editing the manuscript prior to submission. All the authors approved the submission and revised version of this manuscript.
Data Availability
The ‘Nonpareil’ almond reference genome v. 2.0 fasta file and gff file and descriptions of the data can be found at https://www.rosaceae.org/publication_datasets. All sequencing data for this project has been deposited to the NCBI Sequence Read Archive under Bioproject PRJNAXXXXX including biosamples: X, Y, and Z. All code used to perform analyses reported in the manuscript can be found at (link to GitHub).
Acknowledgements
We would like to acknowledge Matthew Willman for his assistance with the statistical analysis and preparation of the scripts used to perform the computational analyses for this manuscript. We would also like to acknowledge the Ohio Supercomputer Center for access to computing resources and the Translational Plant Sciences Graduate Program for the fellowship for KMDW. This work was supported by The Ohio State University CFAES-SEEDS program grant # 2019-125, the Almond Board of California Grant HORT35, the U.S. Department of Health and Human Services National Institutes of Health - National Cancer Institute - Cancer Center Support Grant (CCSG) P30CA016058, the USDA National Institute of Food and Agriculture AFRI-EWD Predoctoral Fellowship 2019-67011-29558.