Non-canonical aberrant DNA hypermethylation in glioma

Aberrant DNA hypermethylation is a hallmark of cancer although the underlying molecular mechanisms are still poorly understood. To study the possible role of 5-hydroxymethylcytosine (5hmC) in this process we analyzed the global and locus-specific genome-wide levels of 5hmC in primary samples from 54 gliomas and 72 colorectal cancer patients. Levels of 5hmC in colorectal cancer were very low and no consistent changes were detected between control tissues and tumors. As expected, levels of 5hmC in non-tumoral brain samples were high and significantly reduced at the 49,601 CpG sites in gliomas. Strikingly, hypo-hydroxymethylation at 4,627 (9.3%) of these CpG sites was associated with aberrant DNA hypermethylation. The DNA regions containing these CpG sites were enriched in H3K4me2, and presented a different genuine chromatin signature to that characteristic of the genes classically aberrantly hypermethylated in cancer. We conclude that this data identifies a novel 5hmC-dependent non-canonical class of aberrant DNA hypermethylation in glioma.


Introduction 52
DNA methylation at the fifth position of cytosine (5mC) has been one of the most 53 studied epigenetic modifications in mammals to date. 5mC is involved in the regulation 54 of multiple physiological and pathological processes, including cancer, and when 55 located at gene promoters, it is usually linked to transcriptional repression. 56 As distinctive features of tumorigenesis, local DNA hypermethylation and global 57 hypomethylation have been attributed to changes in 5mC levels [10; 11]. However, the 58 discovery a few years ago, of 5-hydroxymethylcytosine (5hmC), a new epigenetic mark 59 resulting from 5mC oxidation, is reshaping our view of the cancer epigenome [29; 47]. 60 This 5mC to 5hmC conversion in mammals is mediated by ten-eleven translocation 61 proteins (TET1, TET2, and TET3), a family of α-ketoglutarate (αKG) and Fe(II)-62 dependent dioxygenases [21; 47]. Global levels of 5hmC in the genome fluctuate 63 considerably according to tissue type, and are consistently around 10-fold lower than 64 those of 5mC, though it is interesting that the highest levels of both marks are found in 65 The fact that there are now methods available that distinguish 5mC and 5hmC positions 72 at single-base resolution within the genome prompted us to reassess the role of DNA 73 methylation status in tumorigenesis from a 5hmC perspective. The method used here 74 allowed us to describe global and genome-wide locus-specific 5mC and 5hmC patterns 75 in colon and brain samples, to identify a specific chromatin signature associated with 76 changes of these epigenetic marks in cancer and, most notably, to describe a novel non-77 canonical type of aberrant DNA hypermethylation in cancer. 78 Materials and Methods). A total of 49,601 CpG sites that were hypohydroxymethylated 147 were identified in gliomas, but almost no hyper-hydroxymethylated sites were found 148 (see Materials  H4K20me1 and H3K79me2 (Fisher's exact test, p<0.05) ( Figure 3D), but not with the 165 repressive histone modification H3K27me3, which has been previously shown to be 166 associated with aberrant DNA hypermethylation in cancer [38; 52] ( Figure 3D). A 167 similar framework was used to test for the enrichment of our selected probes over the 168 computer-generated chromatin segmentation states from the ENCODE ChromHMM 169 project. Using this approach, we found that hypohydroxymethylated CpG sites were 170 significantly associated with transcription regulation and enhancers (Fisher's exact test; 171 p < 0.05) ( Figure 3E). 172

DNA hyper-methylation in glioma 175
To study the relationship between changes in 5mC and 5hmC in glioma, we first 176 identified aberrantly methylated CpG (d5mC) sites. The comparison of the methylation 177 data between tumoral and control samples (see Materials and Methods) identified 2,727 178 hypo-and 12,050 hyper-methylated CpG sites in gliomas (Supplementary Tables 4  179   and 5). Next, we compared these d5mC sites with the previously identified hypo-180 hydroxymethylated CpG sites ( Figure 3A, Supplementary Table 3). Interestingly, this 181 approach showed that 4,627 (38.4%) of the CpG sites aberrantly hypermethylated in 182 gliomas also lose 5hmC ( Figure 4A, Supplementary Table 6). 183 To investigate, at a functional genomic level, the characteristics of these two classes of 184 aberrantly hypermethylated CpG sites in gliomas we first analyzed their genomic 185 distribution in relation to density of CpG sites and we found that the hypermethylated 186 CpG sites that lose 5hmC (hyper5mC-hypo5hmC) were enriched in low density CpG 187 regions (Wilcoxon rank sum test, p<0.001, D=-0.11) as compared with the 188 hypermethylated CpG sites that showed no changes in 5hmC (hyper5mC) (Wilcoxon 189 rank sum test, p<0.001, D=-0.23) ( Figure 4B, Supplementary Tables 6 and 7). In line 190 with this, hyper5mC-hypo5hmC sites were strongly depleted from CGIs (chi-squared 191 test, p<0.001, OR=0.42) ( Figure 4B). Hierarchical clustering using the differentially 192 methylated CpG sites showed that the hyper5mC-hypo5hmC sites were slightly more 193 methylated in control brain samples than the hyper5mC sites, and that they were more 194 uniformly hypermethylated in glioma ( Figure 4C). To further corroborate our results, 195 we took advantage of recently published data on the whole-genome bisulfite sequencing 196 (WGBS) in glioma [42]. We found that, in addition to a large percentage of CpGs (n: 197 4,051; 88%) showing the same patterns of change as in our methylation arrays, the 198 WGBS analysis identified more than 10 6 new hyper5mC-hypo5hmC sites, thus 199 confirming that this is a frequent event in glioma (Figure 4-figure supplement 1).   does not principally occur at repeated DNA. As changes in 5hmC at repeated DNA 281 could not explain the global differences previously observed by mass spectrometry, we 282 decided to study the possible contribution of single copy sequences. Genome-wide 283 profiling of 5mC and 5hmC of healthy tissue identified a 10-fold increase in abundance 284 of CpG sites frequently hydroxymethylated in brain compared to colorectal tissue, 285 providing evidence that the level of this epigenetic mark is highly tissue type-286 dependent, and also that it is very abundant in the brain [16; 24; 29; 32; 37; 44]. 287 Moreover, 5hmC was enriched in regions with low CpG density and in introns in both 288 colorectal and brain tissue. As the 5hmC is enriched in different genomic regions, these 289 results support the notion that 5hmC is not simply a demethylation intermediate [1]. 290 Interestingly, 5hmC co-localized in regions marked with the activating histone PTM 291 H3K4me1. This histone mark has been previously associated with gene enhancers [20; 292 48], which suggests that DNA hydroxymethylation might play a role in gene regulation 293 in trans. Moreover, we have recently found an association between H3K4me1 and DNA 294 hypomethylation during aging in stem and differentiated cells [12], which may 295 represent an interesting link between aging and cancer at these genomic regions. 296 Colorectal tumors showed more changes with respect to 5mC than to 5hmC. However, 297 in contrast, glioma presented more changes in 5hmC than in 5mC, suggesting that the 298 dynamics of DNA methylation and hydroxymethylation in cancer is highly tumor-type 299 dependent. Moreover, the great number of hypo-hydroxymethylated single CpG sites in 300 glioma could explain the global differences previously observed by mass spectrometry 301 [23; 27; 28; 39] and suggests that, in contrast to 5mC, most DNA hypo-302 hydroxymethylation in brain tumors occurs at single copy sequences. 303 The behavior of 5hmC led us to next identify two types of CpG sites aberrantly 304 hypermethylated in glioma: aberrantly hypermethylated CpG sites that showed no 305 changes in 5hmC; and hypermethylated CpG sites that lose 5hmC. hypermethylation at these DNA regions could be due to an attempt by the cell to reverse 327 or repair the loss of 5hmC at functionally sensible loci. This possibility is supported by 328 the fact that the non-canonical aberrant hypermethylation described here seems to play 329 an important role in gene regulation. Intriguingly, 5hmC at gene promoters has also 330 been proposed to protect from aberrant hypermethylation in colorectal cancer [50]. 331 Thus, although it seems that 5hmC plays an important role in the regulation of the DNA 332 methylation changes in cancer, more research is needed to fully understand its role. 333 The non-canonical aberrant hypermethylation described here seems to have a similar were also included in the model definition, but excluding those found to be correlated to 410 the phenotype of interest. P values were corrected for multiple testing using the 411 Benjamini-Hochberg method for controlling false discovery rate (FDR). An FDR 412 threshold of 0.001 was employed to determine differentially methylated and 413 hydroxymethylated probes. Additionally, these probes were filtered according to their 414 effect size, keeping only those probes with methylation or hydroxymethylation changes 415 between-groups which exceeded the median of all differences for the same comparison. 416 The probes without no significant 5hmC signal on control samples were filtered out 417 from the set of hypo-hydroxymethylated probes in glioma. 418 419

Identification of hydroxymethylated probes 420
In order to identify those probes representing the regions where the 5hmC mark is 421 located, a differential hydroxymethylation analysis was performed as described Distance to both the centromere and telomere was measured for each of the probes in 502 the HumanMethylation450 microarray. In order to find significant differences between 503 the probes within the subset of interest and those in the background, a Wilcoxon non-504 parametric test was used. Once again, a significance level of 0.05 was employed for all 505 tests, and Cliff's Delta (D) was used as a measure of effect size. 506 507

Microarray background correction 508
Although it is sometimes referred to as a genome-wide solution, the 509 HumanMethylation450 BeadChip only covers a fraction of the entire genome. In its 510 27K predecessor, the probes were mainly located at gene promoter regions, while the 511 newer HumanMethylation450 BeadChip additionally includes probes located inside 512 genes and in intergenic regions [4]. 513 The irregular distribution of probes can however lead to unwanted biases when studying 514 whether a selected subset of probes is enriched with respect to any functional or clinical 515 mark. For this reason, here a reference to the background distribution of features was 516 included in all statistical tests performed in order to prevent our conclusions from being 517 driven by the irregular distribution of probes. In qualitative tests (CGI status, genomic 518 region, or histone mark enrichment), the contingency matrix was built to represent the 519 background distribution of the microarray. In quantitative tests (CpG density, distance 520 to centromeres and telomeres) the corresponding metric was compared between the 521 subset of interest and the remaining probes in the microarray. Thus, any significant 522 result would indicate a departure from the fixed background distribution and ignore any 523 bias inherent in the test. In order to plot the CpG and histone peak information on the circular genome-wide and 541 example graphs, smoothing was applied to the data. CpG enrichment information for 542 canonical and non-canonical hypermethylation was generated by partitioning the 543 genome into intervals of 10kbp and assigning to each a score corresponding to the 544 average coverage of the selected CpGs in the interval. 545 546

Whole-genome bisulfite sequencing (WGBS) datasets 547
Supplementary data referenced in [42] was used as a validation dataset in glioblastoma. 548 Previously processed data in the form of quantified methylation for each CpG measured 549 in both strands of the genome was downloaded and filtered. Only methylation measures 550 from CpGs having a total read count higher than 10 were retained. 551 The resulting dataset comprised only two samples (normal and tumoral), so a 552 descriptive strategy was used to distinguish the different types of probes according to 553 their methylation status. Hydroxymethylated probes were identified as those having a 554 5hmC measure higher than 0.1. Differentially methylated probes were defined as those 555 having an absolute difference in their methylation values, between the control and 556 tumor samples, higher than a given threshold (0.2 for 5mC and 0.1 for 5hmC). 557 The validation datasets may contain either one or two methylation measures for each 558 CpG in the genome as they measure methylation in both strands. Strand-agnostic CpG 559 regions representing the CpG dinucleotides with at least one measure were defined in 560 order to compute the degree of intersection between the WGBS and methylation arrays 561 results. 562

TCGA expression datasets 564
In order to analyze changes in gene expression, samples of glioblastoma multiforme 565 (GBM) were selected from among the data generated by the TCGA Research Network 566 (http://cancergenome.nih.gov). Expression Level-3 pre-processed data was obtained for 567 572 GBM samples (10 controls and 562 tumors). The moderated t-test approach in the 568 R/Bioconductor package limma was used to assess the differential expression status of 569 each gene in the TCGA datasets. The normalized expression ratio in the TCGA datasets 570 was used as the response variable, and the sample group (normal/tumoral) as the 571 covariate of interest. No adjustment for possible confounders was performed in this 572 case. An FDR threshold of 0.001 was used to correct for multiple hypotheses. No 573 filtering on effect size was applied in this case. 574 575 Data analysis workflow 576 All the necessary steps for upstream and downstream analyses were defined and 577 implemented using the Snakemake tool [26], which helps data scientists to generate a 578 reproducible and inherently parallel processing pipeline. Individual workflow tasks 579 were implemented in R (version 3.2.2) and Python (version 3.4.3). 580 581 Acknowledgments 582 We thank Ronnie Lendrum for editorial assistance. We also thank the Tumor Bank of 583 the Hospital Virgen de la Salud (BioB-HVS   shown below. 5mC hypermethylation (blue to yellow) and 5hmC loss (gray to blue) in gliomas are shown above.
Lower panel shows the full genome bisulfite sequencing (WGBS) data including all the CpG sites in the same region. The associated change in gene expression is shown on the right.