Characterising sex differences of autosomal DNA methylation in whole blood using the Illumina EPIC array

Background Sex differences are known to play a role in disease aetiology, progression and outcome. Previous studies have revealed autosomal epigenetic differences between males and females in some tissues, including differences in DNA methylation patterns. Here, we report for the first time an analysis of autosomal sex differences in DNAme using the Illumina EPIC array in human whole blood by performing a discovery (n = 1171) and validation (n = 2471) analysis. Results We identified and validated 396 sex-associated differentially methylated CpG sites (saDMPs) with the majority found to be female-biased CpGs (74%). These saDMP’s are enriched in CpG islands and CpG shores and located preferentially at 5’UTRs, 3’UTRs and enhancers. Additionally, we identified 266 significant sex-associated differentially methylated regions overlapping genes, which have previously been shown to exhibit epigenetic sex differences, and novel genes. Transcription factor binding site enrichment revealed enrichment of transcription factors related to critical developmental processes and sex determination such as SRY and ESR1. Conclusion Our study reports a reliable catalogue of sex-associated CpG sites and elucidates several characteristics of these sites using large-scale discovery and validation data sets. This resource will benefit future studies aiming to investigate sex specific epigenetic signatures and further our understanding of the role of DNA methylation in sex differences in human whole blood. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-022-01279-7.


INTRODUCTION
Sex is an important covariate in all epigenetic research due to its role in the incidence, progression and outcome of many phenotypic characteristics and human diseases (1,2). There is an increasing interest as to which role epigenetic modifications (such as DNA methylation) may play in the underpinnings for relationships between environmental exposures and disease onset. In addition, sex has previously been shown to have a strong influence on DNA methylation variation (3-7). However, the idea that DNA methylation variation between males and females may underlie the sex biases observed in diseases has not been well documented thus far.
Sex differences in disease prevalence are sometimes explained at the molecular level and rooted in genetic differences between males and females. Differences in sex chromosome complement have independently been shown to direct differences in gene expression and chromatin organization (8-11). Furthermore, these differences in sex chromosome complement are sufficient to explain sex bias seen in some diseases. X chromosome number has previously been shown to impact immune cell population and occasionally therefore the development of diseases such as autoimmunity (12, 13).
Previous research has also revealed sex differences in gene expression of autosomal genes as well as sex chromosome linked genes (14). It is worth noting that most of the differences in gene expression on the autosomes are small differences (15). However small expression differences may still be associated with great effects on phenotypic characteristics and disease incidence and onset. Others also identified sex differences in chromatin accessibility and histone modifications, thus suggesting that different epigenetic factors contribute to gene expression sex biases seen in some diseases (16).
Sex specific gene expression and levels of sex hormones may be mediated by epigenetic mechanisms, including DNA methylation. Several genome wide association methylome studies (or Epigenome Wide Association Studies -EWAS) have highlighted differences in DNA methylation patterns linked to sex differences in genes on the autosomes (17-20). Previous studies have reported sites showing varying methylation due to sex differences in various tissues such as saliva, placenta, brain, pancreatic islets and whole blood (17,19,(21)(22)(23)(24)(25)(26)(27). Early studies in this area did not account for cross hybridizing probes which map to both autosomal and sex chromosome loci (28,29). A meta-analysis of 76 published studies with correction for cross hybridizing probes only validated 184 sex associated sites (30). Furthermore, cellular heterogeneity is also important for identification of sex associated sites as several studies have previously demonstrated that cell type proportions may occasionally vary by sex and that could result in identification of cell type associated type differences rather than sex associated differences (24).
Due to X chromosome inactivation in females, large differences in methylation levels of X chromosomes can be observed between males and females (31). Recent research suggests that normalising methylation data with the sex chromosomes introduces a large technical bias to many autosomal CpGs. This technical bias has been reported to result in many autosomal CpG sites being falsely associated with sex and extra steps need to be taken to ensure minimal technical bias is introduced to the autosomal CpGs.
Here, we use the EPIC BeadChip to assess autosomal sex differences in DNA methylation levels from whole blood at individual sites and genomic regions. All individuals involved in this study were part of Understanding Society: The UK Household longitudinal study (32). Additionally, we account for cell type composition, cross hybridization and adequately handle the technical bias introduced by sex chromosomes. To our knowledge, this is the first study using the Illumina EPIC BeadChip (allowing for interrogation of ~850,000 sites across the genome) to investigate autosomal sex differences in DNA methylation.

Participants
Whole blood Illumina Infinium MethylationEPIC BeadChip DNAme data was collected from 1175 participants involved in Understanding Society: The UK Household Longitudinal Study (33). In wave 3 of the study (2011-12) blood samples were collected from a portion of the study participants. Individuals were considered eligible to give a blood sample if they were over the age of 16 and met the other requirements detailed in the UK Household Longitudinal Study Biomarker user guide. Our study population was restricted to participants of white ethnicity. A full description of the dataset and data processing has been described by (34).

DNA methylation data
Samples of whole blood DNA from 1175 participants were obtained following the protocol described in (34). Raw signal intensities were processed using the R package bigmelon (35) and wateRmelon (36)from idat files. Prior to normalisation of the data, outlier samples were identified using principal component analysis and subsequently removed from the data set. The reported age of each sample was compared to predicted age using the epigenetic age method implemented by agep in the R package bigmelon (35) .Further, the reported sex of the samples was checked using a DNA methylation-based sex classifier (31)which predicts sex based on the methylation difference of X and Y chromosomes. 4 samples were subsequently removed after this step as reported and predicted sex did not match. The data was then normalised via the interpolatedXY adjusted dasen method using the adjustedDasen function in the R package watermelon (37) which first normalises autosomal CpGs by dasen and then the corrected methylation values of sex chromosome linked CpGs are estimated as the weighted average of their nearest neighbours on autosomes. Following normalisation of the data, SNP probes, cross hybridizing probes (29) and X or Y linked probes Since we had such stringent parameters to define what we considered a significantly associated saDMP for males and females. We performed principal component analysis (PCA) to see how male and female beta values clustered in PC space and to evaluate the effect of DNAme at the saDMPs. As shown in Figure 1C, male and female samples formed clear clusters based on the beta values of the significant sex associated DMPs (554 CpGs). PC1 explained 14.2% of the variance and PC2 explained 4.1% of the variance. Based on Figure 1C we can conclude that these saDMPs are sufficient to contribute to the clear separation of male and female samples in PC space.

Characterisation of sex associated DMPs
The saDMPs were found in 223 unique genes with 68 of these genes harbouring several saDMPs with an average of 2.7 saDMPs per gene ( Figure 1D). CRISP2, a gene known to be involved in sperm function and male fertility (Lim et al.2019), harboured the largest number of saDMPs, 9. We performed GO and KEGG analyses, but did not identify any significantly enriched biological processes or pathways for these genes.
To help us try to gain more insight into the functional role of these saDMPs, we characterised their genomic location and further compared this with the autosomal background. We found saDMPs are preferentially located CpG islands and CpG shores and depleted in open sea regions compared to the autosomal background ( Figure 1E). Moreover, saDMPs hypermethylated in females are enriched at promoters and exons, with saDMPs hypermethylated in males being enriched at 5'UTR, potentially acting as alternative promoters ( Figure 1F). Interestingly, we observed that all saDMPs display enrichment at enhancers, which, together with their presence at promoters, indicates they could play a role in gene regulation. Lastly, we also note that all saDMPs were depleted at transposable elements and introns compared to the autosomal background.
Enrichment of saDMPs at enhancers suggests that some of the saDMPs could potentially regulate distal genes (48, 49). We further annotated the saDMPs to genes by assessing their contacts with promoters, and thus, distal genes. Following this, we further annotated the saDMPs to 46 additional genes, 35 of them being annotated to saDMPs hypermethylated in females and 11 to saDMPs hypermethylated in males (see Figure   S2A-B).
To evaluate the interactions between the proximal (located in genes) and distal (those in 3D proximity) saDMPs, we collated these and produced protein-protein interaction networks to visualize the networks of these genes (see Figure S2C-D). Of the 11 genes linked to saDMPs hypermethylated in males, we found three histones (HIST1H3A, HIST1H4A and HIST1H4B), which are known to interact with CDYL ( Figure S2C), a gene that we also found to harbour an saDMP hypermethylated in males. Chromodomain Y-like protein (CDYL) is a chromatin reader binding to heterochromatin (H3K9me3, H3K27me2 and H3K27me3) that is crucial for spermatogenesis, male fertility and X chromosome inactivation (50). From the list of genes linked to saDMPs hypermethylated in females, KDM2A regulates circadian gene expression by repressing activity of CLOCK-ARNTL which has previously been shown to exhibit sex dimorphism (51). In addition, ODF2L; outer dense fiber of sperm tails 2 like is also linked to saDMPs hypermethylated in females and has previously be shown to interact with PRSS23, which is involved in ovulation (52).
We then performed functional enrichment of genes annotated to those saDMPs to identify enriched biological processes or terms. Enrichment of the genes annotated to those saDMPs hypermethylated in females revealed several processes (129 terms), including adaptive immune system, estrogen receptor binding and androgen receptor binding (FDR < 0.05) (Additional File 5). Furthermore, enrichment of the genes annotated to those saDMPs hypermethylated in males displayed enrichment of terms such as dosage compensation by inactivation of X chromosome, hormone receptor binding and gene / chromatin silencing (FDR <0.05) (Additional File 6).

Enrichment of saDMPs in transcription factor binding sites
To identify common features among the sex associated DMPs, we performed transcription factor (TF) binding site and gene ontology analyses. First, we evaluated whether the saDMPs were enriched in motifs for TFs (100 bp window). For the saDMPs hypermethylated in females, we found 329 enriched TFs (p.value < 0.05) (Figure 2A and Additional File 3) with strongest evidence for TROVE2, XRCC1 and NFIL3.
TROVE2/R060 is an RNA binding protein known to be involved in stimulating of androgen receptor regulated genes. We also found SOX9 and SRY TFs to be enriched in DMPs hypermethylated in females, which are genes known to be involved in male sex determination ( Figure 2A) (53). For saDMPs hypermethylated in males, we identified 64 enriched TFs, including CEPBG, AFF4 and ELK1. CEPBG is a transcription factor which has previously been reported to stimulate adipocyte differentiation in an ESR1/CEBPA mediated pathway (54). Furthermore, GABPA and ELK1 have previously been shown to be associated with sex associated differentially methylated regions in the mouse liver (55).
To analyse whether the TF motifs were enriched for annotation to biological processes or pathways, we performed pathway analyses using the GO and KEGG databases. We identified 15 enriched KEGG pathways for the TFBS enriched at saDMPs hypermethylated in females, spanning a wide range of processes such as the transcriptional regulation in cancer, several specific cancer pathways, viral carcinogensis and more ( Figure 2B). In addition, we also found 39 enriched GO terms ranging from transcription factor activity, E-box binding, transcription coactivator activity and interestingly, bHLH transcription factor binding ( Figure S3B). Nevertheless, we found no enriched KEGG terms for the TFs enriched at saDMPs hypermethylated in males, likely due to the small number of enriched TFs. However, we identified 9 enriched GO terms such as NAD, NADP binding and oxidoreductase activity ( Figure 3SA).
One possibility is that transcription factor motifs for genes encoded on the sex chromosomes may act as hubs in the enriched TF motif network. To assess this, we produced protein-protein interaction networks to visualize the networks of these TFs ( Figure S4). Although we identified some enriched motifs for several TFs encoded on the X chromosomes in the saDMPs hypermethylated in males such as ELK1, TGIF2LX and TCEAL6 ( Figure S4A), we observed that they were not central hub nodes in the network. Nevertheless, we did identify several central TF motifs encoded on the sex chromosomes for the saDMPs hypermethylated in females ( Figure S4B). These included 11 TFs encoded on the X chromosome and 2 on the Y chromosome including SRY and KDM5D.
We further utilised cytohubba package to robustly identify if these TFs were in fact hub genes in the network.
This revealed that ELK1 did in fact act as a hub gene in the TF network, however the other 49 genes were encoded on the autosomes ( Figure 2C). Interestingly, for the TF motifs enriched at saDMPs hypermethylated in females, we found SRY (coded for on the Y chromosome) and SOX9, BRCA1 and other autosomal genes ( Figure 2D). BRCA1 is a tumour suppressor gene which is heavily implicated in breast cancer and its promoter hypermethylation in peripheral blood has previously been shown to be linked to decreased risk of breast cancer (56).

Relationship with gene expression
The 554 saDMPs were then further explored in association with the expression levels of their annotated genes using publicly available data for whole blood poly(A)+. We did not identify any sex biased gene expression patterns corresponding to differences in DNAme levels at these genes ( Figure S5). Some of the genes annotated to our saDMPs met the P value threshold after correcting for multiple testing using the Benjamini and Hochberg method (adjusted pvalue<0.05), however did not meet the log fold change threshold (log 2 FC>1) such as TARP, KCNT1 and CEBPB ( Figure S5A). However, these genes have low expression in blood and therefore, this analysis may be more relevant for other tissues such as brain or skeletal muscle. The majority of the differentially expressed genes are located on the sex chromosomes, but we also did observe differential expression between males and females for several autosomal genes ( Figure   S5B-S5C). These results indicate that the differences in DNA methylation observed between males and females do not lead to any large difference in gene expression, but only some small ones.

Sex associated differentially methylated regions
Given that several genes harboured numerous saDMPs, we postulated whether some of the saDMPs were part of larger differentially methylated regions associated with sex. We therefore searched for differentially methylated regions associated with sex. Following adjustment for multiple testing (FDR), we identified a large list of 14,386 sex associated differentially methylated regions. We therefore considered a saDMRs significant if it harboured at least 2 CpGs, had an FDR value smaller than 0.05 and had a maximum absolute beta fold change value within the region greater than 0.05 in either direction. Following filtering of the list of saDMRs, we identified 311 significant sex associated DMRs on the autosomes between males and females located in 226 unique genes (Additional File 2). The number of CpGs within the DMRs ranged from 3 to 58 and had an average width of 1793 base pairs (bp) ranging from 63bp to 5638 bp. Note that PRKXP1 saDMR, although hypermethylated in males, contains a single CpG which is hypermethylated in females.
A saDMR harbouring 58 CpGs overlapped the promoter region of a gene called SCAND3. The top hits in the saDMR list overlapped promoter regions of the following genes; ATP5J, GABPA, DYRK2, SMAD2, CRISP2, AND PRKXP1. ATP5J and GABPA are genes (hypermethylated in males) which have previously been reported to be implicated in early onset of Alzheimers disease (57) , a disease known to affect females more than males. Furthermore, ATP5J is a gene known to be a target gene of oestrogen, previously shown to serve an inhibitory role in the sex differences in hepatocellular carcinoma (58). DRYK2, SMAD2 and CRIPS2 have previously been shown to exhibit functions which are sex specific (21). CpGs harboured by the gene ZPBP2 are hypermethylated in females and previous work has implicated this gene in sex specific effects in asthma and, consistent with our findings, also showed that methylation at this gene is lower in males than females. PRKXP1 is located on chromosome 15 and CpGs in this region have previously been associated with Crohns disease and intestinal inflammation, a disease which has previously been reported to be more prevalent in females (59). These results suggest that DNAme differences at these sites may translate to sex biases seen in disease such as Alzheimers and asthma. Moreover, SCAND3 is expressed mainly in seminal vesicle and testis, CRIPS2 is expressed uniquely in testis, whereas GABPA has high expressions in placenta and vagina.

DISCUSSION
Here, we conducted the first study aiming to characterize autosomal sex differences in DNAme between males and females in whole blood using the IlluminaEPIC BeadChip, which interrogates ~850,000 sites across the genome. Previous studies were performed using Illumina450K BeadChip that covers only ~450,000 sites. We identified 554 sex associated differentially methylated positions on the autosomes whilst adequately handling the technical bias introduced by normalising with the sex chromosomes and correcting for cross hybridisation of some EPIC probes. Previous work has reported contradicting results, with one group finding that there is higher methylation on autosomes in females (5, 21, 60, 61), another group identifying higher methylation on autosomes in males (62, 63) and a third group finding no significant difference in DNAme on autosomes between males and females (19, 64). Our results support the former and we found that 70% of these loci (389 CpGs) showed higher methylation in females compared to males.
The inconsistency seen in the literature is due to the differing normalisation methods applied to DNAme microarray data. Previous research has shown that as the methylation levels of CpG sites on the X chromosome differ largely between males and females, normalisation methods which normalise array data indiscriminately with CpG sites on the autosomes introduce large technical biases for autosomal CpGs (31).
Using such normalisation methods, will therefore lead to many autosomal CpG sites being falsely associated with sex. Our choice of normalisation method greatly reduced technical bias at autosomal CpGs for male and female samples, and, thus, we report a highly robust catalogue of autosomal CpGs differentially methylated between males and females.
We further categorized these 554 saDMPs into two groups, those that were hypermethylated in males (n=166) and those that were hypermethylated in females (n=389). Several saDMPs found to be hypermethylated in females overlapped the transcription start site (TSS) of genes not previously been reported to exhibit sex differences in DNAme including DYRK2, SMAD2 and SHANK2. Interestingly, it has previously been shown that sex hormones can regulate SHANK expression leading to a sex differential expression in SHANK2 (65). Furthermore, this gene has previously been implicated in autism spectrum disorder, a disorder known to exhibit higher prevalence in males rather than females (66) In contrast, the most significant saDMP hypermethylated in males is located in the CpG island of a gene located on chromosome 21 called GABPA. GABP is a methylation sensitive transcription factor and has previously been shown to be a transcriptional activator of Cyp 2d-9, which is a gene encoding a male specific steroid in mice (67). Sex differences in these regions have previously been identified (21).
Interestingly, as well as DYRK2 and GABPA being the genes annotated to the most significant saDMPs hypermethylated in females and males respectively, they were also the two most significant saDMRs, suggesting these regions could account for important sex biases observed in some diseases. This is further supported by the fact that GABPA has also been heavily associated with early onset of Alzheimers disease, Parkinsons disease, breast cancer and autism (68) and DYRK2 implication in cancer (69). Our KEGG term analysis at those saDMPs hypermethylated in females also implicated many cancer related KEGG terms including thyroid cancer, endometrial cancer, non-small cell lung cancer and more ( Figure 2B).
The saDMR harbouring the highest number of CpG sites (n=58) is located on chromosome 6, overlaps SCAND3/ ZBED9 and is hypermethylated in males. Hypermethylation of this gene has been suggested to be implicated in hepatocellular carcinoma which is third leading cause of deaths related to cancer (70). These results collectively support the hypothesis that sex differences in autosomal DNAme may account for some of the sex differences seen in disease prevalence, onset and progression. Moreover, we did identify saDMPs in genes known to exhibit sex differences in DNAme such as CRIPS2, SLC9A2, DDX43 which are involved in spermatogenesis and male fertility (21, 30). Specifically, CRIPS2 harboured 9 significant saDMPs, all hypermethylated in females, and is part of a group of proteins called CRISPs which show male biased expression in the male reproductive tract. CRIPS2 plays an important role in spermatogenesis, acrosome reaction and gamete fusion (71) . Some of our saDMPs were located in genes known to show sex by age effects, such as PRR4, a gene associated with dry eye syndrome (Perumal et al,. 2016). Despite this, recent research shows that the adult blood DNA methylome is largely affected by sex, but that these methylome sex differences do not change throughout adulthood and so are largely independent from age effects (72, 73).
The Illumina EPIC array has an increased coverage of the genome, including distal regulatory elements (74).
It was interesting that the 554 saDMPs were still found to be significantly enriched at CpG islands and CpG shores but depleted in open sea regions of the genome ( Figure 1E). As the genomic location of DNA methylation normally alters its function, with methylation in CpG islands normally functioning to serve long term silencing of genes (75) and CpG island shore methylation being strongly related to gene expression (76) this suggests a potential functional role for these saDMPs. To further support these findings, we identified enrichment of these saDMPs at enhancers, 5'UTRs and promoters ( Figure 1F). Despite this enrichment at regulatory regions, we found little correlation of these sites alone with differences in gene expression between males and females, suggesting that these saDMPs may not be sufficient alone to predict gene expression. It is worthwhile mentioning that, while some genes associated with these saDMPs displayed statistically significant difference in expression between males and females, the differences in expression were modest.
Instead, these saDMPs are likely to interact with transcription factors and other regulatory features to affect gene expression and chromatin organisation. This potential link was identified in our TF motif analysis, where we found SRY (sex determining region Y) and SOX9 (SRY-box 9) transcription factor motifs also known as the sex determining factors, to be enriched at those saDMPs hypermethylated in females and further identified these as hubs in the TF network. SRY has been found to bind and repress WNT activation of ovarian genes, and both SRY and SOX9 transcription factors have been shown to bind the promoter regions of many targets of involved in differentiation of the testis (53). Furthermore, we also found SR1 TFBS enriched in the saDMPs hypermethylated in females, a gene known to interact with SOX9 to increase its own expression to propel differentiation of testis beyond SRY activity (53) It has previously been reported that 3D genome organisation can impact sex biased gene expression through direct and indirect effects of cohesion and CTCF looping on enhancer interactions with sex biased genes (77), Recently, it was shown that with rising oestrogen levels, the female brain exhibits sex hormone driven plasticity and that chromatin changes underlie this (78). Interestingly, by annotating our saDMPs to distal genes using chromatin loops, we were able to identify contacts between saDMPs and three genes HIST1H3A, HIST1H4A and HIST1H4B which are core components of nucleosome, thereby responsible for playing a role in chromatin organisation. We further were able to identify interactions between these three genes and a gene harbouring a saDMP found to be hypermethylated in males called CDYL. These results suggest that although we found DNAme to not be predictive of sex differences in gene expression ( Figure   S5), these saDMPs may interact with other genes, transcription factors and other epigenetic modifications to direct chromatin organisation and regulatory networks.
Lastly, we acknowledge limited overlap with previous studies yet conclude that this is due to our extremely large sample size (n=1171) and improved handling of sex bias introduced by normalising such data with the sex chromosomes. Both of these factors contribute to our ability to detect true positives and obtain a more robust catalogue of true sex associated autosomal CpGs.