I. Abstract
Background Methylmercury (MeHg) is an environmental pollutant of global public health concern. MeHg is associated with immune dysfunction but the underlying mechanisms are unclear. The most common route of MeHg exposure is through consumption of fatty fish that contain beneficial n-3 polyunsaturated fatty acids (PUFA) that may protect against MeHg toxicity.
Objectives To better inform individual costs and benefits of fish consumption, we aimed to identify candidate epigenetic biomarkers of biological responses that reflect MeHg toxicity and PUFA protection.
Methods We profiled genome-wide DNA methylation using Illumina Infinium MethylationEPIC BeadChip in whole blood from N=32 individuals from Madre de Dios, Peru. Madre de Dios has high artisanal and small-scale gold mining activity, which results in high MeHg exposure to nearby residents. We compared DNA methylation in N=16 individuals with high (>10 µg/g) vs. N=16 individuals with low (<1 µg/g) total hair mercury (a proxy for methylmercury exposure), matched on age and sex.
Results We identified hypomethylated (i.e., likely activated) genes and promoters in high vs. low MeHg-exposed participants linked to Th1/Th2 immune imbalance, decreased IL-7 signaling, and increased marginal zone B cells. These three pathways are feasible mechanisms for MeHg-induced autoimmunity. In addition, we identified candidate epigenetic biomarkers of PUFA-mediated protection: hypomethylated enhancer binding sites for retinoid X receptor (RXR) and retinoic acid receptor α (RARα). Last, we observed hypomethylated enhancer and promoter binding sites for glucocorticoid receptor (GR), which is associated with developmental neurotoxicity, and transcription factor 7-like 2 (TCF7L2), which is associated with type 2 diabetes (T2D) risk.
Discussion Here, we identify a set of candidate epigenetic biomarkers for assessing individualized risk of autoimmune response and protection against neurotoxicity due to MeHg exposure and fish consumption. In addition, our results may inform surrogate tissue biomarkers of early MeHg exposure-related neurotoxicity and T2D risk.
II. Introduction
The organic heavy metal methylmercury (MeHg) is an environmental pollutant of global public health concern1-3. MeHg is formed by microbial methylation of inorganic mercury that deposits in waterways after atmospheric emission from coal-fired power plants and burning of gold-mercury amalgams during artisanal gold mining3,4. After conversion to the organic form, MeHg bioconcentrates and biomagnifies in the aquatic food web and humans are exposed via consumption of contaminated fish and seafood4-6. Clinically apparent MeHg neurotoxicity, including gross motor impairment, was first documented after high dose poisoning events that occurred in Minamata, Japan7 and Iraq8 (reviewed in 9,10). However, exposure to much lower doses that are common in fish-consuming communities are sufficient to cause subtler neurotoxic effects, including impairment of learning and memory, particularly in individuals exposed during critical windows of brain maturation, including fetal development or early childhood (reviewed in 11). In addition, both MeHg and inorganic mercury species trigger increased levels of circulating inflammatory cytokines and autoreactive antibodies and T-cells in humans12-21 and animals22-25, as well as autoimmune disease risk26,27, suggesting that MeHg increases risk of chronic inflammation and autoimmunity28,29. One hypothesis for the underlying mechanism for immunotoxicity is MeHg-induced mitochondrial dysfunction and subsequent increase in reactive oxygen species (ROS) and oxidative damage to cellular components18,19,30,31. These damaged cellular components, including damaged mitochondrial DNA and membrane proteins, serve as damage-associated molecular patterns (DAMPs) that activate innate immune responses through the same pathways as pathogen-associated molecular patterns (PAMPs)32. However, not all studies of low dose MeHg exposure point to clear autoimmune responses in exposed individuals33-35, and there are noted beneficial effects of other nutrients contained within MeHg-contaminated fish and seafood, including polyunsaturated fatty acids (PUFAs) that support healthy brain development in infants and children9,36 and cardiovascular protection to adults37,38. In addition, some communities have strong cultural connections to foodways centered on fish and seafood that should be considered in this cost benefit analysis39. Considering the increasing environmental levels of MeHg with climate change40-42 and the projected increase in human exposure43,44, understanding the individualized cost-benefit relationship between toxic and healthful outcomes of fish consumption is an important public health priority. To offer personalized environmental health recommendations, often referred to as “precision environmental health” or “functional exposomics”45, it is important to develop reliable and informative biomarkers of effect that report on both toxic and protective health responses in individuals.
A current focus of the precision environmental health approach is the development of epigenetic biomarkers of effect that reflect individuals’ biological responses to chemical exposures45. Epigenetic biomarkers comprise changes to the DNA-protein structure, including DNA methylation and histone modifications, as well as regulatory non-coding RNA, that reflect current or past gene expression responses to pollutants or nutrients46. Ideal epigenetic biomarkers are specific to particular chemical or dietary exposures and reliably report on biological responses that reflect known mechanisms of toxicity or protection46. Although exciting and highly promising in theory, in practice, descriptive discovery experiments for MeHg exposure have identified only a small number of candidate epigenetic biomarkers47-55, possibly due, at least in part, to the low variance in exposure within studied populations that limits statistical power.
In this study, we conducted a discovery epigenetic experiment in age- and sex-matched Peruvian adults with high variance in MeHg exposure due to nearby artisanal and small-scale gold mining2 (Table 1). This approach has two distinct strengths. First, we identify candidate epigenetic biomarkers that are directly relevant to health risks and protections that are specific to this population of highly exposed, vulnerable Peruvians, including indigenous Amazonian individuals2. Viewed through an environmental justice lens, we value the initial development of public health tools that directly serve the needs of the most vulnerable communities, rather than testing the cross-population validity of biomarkers initially developed in and for Western populations. Second, the high exposure variance in our study allowed us to identify candidate epigenetic biomarkers that reflect known MeHg and PUFA biology. Because they reflect this established biology, these biomarkers likely have strong public health relevance to Western populations with lower exposures, even though these biomarkers were undetected in previous Western population studies with lower exposure variance47-55. Therefore, this study represents an important advance in the development of precision environmental health tools for personalized recommendations of MeHg-contaminated fish and seafood consumption.
III. Methods
Sample population
This study leverages a larger mercury exposure assessment study in communities around the Amarakaeri Communal Reserve in Madre de Dios, Peru2. This reserve is bordered on the east by heavy artisanal and small-scale gold mining activity (ASGM), a form of mining that uses large inputs of elemental mercury and contaminates local fish with methylmercury2. Residents of these communities are exposed primarily to methylmercury by consuming methylmercury-contaminated fish2. We previously quantified total mercury levels in proximal 2-centimeter segments of head hair, which represents ∼2-3 months’ growth2,56. For populations in this region, methylmercury is the dominant form of mercury in scalp hair56. Thus, total mercury level in this hair segment length approximates primarily methylmercury exposure over the prior 2-3 months2,56. For this study, we selected a subset of 16 adults with high chronic methylmercury exposure (defined as >10 µg/g total hair mercury) and 16 adults with low chronic exposure (defined as <1 µg/g total hair mercury), matched on age and sex2 (Table 1).
DNA extraction
For both DNA methylation and mtDNA analyses, 8.5 mL of whole blood was collected in PAXgene Blood DNA Tubes (Qiagen, 761115) which contain 2 mL of a proprietary additive that prevents coagulation of the blood and preserves genomic DNA. Tubes were stored for no more than four hours at room temperature, transferred to a -20°C freezer for a period of four to seven days, and finally transferred to -80° until being shipped on dry ice to Duke University where they were stored at -80°C until DNA was isolated. For DNA isolation, the frozen whole blood samples were thawed in a 37°C water bath for 15 minutes and then immediately processed. PAXgene Blood DNA kits (QIAGEN, 761133) were used according to the manufacturer’s instructions to extract high molecular weight DNA from tissue (not cells).
DNA methylation
We assessed genome-wide DNA methylation using Illumina Infinium MethylationEPIC BeadChips and analyzed DNA methylation microarray data in R using the standard pipeline in RnBeads57. We estimated cell type proportions within whole blood samples using the estimateCellCounts function in the minfi package58 and estimated pairwise associations between age, sex, MeHg group, and cell type proportion variables. In addition, we estimated epigenetic age using an algorithm incorporated into the RnBeads pipeline, which uses an elastic net regression method (glmnet) and the Horvath laboratory age annotation as a response variable to account for different epigenetic age pacing between younger and older age groups59. We used pre-defined age predictors developed from training methylation datasets from multiple, publicly available studies (incorporated into the RnBeads pipeline) to annotate our data; these training datasets include Infinium 27K BeadChip (N=2,286 from 6 studies), Infinium 450K BeadChip (N=1,866 samples from 20 studies from Gene Expression Omnibus or The Cancer Genome Atlas), and Reduced Representation Bisulfite Sequencing data (N=232 samples of German origin). Training datasets include majority European samples (datasets are listed at https://github.com/epigen/RnBeads_web/blob/master/ageprediction.html.) We conducted differential methylation analysis on the site and region level between high and low MeHg groups (based on a binary MeHg variable) adjusted for sex, age, community and estimated proportions of the following cell types: CD8+ T cells, CD4+ T cells, B cells, natural killer cells, monocytes, granulocytes. RnBeads computed p-values and adjusted p-values (using the Benjamini-Hochberg false discovery rate (FDR) correction for multiple comparisons) on the site and region levels using hierarchical linear models from the limma package and fitted using an empirical Bayes approach on derived M-values. Then, RnBeads assigned differentially methylated sites ranks based on three criteria: 1) the difference in mean methylation, 2) the quotient in mean methylation, and 3) a statistical test (limma or t-test). A combined rank was computed based on the maximum rank among these three metrics (the lower the rank, the greater the evidence for differential methylation). Differentially variable sites were computed using the diffVar method from the missMethyl R package60. Differential methylation on the region level was computed using: 1) the mean differences in means across all sites in a region between high and low MeHg groups, 2) the mean of quotients in mean methylation, and 3) the combined p-value from all site p-values in the region. Each region was assigned a combined rank based on the maximum rank among these three metrics. Regions were divided into four genetic context categories: genomic tiling, CpG islands, promoters, and genes57. Differential variability on the region level was computed similarly to differential methylation on the region level, using the mean of variances, log-ratio of the quotient of variances, and p-values from the differentiality test to compute ranks. We conducted a Gene Ontology (GO) enrichment analysis of differential and differentially variable genes enriched in our DNA methylation results using a hypergeometric test61, as well as a Locus Overlap Analysis (LOLA) enrichment62 using Fisher’s exact tests to derive ranked enrichments in functional genomic and epigenomic annotations from the following reference databases: cistrome_cistrome, cistrome_epigenome, codex, encode_segmentation, encode_tfbs, Sheffield_dnase, and uscs_features.
Mitochondrial DNA damage and copy number
We assessed mitochondrial DNA copy number (mtDNA CN) and mitochondrial DNA damage (mtDNA damage) as follows. DNA was quantified using PicoGreen (ThermoFisher P7589) with a standard curve of a HindIII digest of lambda DNA (Invitrogen 15612-013) as described63. Samples were then diluted to 3 ng/μL in 0.1X TE buffer for use in long amplicon Polymerase Chain Reaction (LA-PCR) and real time PCR assays. We measured mtDNA damage using an established long-range qPCR assay that evaluates whether DNA lesions are present that can halt or slow DNA polymerase progression during PCR amplification. This assay’s primers amplify an 8.9 kb fragment from mtDNA. Samples with greater loads of DNA damage yield fewer PCR products. For each mtDNA damage qPCR reaction, we used 15 ng DNA template, 0.4 uM each of forward (5’-TCT AAG CCT CCT TAT TCG AGC CGA-3’) and reverse (5’-TTT CAT CAT GCG GAG ATG TTG GAT GG-3’) primers, nuclease-free water, and LongAmp Hot Start Taq 2× Master Mix (New England Biolabs), as described63. We amplified this product under the following conditions: an initial denaturation step of 2 min at 94°C, 21 cycles of denaturation at 94°C for 15 seconds and annealing at 64°C for 12 minutes, with a single final extension step at 72°C for 10 minutes. We quantified qPCR products using Picogreen dye in a 96-well plate reader as described63. We calculated DNA lesion frequency for mtDNA following a Poisson equation [f(x) = e-lλ λx/x!], where λ is the average lesion frequency in the reference template (i.e., the zero class; x=0, f(0) = e-λ), as previously described64. We compared amplification of mtDNA in people with high hair mercury (AHIGH) to amplification of mtDNA in people with low hair mercury (ALOW) with a relative amplification ratio (AHIGH/ALOW). We defined the DNA lesion frequency as λ = -ln(AHIGH/ALOW). We calculated lesion frequency per base pairs (bp) of mtDNA by adjusting for amplicon size and normalizing amplification of the long mtDNA fragment to the short mtDNA fragment that reflects mtDNA CN per cell63.
We measured mtDNA CN using an established short-range, real-time, standard curve-based qPCR assay that is specific to mtDNA. We prepared serial dilutions of a plasmid containing a 107-base fragment of the mitochondrial tRNA-Leu(UUR) gene to create a standard curve to then calculate absolute mtDNA CN, as previously described63. We evaluated associations between MeHg and mtDNA CN or mtDNA damage with tests of correlation (Fig. 2B, Supplemental Fig. 2), as well as multivariate regression models, adjusted for age, sex, and cell type proportions.
IV. Results
Differential DNA methylation at the site and region levels
We observed 43,011 CpG sites with differential DNA methylation between high and low MeHg groups (p <0.05) and 9 CpG sites with differential DNA methylation (FDR <0.05). At the region level, we observed the following. In genomic tiling regions, we observed 7,565 regions with differential DNA methylation between high and low MeHg groups (p <0.05). In addition, we observed 370 gene regions, 567 promoter regions, and 376 CpG island regions with differential DNA methylation (p <0.05). Tables with complete data are available in the Gene Expression Omnibus (GSE207443).
Differential variability in DNA methylation at the site and region levels
We observed 45,403 CpG sites with differentially variable DNA methylation between high and low MeHg groups (p< 0.05) and 5 CpG sites with differentially variable DNA methylation (FDR <0.05). At the region level, we observed the following. In genomic tiling regions, we observed 7,745 regions with differentially variable DNA methylation between high and low MeHg groups (p <0.05). In addition, we observed 347 gene regions, 568 promoter regions, and 368 CpG island regions with differentially variable DNA methylation (p <0.05). Tables with complete data are available in the Gene Expression Omnibus (GSE207443).
GO enrichment in differential and differentially variable DNA methylation
Gene Ontology (GO) enrichment analysis leverages the GO Consortium’s curated, logical hierarchy of gene sets and their functional annotations to identify genes enriched within discovery datasets61. These gene sets are curated into groupings, or GO “terms”, within three categories: Biological Processes, Molecular Functions, and Cellular Components61. We observed enrichments of GO terms for regions in genes and promoters only, and no enrichments for genomic tiling regions or CpG islands. We observed 112 Biological Process (BP) GO terms enriched in gene regions and 51 terms enriched in promoter regions with hypomethylated DNA in high vs. low MeHg groups (using a cutoff of combined rank among the 1000 best ranking regions, all with p≤0.01) (Supplemental Tables S1-2, selected terms in Table 2). We observed 46 BP GO terms enriched in gene regions and 51 terms enriched in promoter regions with hypermethylated DNA in high vs. low MeHg groups (using a cutoff of combined rank among the 1000 best ranking regions, all with p≤0.01) (Supplemental Tables S3-4, selected terms in Table 3). In addition, we saw 70 BP GO terms in genes and 128 BP GO terms in promoters with hypervariable DNA methylation between exposure groups, using the same cutoffs (Supplemental Tables S5-6, selected terms in Table 4), as well as 33 terms in gene regions and 43 terms in promoter regions with hypovariable DNA methylation between exposure groups (Supplemental Tables S7-8). Most enriched GO terms were related to immune response, with a particular focus on the innate immune response/inflammation (Tables 2-4, Supplemental Tables S1-S8).
LOLA enrichment in differential and differentially variable DNA methylation
To complement the gene-centric GO enrichment analysis, we additionally performed Locus Overlap Analysis (LOLA) to identify regulatory regions within our dataset enriched for functional genomic and epigenomic annotations62. We observed enrichments of LOLA annotations for regions in genomic tiling regions and CpG islands only, and no enrichments for genes or promoters. Most enriched annotations were for binding sites for transcription factors involved in hematopoiesis and immune response in regions hypomethylated in high vs. low MeHg groups (Fig. 1, Supplemental Figs. 3-13). The second most common signal in our LOLA results was for general repression in regions hypomethylated in high vs. low MeHg groups (Supplemental Figs. 3-13). In particular, we observed enrichment in repressive signals in hypomethylated regions in high vs. low MeHg groups, which suggests reactivation of repressed regulatory regions (Supplemental Figs. 3-13). Signals of gene repression include repressive histone modifications (e.g., H3K27me3) and loss of methylation (indicating binding and activation) in regions associated with binding of proteins that deposit repressive histone modifications (e.g., polycomb repressive complex components EZH265 and SUZ1266) or remove activating histone modifications (e.g., SMARCA4, a component of the SWI/SNF chromatin remodeling complex67, which recruits histone deacetylase repressor complexes68).
Predicted epigenetic age, cell type proportions, and mitochondrial endpoints
We computed predicted epigenetic age, sometimes referred to as an “epigenetic clock” biomarker, based on age-related changes in DNA methylation in specific CpG sites69-75. Accelerated aging, as evident from a discrepancy between chronological age and computed epigenetic age, has been associated with environmental pollutants and disease risk in past studies69-75. In our data, the predicted epigenetic ages computed from DNA methylation data were consistently lower than reported chronological age (on average 11 years lower, ranging from 8 to 17 years lower) (Fig 2A). Since predicted epigenetic age was highly correlated with reported chronological age (R2=0.86) (Fig. 2A), these results indicate a systematic underestimation of age by the epigenetic age algorithm in our dataset. In addition, we did not observe any association between predicted epigenetic age and MeHg exposure (Fig. 2B). In pairwise tests of association between proportions of different immune cell types with mercury exposure, only monocyte proportion was associated with binary MeHg (Wilcoxon test p=8.7×10−5) (Fig. 2B, Supplemental Fig. 1). Neither continuous nor binary total hair mercury was associated with mtDNA damage (R2=7E-05) (Fig. 2C) or mtDNA CN (R2=0.0028) (Fig. 2D). In addition, mtDNA damage was not highly correlated with mtDNA CN (R2=0.07) (Supplemental Fig. 1). We observed a broad distribution of both mtDNA damage and mtDNA CN biomarkers in both high and low MeHg exposure groups (Fig. 2C-D).
V. Discussion
In this study, we identified candidate epigenetic biomarkers of MeHg-induced toxicity and protection in Peruvian individuals with high (>10 µg/g) vs. low (<1 µg/g) total hair mercury (a proxy for methylmercury exposure), matched on age and sex.
The primary signal in our pathway enrichment data was of a clear immune phenotype in response to MeHg exposure. The human immune response comprises general innate responses as well as antigen-specific adaptive responses76. Both innate and adaptive immune responses include a humoral component (circulating chemical effectors, including cytokines and chemokines) and a cell-mediated component (including general neutrophil and macrophage77 responses and both general78,79 and antigen-specific79 B-cell and T-cell action). Most pathways that were enriched in hypomethylated regions in genes and promoters in high MeHg-vs. low MeHg-exposed Peruvians reflect innate immune response activation (Table 2, Supplemental Tables S1-2). Loss of DNA methylation in promoters and genes are generally associated with gene activation80, implying activation of these innate immune pathways in response to MeHg exposure. This innate response included classic neutrophil and macrophage activation81,82, as well as mast cell release of serotonin83 and eicosanoids like prostaglandins84 (Tables 2 and 4, Supplemental Tables S1-S2). In addition, we observed evidence of immune responses in several T- and B-cell subtypes (Tables 2 and 4, Supplemental Tables S1-S2). Both T-cells and B-cells develop effector cells in response to specific antigens, including DAMPs85. A subset of each effector cell subtype is retained as memory cells following immune response resolution86. T-cell responses are generally divided into cytotoxic (CD8+ T-cells) and helper (CD4+ T-cells) responses; CD4+ responses are further subdivided into T-helper type 1 (Th1) and T-helper type 2 (Th2) responses87. Th1 responses promote inflammation and, if uncontrolled, cause autoimmunity and tissue damage due to chronic inflammation88. Th2 responses include anti-inflammatory cytokines, as well as eosinophilic (e.g., IgE- and histamine-mediated signaling), that counterbalance Th1 responses88. Here, we observed a clear CD4+ response, including both Th1 (inflammatory cytokines and chemokines: interleukin-1 (IL-1), interleukin-1β (IL-1 β), interleukin-6 (IL-6), interleukin-8 (IL-8), tumor necrosis factor α (TNFα), macrophage-activating interferon γ (IFN γ)) (Tables 2 and 4, Supplemental Tables S1-S2 and S4-S5) and Th2 (anti-inflammatory interleukin-10 (IL-10), eosinophil activation, interleukin-5 (IL-5), interleukin-13 (IL-13), B-cell isotope switching) (Supplemental Tables S1-S2 and S4-S5). Importantly, the Th1 response is most evident in our GO enrichments of differential mean DNA methylation (Tables 2 and 3, Supplemental Tables S1-S4) and the Th2 signal is clearest in GO enrichments of hypervariable DNA methylation (Table 4, Supplemental Tables S4-S5). These results indicate that MeHg induces a similar Th1 response in most individuals, but that some individuals mount a stronger balancing Th2 response than others. This result suggests a mechanism by which MeHg-exposed individuals who exhibit Th1-dominant signaling with little Th2 counterbalance may be at higher risk of developing autoimmunity. Therefore, DNA methylation at genes and gene promoters associated with Th2 signaling in our dataset are strong candidate epigenetic biomarkers of individual risk of autoimmune response to MeHg.
We observed two additional signals related to the development of autoimmunity. First, our data are consistent with expansion of autoreactive T cells. Autoreactive helper T cell subsets can form in response to self-antigen; T helper type 17 (Th17) cells are most likely to be autoreactive, followed by Th1 cells89. Th17 cells are stimulated to differentiate from naïve CD4+ T cells by IL-6 and transforming growth factor-β (TGF-β, which stimulate downstream STAT3 signaling89. Our data show increases in all three of these signals (Tables 2 and 4, Supplemental Tables S1-S4), indicating an environment conducive to increased Th17 cell production. Regulatory T (Treg) cells provide negative regulation of Th17 cells and suppress autoimmune responses and disease development90,91. Treg cells are derived from naïve CD4+ T cells when IL-6 and TGF-β levels decrease during resolution of an inflammatory response89. In addition, interleukin-7 (IL-7) signaling promotes expansion of the Treg pool90,91. The increased IL-6 and TGF-β signaling in response to MeHg, as well as decreased IL-7 signaling (Table 3, Supplemental Tables S3-S4), are consistent with an expanded pool of autoreactive Th17 cells and a diminished population of Treg cells that suppress autoreactivity in individuals with high MeHg exposure. Second, we observed that hypomethylated regions in high MeHg-exposed were enriched for gene promoters involved in marginal zone B cell differentiation (Table 2), which is consistent with activation of the target genes of these promoters. Marginal zone B cells can become autoreactive when co-stimulated by self-antigens and DAMPs92, and autoreactive marginal zone B cells can also activate autoreactive T cells92. Therefore, in addition to DNA methylation linked to Th2 signaling, hypomethylated enhancers and promoters associated with decreased IL-7 signaling and increased marginal zone B cell differentiation are strong candidates for epigenetic biomarkers that report on inter-individual differences in autoimmune response to MeHg. The hypervariable DNA methylation in multiple inflammatory pathways is strong evidence of population distribution in immune response to MeHg in our dataset, including both high and low responders that carry differential DNA methylation signatures of response (Supplemental Tables S5-S6).
The transcription factor signal that we observed in our LOLA enrichments is consistent with the immune phenotype reflected in our GO enrichment data. Specifically, we observed hypomethylation of tiling regions (likely enhancers) and promoters containing binding sites for transcription factors that control differentiation of the macrophages, neutrophils, T-cells and B-cells (Fig. 1A-B, Supplemental Figs. S3-S13). Broadly, hematopoiesis generates a range of blood cell types, including red (erythrocytes) and white (lymphocytes) cells93. Lymphocytes are derived from either myeloid or lymphoid lineages; myeloid precursors differentiate into neutrophils and monocytes/macrophages and lymphoid precursors develop into B-cells and T-cells93. Spi1/PU.1 is a master regulator of hematopoiesis that directs differentiation within both myeloid and lymphoid lineages through varying concentration (e.g., low in multipotent precursors, high in mature B-cells and macrophages) and co-activator partners94. During early hematopoiesis, Spi1/PU.1 interacts with factors GATA-2, CEBPα/β and c-Jun to drive white blood cell differentiation94,95. In the presence of STAT3 (Fig. 1A) and interleukin-3 signaling (Supplemental Table S1), cells further develop into neutrophils and macrophages96. In contrast, RUNX197,98, RUNX397,98, TCF399, and TCF1299 (Fig. 1A) promote T cell lineage commitment. We observed evidence of signaling through additional hematopoietic transcription factors, including LMO2, which is a scaffold protein that enables formation of protein complexes that include components TAL1, LYL1, GATA-2 that act at varying stages of hematopoiesis, primarily early stages100-103. These results are supported by an increase in monocyte cell proportion (on average, 6% in high MeHg vs. 4% monocytes in low MeHg p=8.7×10−5) in individuals with high MeHg exposure (Fig. 2B, Supplemental Fig. S2). In addition to directing development of specific immune cell types, the transcription factors identified in our dataset have relevant roles in innate immune response that we observed in our GO enrichments. For example, the transcription factor c-Fos is a component of the master factor activator protein 1 (AP1) that activates downstream innate immunity104. STAT3 mediates cytokine signaling, partly by upregulating c-Fos104. BATF is another member of the AP-1 family that dimerizes with Jun proteins and provides negative feedback to AP1 transcription105. Last, some of the proteins with binding sites enriched in high vs. low MeHg-exposed individuals regulate chromatin remodeling and transcription. For example, SMARCA4 is a component of the SWI/SNF chromatin remodeling complex106.
Some of our results are particularly relevant to our study population. Specifically, our data point to a potential mechanistic link between MeHg exposure and type 2 diabetes (T2D). T2D is characterized by persistently high blood glucose levels due to impaired insulin secretion from pancreatic β cells, insensitivity to insulin in peripheral tissues, and increased glucose production in the liver107. Several well-established genetic risk factors for T2D are variants in the transcription factor 7-like 2 (TCF7L2) gene108-110 that drive expression of functional splice isoforms of this gene109,111. The protein product of this gene is the high mobility group box-containing transcription factor TCF7L2 which activates Wnt signaling with tissue-specific outcomes108,112-114. In pancreatic β cells, human TCF7L2 variants impair normal insulin production and secretion in response to glucose115,116; impaired insulin response could lead to T2D, which is supported by the positive correlation between TCF7L2 variant frequency and population T2D risk117. In enteroendocrine cells, TCF7L2 may influence T2D susceptibility through its transcriptional regulation of proglucagon, which is the precursor of the insulinotropic peptide hormone glucagon-like peptide 1 (GLP-1)108. Together with insulin, GLP-1 regulates blood glucose homeostasis111. Our data show that DNA binding sites for TCF7L2 are enriched in tiling regions (likely enhancers) and in gene promoters that are hypomethylated in Peruvians with high vs. low MeHg exposure (Fig 1A, Supplemental Fig. 3). Loss of DNA methylation in these regions likely reflects binding of TCF7L2 to regulatory binding sites and activation of downstream signaling. If MeHg triggers aberrant TCF7L2 signaling in pancreatic β cells or enteroendocrine cells, in addition to the leukocyte signal observed in this study, then hypomethylation of TCF7L2 enhancers in blood cells may serve as a surrogate tissue biomarker of early MeHg-related T2D risk. MeHg exposure is toxic to pancreatic β cells118. However, MeHg is related to T2D risk in some but not all epidemiological studies (reviewed in 119). Most notably, cross-sectional analyses in the population-representative National Health and Nutrition Examination Surveys (NHANES) in the United States120 and Taiwan121 show positive associations between T2D and MeHg exposure. A large prospective human study confirmed this positive association122. In contrast, several cross-sectional and prospective human studies report no association123-125, or even an inverse association126 (attributed to higher consumption of protective dietary nutrients in high exposed groups126), between MeHg and T2D risk. These equivocal results suggest population-specific risk profiles. American Indians in the U.S. have higher diabetes risk than do other ethnic groups, which suggests a higher baseline genetic risk in indigenous Peruvians that may be exacerbated by diet and environment127. Individuals carrying TCF7L2 risk alleles that develop impaired glucose tolerance show increased conversion of this pre-diabetic state to full T2D onset, as compared to glucose-intolerant non-carriers128. These data suggest that MeHg-induced TCF7L2 signaling may pose a greater disease risk in a population with a higher baseline risk for disease.
Some of our results may be more generalizable to health outcomes of MeHg in Western populations. Importantly, our results inform potential biological mechanisms that have not been resolvable in Western epigenetic datasets, possibly due to lower variance in exposure. For example, DNA binding sites for glucocorticoid receptor (GR), encoded by the NR3C1 gene, are enriched in gene promoters that are hypomethylated in individuals with high vs. low MeHg exposure (Fig. 1B, Supplemental Fig. S5) and the GO term “response to dexamethasone” (GO:0071548), which reflects GR activation by dexamethasone, is enriched in hypomethylated genes (in high vs. low MeHg-exposed individuals) (Supplemental Table S1). In leukocytes, GR signaling is important for dampening and resolving inflammatory responses129. Importantly, in the hippocampus, MeHg exerts neurotoxicity through GR signaling130. MeHg binds GR directly and attenuates GR activation by endogenous ligands, leading to decreased GR signaling that contributes to developmental neurotoxicity of MeHg130. In addition, rats exposed during development to a complex environmental contaminant mixture containing MeHg showed a dampened ability to reduce serum corticosterone levels following an experimental acute stress event131; because GR is responsible for returning corticosterone levels to homeostatic levels in healthy animals, these data provide functional evidence of GR signaling disruption in MeHg-exposed animals131. A prior study provides initial evidence that an epigenetic biomarker of MeHg GR inhibition in blood may reflect signaling in brain; specifically, in utero mercury exposure to child participants in the Seychelles Child Development Study predicted leukocyte DNA methylation of the NR3C1 gene132. Future work should focus on whether DNA methylation of NR3C1 gene in blood reflects similar epigenetic profiles at this gene in hippocampus in rodents exposed to environmentally relevant levels of MeHg. If confirmed, DNA methylation at this gene in blood may serve as an actionable and accessible surrogate epigenetic biomarker of MeHg-induced neurotoxicity.
Another important finding from our results suggests an epigenetic biomarker for a protective biological response to fish consumption. DNA binding sites for the transcription factors retinoid X receptor (RXR) and retinoic acid receptor α (RARα) are enriched in tiling regions (likely enhancers) that are hypomethylated in Peruvians with high vs. low MeHg exposure (Fig 1A, Supplemental Figs. S3 and S5). PUFA found in large, fatty fish, including docosoehexaenoic acid (DHA), activates RXR signaling133,134that triggers downstream antioxidant signaling which protects against MeHg-induced neurotoxicity135,136. RXR can form heterodimers with RARα135; RXR-RARα signaling is critical for the hippocampus-dependent learning and memory137, as well as DHA-augmented fetal neurodevelopment135, that is disrupted by early life MeHg exposure10. The enrichment for DNA binding sites for transcription factor PML (Fig 1A, Supplemental Figs. S3 and S5) in our data likely reflects RXR-RARα signaling, providing further support for activation of this pathway; this signal likely reflects binding sites within the queried database of a cancer fusion gene of PML and RARα that heterodimerizes with RXR and binds to RXR-RARα DNA binding sites138. Since human MeHg exposure in MDD occurs primarily through fish consumption, individuals with the highest MeHg exposure also have the highest fish consumption2. Birth cohort data from the high fish- and seafood-consuming populations in the Republic of Seychelles and the Faroe Islands highlight the importance of considering the health benefits of fish consumption, which may outweigh the harms of MeHg exposure in some exposure settings9,10. Future work should explore whether epigenetic biomarkers of RXR-RARα activation by fish consumption reflect RXR-RARα in hippocampus, which is the primary target of MeHg neurotoxicity. Validation of a biomarker that reports on how protective fish consumption is for a particular individual is a critical step in providing individualized health recommendations to individuals, particularly those at high risk for harm, like pregnant women and small children.
In addition to differential DNA methylation, we investigated three additional biomarkers that may inform MeHg response in our study participants: epigenetic age and two mitochondrial biomarkers, mtDNA damage and mtDNA CN. We observed that a commonly used epigenetic age algorithm (which predicts age based on tissue DNA methylation) systematically underestimated chronological age in our study (Fig. 2A). This result indicates that current algorithms, which have been trained and tested on datasets from European individuals, are not generalizable to non-European populations. For algorithms to be more generalizable tools, they should be trained and tested on more diverse datasets. In addition, we observed no relationship between either mitochondrial biomarker and MeHg exposure (Fig. 2C-D), as well as promoter hypermethylation of the RIG-1 signaling pathway (Supplemental Table S4) through which DAMPs trigger innate immune responses139. Although prior papers have not reported clear associations between mitochondrial biomarkers and MeHg exposure, it was unclear whether this lack of signal was statistical or biological140,141. The clear signal in immune signaling pathways in our data, coupled with the lack of association between mitochondrial biomarkers and MeHg, indicates that the biological role of mitochondria in the response to MeHg is more complex than previously thought. For example, mitochondrial DAMPs may serve as a signal of self-damage that triggers endogenous suppression of inflammation to promote healing142. The high variance in both mitochondrial markers in both high and low exposure groups (Fig. 2C-D), as well as hypervariable promoter DNA methylation in pathways involved in ROS production and response (Table 4), strongly implies unmeasured source(s) of variation in our population that require study before these biomarkers can be fully realized in population health settings.
It is worth discussing why the primary transcription factors in our LOLA enrichments function during cellular differentiation, even though our study profiled mature, circulating white blood cells. There are two possible explanations for this finding. The first is that mature cells may carry persistent DNA methylation signatures of past differentiation programs; this possibility is supported by past evidence of similar DNA methylation memories143. The second possibility is that these differentiation programs may be reactivated in mature cells to enable dedifferentiation and phenotypic switching between cell types by changing epigenetic programs144-146. This possibility is supported by enrichment in our dataset for the GO term “cell dedifferentiation involving in phenotypic switching” (GO:0090678) (Supplemental Tables S5-S6).
This study has several important limitations. First, our study participants are from a region where residents commonly live in the same villages or towns in which they are born2. Therefore, individuals with high adult exposures may have had high developmental exposures, as well. Our results may reflect acute epigenetic responses to MeHg or they may reflect persistent effects of developmental MeHg exposure or a combination of both effects. Second, most study participants in the high MeHg group reside in indigenous communities in the Madre de Dios region (Table 1), because the highest MeHg exposures accrue to high fish-consuming residents of these native communities2. Therefore, we are unable to separate definitively DNA methylation changes due to genetic differences between indigenous and non-indigenous study participants from environmental effects on DNA methylation due to MeHg exposure. However, the differential DNA methylation signal that we observed here largely reflect known biology in MeHg toxicity, which supports a primarily environmental effect, even in the presence of known genetic variation. Third, because this study is cross-sectional, there are several limits to results interpretability. For example, our MeHg exposure biomarker reflects only 2-3 months’ prior exposure, which may reflect transient exposure or, alternatively, relatively constant chronic exposure. Therefore, we are unable to assess whether these epigenetic changes reflect responses to short or long exposure durations. In addition, we are unable to assess the persistence, if any, of our observed epigenetic markers. These questions should be assessed in future cohorts with time-resolved exposures and epigenetic outcomes.
Here, we identify a set of candidate epigenetic biomarkers for assessing individualized risk of autoimmune response and protection against neurotoxicity due to MeHg exposure. In addition, our results may inform surrogate tissue biomarkers of early MeHg-related neurotoxicity and T2D risk. This set of candidate epigenetic biomarkers represents an important step towards personalized health recommendations for MeHg-contaminated fish consumption.
Data sharing
We have deposited raw data files and sample phenotype data, as well as tables containing differential DNA methylation and differentially variable DNA methylation on both site and region levels, in the Gene Expression Omnibus (GSE207443).
Acknowledgments
This work is supported by Hunt Oil Peru LLC (HOEP-QEHSS-140003, W.K.P.), Duke Population Research Institute P2C pilot funds (2P2CHD065563-06 SUB#60P2034949, W.K.P.), NIH grant K01-ES32044-01 (C.W) and a Duke Global Health Institute Postdoctoral Fellowship (C.W. and W.K.P.). The authors acknowledge Ernesto Ortiz and Axel Berky for their roles in data collection for the parent study, as well as study participants and local field workers.
Footnotes
Conflicts of Interest. The authors declare that they have nothing to disclose.