Human Immune Cell Epigenomic Signatures in Response to Infectious Diseases and Chemical Exposures

Variations in DNA methylation patterns in human tissues have been linked to various environmental exposures and infections. Here, we identified the DNA methylation signatures associated with multiple exposures in nine major immune cell types derived from peripheral blood mononuclear cells (PBMCs) at single-cell resolution. We performed methylome sequencing on 111,180 immune cells obtained from 112 individuals who were exposed to different viruses, bacteria, or chemicals. Our analysis revealed 790,662 differentially methylated regions (DMRs) associated with these exposures, which are mostly individual CpG sites. Additionally, we integrated methylation and ATAC-seq data from same samples and found strong correlations between the two modalities. However, the epigenomic remodeling in these two modalities are complementary. Finally, we identified the minimum set of DMRs that can predict exposures. Overall, our study provides the first comprehensive dataset of single immune cell methylation profiles, along with unique methylation biomarkers for various biological and chemical exposures.

based on fluorescence-activated cell sorting (FACS) and genome-wide methylation profiles in the CG context of each cell. This analysis revealed two sub-clusters within B cells and NK cells. We then identified differentially methylated regions (DMRs) within each of these nine cell types. For the cohorts exposed to viruses, bacteria, and chemicals, we conducted snmC-seq2 on the seven cell types using 173 PBMC samples collected from 112 donors, resulting in 111,180 PBMC methylation profiles. We examined the impact of each exposure on the methylome of the immune cell types and trained a model using these DMRs to predict different types of exposures based on the methylation status of several thousand loci. Additionally, we generated single-nucleus ATAC-seq data from patients with the same exposures and found a high correlation between these two modalities. However, we observed minimal overlap between the exposure-associated changes in methylation and chromatin accessibility, indicating the need to study epigenomic remodeling associated with exposures using both techniques.
Our study provides valuable resources and insights into the molecular mechanisms underlying the impact of different exposures on DNA methylation in immune cells. To our knowledge, this is the first large-scale, single-cell DNA methylation dataset on major immune cell types, which will be instrumental in developing diagnostic assays for detecting and containing infections at their source. Moreover, the potential gene regulatory effects of differential methylation in intergenic regions offer new insights into human pathogenesis. The unique DNA methylation signatures identified in this study contribute to our understanding of the epigenetic regulation of immune processes and offer a novel approach to diagnosing exposures to known and unknown pathogens.

Results
In this study, our main focus was on three categories of exposures: viral, bacterial, and chemical ( Figure 1A). To examine viral exposure, we evaluated the methylation patterns of individuals who were part of a prospective study on HIV-1 infection prevention. Additionally, we analyzed a cohort of volunteers who participated in a vaccine trial against the H3N2 flu virus and a group of patients who experienced SARS-CoV-2 infection with varying degrees of disease severity. For bacterial exposures, we examined the methylation profiles of patients infected with MRSA and MSSA, as well as vaccinated technicians who handled Bacillus anthracis. In terms of chemical exposure, we obtained samples from individuals with high levels of 3,5,6-trichloro-2-pyridinol (TCPY) resulting from exposure to chlorpyrifos-an organophosphate insecticide known to be neurotoxic (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. into seven immune cell types on 384-well plates, and subjected to single-nucleus methylation sequencing. Data analysis was performed using our pipeline. C. Single nucleus clustering of cells from all exposures, colored by cell surface markers (i), global mCG levels (ii), and different exposures (iii).

Methylation Signatures in Immune Cells from Healthy Donors at Single-Cell Resolution
To elucidate the cell type-specific methylation signatures for major immune cell classes at singlecell resolution, we obtained peripheral blood mononuclear cells (PBMCs) from healthy donors across different cohorts. As outlined in our methodology (Figure 1), we performed fluorescenceactivated cell sorting (FACS) to isolate seven major immune cell types from PBMCs, followed by single-cell methylation sequencing (Luo et al., 2017(Luo et al., , 2018. Methylation reads from individual immune cells covered approximately 5%-10% of the genome ( Figure S2A) and underwent filtering based on read mapping rate, total read count, and global CpG methylation (mCG) level ( Figure   S2B-D). In total, we obtained 22,341 high-quality cells from healthy donors that passed the filtering step and were used for subsequent analysis. To cluster these single cells, we partitioned the genome into 5 kb bins, generating a cell-by-bin matrix (Liu et  To identify potential regulatory elements in different immune cells, we detected hypomethylated differentially methylated regions (DMRs) for each cell type. Given the considerable variation in global mCG levels among these clusters ( Figure 2C), we performed separate DMR calling between cell types with low mCG and high mCG. In total, 495,479 DMRs were identified across the immune cell types, with most being unique to a specific cell type ( Figure 2D). Notably, subclusters of B cells and T cells shared only a small number of DMRs ( Figure S2G), while NK cells . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint exhibited more DMRs shared with Tc-Mem cells than with other T cell subtypes ( Figure 2D, S2G).
To gain insights into their potential impact on gene regulation, we associated each DMR with underlying genomic features and observed significant enrichment in intragenic and promoter regions for each immune cell type ( Figure 2E). Motif analysis using DMRs revealed the enrichment of known lineage-specific transcription factor binding sites in corresponding cell types ( Figure 2F). For instance, TCF motifs were enriched in T cells, while the EBF1 motif showed enrichment in B cells ( Figure 2F). To further validate the accuracy of our data, we plotted the methylation status at T cell marker CD3 genes, which revealed hypomethylation in T cells and hypermethylation in other cell types ( Figure 2G).

Impact of HIV-1 Infection on the Methylome
To investigate the impact of HIV-1 infection on the methylome of immune cells, we conducted a comprehensive analysis using immune cells obtained from the same individuals at different stages: "pre" (before infection), "acute" (after diagnosis), and "chronic" (after treatment) ( Figure   1). We employed snmC-seq2 (Luo et al., 2017(Luo et al., , 2018 to profile the methylome of these cells. Based on the methylation profiles, we identified 13 clusters of immune cells ( Figure S3A) and annotated them based on both cell surface markers and methylation profiles. These cell clusters were evenly distributed across the three stages of the disease ( Figure S3B). Two sub-clusters of B cells and NK cells were identified based on methylation and cell surface markers, while other cell types were assigned based on cell surface markers ( Figure 3A). Similar to the analysis performed on healthy control samples, we classified two clusters of B cells as B-Mem and B-Naive, and two NK cell clusters as NK-Active and NK-Naive, based on the global mCG level ( Figure S3C). The ratio between B-Mem and B-Naive remained consistent across the three stages of HIV-1 infection. However, the ratio between NK cell sub-clusters varied between the "acute," "chronic," and "pre" stages, with "pre" samples exhibiting fewer NK-Active cells and more NK-Naive cells ( Figure S3D) (P=0.0073, Chi-square test, comparing "acute" and "pre"; P=9.39e-06, comparing "chronic" and "pre").
Significant changes in the genome-wide mCG levels of certain cell types were observed following HIV-1 infection. For instance, memory CD8 T cells exhibited an increase in global mCG levels from "pre" to "chronic" and "acute" stages ( Figure S3E). Other cell types such as Memory B cells, Naive B cells, and Naive NK cells also demonstrated significant global mCG changes between the "pre" and "acute" stages ( Figure S3E). Subsequently, we identified differentially methylated regions (DMRs) between the "pre," "acute," and "chronic" stages for these cell types. Interestingly, highly distinctive methylation patterns were observed between the three stages across all samples ( Figure 3B), indicating that these DMRs are highly consistent with disease progression.
In total, we identified 104,426 DMRs between the three stages, with 103,460 (99.07%) of them being single CpG sites (Table S2). Single CpG DMRs contained fewer CG contexts within 250 bp . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made and single CpG cell type DMRs (Wilcoxon Rank Sum Test, p-value=0.0) ( Figure S3F). The largest number of hypomethylated DMRs (hypo-DMRs) was observed in memory CD8 T cells in the "acute" stage, while hypermethylated DMRs (hyper-DMRs) were predominant in the "pre" stage ( Figure S3G), consistent with the observed global mCG increase between the "pre" and "acute" and "chronic" stages of infection. Interestingly, the memory state of different cell types, including active NK cells, exhibited more DMRs compared to their naive forms ( Figure S3G).
To determine which transcription factors (TFs) may bind to these hypo-DMRs, we identified enriched TF motifs using HOMER (Heinz et al., 2010). ETS family motifs were found to be enriched in most cell types across the three stages of HIV-1 infection, whether in hypo-DMRs or hyper-DMRs ( Figure 3C), suggesting a significant rearrangement of ETS family transcription factor binding following HIV-1 infection. For example, ETS family motifs were enriched in the hyper-DMRs of the "acute" and "chronic" stages in monocytes and naive CD8 T cells, while they were also enriched in the hypo-DMRs of the "pre" stage in these cell types ( Figure 3C). The enrichment of PU.1 and Fli1 motifs ( Figure 3C), which are transcription factors (Minderjahn et al., 2020) involved in regulating immune cell differentiation, suggests that immune cells may be undergoing differentiation into other lineages. RUNX motifs were enriched in hyper-DMRs of the "pre" stage in Naive NK cells, memory and naive CD8 T cells, while also being enriched in hypo-DMRs of the "acute" stage in these cell types. Together with other transcription factors, RUNX1/2 and ETS1 can alter chromatin state (Korinfskaya et al., 2021), potentially influencing T cell lineage determination. RUNX1 and RUNX3 have also been reported to drive the adaptive response of NK cells during MCMV infection (Rapp et al., 2017). Interestingly, motifs of transcription factors associated with circadian function were enriched in hypo-DMRs from the "pre" stage ( Figure 3C). Furthermore, we identified differentially methylated genes (DMGs) among the nine cell types between different HIV-1 groups. In pairwise comparisons of the "pre," "acute," and "chronic" . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint stages, we identified 288 DMGs (Table S3), with the majority found between the "pre" and "acute" stages in memory CD8 T cells ( Figure S3H). Among these DMGs, 112 genes exhibited hypomethylation in the "pre" stage, while 69 genes showed hypomethylation in the "acute" stage.
Both groups of DMGs were enriched in genes with immune-related functions ( Figure S3I).
To validate the reliability of these methylation signatures and investigate potential regulatory mechanisms, we integrated DNA methylation data with single-cell ATAC-seq data. We mapped the cells from single-cell ATAC-seq to methylation clusters and transferred the cluster labels using Canonical Correlation Analysis (CCA) (Stuart et al., 2019). To validate our integration approach, we calculated the genome-wide correlation between these two modalities ( Figure 3D).
Specifically, we divided the genome into 5 kb bins and calculated the hypomethylation score using single-cell methylation data and the number of Tn5 insertions using single-cell ATAC data . We then calculated the correlation between these two measurements for each bin. We observed a strong correlation between the two modalities (methylation and open chromatin) across all cell types, with the highest correlation observed in monocytes ( Figure 3D). Methylation in NK cells also showed a high correlation with chromatin accessibility in T cells ( Figure 3D). Interestingly, we detected a loss of methylation and an increase in accessibility after HIV-1 infection in memory CD8 T cells at the intron of DGKH ( Figure 3E), a gene previously reported to exhibit differential methylation between elite controllers (individuals able to maintain undetectable viral loads for at least 12 months without antiretroviral therapy) and individuals receiving antiretroviral therapy (Frias et al., 2021). Although this region experienced a loss of chromatin accessibility, the methylation level remained unchanged between the "acute" and "chronic" stages ( Figure 3E). When comparing all DMRs with differentially accessible regions (DARs) from the two modalities, we found a minimal overlap of only about 2%, highlighting the necessity of using both technologies to profile the epigenomic remodeling associated with HIV-1 infection.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Impact of influenza infection on the methylome
To investigate how influenza infection can affect the methylome of immune cell types, we conducted experiments examining PBMCs collected from influenza virus-infected volunteers before and 28 days after infection, as part of an experimental influenza vaccine trial ( Figure 1).
Using single-cell methylation sequencing on these cells, we identified 11 cell clusters ( Figure S4A) and further annotated them into 9 cell types based on cell surface markers and methylation profiles ( Figure 3F). The distribution of cells from all patients was even among the clusters, indicating the absence of significant batch effects ( Figure S4B). Similar to the control and HIV cohorts, we observed two sub-clusters of B cells and NK cells, but the ratio between these subclusters varied significantly compared to the other cohorts. The ratio between B-Memory and B-Naive cells was 0.54:1, while the ratio between NK-Active and NK-Naive was 0.3:1 ( Figure S4D).
Interestingly, NK-Active cells clustered together with memory CD8 T cells, which also exhibited . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint lower global mCG levels ( Figure 3F), suggesting similarity between NK and memory CD8 T cell methylomes.
The global mCG levels of the nine cell types did not show significant differences between the 'pre' and 'post' groups, except for some moderate changes in NK-Naive cells ( Figure S4E). Furthermore, no differentially methylated genes (DMGs) were identified between the "pre" and "post" groups, indicating that there were no drastic changes in gene body methylation 28 days after infection. To further characterize the genomic regions whose methylation is associated with influenza infection, we identified differentially methylated regions (DMRs) between the 'pre' and 'post' stages of influenza infection in all cell types. We found 71,995 DMRs associated with influenza infection in all cell types (Table S4), with the vast majority (99.49%) being single CpG sites in CD4 memory T cells ( Figure S4F). These DMRs were equally distributed between hypoand hyper-DMRs ( Figure S4F) and exhibited distinct patterns between the two groups (pre vs. post) ( Figure 3G).
To determine the potential transcription factors that could bind to these DMRs, we identified  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made To verify the quality of our single-cell methylation data and further validate the identified DMRs, we integrated the methylation data with the single-cell ATAC-seq data and transferred the cell type labels to the single-cell ATAC-seq cells. We calculated the genome-wide correlation between the hypomethylation score and the number of Tn5 insertions using 5 kb bins with the single-cell methylation and ATAC-seq data. The two modalities showed high correlation in B cells, monocytes, and NK cells, while the correlation was lower in T cells ( Figure S4G).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023.  Ctrl_NK-active Ctrl_NK-naive . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; Red indicates stronger correlation, while blue indicates weaker correlation.

Impact of SARS-CoV-2 Infection
The SARS-CoV-2 virus, the pathogen causing COVID-19, continues to be a major global public health concern. However, little is known about the impact of SARS-CoV-2 infection on the dynamics of DNA methylation in immune cells. To examine this, we profiled the methylome of immune cells as previously described. We collected samples from patients with severe and nonsevere disease and identified 21 cell clusters ( Figure S5A), which were annotated to 10 cell types based on both cell surface markers and methylation profiles. In addition to two sub-clusters of B cells (B-Mem, B-Naive) and NK cells (NK-Active, NK-Naive), we also identified two populations of monocytes ( Figure 4A), present in both severe and non-severe disease patients ( Figure S5B).
Compared to the subclusters of B cells and NK cells, the genome-wide global mCG levels of these two clusters of monocytes are more similar ( Figure S5C). The ratios of cells in these two clusters of monocytes are comparable between severe and non-severe samples, while control monocytes are significantly enriched in one of the clusters ( Figure 4B) (P=2.05e-237, Chi-square test). We refer to the cluster of monocytes that are rare in control samples as 'Monocyte2', and the monocyte cluster that is abundant in control samples as 'Monocyte1', a cluster also identified in other exposures. In addition to monocytes, the cell ratios between the two clusters of NK cells . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint and B cells are also significantly different between non-severe, severe, and control samples ( Figure S5D, S5E) (P=2.76e-20, Chi-square test, comparison between B-Mem and B-Naive; P=3.51e-05, Chi-square test, comparison between NK-Active and NK-Naive).
To compare the differences between the two clusters of monocytes, we identified the differentially methylated genes (DMGs) and differentially methylated regions (DMRs) between them. We identified 321 DMGs between 'Monocyte1' and 'Monocyte2', of which 262 and 59 genes are hypomethylated in 'Monocyte2' and 'Monocyte1', respectively ( Figure S5D, Table S5). Methylation levels at these genes show distinctive patterns between the two clusters of monocytes ( Figure   S5D). We also identified DMRs between the two clusters of monocytes, resulting in 118,186 DMRs, about 64,195 of which were single CpG sites (Table S6) To compare the methylomes of these immune cell types, we first examined global mCG between control, non-severe, and severe COVID-19 samples. We found that the global mCG was significantly different in the two clusters of monocytes, naive NK cells, naive CD4, and CD8 T cells ( Figure S5I). To further determine the methylation signatures associated with non-severe or severe COVID-19, we identified the DMRs in each of the ten immune cell type clusters. In total, we identified 203,809 DMRs, of which 84.06% (171,319) are single CpG sites (Table S7).  Figure 4D). Further correlation analysis of DMRs between these samples showed a higher correlation between non-severe and control samples than between non-severe and severe COVID-19 patients ( Figure 4E). To further verify that the DMRs are associated with . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint COVID-19, we shuffled the labels of the samples and performed DMR analysis between the randomly grouped samples, which showed that the DMRs are less distinctive compared to the DMRs between COVID-19 samples and controls ( Figure 4K).
To further identify the transcription factor activities associated with non-severe or severe COVID-19, we examined the enrichment of DNA binding motifs at DMRs in the ten immune cell clusters.
We found that control hypo-DMRs in Monocyte1, which are also mostly hypermethylated in severe and non-severe COVID-19 samples, are significantly enriched in IRF and ETS motifs ( Figure 4F). As described above, we performed integration between our methylation data and single-cell ATAC-seq data. In order to validate our integration approach and data quality, we calculated the genome-wide correlation between these two modalities ( Figure 4G). We observed that the two modalities (methylation and open chromatin) were strongly correlated across all cell types, with the strongest correlation observed in monocytes and the lowest correlation in different types of T cells ( Figure 4G). Surprisingly, only ~5% of the DMRs in each cell type overlapped with a peak in the corresponding cell type in ATAC-seq data, indicating that most of the methylation changes associated with COVID-19 are in inaccessible chromatin.

Impact of MRSA and MSSA infection on the methylome
Methicillin-resistant Staphylococcus aureus (MRSA) and methicillin-susceptible Staphylococcus aureus (MSSA) are both caused by the same strain of bacteria which differ in their susceptibility to antibiotics. To better understand the impact of this infection on the immune cell methylome, we performed snmC-seq2 on the MRSA-MSSA cohort described above. Based on the global methylation profile of the cells, 25 clusters were identified, which were annotated to 9 cell types . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint based on both cell surface markers and methylation profiles ( Figure 5A, S6A). These included two sub-clusters of B cells and NK cells, which were labeled as, B-Mem, B-Naive, NK-Active, and NK-Naive, respectively ( Figure S6C). These clusters were distributed evenly among MRSA, MSSA patients, and control samples ( Figure S6B). The number of cells in the two sub-clusters of NK cells was comparable in the three cohorts ( Figure S6D), while we captured more memory B cells in the MSSA cohort compared to the other two ( Figure S6D) (P=9.36e-10, Chi-square test).
We observed large changes in global mCG levels between MRSA, MSSA, and controls in all nine cell types ( Figure S6E). To further characterize the methylation signatures correlated with MRSA or MSSA infection, we identified DMRs between MRSA, MSSA, and control samples. In total, we identified 134,868 DMRs between the three groups, the majority of which were from memory CD8 T cells ( Figure S6F, Table S8), which also showed a significant change in global mCG ( Figure   S6E). Similar to other exposures, the majority (93.46%) of DMRs were single CpG sites.
Visualization of the methylation levels at these DMRs demonstrated high reproducibility, showing a clear distinction between patient and control samples ( Figure 5B). The correlation of the methylation level of samples from the three cohorts at these DMRs confirmed that the samples from the three groups are very distinctive from each other at the DMRs ( Figure 5C). To further validate that the MRSA or MSSA-associated DMRs we identified are not caused by heterogeneity between individuals, we shuffled the labels of the samples as described above and identified the DMRs between these random groups. The result showed that the DMRs between random groups are much less distinctive than MRSA or MSSA-associated DMRs ( Figure S6G).
To identify transcription factors that can potentially bind at these DMRs, we investigated the     Figure 5E, S7A). Immune cells from technicians who handled anthrax, both frequently and infrequently, clustered together across all cell types ( Figure S7B). Based on global mCG levels ( Figure S7C), we identified two clusters of B cells (B-Mem and B-Naive) and two clusters of NK cells (NK-Active and NK-Naive). The ratio of NK-Active cells in the 'frequent' or 'infrequent' groups was significantly lower than in the control samples (P=1.74e-64, Chi-square test), while the ratio of memory and naive B cells was comparable across the three cohorts ( Figure   S7D).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint Significant differences in global mCG levels were observed among these immune cell clusters across the three groups ( Figure S7E). To explore the epigenetic characteristics of immune cells in individuals handling anthrax frequently or infrequently in BSL3 facilities, we identified differentially methylated regions (DMRs) between these groups compared to a control population.
In total, we found 76,757 DMRs among the three groups, with 72,837 (95.12%) consisting of single CpG sites (Table S9). Control samples displayed the highest number of hypo-or hyper-DMRs in all cell types ( Figure S7F), suggesting that the observed differences were mainly between the control and BA cohorts. Visualization of the methylation levels at these DMRs clearly distinguished the two groups and the control cohort ( Figure 5B). We conducted a permutation test similar to the one described earlier and found that DMRs between random groups were less distinctive ( Figure S7G). The correlation of methylation levels at these DMRs in donors from the three groups also exhibited higher correlations within groups ( Figure S7H).
To identify potential transcription factors that could bind to these DMRs, we performed an As mentioned earlier, we integrated our methylation data with single-cell ATAC-seq data. To validate our integration approach and data quality, we calculated the genome-wide correlation between these two modalities. Similar correlations to other exposures were observed between the two modalities across all cell types ( Figure S7I). Interestingly, we observed simultaneous loss of methylation and gain of accessibility, as well as gain of methylation and loss of accessibility, in the 'frequent' group at a locus containing a long non-coding RNA (lncRNA) and a pseudogene ( Figure 5H). Only approximately 5.5% of the DMRs in each cell type overlapped with a peak in the corresponding cell type in the ATAC-seq data, indicating that most of the methylation changes associated with working frequency in high-risk facilities occur in closed chromatin regions.
Impact of organophosphate exposure on the methylome . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Organophosphates (OP) are common insecticides that can harm humans and animals if exposed for too long or at high levels. However, little is known about how OP can affect the methylome of immune cells when exposed to different levels. We performed snmC-seq2 on the peripheral blood mononuclear cells (PBMCs) of individuals exposed to OP at varying levels (Low, Med, and High).
These immune cells clustered into 22 clusters and were further annotated into 9 cell types based on both cell surface markers and methylation profiles ( Figure 6A, S8A). Cells from patients with different levels of OP exposure were evenly distributed across the cell types ( Figure S8B). Similar to other exposures, we identified B-Mem, B-Naive, NK-Active, and NK-Naive based on their global methylation levels ( Figure S8C). The proportions of the two clusters of B cells were comparable across different levels of exposure, while the exposure groups had significantly more NK-Active cells compared to control samples ( Figure S8D) (P=1.44e-56, Chi-square test).
We explored the association of methylation changes with different levels of OP exposure and observed significant changes in the global mCG levels in most of the cell types ( Figure S8E). We further identified differentially methylated regions (DMRs) between the different exposure groups and control samples in all nine cell types, resulting in 198,807 DMRs among the four groups (Low, Med, High, and Control) (Table S10). Memory CD8 T cells had the highest number of DMRs compared to other cell types ( Figure S8F). Visualization of the methylation levels at these DMRs showed distinct methylation patterns between these groups ( Figure 6B), particularly between OP exposure and control. We performed a similar permutation test as described above and found that the DMRs between random groups were much less distinctive ( Figure S8G). The correlation analysis of these samples at the DMRs revealed weak distinctions between the 'Low', 'Med', and 'High' group samples, while the distinction between control samples was evident ( Figure 6C).
To further investigate which transcription factors might be affected in these nine cell types, we Similarly, we performed integration between our methylation data and single-cell ATAC-seq data.
To validate our integration approach and data quality, we calculated the genome-wide correlation between these two modalities. We observed similar correlations to other exposures between the two modalities in all cell types ( Figure S8H). Consistent with other exposures, the majority of DMRs associated with OP were located in inaccessible chromatin regions.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Comparison of methylation signatures from all exposures
To compare the methylation signatures across different exposures, we merged the methylation files from the same exposure and identified the differentially methylated regions (DMRs) among all the groups using methylpy. This strategy does not rely on control samples and allows for a general comparison of the methylation levels across all the samples. For instance, in Naive B cells, we identified a total of 188,563 hypo-DMRs across all exposures, most of which were unique to a specific exposure ( Figure 7A). Interestingly, MRSA and MSSA shared more DMRs than any other two exposures, which aligns with the similarities observed between these two exposures ( Figure 7A). We also observed that different levels of OP exposures shared more hypo-DMRs

Predicting exposure types with single-cell methylomes
Considering that 790,662 differentially methylated regions (DMRs) were associated with various exposures in the nine immune cell types, we hypothesized that some of these DMRs could accurately predict different exposures. We conducted multiple rounds of training to identify the . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint optimal number of DMRs for accurate prediction. As an example, we used DMRs associated with HIV-1 infection to predict its three stages (pre, acute, chronic). We divided the individual donors into training and test cohorts and created 150 and 50 pseudo-individuals by combining single cells through permutation. Using logistic regression, we trained a model with the 150 pseudoindividuals in the training cohort for each cell type and tested the model with pseudo-individuals in the test cohort. Remarkably, the model was able to perfectly predict the different stages of HIV-1 infection. Next, we aimed to minimize the number of DMRs required for accurate prediction by selecting the top weighted DMRs in the primary prediction at the 10% quantiles. Surprisingly, we achieved perfect prediction of different HIV-1 infection stages using only 30% of the DMRs, and the pre-stage could be predicted with just 10% of the DMRs (Figure 8A-C).
We then expanded the prediction analysis to other exposures, and the results demonstrated that we could predict most exposures with an area under the curve (AUC) greater than 0.95 in receiver operating characteristic (ROC) curves using only a few thousand DMRs ( Figure 8D). The minimum discriminative DMRs are provided in Table S11, which can serve as biomarkers for diagnosing different exposures using readily accessible cells.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Discussion
Here, we conducted an analysis of single-cell methylomes in major immune cell types from PBMCs obtained from both healthy donors and patients exposed to bacteria, viruses, and chemicals. In total, we analyzed the methylomes of 104,876 high-quality cells representing seven major immune cell types. This dataset serves as a valuable resource for studying the epigenomes and epigenetic features of immune cells in response to various exposures. Notably, this study represents the first human immune cell methylome atlas at single-cell resolution. Using this comprehensive resource, we identified differentially methylated regions (DMRs) associated with each exposure and developed unsupervised models to predict exposure based on these DMRs. While our study provides valuable resources and insights into infectious diseases and chemical exposure, there are certain limitations to consider. In order to identify methylation features of major immune cell types in an unbiased manner, we focused on sorting seven immune cell types from PBMCs. However, this approach limited our investigation to only these cell types, and we were unable to explore other cell types of interest. Additionally, we analyzed the methylomes of over 100,000 immune cells processed in different batches, which introduced potential batch effects. Although we placed different cell types on the same plate to control for this, some batchrelated variability may still exist. Despite applying stringent filtering to identify exposureassociated DMRs, it is possible that we captured individual differences in methylation rather than exposure-related differences. Furthermore, the cohorts were enrolled at different times, and PBMCs were collected and processed by different groups, introducing technical pre-analytic variability specific to each cohort.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint

Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Joseph R. Ecker (ecker@salk.edu).

Materials availability
This study did not generate new unique reagents.

Materials availability
This study did not generate new unique reagents.

Data and code availability
De-identified molecular data and sample metadata is available through controlled access via dbGaP PHS Accession phs003204.v1.p1.

HIV Cohort:
The iPrEx cohort (Iniciativa Profilaxis Pre-Exposición, or the PreP pre-exposure prophylaxis initiative) was a phase III clinical trial aimed at determining the efficacy of pre-exposure prophylaxis (PrEP), an antiretroviral treatment, for preventing HIV infection (Grant et al., 2010).
The study enrolled high-risk individuals who were continuously monitored for the onset of HIV-1 infection. During the study, some participants became HIV-1 positive. We obtained samples from . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint nine donors taken approximately ~250 days (median of 258 days) before testing positive for HIV-1 (pre), on the day of the HIV-1 positive blood draw (acute), and approximately ~200 days after treatment (chronic). We observed significant variation in the initial virus load and the load after 200 days during the chronic stage when standard HIV-1 treatment was provided. For this study, we utilized peripheral blood mononuclear cell (PBMC) samples from nine subjects at three time points (see Supplementary Table S1).

Influenza Cohort:
The BARDA-Vaccitech FLU010 Study with A/BELGIUM/4217/2015 (H3N2) challenge aimed to assess the protective capabilities of a vaccine candidate, VTP-100, as a standalone influenza vaccine. We obtained pre-and post-challenge samples from 18 donors who received the placebo vaccine (see Supplementary Table S1), with post-challenge sampling conducted 28 days after the challenge dose.

COVID-19 Cohort:
Patients who were hospitalized following SARS-CoV-2 infection, with similar COVID severities based on the WHO ordinal severity score between 4 and 7, were enrolled in this cohort. Some patients showed improvement with moderate disease (non-severe group), while others experienced clinical deterioration within a 14-day period (increased severity score) and required invasive mechanical ventilation or succumbed to the disease (severe group). We obtained samples from nine patients in each group (see Supplementary Table S1).

MRSA-MSSA Cohort:
We collected samples from 19 patients who tested positive for either Methicillin-resistant  Supplementary Table S1). The time intervals between the second and third time points ranged from 3 to 35 days since the first positive test.

Bacillus anthracis Cohort:
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint We obtained PBMC samples from 27 individuals who were vaccinated against anthrax and worked with Bacillus anthracis in a controlled facility while wearing personal protective equipment (see Supplementary Table S1). These individuals were trained technicians working in Biosafety precursor molecules to chemical agents, biological warfare agents, explosives, precursors to explosives, agricultural chemicals, or radioactive materials in a controlled environment.

Organophosphates Cohort:
Organophosphates (OP) are a class of pesticides known to have a severe impact on the dopaminergic and serotonergic systems. One commonly used form of this pesticide, Chlorpyrifos, has extensive use in the United States. Exposure to chlorpyrifos, the most widely used OP in the US, was estimated based on levels of its urinary metabolite 3,5,6-trichloro-2-pyridinol (TCPY) and classified as high, moderate, or low (with undetectable levels of TCPy). In this study, we analyzed DNA methylation patterns in 18 high-exposure, six moderate-exposure, and three low-exposure samples (see Supplementary Table S1).

Control Cohorts:
We obtained PBMC samples from 12 healthy donors through a commercial vendor (Dx Biosamples LLC). These donors represented a range of ethnic diversity, age groups, and sexes (see Supplementary Table S1). Together with the pre-HIV samples and the pre-influenza challenge samples mentioned above, we obtained a total of 34 control PBMC samples.
De-identified human PBMC samples were provided by the collaborating teams (listed in the author contributions section). Informed consent was obtained from the donors by the collaborating teams and their respective institutional IRBs.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023.

Library preparation and Illumina sequencing
For library preparation, we followed the previously described methods for bisulfite conversion and

Single-cell methylation data processing (alignment, QC)
For alignment and quality control (QC) of the single-cell methylation data, we employed the same mapping strategy used in our previous single-cell methylation projects in our lab .
Specifically, we utilized our in-house mapping pipeline, YAP (https://hq-1.gitbook.io/mc/), for all the mapping-related analysis. The pipeline includes the following main steps: (1) demultiplexing FASTQ files into single cells, (2) reads-level QC, (3) mapping, (4) BAM file processing and QC, . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 30, 2023. ; https://doi.org/10.1101/2023.06.29.546792 doi: bioRxiv preprint and (5) generation of the final molecular profile. Detailed descriptions of these steps for snmC-seq2 can be found in the work by Luo et al. (2018). All the reads were mapped to the human hg38 genome, and we calculated the methylcytosine counts and total cytosine counts for two sets of genomic regions in each cell after mapping.
We filtered out low-quality cells based on three metrics generated during mapping: mapping rate > 50%, final mC reads > 500,000, and global mCG > 0.5. Chromosomes X, Y, and M were excluded from the analysis, and the remaining genome was divided into 5 kb bins to create a cellby-bin matrix. In this matrix, each bin was assigned a hypomethylation score (hypo-score) calculated from the p-values of a binomial test, which indicates the probability of hypomethylation of that bin. The matrix was further binarized for downstream analysis using a hypo-score cutoff of >= 0.95. Bins covered by fewer than 5 cells and those with an absolute z-score greater than 2 were filtered out. Additionally, we excluded bins that overlapped with the ENCODE blacklist using To annotate the cells, we used both the single-cell methylation clustering results and cell surface markers. In almost every cohort, we observed two clusters of B cells and NK cells, which were distinguished by their global mCG levels. Therefore, we assigned these clusters as naive and memory B cells, naive and active NK cells. We also merged clusters with cell surface markers indicating memory CD4 and CD8 T cells, even if they exhibited multiple clusters in the T-SNE embedding.

Differentially methylated regions (DMR) identification
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Within each subgroup, rows and columns were clustered using ward linkage and the Jaccard metric.

Differentially methylated loci (DML) identification
We used DSS (Park and Wu 2016) to call DMLs for each exposure. The samples were grouped based on exposure types, and pairwise comparisons were made to identify DMLs. To optimize memory and time usage, DSS was run on each chromosome, and the results were merged to calculate the false discovery rate (FDR) using the statsmodels package. We compared the DMRs from methylpy with the DMLs from DSS by examining the heatmaps of mCG levels at these sites in all samples. After evaluation, we decided to use the DMRs from methylpy for downstream analysis.

Validation of DMRs by shuffling the samples
To validate that the identified DMRs for each exposure were not confounded by batch effects or other factors, we shuffled the group labels of the samples within each exposure and identified DMRs among the randomly assigned groups. We quantified the methylation levels of all samples at the DMRs from the random groups and performed t-tests on the methylation levels between each pair of groups.

Motif enrichment
We obtained the hypo-and hyper-DMRs reported by methylpy from the columns 'hypermethylated_samples' and 'hypomethylated_samples'. HOMER was used to identify enriched motifs within these different sets of DMRs for each exposure. The results from HOMER's 'knownResults.txt' output files were used for downstream analysis. Only motif enrichments with a p-value < 0.01 were retained. The motif enrichment results were visualized using scatterplots in seaborn.

Functional enrichment of DMRs
To perform functional enrichment analysis of DMLs, we utilized GREAT (http://great.stanford.edu/public/html/index.php).

Integration with single-cell ATAC
We integrated our single-cell methylation data with single-cell ATAC-seq data from HIV-1, COVID-19, BA, and OP cohorts, as well as single-cell multiome data from influenza. This integration was performed using Canonical Correlation Analysis (CCA), where we transferred our methylation cell annotations to the cells from the other modality. To generate the peaks and bigwig files for each cell type, we utilized SnapATAC2 (Zhang et al., 2021;Fang et al., 2021).

Correlation of single-cell methylation and single-cell ATAC
To assess the correlation between single-cell methylation and single-cell ATAC, we calculated the correlation between the hypo-score of each 5 kb bin and Tn5 insertions in each bin. This correlation was performed between cell types and within matched cell types.

Prediction of exposures using sub-clusters, genes, and DMRs
To predict different exposures, we utilized three pieces of information: sub-clusters, genes, and differentially methylated loci (DMLs).For sub-cluster prediction, we performed clustering within each sorted cell type from all exposures and identified sub-clusters based on Leiden clustering.
We divided the samples into training and test sets at a ratio of 6:4. Then, we generated 1000 pseudo-individuals by permuting the training and testing samples (700 as training and 300 as the test set). From each pseudo-individual, we randomly sampled 500 cells and calculated their distribution across the sub-clusters. We used this distribution to predict different exposures.
For gene-based prediction, we utilized the methylation level of genes. For each cell type, we generated 400 pseudo-individuals, with 300 as training samples and 100 as the test set. The samples were divided into training and testing sets at a ratio of 7:3. We calculated the gene body methylation level for all genes and used this information to predict different exposures.Similarly, for DML-based prediction, we utilized DMLs to predict different exposures. The process was similar to the gene-based prediction, where we quantified the DMLs by methylation level and used this information to predict different exposures within each cell type.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made               . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made