Epigenetic and Transcriptional Dysregulation in CD4+ T cells of Patients with Atopic Dermatitis

Atopic dermatitis (AD) is one of the most common skin disorders in children. Disease etiology involves genetic and environmental factors, with the 29 independent AD risk loci enriched for risk allele-dependent gene expression in the skin and CD4+ T cell compartments. We investigated epigenetic mechanisms that may account for genetic susceptibility in CD4+ T cells. To understand gene regulatory activity differences in peripheral blood T cells in AD, we measured chromatin accessibility (ATAC-seq), NFKB1 binding (ChIP-seq), and gene expression (RNA-seq) in stimulated CD4+ T cells from subjects with active moderate-to-severe AD and age-matched, non-allergic controls. Open chromatin regions in stimulated CD4+ T cells were highly enriched for AD genetic risk variants, with almost half of AD risk loci overlapping with AD-dependent ATAC-seq peaks. AD-specific open chromatin regions were strongly enriched for NFκB DNA binding motifs. ChIP-seq identified hundreds of NFKB1-occupied genomic loci that were AD-specific or Control-specific. As expected, the AD-specific ChIP-seq peaks were strongly enriched for NFκB DNA binding motifs. Surprisingly, Control-specific NKFB1 ChIP-seq peaks were not enriched for NFKB1 motifs, instead containing motifs for other classes of human TFs, suggesting a mechanism involving altered indirect NFKB1 binding. Using DNA sequencing data, we identified 63 instances of genotype-dependent chromatin accessibility at 36 AD risk variants (30% of AD risk loci) that could lead to genotype-dependent expression at these loci. We propose that CD4+ T cells respond to stimulation in an AD-specific manner, resulting in disease and genotype-dependent chromatin accessibility involving NFKB binding. AUTHOR SUMMARY Stimulated CD4+ T cells from patients with atopic dermatitis have disease-dependent regulation of how gene expression is regulated. This regulation is disease dependent and the way the DNA is accessible and the transcription factor NFKB1 binds is enriched for genetic risk variants. Clinically, the CD4+ T cells in the peripheral blood of patients with AD respond to stimulation in a disease and genotype-dependent manner.


ABBREVIATIONS: 75
Allelic Reproducibility Score (ARS) 76 Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq) 77 Atopic dermatitis (AD) 78 in the skin as well as the disruption in the skin barrier [18,19]. An important role for NFkB 123 in CD4+ T cells in AD was established in a mouse model of AD in which mice injected 124 with CD4+ T cells with inhibited NFkB signaling showed marked improvement in AD-like 125 skin lesions compared to those injected with CD4+ T cells with a control vector [20]. 126 Herein we hypothesized that AD loci may be epigenetically regulated. In order to 127 test this hypothesis, we first focused on measuring the chromatin accessibility, NFKB1 128 binding, and gene expression in stimulated CD4+ T cells from subjects with active 129 moderate-to-severe AD, along with age and ancestry-matched healthy, non-allergic 130 controls. We identified 34,216 regions of chromatin across the genome that are 131 accessible in an AD-dependent manner. These regions are highly enriched for DNA 132 sequence motifs recognized by NFkB transcription factors. We therefore performed ChIP-133 seq for NFKB1 in the AD and control individuals, identifying 20,322 genomic loci with AD-134 dependent NFKB1 occupancy. Whole genome sequencing of these individuals and 135 application of our MARIO method to identify allelic activity [21] revealed 63 instances of 136 genotype-dependent chromatin accessibility at 36 AD risk variants that might lead to the 137 genotype-dependent gene expression at these loci. Collectively, our finding demonstrate 138 that the pathoetiology of AD involves epigenetic changes in CD4+ T cells, especially via 139 NFkB-mediated gene expression regulated mechanism. 140

RESULTS: 141
We created a set of 3,143 AD-associated genetic risk variants at 29 independent risk loci 142 (Supplemental Table 1  CD4+ T cells, along with skin (sun exposed and sun unexposed) ( Table 1). This analysis 147 indicates that alteration of gene regulatory mechanisms in CD4+ T cells is likely an 148 important factor underlying AD-associated genetic risk. 149 We recruited six moderate-to-severe AD patients (average EASI score of 30) and 150 six age and ancestry-matched controls without known deleterious mutations in the 151 Fillagrin gene. Demographics are indicated in Table 2. Adults with persistent AD had 152 childhood onset of the disease. The mean total IgE among AD subjects (180.8 kU/L) was 153 higher than among controls (61.7 kU/L). Peripheral blood was obtained from each subject 154 and CD4+ T cells were isolated and stimulated for 45 hours with anti-CD3/CD28 beads 155 (Figure 1). 156

Global mapping of the chromatin accessibility landscape in AD CD4+ T cells 158
We performed assay for Transposase-Accessible Chromatin followed by 159 sequencing (ATAC-seq) to identify genome-wide chromatin accessibility. The data 160 obtained were of high quality, with an average of almost 70,000 peaks per dataset, an 161 average Fraction of Reads Inside of Peaks (FRiP) score of 0.32, and an average 162 transcription start site (TSS) enrichment score of 20.5 (Supplemental Table 2). Pairwise 163 comparisons of each dataset identified strong agreement between subjects within cases 164 and controls (Supplemental Figure 1 A and B). 165 We assessed the the overlap of chromatin accessibility data with AD genetic risk 166 variants using the RELI method [21]. Seven of the twelve ATAC-seq datasets were 167 significantly enriched for AD risk loci with 7-14 overlapped risk loci for each subject (RELI 168 pcorrected: 0.01-1.6x10 -3 ) (Supplemental Table 3). 169 In a pairwise assessments performed using MAnorm [24], most ATAC-seq peaks 170 were shared between AD patients and demographically matched controls (75.0-88.4%). 171 The remaining peaks were either stronger in AD (AD-specific) or in Control (Control-172 specific) (representative analysis Figure 2 A, full results in Supplemental Figure 2). We 173 identified 34,216 regions of chromatin across the genome that were accessible in an AD-174 dependent manner, yielding 409 AD-specific and 398 Control-specific peaks that are 175 present in three or more pairs (Supplemental Figure 3). We defined these ATAC-seq 176 peaks that were AD-specific or control-specific in three or more subject pairs as 177 "consistently AD-specific" and "consistently control-specific" peaks, respectively. 178 Consistently AD-specific ATAC-seq peaks overlapped AD-associated genetic risk 179 variants at 13 of the 29 AD risk loci (2.0-fold enrichment, pcorrected=0.015) (Supplemental 180 Table 3). These results indicate that chromatin is accessible in a disease-specific manner 181 in CD4+ T cells at almost half of the AD risk loci. 182 To identify potential transcription factors (TFs) whose binding might be affected by 183 differential chromatin accessibility, we performed TF binding site motif enrichment 184 analysis on AD-and control-specific ATAC-seq peaks. These analyses revealed that 185 NFkB DNA binding motifs were more strongly enriched in the AD-specific vs control-  Table 4). Motif enrichment analysis in consistently AD-specific or consistently control-189 specific peaks confirmed that NFkB binding motifs were highly enriched in an AD-specific 190 manner, with ~15% of consistently AD-specific peaks containing predicted NFKB binding 191 sites (P<10 -13 ), and only ~3% of consistently control-specific peaks (P=1) (Figure 2 C-D,  192 and Supplemental Table 5). These data highlight the potential for more robust direct between consistently AD-specific and consistently control-specific ATAC-seq peaks. 210 "Consistently specific" peaks were defined as those peaks that were AD-or control-211 specific in at least three cases or controls, respectively. Results are shown for 212 representative Cis-BP NFKB motif M05887_2.00. Full results are provided in 213 Supplemental Table 5. 214 215 216

NFKB1 binds in an AD specific manner at hundreds of genomic loci in CD4+ T cells 217
We performed NFKB1 (p50) chromatin immunoprecipitation (ChIP-seq) 218 experiments to measure NFκB binding to the genome of stimulated CD4+ T cells, 219 obtaining an average of ~11,000 peaks per subject, with an average FRiP score of 0.012 220 (Supplemental Table 2). All ChIP-seq peak datasets had highly significant overlap with 221 a previously published CD4+ T cell NFKB1 ChIP-seq dataset (GSE126505) 222 (Supplemental Table 6). As expected, the NFkB DNA binding motif was highly enriched 223 in each of our NFkB ChIP-seq datasets (Cis-BP NFkB motif M05887_2.00 enrichment: 224 10 -4158 < P < 10 -123 (Supplemental Table 7)). We identified 20,322 genomic loci with AD-225 dependent NFKB1 occupancy. AD-specific NFKB1 ChIP-seq peaks were enriched for 226 overlap with AD-specific ATAC-seq peaks in all six pairs (between 5.9 and 38.1-fold 227 enrichment, 3.00x10 -25 < p < 3.20x10 -203 ) (Supplemental Table 8). There was 228 substantially more variability between subject matched pairs in the NFKB1 ChIP-seq 229 experiments compared to the ATAC-seq experiments, with a median of 51.5% shared 230 NFKB1 peaks vs. a median of 91.9% shared ATAC-seq peaks (Figure 3). These results 231 indicate substantially more differential NFKB1 binding than chromatin accessibility in AD 232 subjects compared to matched controls. 233 We next sought to identify AD-and control-specific NFKB1 binding events by 238 performing a pairwise assessment of NFKB1 peaks in cases vs controls (see Methods). 239 This procedure identified shared, control-specific, and AD-specific NFKB1 peaks. An 240 exemplary pair shown is in Figure 4A, with all pairs shown in Supplemental Figure 4. 241 Strikingly, NFkB binding sites were more strongly enriched in the AD-specific NFKB1 242 ChIP-seq peaks compared to the matched control in five of the pairs (Supplemental 243 Figure 4 and Supplemental Table 7, example pair shown in Figure 4B). We defined 244 those NFKB1 peaks that were AD-specific or control-specific in three or more subject 245 pairs as "consistently AD-specific" and "consistently control-specific" peaks, respectively 246 (Supplemental Figure 3 C-D). In total, we identified 143 and 80 AD-specific and control-247 specific NFKB1 ChIP-seq peaks, respectively. Motif enrichment analysis revealed that 248 NFKB binding motifs were also the most highly enriched motif class within consistently 249 AD-specific NFKB1 peaks (Figure 4 C-D and Supplemental Table 9). In strong contrast, 250 consistently control-specific peaks were not enriched for NFKB motifs, instead enriching 251 for a wide range of motif classes (Supplemental Figure 5 A-B and Supplemental Table  252 9). Collectively, these results indicate that AD-specific NFKB ChIP-seq peaks strongly 253 enrich for NFKB1 motifs, while control-specific NFKB peaks surprisingly do not. Comparison across data types confirms strong concordance between RNA-seq, 276

ATAC-seq, and NFKB1 ChIP-seq 277
We next measured gene expression levels in CD3/CD28-stimulated CD4+ T cells 278 from each subject in this study, with the goal of integrating these data with chromatin 279 accessibility and NFKB1 binding. In a case-control pairwise analysis, 15 genes were 280 expressed at least 1.5-fold higher in the stimulated CD4+ T cells from the patient with AD 281 compared to the matched control, and 16 genes were expressed 1.5-fold lower in the 282 case compared to the matched control (Supplemental Table 10). These 31 genes were 283 enriched for AD-related processes such as the "regulation of immune system processes", 284 "lymphocyte activation", and "cytokine-mediated signaling pathway" GO biological 285 processes as well as "cytokine receptor binding", "nitric oxide synthase binding", and 286 "RNA-polymerase II-specific DNA-binding transcription factor binding" GO molecular 287 functions (Supplemental Figure 6). 288 The 100 kB region of DNA around AD-specific gene sets widely overlapped (94.7-289 100%) the ATAC-seq peaks in the six subjects with AD (Supplemental Table 8). There  Table 8). 295 In five of the six pairs, AD-specific NFKB1 ChIP-seq peaks overlapped a large proportion 296 of the AD-specific genes (42.1-73.4%) (Supplemental Table 8). Collectively, these data 297 indicate strong agreement between AD-and control-specific gene expression, chromatin 298 accessibility, and NFKB1 binding. 299

Allele-dependent chromatin accessibility at AD risk loci 300
Increasing evidence points to an important role for allele-dependent gene 301 regulatory mechanisms in many diseases [21,[25][26][27]. To identify such events, we 302 performed whole genome sequencing of all subjects to identify the alleles present at AD 303 genetic risk variants (Supplemental Table 11). We integrated these data with the 304 functional genomics data produced in this study using the Measurement of Allelic Ratios 305 Informatics Operator (MARIO) method, which measures the allele-dependence of 306 sequencing reads at genetic variants that are heterozygous [21]. Collectively, there were 307 an average of 2.3 (0-5) heterozygous loci in AD cases and 2.6 (range 0-6) in controls 308 (Supplemental Table 11), providing 124 total opportunities to discover allele-dependent 309 ATAC-seq or NFKB1 peaks at AD genetic risk variants. 310 Sixty AD-associated variants are located within an ATAC-seq peak in at least one 311 subject and also heterozygous in that subject. Strikingly, 36 of these 60 (60%) variants 312 produced allele-dependent ATAC-seq peaks (Supplemental Table 12, Figure 5). 313 Collectively, the AD risk variants with allelic chromatin accessibility were found at nine 314 independent risk loci (31% of AD risk loci). At 16 of the AD risk variants that were 315 heterozygous and overlapped ATAC-seq peaks, we measured allelic imbalance across 316 multiple subjects. For example, at rs10791824 near the OVOL1 gene, we measured a 317 strong preference for the A allele across six individuals, with 85-100% of reads for all 318 subjects having an A (total of 112 vs 10, A vs T reads). 28 of the AD risk variants with 319 allelic ATAC were found to be eQTLs in stimulated CD4+ T cells based on DICE as 320 curated by the eQTL catalogue [28] (Supplemental Table 13); however, these 321 associations of allelic expression were not robust to multiple testing correction after 322 accounting for all of the eQTL measurements in that study (i.e. across many cell types 323 with and without stimulation). 324 In contrast to the large amount of observed allelic ATAC-seq peaks, only 6 unique 325 AD-associated variants were located within at least one NFKB1 ChIP-seq peak and also 326 heterozygous in any of the 12 subjects, and none of these demonstrated genotype-327 dependent activity. Future studies examining larger cohorts will be better powered to 328 potentially identify allelic NFKB1 binding activity. 329 Altogether, our data support a model in which stimulated peripheral blood CD4+ T 339 cells from patients with active AD have extensive differential chromatin accessibility and 340 NFKB1 binding relative to matched controls. We identify genotype-dependent chromatin 341 accessibility at 36 AD genetic risk variants. Collectively, this study highlights plausible 342 genetic risk mechanisms for AD through disease-specific epigenetic factors that are 343 enriched at AD genetic risk loci. Current databases of eQTLs are not specific to 344 participants with AD or other allergic diseases and they do not contain sufficient numbers 345 of participants to have the statistical power to robustly identify moderately sized eQTLs. 346 It will be important for the field to continue to curate control and AD-specific eQTL 347 datasets. It will also be important to perform molecular studies assessing genotype-348 dependent regulatory activity at AD risk variants in the context of CD4+ T cells to 349 definitively identify allelic transcriptional mechanisms at these loci. 350 In this study, we use a strong stimulation (CD3/CD28 crosslinking) to model 351 immune activation in patients. It is possible that differing levels of stimulation will reveal 352 additional disease specific differences in future studies. We used CD4+ T cells in our 353 functional genomic assays to support quantitative assessment of disease-specific and 354 genotype-dependent mechanisms. The disease and genotype-dependent effects 355 observed in this study can be refined and validated in very specific immune cell subsets 356 as the technology and analytical framework for quantitative comparisons at a single cell 357 level continue to mature. were processed and analyzed as described above for ATAC-seq. We also used publicly 468 available NFKB1 ChIP-seq datasets (GSE126505), which were processed using the 469 same analytical pipeline. allele-dependent mechanisms in our functional genomics datasets, we applied our 500 MARIO method [21]. In brief, MARIO identifies common genetic variants that are (1) 501 heterozygous in the assayed cell line (using NGS DNA sequencing data) and (2) located 502 within a peak in a given ChIP-seq or ATAC-seq dataset. MARIO then examines the 503 sequencing reads that map to each heterozygote within each peak for imbalance between 504 the two alleles. We report allelic accessibility and NFKB1 binding at AD genetic risk 505 variants in our ATAC-seq data with an Allelic Reproducibility Score (ARS) greater than or 506 equal to 0.4 which is considered significantly allelic [21]. 507

Data access 508
All raw and processed sequencing data generated in this study have been submitted to 509 the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under 510 accession number GSE184238. The reviewer token is urkzwwuszrsnrol.   n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 17.2 (8-36) n/a 538 539