CHD8 Regulates Cellular Homeostasis and Neuronal Function Genes Across Multiple Models of CHD8 Haploinsufficiency

The packaging of DNA into chromatin determines the transcriptional potential of cells and is central to eukaryotic gene regulation. Recent sequencing of patient mutations has linked de novo loss-of-function mutations to chromatin remodeling factors with specific, causal roles in neurodevelopmental disorders. Characterizing cellular and molecular phenotypes arising from haploinsufficiency of chromatin remodeling factors could reveal convergent mechanisms of pathology. Chromodomain helicase DNA binding protein 8 (CHD8) encodes a chromatin remodeling factor gene and has among the highest de novo loss-of-function mutations rates in patients with autism spectrum disorder (ASD). Mutations to CHD8 are expected to drive neurodevelopmental pathology through global disruptions to gene expression and chromatin state, however, mechanisms associated with CHD8 function have yet to be fully elucidated. We analyzed published transcriptomic and epigenomic data across CHD8 in vitro and in vivo knockdown and knockout models to identify convergent mechanisms of gene regulation by CHD8. We found reproducible high-affinity interactions of CHD8 near promoters of genes necessary for basic cell functions and gene regulation, especially chromatin organization and RNA processing genes. Overlap between CHD8 interaction and differential expression suggests that reduced dosage of CHD8 directly relates to decreased expression of these genes. In addition, genes important for neuronal development and function showed consistent dysregulation, though there was a reduced rate and decreased affinity for CHD8 interactions near these genes. This meta-analysis verifies CHD8 as a critical regulator of gene expression and reveals a consistent set of high affinity CHD8 interaction targets observed across human and mouse in vivo and in vitro studies. Our findings highlight novel core functions of CHD8 and indicate direct and downstream gene regulatory impacts that are likely to be associated with neuropathology underlying CHD8-associated neurodevelopmental disorder.

. Analysis of all datasets was performed using the same pipeline with quality control steps 227 and study-specific exceptions for consistency and covariate and batch structure as described in 228 original publication ( Figure 1A). Unsurprisingly, relative gene expression levels varied widely 229 across studies, with principle components of variation dominated by species of origin and 230 experiment ( Figure 1B). However, pairwise comparisons between DEGs from individual 231 datasets revealed specific similarities in gene expression changes. For example, comparison of 232 DEGs at the p < 0.01 cutoff level between the Gompers et al. (2017) and the Sugathan et al. 233 (2014) datasets revealed a strong positive correlation in direction of differential gene expression, 234 where genes that were significantly up-or down-regulated in one dataset followed the same 235 pattern in the other ( Figure 1C). Further pairwise comparisons between studies and expression 236 for specific genes can be done using our interactive web browser available at 237 https://nordlab.shinyapps.io/rna_browser/. This interactive resource allows for analysis of 238 principle components, differential expression of individual genes, and overall differential 239 expression patterns for all included datasets (Supplemental Figure 1). New data from CHD8 240 models will be added to this site as they are published and available.

242
Considering expression of CHD8 itself, most knockdown and heterozygous knockout 243 models resulted in a 50-60% significant decrease in mRNA ( Figure 1D). However, published 244 data from some models only showed a subtle decrease or even a significant increase in CHD8. 245 We verified that these findings were consistent with originally published RNA-seq data. The To examine patterns of transcriptional dysregulation associated with knockdown or 266 heterozygous mutation to CHD8, we performed gene set enrichment analysis of Gene Ontology 267 (GO) terms including datasets with at least 500 differentially expressed genes (Figure 2). 268 Specific GO terms were chosen based on observed changes or interest to the field based on 269 previous findings (for example, "canonical Wnt signaling pathway"). The full list of GO terms and relative enrichment is also provided (Supplementary Figure 2). While relatively small 271 numbers of individual genes showed overlapping significant changes in expression across pairwise study comparisons, we found strong correlation in DEG functional groups across 273 studies at the gene set level. This analysis identified two general signatures of differential gene 274 expression across published models. The first signature was characterized by downregulation of 275 genes annotated to terms related to the regulation of chromatin, transcription, and RNA 276 processing, which we refer to as general cellular homeostasis (homeostasis) genes. As a whole, 277 these are genes that do not exhibit cell specificity and are necessary for basic cell functions, such 278 as chromatin organization, transcription and translation, and mitosis. This includes terms such as 279 "RNA splicing," "regulation of gene expression," and "cell cycle." The second signature 280 encompassed terms related to neuronal development, maturation, and function, including terms 281 associated with neural progenitor activity and lineage specification, synaptic function, and cell 282 adhesion. We refer to these genes as neuronal function (neuronal) genes, and these genes showed 283 both down-and up-regulation depending on the model. Examples of these terms include "neuron 284 differentiation," "axon guidance," and "cell adhesion."  In vitro models were more likely to have neuronal terms represented while in vivo models were 293 more likely to have both, or only homeostatic terms, represented. There is also some indication 294 that in vivo models of postnatal brain were more likely to have enrichment of neuronal terms 295 while models of embryonic brain were more likely to have enrichment of homeostasis terms, but 296 this remains a preliminary assessment requiring more robust data across developmental stages.

297
Hierarchical clustering of all RNA-seq datasets reinforced this pattern, though enrichment of 298 these signatures was weaker for datasets with fewer than 500 differentially expressed genes 299 (Supplementary Figure 3). We note that there were also GO terms enriched only in individual 300 datasets (Supplementary Figure 2). Overall, our results suggest that CHD8 knockdown or 301 heterozygous knockout consistently influences homeostatic and neuronal pathways, which are 302 likely to drive the cellular, anatomical, and behavioral pathology reported in studies of CHD8 303 haploinsufficiency.

307
We reanalyzed a total of 49 ChIP-seq sequencing libraries from 8 studies of CHD8 308 genomic interaction patterns ( Table 2). Analyzed datasets represented both neuronal and non-309 neuronal model systems. We included both in vivo tissue preparations and in vitro culture models 310 from neuronal and non-neuronal fate cells to allow additional examination of tissue or cell-type 311 specificity of CHD8 interactions. Half of the datasets were generated from bulk mouse tissue at 312 adult (3 studies; Gompers et al. 2017, Katayama et al. 2016, Platt et al. 2017) and embryonic (2 313 studies; Katayama et al. 2016, Cotney et al. 2015 timepoints allowing for investigation of CHD8 314 interactions in vivo across time. The remaining data were generated from cellular models, with 315 two studies using human neuronal lineage cells (Sugathan et al. 2014, Cotney et al. 2015, two 316 using mouse or human cancer cell lines (Ceballos-Chavez et al. 2015, Shen et al. 2015, and one 317 using mouse embryonic stem cells (de Dieuleveult et al. 2016). ChIP-seq data were analyzed using the same steps for immunoprecipitated, or experimental, and control data in our analysis 319 pipeline ( Figure 3A). There was large variation in number of called peaks, likely due to 320 experimental design and technical differences ( Figure 3B). Eleven of the control ChIP-seq 321 libraries were found to have more than 250 called peaks with strong promoter enrichment 322 ( Figure 3B-C), suggesting some level of technical artifact associated with chromatin preparation 323 (Marinov et al. 2014). Considering these experimental issues, control ChIP-seq libraries with 324 >250 peaks were included in the analysis to test for similarity between technical artifacts and 325 CHD8 immunoprecipitated signatures in these datasets.

327
Across all ChIP-seq datasets, CHD8 genomic interactions most commonly occurred near 328 promoters ( Figure 3C). Furthermore, binding to promoter-defined peaks tended to approach 329 100% as the number of called peaks decreased, suggesting that higher affinity interactions for 330 CHD8 are largely at promoters. Increased affinity and frequency of promoter interactions by 331 CHD8 was clearly evident in the coverage data signal for both mouse tissues ( Figure 4A) and    Given that CHD8 targets a consistent set of promoters with a high level of affinity as well 367 as an expanded set of loci with lower levels of affinity and CHD8 knockdown or heterozygous 368 knockout causes changes to gene expression, we tested whether CHD8 genomic interactions 369 directly relate to changes in gene expression in CHD8 models. Most genes with CHD8 370 interactions at or distal to the promoter did not exhibit significant changes in gene expression, 371 regardless of the study, suggesting that there are additional determinants regarding sensitivity of 372 regulatory target genes to CHD8 dosage. While we did observe consistent patterns of overlap 373 between CHD8 targets and downregulated DEGs, upregulated DEGs were not enriched across 374 studies for CHD8 genomic interactions. Regardless of the experiment, CHD8 interaction affinity 375 was also strongest for genes that were more highly expressed (Supplementary Figure 5).

377
For a subset of RNA-seq results, there was a strong overlap between CHD8 target genes 378 from the ChIP-seq data and downregulated DEGs involved with cellular homeostasis. For

389
Genes associated with cellular homeostasis also tend to be at the high end of transcript 390 expression level distributions, suggesting a relationship between highly expressed genes and 391 dosage-sensitive CHD8 regulatory function. However, high levels of expression alone did not 392 predict CHD8 interaction or DEG, indicating that expression level does not solely determine 393 CHD8 interactions or sensitivity of regulatory targets to reduced CHD8 dosage. This trend of 394 negative correlation between CHD8 interaction affinity and changes in gene expression was less 395 apparent with datasets having fewer than 500 differentially expressed genes and for the datasets 396 (including many generated from in vitro models) where homeostasis gene expression signatures 397 were not present (Supplementary Figure 6, Figure 2).

399
While our findings show model-specific variation, the patterns present across CHD8 400 studies suggest a consistent relationship where reduced expression of CHD8 leads to 401 downregulation of CHD8 target genes associated with the cellular homeostasis signature, such as 402 genes involved in cell cycle, chromatin organization, and RNA transcription and processing.

403
These changes are seemingly stronger in in vivo models representing early stages of brain 404 development, though they are still present in some models representing more mature brain tissue 405 and cell types (Figure 2). In contrast, the observed differential expression of neuronal 406 differentiation and neuronal function (e.g. synaptic) genes tends to occur in models representing 407 more mature neuronal tissue or cell types and differentiated culture models inherently containing 408 heterogenous cell populations at unknown stages of development.

411
This meta-analysis of published genomic datasets from in vitro and in vivo mouse and 412 human studies revealed both consistent and study-specific effects of CHD8 haploinsufficiency 413 on gene expression and largely concordant high-affinity CHD8 genomic interaction loci. We note a number of technical issues that impacted this meta-analysis, many of which 439 are associated with variation in methods and sequencing depth. Surprisingly, we found 440 considerable differences in CHD8 expression across models despite the common design of 441 testing the impacts of haploinsufficiency. Though we did not find an obvious correlation between 442 CHD8 transcript levels and up-or downregulated gene expression, it seems likely that 443 differences in experimental design, including CHD8 knockdown or knockout, contributed toward 444 meaningful variation between models. Changes to CHD8 dosage have been shown to have strong 445 and potentially opposing effects on cellular function. For instance, homozygous knockout of 446 CHD8 has been described to cause severe developmental arrest and widespread apoptosis 447 leading to early embryonic lethality (Nishiyama et al. 2009) while heterozygous mutation can 448 lead to increases in proliferation (Gompers et al. 2017 Despite the limitations of comparing genomic datasets across variable models, our 462 analysis challenges two simple models regarding pathological mechanisms of CHD8 463 haploinsufficiency. The first model the transcriptional signatures present across studies refute is 464 that pathology due to CHD8 haploinsufficiency is primarily due to alterations in patterning 465 during early brain development. While our meta-analysis clearly supports impacts to 466 proliferation and neuronal differentiation consistent with published findings on proliferation and 467 brain volume (Bernier et al. 2014, Gompers et al. 2017, Katayama et al. 2016, Platt et al. 2017 Durak et al. 2016), we also observed evidence of dysregulation of genes involved in mature 469 neuron function, including synaptic genes. This is consistent with observation that CHD8 is still 470 highly expressed in adulthood (Gompers et al. 2017, Platt et al. 2017, Maussion et al. 2015, that 471 mutations to CHD8 continue to lead to differential gene expression and behavioral phenotypes in 472 adult mice (Gompers et al. 2017, Katayama et al. 2016, Platt et al. 2017, and with limited 473 evidence of synaptic dysfunction associated with Chd8 haploinsufficiency (Platt et al. 2016).

474
Further work will be required to establish the role and requirement for CHD8 in mature neurons 475 and other cell types in the brain.

477
Second, the signatures present in this meta-analysis suggest that pathology observed in 478 CHD8 models and patients with CHD8 mutations is not due to targeted impacts to specific 479 populations of cell-types or due to impacts limited to specific brain regions. In this analysis of haploinsufficiency. Despite evidence that these genes are not high-affinity CHD8 targets, we did 485 observe enrichment of differentially expressed neuronal genes in the CHD8 interaction analysis.

486
One explanation for this finding is that CHD8 haploinsufficiency indirectly causes large-scale dosage. Our results raise two questions that could be addressed by application of RNA-seq and impacts of CHD8 haploinsufficiency in the developing and mature brain, and 2) Does CHD8 503 have context-dependent function in specific stages, cell types, and regions with regard to 504 genomic interaction patterns? Beyond addressing these two key issues, additional clarity 505 regarding the role of CHD8 in the brain will come from studies examining the molecular 506 interaction partners and impacts on chromatin, transcription, and RNA processing. As        Examples of CHD8 binding near promoters of select chromatin (ADNP, SUV420H1), RNA processing (TRA2B), and neuronal function (CALM2) genes in the mouse (top) and human (bottom) ChIP-seq datasets.  A metabolic process ion process c process anization e expression scription, DNA−templated proliferation gnaling pathway ation n development rix organization anization mbly ne all GTPase mediated signal transduction ne transport l death 0 2 4 6 8 10 Ceballos    RNA-seq datasets. Heat maps and scatter plots of gene expression changes are also available. All plots and tables generated using Shiny can be downloaded from the app. Datasets can be analyzed using pseudo counts or relative expression.
Supplementary Figure 2 (See Supplementary Figure File) Full list of terms from the gene ontology analysis using goseq. Terms were selected from this list to create Figure 2. All RNA-seq datasets were included in this figure. All terms included met an FDR < 0.05 cutoff.  No obvious primary motif associated with CHD8 binding. Each dataset was analyzed using HOMER to look for common motifs enriched in CHD8 ChIP-seq datasets. ELF1, ELK1, E2F, CTCF, and YY1 transcription factors were the motifs that were commonly represented across datasets. Ceballos  Chd8 regulates highly expressed genes. Each dataset is labelled showing changes in sequencing coverage according to changes in CHD8 binding affinity. The Chd8 ChIP-seq dataset used was from Platt et al. 2017. Full models for each dataset were chosen as they exhibited similar signal as the individual timepoint or brain region datasets.
Supplementary Figure 6 Remaining fold change plots from the CHD8 binding by differential gene expression comparison analysis. All datasets were analyzed using the Platt et al. 2017 Chd8 ChIP-seq dataset. Datasets are loosely organized based on overlap between downregulated genes, no clear trend, or upregulated genes from top to bottom, which sometimes spanned multiple rows.