Eight principal chromatin states functionally segregate the fly genome into developmental and housekeeping roles

Different chromatin forms, or states, represent a fundamental means of controlling gene regulation. Chromatin states have been studied through either the distribution of histone modifications (e.g. 1–5) or more rarely via the occupancy of chromatin proteins 6–8. However, these two approaches disagree on the nature and composition of active chromatin states 2,9 and modelling chromatin via both histone marks and chromatin proteins has been lacking. Here, combining protein and histone mark profiles, we show that chromatin in Drosophila melanogaster is organised into eight principle chromatin states that have consistent forms and constituents across cell types. These states form through the association of the Swi/Snf chromatin remodelling complex, Polycomb Group (PcG)/H3K27me3, HP1a/H3K9me3 or H3K36me3 complexes with either active complexes (RNA Pol/COM-PASS/H3K4me3/NuRF) or repressive marks (histone H1 and nuclear lamin occupancy). Enhancers, core promoters, transcription factor motifs, and gene bodies show distinct chromatin state preferences that separate by developmental and housekeeping/metabolic gene ontology. Within the 3D genome, chromatin states add an additional level of compartmentalisation through self-association of topologically associated domains (TADs) of the same state. Our results suggest that the epigenetic landscape is organised by the binding of chromatin remodellers and repressive complexes, and that through chromatin states the genome is fundamentally segregated into developmental and housekeeping/metabolic roles.

these states formed through the association of the Swi/Snf, HP1a/H3K9me3, PcG/H3K27me3 or 50 H3K36me3 complexes with either active complexes (RNA Pol/COMPASS/H3K4me3/NuRF) or 51 repressive marks (histone H1 and nuclear lamin occupancy) (Fig. 1e). These constituents have 52 established interactions that suggest a physical basis for chromatin states [10][11][12][13][14][15][16][17] , most notably the 53 refractory relationships between PcG/H3K27me3 and H3K36me3 15 , and between Swi/Snf and 54 H3K9me3/HP1 heterochromatin 12 . 55 These eight chromatin states shared three repressive states in common with earlier protein 6 56 and histone mark studies 2,4 : Polycomb Group (PcG) facultative heterochromatin, HP1 consti- 57 tutive heterochromatin, and a silent "Black" (or "low-signal") chromatin state. This latter state 58 was enriched for the linker histone H1 and nuclear lamin occupancy, devoid of other profiled 59 proteins or histone marks, and shared similar genomic coverage with earlier classifications of For core promoters that were differentially activated by either a developmental enhancer (dEnh) or a housekeeping enhancer (hkEnh) in S2 cells, the chromatin state of the four active core promoter clusters strongly predicted enhancer-corepromoter ontology. CP classes identified in S2 cells via transcription cofactor binding 25 (Fig. S11). Assigning chro-109 matin states to CPs, we observed a significant depletion of Yellow chromatin in two CP types, 110 and a significant enrichment over a further two types ( Fig. 2e; Fig. S12). This divergence became 111 even more pronounced on CPs that were differentially activated by a developmental or house-112 keeping enhancer (Fig. 2e). Combined, these data suggest that chromatin states fundamentally 113 separate enhancer and core promoter regulatory elements along developmental and housekeeping 114 lines. Importantly, our chromatin state models allow us to relate enhancers and core promoters 115 identified from ectopic, transgenic constructs to the chromatin contexts at their native locus, 116 demonstrating that regulatory links between enhancers and core promoters translate to differing 117 Figure 3: Chromatin states are differentially enriched for TF motifs across cell types. (a) For each cell type, genes were grouped by state, enriched TF motifs searched for using RcisTarget, and the proportion of genes in each state for each enriched motif recorded. High-confidence motifs were then clustered across all cell types and by chromatin state. Highlighted TF annotations are coloured by the chromatin state they are found enriched within. (b) (i) The binding of the TF Cropped (Crp) in NSCs, determined via Nanodam 28 , is (ii) strongly and significantly enriched in Yellow, but not Yellow-R chromatin; peaks (FDR<0.01) are illustrated. chromatin states in vivo. 118 These data fit well with recent findings that the Swi/Snf remodelling complex is bound at and 119 required for developmental enhancer activation in S2 cells, specifically affecting developmental 120 gene expression when depleted 26 . This study also looked at the effect of depleting other chro-121 matin remodeling complexes, including NuRF and INO80, finding no ontological preference for 122 these enhancers and -in concordance with our chromatin state modelling -observing that NuRF 123 bound indiscriminately over both classes of promoters 26 . However, this work failed to profile 124 the Tip60/p400 remodelling complex that also associates with the Yellow chromatin compo-125 nent MRG15 27 , and is tempting to speculate that this complex may be a specific remodeller for 126 housekeeping regulatory elements. search for significant enrichment of TF binding motifs associated with genes in different chro-131 matin states, we observed a clear segregation of TF motifs by chromatin state (Fig. 3a). In par-132 ticular, clusters of specific TF motifs were observed for Yellow and PcG chromatin states from all cell types. Motifs for the cell-growth regulating TF Cropped (crp) 30 , the insulator BEAF-134 32 31 , and the polarity and proliferation regulating TF Zif 32 were enriched in Yellow chromatin; 135 whereas motifs for classic cell-fate and segmentation regulators such as Kr 33 , ems 34 and bowl 35 136 were enriched in PcG chromatin (Figs. 3a, S13). 137 We also observed cell-type-and state-specific enrichment for some factors -such as the

166
In contrast to dividing celltypes, terminally-differentiated neurons showed repression of many 167 developmental gene clusters by HP1 heterochromatin.

168
Notably different from all other clusters, a single group of genes was present in PcG chro-   Although not significantly enriched for specific ontologies, investigation of the highest-ranking genes transitioning from Yellow-R in NSCs to Yellow states in neurons revealed a number of genes involved in synaptic structure and signalling, with scRNA-seq data (reanalysed from 39 ) confirming the change in expression status between cell types (d). data 6 (Fig. 6a), we adapted Chrom3D 48,49 to model the 3D genome of kc167 cells (Fig. 6b(i)).

199
Chrom3D uses TADs as its fundamental modelling unit, relying upon statistically-significant pericentric heterochromatin regions into a single structure termed the chromocentre (Fig. 6b(ii)).

204
Using the significantly-enriched inter-TAD associations called via Chrom3D, we then 205 looked at whether chromatin states associated on a TAD level within the 3D genome. Although 206 Yellow-R TADs were not frequent enough to call enrichment for (reflecting the overall low 207 extent of Yellow-R chromatin in kc167 cells), for all other states we observed a significantly-208 enriched self-association of chromatin states (Fig. 6c,d; Fig. S15). We also found weaker,  PcG and Black TADs found predominantly within B compartments, and the remaining chro-217 matin states enriched within A compartments (Fig. 6d). Surprisingly, within A compartments 218 we observed a clear separation of states by ontology, with long-range TADs interactions divided 219 between housekeeping/metabolic (Yellow, HP1-A, HP1) and developmental (Swi/Snf, Swi/Snf-220 R) roles (Fig. 6d). These data fit well with previous work in mammalian cells suggesting that 221 B compartments were separated into two sub-compartments, and A compartments into 3 sub-  In conclusion, we find that chromatin is organised into eight principle states across  233 Details of all previously published datasets used in this study are provided in Table S1. Data 234 were reprocessed as described below.     For the mature neuronal dataset, flies were allowed to lay at 18°C for 24 hours in food vials.  Following sample collection, genomic DNA was extracted, cut with DpnI and cut fragments 277 isolated. DamID adaptors were ligated to the isolated DNA, fragments were digested with DpnII, 278 and amplified via PCR before next-generation sequencing library preparation. 280 Following the DamID procedure, samples were prepared for NGS as previously published 59 .   All binding data was transformed to a log2 binding enrichment ratio in order to fit gaussian 301 HMMs to ChIP-seq data. For datasets with input controls, the input control reads were nor-302 malised to the binding data by the average sample / input ratio per bin, with bins in the highest 303 two deciles of sample signal excluded. The final binding ratio for each bin was log2(sample/nor-304 malised_input). For datasets without input controls, bins with no reads were excluded and then 305 a noise floor to the data was determined by finding the arg max of the kernel density function 306 for the data. The final binding ratio was log2(sample / noise_floor). In both cases, pseudocounts 307 were added per bin during ratio calculation.  (Table S1) was converted to release 6 of the Drosophila genome, converted to GATC resolution 316 and scaled.     Hi-C data reprocessing 398 High-resolution DpnII Hi-C reads from kc167 cells (Table S1)  Transcription factor motif enrichment was performed using the RcisTarget R package 29 to-467 gether with the dm6-5kb-upstream-full-tx-11species.mc8nr.feather motif rankings.

468
For each cell type studied, genes were divided by chromatin state and the cisTarget function was  Clustering of motifs was performed with ComplexHeatmap, using Spearman's rank correla-476 tion as the distance method between rows and columns, clustering rows and columns via centroid 477 heirarchical clustering and splitting columns on chromatin state; given the very large number of (RNA Polymerase) occupancy, and the final plot generated with ComplexHeatmap 63 .
Genomic features (exons, introns and intergenic regions) were obtained via the following 515 method. The D. melanogaster release 6.34 GTF annotations were read as a GRanges object, 516 filtered for exons, and set as unstranded. The exon Granges object was reduced, and all 517 non-exon regions taken as the gaps in this object. Separately, the R genomation package was 518 used to read features from the same genome GTF file, identifying introns. Intergenic regions 519 were taken to be the non-exon ranges with the intron ranges excluded. All gene ontogeny (GO) term analysis was carried out via the ClusterProfiler R package 68 .

525
Other bioinformatics analyses 526 Bioinformatics pipelines were accelerated using GNU Parallel 70 . All other analyses were per-527 formed using R 71 .

528
Software availability 529 All analysis code used in this project will be made available on GitHub at https://github.com/marshall-530 lab upon publication.

531
Data availability 532 Next-generation sequencing data generated in this study will be deposited in NCBI GEO prior 533 to publication.

534
Acknowledgments 535 We thank Grace Jefferies, Victoria Roy and Elizabeth Read for technical assistance. We thank 536 Ciarán O'Mara, Jake Newland and all members of the Marshall Group, past and present, for their 537 helpful thoughts, comments, insights and discussions. We also thank Alison Bardin, Natalia