High-resolution mapping of regulatory element interactions and genome architecture using ARC-C

Interactions between cis-regulatory elements such as promoters and enhancers are important for transcription but global identification of these interactions remains a major challenge. Leveraging the chromatin accessiblity of regulatory elements, we developed ARC-C (accessible region chromosome conformation capture), which profiles chromatin regulatory interactions genome-wide at high resolution. Applying ARC-C to C. elegans, we identify ~15,000 significant interactions at 500bp resolution. Regions bound by transcription factors and chromatin regulators such as cohesin and condensin II are enriched for interactions, and we use ARC-C to show that the BLMP-1 transcription factor mediates interactions between its targets. Investigating domain level architecture, we find that C. elegans chromatin domains defined by either active or repressive modifications form topologically associating domains (TADs) and that these domains interact to form A/B (active/inactive) compartment structure. ARC-C is a powerful new tool to interrogate genome architecture and regulatory interactions at high resolution.

We applied ARC-C to C. elegans L3 chromatin, preparing libraries from three biological and two technical replicates. Data from all replicates were highly concordant (Supplementary Figure 1). To increase the power to profile interactions, all cis-informative reads were pooled, resulting in 12 million reads pairs for analysis (Supplementary Figure 1). As expected, cis-informative read pairs were enriched at regulatory elements (43.7% of reads overlap regulatory elements, which comprise 21.1% of the genome).
Previous studies of C. elegans genome topology using Hi-C identified large selfinteracting domains on the X chromosome regulated by the dosage compensation complex (DCC), which contain on average ~200 genes 7 . The DCC was shown to be enriched at recruitment element on X (rex) sites at domain boundaries, and 10 kb bins containing these sites were shown to physically interact with one another. We found that the ARC-C contact map recapitulated the X-chromosome domains mapped by Hi-C and more sensitively detected interactions between rex sites ( Figure   1d, e). ARC-C contact maps from the autosomes are also highly similar to Hi-C maps (Supplementary Figure 2). These results show that ARC-C can profile domain largescale domain architecture and suggest that it can improve detection of specific interactions between regulatory elements.
Previous Hi-C studies in C. elegans failed to identify smaller self-interacting topologically associating domains (TADs) similar to those seen in Drosophila and vertebrate genomes, which typically contain one to several genes [8][9][10][11] . Notably, the average C. elegans gene length of 5 kb (including intergenic regions) is similar to the resolution of Hi-C maps, which may have limited the ability to identify TADs.
We considered that genomic regions defined by chromatin modification domains might form TADs because in other animals, histone modification patterns within individual TADs is often relatively uniform 11 . In C. elegans, histone modification domains segment most of the genome into "active" domains of broadly active genes marked by H3K36me3 and other modifications associated with gene activity which alternate with domains of H3K27me3 covering genes that are inactive or have regulated expression 12,13 . Both domain types have a median gene number of three, with median lengths of 14 kb for active domains and 19 kb for H3K27me3 domains 12 .
To investigate whether these chromatin domains defined by histone modifications form TADs, we aggregated ARC-C signal across each type of aligned domain and their neighboring regions.
For both active and H3K27me3 domains, we found that ARC-C interaction frequency is higher within domains compared to interactions with neighboring chromatin, giving rise to a central square of higher signal. This indicates that the domains are spatially separated ( Figure 2). We further observed that active and H3K27me3 domains have "compartment" structure, in which domains of the same type (active or H3K27me3) interact more frequently with each other than with domains of the opposite type ( Figure 2). Supporting these analyses, we observed similar results using Hi-C data (Supplementary Figure 4). We conclude that C. elegans active and H3K27me3 chromatin domains form TADs and that these TADs are organized into a compartment structure similar to A/B compartments of other animals.
We next tested the ability of ARC-C to profile interactions between specific regulatory elements. Individual promoter and enhancer elements are small regions of DNA bound by transcription factors and from which transcription is initiated 14 . To identify chromatin interactions between such elements at high resolution, we separated the genome into 500bp bins and identified those with significantly enriched interactions, taking into account distance and coverage biases (see Methods). This identified Table S2).
Promoters are most prevalent among the significant interactions, accounting for 62% of interacting elements, and they are involved in 86% of the interactions (Figure 3a).
Half of the significant interactions are relatively short range (within 20 kb), and in this size range there are similar numbers of P-P and P-E interactions ( Figure 3b).
However, at longer distances, promoter-promoter interactions predominate, which may indicate functional differences. We also observed that genes connected by promoter-promoter interactions had correlated gene expression ( Figure 3c). The correlation is strongest for pairs with highly regulated expression (i.e, those with high coefficients of variation of gene expression (CV), suggesting that such genes are in proximity with each other when expressed.
To identify proteins that are candidates for mediating interactions in C. elegans, we  Table S3). Although the binding sites for only three proteins showed significant enrichment for interactions in the Hi-C map, the trends were similar to those observed using ARC-C (Supplementary Figure 3; Table S4). As To evaluate the ability of ARC-C to detect changes in regulatory interactions, we chose to analyse BLMP-1, one TF for which binding sites significantly interact ( Figure   3d). blmp-1 encodes a spatially restricted transcription factor important for hypodermal, vulval, and gonadal development [17][18][19] . We performed ARC-C in L3 stage blmp-1 mutants and compared chromatin interactions at pairs of TF and chromatin regulator binding sites with those of L3 stage wild-type. For this analysis, we also included direct BLMP-1 targets, defined as the subset of BLMP-1 binding regions associated with a gene that was upregulated or downregulated in blmp-1 mutants. In wild-type, interaction strength among the up or down BLMP-1 target regions is similar to that of the full set of BLMP-1 binding sites ( Figure 3d; Table S5). Strikingly, the only sites to show significantly reduced interactions in the ARC-C map from blmp-1 mutants were the BLMP-1 down regulated targets (Figure 3d). These results indicate that BLMP-1 mediates interactions between targets that require it for expression.
Overall, the results show that ARC-C has power to map interactions between regulatory elements at 500bp resolution and to detect changes in mutants.
In its present form, ARC-C works well for profiling chromatin interactions in relatively small genomes, as sequencing 200 million fragments per duplicate library produces enough cis-informative read pairs for profiling architecture and regulatory element interactions. However, for application to larger mammalian genomes, an enrichment step for ligation events (e.g., through biotin tagging) would be necessary.
Here we used ARC-C in whole animals, so the cell types from which the detected interactions came are unknown. In addition, interactions that occur in a small number of cells are likely to have been missed. The future application of ARC-C to specific purified cells would address these issues, allowing in vivo investigation of cell-type specific architecture.
In conclusion, ARC-C provides a new ability to assay genome topology together with high-resolution mapping of regulatory interactions in a genome-wide assay, which should accelerate studies of transcriptional regulation and the relationship with genome architecture.

Worm growth
Strains were grown in liquid culture at 20°C using standard S-basal medium with HB101 bacteria. Animals were first grown to the adult stage, bleached to obtain embryos, and the embryos hatched without food in M9 buffer for 24 hrs at 20°C to obtain synchronized starved L1 larvae. L1 larvae were grown in a further liquid culture at 20°C then harvested at the L3 stage. Worms were collected, washed in M9 buffer, floated on sucrose, washed again in M9, then frozen into small pellets by dripping worm slurry into liquid nitrogen, which was stored at -80°C until use.
Replicate peaks were combined using IDR 25 .

Differential expression analysis of blmp-1 mutant
Raw RNA-seq data of wild-type and blmp-1 mutant at L3 stage were obtained from GEO (GSE55225) 17 . Reads were aligned to genome assembly ce10 with gene annotation WS235 using STAR 26 . Read counts per gene were generated by featureCounts and genes differentially expressed in blmp-1 relative to wild-type identified using DESeq2 27 requiring adjusted p < 0.05 and absolute fold change > 1.5. Differentially expressed genes for which a BLMP-1 ChIP-seq peak overlapped one of its assigned regulatory elements were defined as upregulated "up" or downregulated "down" BLMP-1 targets.

ARC-C library preparation
Frozen worm pellets were ground into a fine powder in which worms were broken into ~10 fragments. 1 ml of worm powder was then fixed in 10 ml of 1% formaldehyde in PBS at room temperature (RT) for 10 min with gentle shaking then quenched for 5 min with a final concentration of 125mM glycine. Fixed worm fragments were then washed with Buffer A (340mM sucrose, 15mM Tris-HCl pH7.5, 2mM MgCl2, 0.5mM spermidine, 0.15 mM spermine, 1mM DTT, protease inhibitors), resuspended in 7 ml Buffer A, then the material dounced 20 strokes in a 7 ml stainless steel tissue grinder (VWR-#432-5005). The material was spun at 100g for 5 minutes and the supernatant containing nuclei was transferred to a new tube and kept. The pellet was resuspended in Buffer A and again dounced with 20 strokes.
After spinning, the two supernatants containing nuclei were pooled.  Contact maps were made from informative read pairs by binning the genome into fixed-width non-overlapping bins and counting the number of read pairs between each pair of bins. The maps were then normalised by matrix balancing using the Knight-Ruiz algorithm 28 . For aggregated contact analysis (see below), the map was further divided by the spline-smoothed average contact frequency given distance from the diagonal to remove background slope of contact frequency.

Processing Hi-C data
Raw FASTQ files of wild-type mixed embryo Hi-C data 7 were downloaded from SRA (SRX77040) and processed using HiCUP v0.5.9 29 , which filters for same fragment (circularised, dangling ends, internal), religation, wrong size, contiguous sequence and removed duplicate read pairs. We additionally required mapQ>=30 and number of mismatches <= 2 from both reads, that none of the reads overlapped modENCODE blacklisted regions, and a minimum distance of 600bp between the two read pairs to be consistent with the processing of ARC-C data. In the end, 25,460,294 read pairs passed all filters, out of which 17,100,808 have both reads mapping to the same chromosome.

Calling significant interactions
We first segmented the genome into bins of ~500bp. We first took annotated regulatory elements 23 (n=42,245) and expanded them to 500bp or until neighbouring intervals began to touch; a small number of elements that were within 100bp were merged first. The rest of the genome was covered with evenly placed 500bp nonoverlapping fixed-width intervals, hence the entire genome was covered by a we used the reciprocal of (coffpeak) 0.87 as the correction factor.

Aggregated contact analyses
A contact is defined as the region in the contact map that connects a pair of genomic locations. Aggregated contact analysis is a method of visualising the average contact frequency of a group of many contacts together with local contact frequency 15 . We applied this method to both small genomic regions, such as nuclear factor ChIP-seq binding sites (NFBS), and to larger intervals, such as chromatin domains. We normalised contact maps using matrix balancing 32 to account for coverage bias and removed distance dependent background. In NFBS analysis, we used contact maps of 1kb resolution. For each NF, up to 50,000 contacts were randomly sampled from all possible cis-contacts among its binding sites within a distance range of 20kb to 1Mb, and local maps of 21 by 21 bins centered at the contacts were extracted and aggregated. For the case of BLMP-1 regulated targets, all possible cis-contacts between BLMP-1 binding sites that involves a BLMP-1 regulated target were aggregated (i.e. at least one end of the contact is at a BLMP-1 regulated target). The log2 fold change of the central point over the mean of the rest of the points in the aggregated map was calculated to measure the relative increase in contact frequency over local background. To assess statistical significance while controlling for accessibility and the distance between the pair of NFBSs, 1000 sets of random contacts were generated, each containing the same number of contacts with matching accessibility and distance as the NFBS contacts. Each of the 1000 random sets was aggregated and a relative increase in contact frequency calculated in the same way as the NFBS set, forming a distribution of values against which the NFBS value was compared and a p-value generated, which was corrected for multiple testing using the FDR method. Datasets are from 21, 33-38 . Processed and curated modENCODE ChIP-seq peaks were obtained from 23 . We removed intervals that are "HOT" (highly occupied target) as such binding events are thought to represent nonsequence specific TF binding or ChIP artifacts 36, 39 . This was defined as the top 20% of peak intervals ranked by the number factors in which the interval is called (effectively removing intervals called in 12 or more factors). We only considered datasets having at least 300 peaks following filtering.
For domain analyses, we used contact maps of 5kb resolution. The active and H3K27me3 domains are L3 domains from 12 . Regions annotated as active and border were merged to generate the active domains used here; the H3K27me3 domains are those termed "regulated." We tested for TADs by assessing intra-domain contacts.
For each type of domain (active or H3K27me3), maps of each contact region containing a domain of at least 5kb together with up to 25kb of the flanking domains of opposite type were extracted and aggregated. Where neighboring domains were < 25 kb, only the domain was extracted. The aggregated contact was scaled to a square of 9 by 9 bins and the flanking intervals were scaled to 5 bins wide. The log2 fold change of the mean of the central 9 by 9 square over the mean of the four neighbouring 5 by 10 rectangles on top, bottom, left and right was calculated to measure the relative increase in contact frequency with p-values generated by t-test.
We tested for compartments using the same approach, by assessing all possible pairs of inter-domain contacts in the range of 50kb to 2Mb.

Aggregated contact analysis of blmp-1 mutant ARC-C data
We used the procedure described above for wild-type to measure the relative contact frequency between NFBSs in blmp-1 ARC-C data. To normalize for a  (Table S5).   Regulatory elements 23 (protein coding promoters, red; unassigned promoters, yellow; enhancers, green; unknown activity, blue) and genes are displayed below. (d)

Expression correlation between interacting promoters
Comparison of ARC-C and Hi-C X chromosome contact maps. Hi-C data are from 7 .
(e) Circos plots of cis-informative read pairs from ARC-C and Hi-C maps in 300kb region of chromosome X (1,200,000-1,500,000 bp). rex sites are indicated in red.
Reads were downsampled in both plots for clarity.