Modeling population size independent tissue epigenomes by ChIL-seq with single-thin sections

Recent advances in omics studies have enabled analysis at the single-cell level; however, methods for analyzing the whole cell of large organs and tissues remain challenging. Here, we developed a method named tsChIL to understand the diverse cellular dynamics at the tissue level using high-depth epigenomic data. tsChIL allowed the analysis of a single tissue section and could reproducibly acquire epigenomic profiles from several types of tissues, based on the distribution of target epigenomic states, tissue morphology, and number of cells. The proposed method enabled the independent evaluation of changes in cell populations and gene activation of cells in regenerating skeletal muscle tissues, using a statistical model of RNA polymerase II distribution on gene loci. Thus, the integrative analysis by tsChIL can elucidate in vivo cell-type dynamics of tissues.


27
Tissues are terminally differentiated cells formed from stem cells, followed by cell-type 28 conversion and functional arrangement of cell types to the specified spatial localization.

29
Presently, the composition of the cells playing different functions and the mechanism by which 30 each type is formed have been elucidated. This allowed us to understand the biological function process. Therefore, we focused on the use of tissue sections that are free from enzymatic 92 treatment biases for epigenomic analysis and developed an experimental procedure using a 93 single, very small, and thin tissue sections. We then optimized the ChIL for tissue (Fig. 1A), 94 based on our previously reported sc-epigenomic analysis tools 7,8 .

96
Since the reports on analysis using microtissue sections are limited, and all of them require 97 multiple tissue slices to obtain the required cell number [15][16][17][18][19][20] (Table S1). We therefore focused on 98 preparing frozen, unfixed tissue sections to equalize the fixation conditions. On plates, unfixed 99 tissue thin sections are fixed with paraformaldehyde then permeabilized, followed by blocking.

100
Immunostaining is then performed by reacting with primary antibodies against the target 101 molecules on chromatin. Then, a fluorescent-labeled ChIL-probe attached with secondary 102 antibodies was used to obtain the tissue localization of the target by imaging at the subcellular 103 level. Subsequently, Tn5 transposase inserts an artificial sequence containing a T7 promoter 104 into the genomic region near the target. In vitro transcription of the genome sequence near the 105 target protein, starting from the T7 promoter, was performed, and the reverse-transcribed DNA 106 was sequenced using a next-generation sequencer. Compared with conventional epigenomic 107 analysis methods for FFPE and fresh frozen tissue slices, this method enabled uniform fixation 108 conditions for the analysis of micro-thin slices. Therefore, using the highly efficient ChIL method, 109 we attempted to analyze tissues with an input size of 3 mm × 3 mm × 10 μm. Thus, we designed 110 tsChIL as a high-precision method for analyzing the epigenetic information of a group of cells on 111 a tissue section of the target, following the spatial distribution of the specific epigenetic status.

113
To evaluate the designed tsChIL experimental procedure, the levels of the enhancer marker of 114 histone modification H3K27ac and the recruitment of RNA Polymerase II (RNAPII), an indicator 115 of transcription, were detected in three different tissues: liver, heart (left ventricle), and testis.

116
Most of the cells were hepatocytes, comprising 70-80% of the liver. The H3K27ac signal 117 visualized by the ChIL-probe was uniformly distributed across cells on the sections.

118
Subcellularly, the co-localization of H3K27ac and RNAPII in euchromatin regions 119 (Hoechst-negative) was observed (Fig. 1B). In the testis, which consists of cells at multiple 120 differentiation stages, the RNAPII signal was strongly distributed and localized in cells with high 121 transcriptional activity, especially near the outer periphery of the seminiferous tubule 27 (Fig. 1C), a region where cells in the early stages of sperm differentiation are located (Fig. 1C). Meanwhile, 123 the heart was co-stained using laminin and the ChIL probe to distinguish the cell boundary 124 regions and visualize the basement membrane (Fig. 1D). S5P signal showed a localization to 125 the low Hoechst-dense region of the cell nucleus in which transcription may active, suggesting 126 that immunostaining with ChIL probe was a valid histological staining method at the subcellular 127 level ( Fig. 1B-D, Fig. S1).

129
To validate the feasibility of tsChIL for sensitive and accurate epigenomic analysis, we performed 130 tsChIL-seq using a single thin section containing 1,000-10,000 cells (Table 1), which was 131 generally assumed as a low number of cells in culture 7,8 . The number of cells used was less than 132 that of conventional epigenomic methods used especially for tissue analysis (Table S1).

133
Furthermore, the genome-wide analysis was performed by ChIL reaction on single sections of 134 the sections that showed in Figure 1B-D. In the representative visualized epigenomic data in 135 liver ( Fig. 1E), the accumulation of H3K27ac and RNAPII at the Alb locus, a hepatocyte marker, 136 was observed. The former showed an activated upstream enhancer region, whereas the latter 137 was highly transcriptional activity at the locus. The transcription of Alb was also confirmed using 138 RNA-seq with different serial slices. These results indicate that tsChIL enables the simultaneous 139 acquisition of both the tissue distribution of the epigenomic status and the genome-wide 140 epigenomic data using a single tissue section containing a small number of cells (10 3 to 10 4 cells 141 in 10 mm 2 area).

143
Next, to evaluate the genome-wide distribution of the signals obtained using the tsChIL 144 procedure proposed above, we examined the specificity of the signal localization among different 145 tissues and antibodies and the reproducibility of signal localization of the same tissue and 146 antibody. First, to estimate the appropriate number of reads for ChIL-seq with tissues, we 147 obtained 480 M reads from RNAPII ChIL-seq in muscle tissue and evaluated the library 148 complexity 28 (i.e. the prediction curve of usable reads). As seen in Figure 1F, the number of total 149 usable reads was starting to move away from the black line at approximately 10 7 , indicating a 150 decreasing percentage of usable reads. Therefore, we determined that approximately 10 7 reads 151 is a good cost-balanced number of the required reads in the case wherein the number of cells 152 per section is < 10 4 . To obtain a ChIL signal with sufficiently high signal-to-noise ratio, we 153 acquired an average of approximately 14 M reads (Table S2), which is comparable to the 154 number of reads in the ENCODE tissue ChIP-seq (10 M-20 M) 3 .

156
With this number of reads, the tsChIL-seq data from the liver, heart, and testis were obtained, 157 and the genome-wide localization of each data set is shown in Figure 1G. In all tissues and H3K27ac and RNAPII S5P antibodies, signals were concentrated around the coding regions 159 (promoters and gene body) compared with the no-antibody (herein, No Ab; without primary 160 antibody) controls (53%-59% and 41%-48%, respectively). The results showed that the genomic 161 sequences were selectively extracted from the transcriptionally activated regions of the genome.

162
In Figure 1H, we describe the correlation matrix of the signal levels on the whole genome to  Table S3). These results suggest that tsChIL-seq can 169 capture the epigenomic differences between different tissues and is technically reproducible.

172
We next assessed the ability of tsChIL for low-input epigenomic analysis of tissues. First, we 173 performed tsChIL using thinly sectioned tissues from the liver, heart, and testis, and the identified 174 enhancers were compared by matching references 3 ( Fig. 2A). According to the odds ratio (i.e., 175 specificity, the detailed definition is described in Method), each H3K27ac ChIL-seq signal 176 preferentially captured the corresponding tissues-specific enhancer (Liver: 33.5, Heart: 27.1,

177
Testis: 4.1; The other odd ratios are listed in Table S4). Therefore, we successfully detected 178 tissue-specific enhancers using tsChIL with lower input compared to the previous reports that 179 utilized 500 µg chromatin equivalent to 10 7 −10 8 cells.

181
Next, we examined the enrichment of the H3K27ac signal on representative tissue-specific 182 enhancers, including the liver, heart, and testis. We focused on the enhancer region of Rxra 183 genes 29 specifically expressed in liver tissues, Gnat3 cardiac muscle-specific gene retinoic acid 184 receptor, and Eps8 expressed in the blood-testis barrier (BTB) 30 . H3K27ac signal enrichments 185 on each tissue-specific enhancer were observed on the IGV screen shot (Fig. 2B). In contrast,

186
all Actb-expressing tissues showed the ubiquitous enrichment of H3K27ac.

188
We further evaluated the enrichment of the regulatory sequence in extracted enhancers using  Table S5). The enrichment of known liver-specific TF-binding motifs,

191
Rxra, Hnf4a, Nr2f6, and others were observed in the H3K27ac tsChIL-seq data obtained from 192 the liver. This data is consistent with the liver-specific regulatory sequences registered as open 193 chromatin regions detected using ATAC-seq with mouse liver tissues in the database 31 .
Klf12 than others; Sox5 and androgen receptor (AR) binding motifs were enriched in the 196 testis-H3K27ac signal, which was consistent with previous studies reporting that AR binds to the 197 androgen responsible element (ARE) on regulatory sequences with histone acetyltransferase to 198 regulate gene expression 32,33 . These data support that H3K27ac tsChIL can identify 199 cis-regulatory elements following the extraction of tissue-specific enhancers.  Figure 2C. In addition, the core transcription factor 210 Hnf4α 36 , which activates the genes by itself, was included in the top rank (1.6 to 3.5%).

211
Furthermore, the SEs featuring each tissue were identified. In the heart (left ventricle), Ablim1 212 expressed in the left ventricle and involved in left-right axis formation 37 , was detected, whereas 213 in the testis, SEs were identified in the vicinity of Crem, which is involved in spermatogenesis 38 .

215
Finally, to validate the function of the SEs identified in the liver using this method, we performed 216 tsChIL targeting Hnf4α, which showed a high specificity score (deviation-Z) in liver SEs. Hnf4α is 217 known to be an important nuclear receptor during hepatocyte differentiation 39 , and has been 218 shown to contribute to SE formation as a core transcription factor, along with RXRα 29 .

219
Immunostaining with the ChIL Probe showed that the HNF4 was distributed throughout most

224
Using the gene sets of SEs and TEs neighboring genes obtained in Figure 2D, gene sets 225 enrichment analysis (GSEA) 40 demonstrated that the hits of the ChIL-Hnf4α peaks against liver 226 enhancers scored as high as 0.72 in the enrichment score (Fig. 2G, top). Particularly, Hnf4α was 227 bound to 76.4-78.8% of the SEs (Fig. 2G bottom). In contrast, in the negative controls of the 228 heart-and testis-specific SEs, the number of SEs bound by Hnf4α was approximately 0.5 in the enrichment score and the percentage of Hnf4α bound to the heartand testis-specific SEs was at 230 a random chance level (24.4-34.2%).

232
In summary, the data from tsChIL-H3K27ac demonstrated that the regulatory candidate 233 transcription factor Hnf4α obtained from the cis-element refinement selectively binds to the 234 liver-specific SE region of the Hnf4a locus. Hnf4α could be validated to provide positive feedback 235 that binds to the SE region of its own Hnf4a locus. Our data indicated that tsChIL is useful for the 236 regulatory analysis of enhancers, including transcription factors and SEs, using low number of 237 cells.

tsChIL-RNAPII peaks detected the majority of active genes in tissue 240
Transcriptome information is obtained by evaluating the binding position of RNAPII using 241 epigenomic analysis. Here, we detected the active genes based on the binding of RNAPII on the 242 genome using tsChIL. In Figure 3A, we plotted the cumulative number of consumed reads of the 243 detected genes in RNA-seq and RNAPII tsChIL in the order of their read counts. Due to the wide 244 dynamic range of RNA-seq data, high copy-number mitochondrial-derived RNAs (e.g.,

245
mitochondrial ribosomal RNAs) and highly expressed genes that characterize each tissue (Alb in 246 liver, Myh6 in the heart, Prm1 in testes), consumed 80% reads on a small number of highly 247 expressed genes (whose expression can be confirmed; Liver 5%, Heart 1%, Testis 11%). The 248 identification of weakly expressed genes and rare populations in bulk tissue RNA-seq is 249 generally hard to obtain because the top 10% genes spends 80% of its reads in even at the  Thus, the genes were divided into five groups based on their expression levels from RNA-seq, 262 and the correlation of each tsChIL RNAPII signal with their expression levels was examined (Fig.   263   3B). In the high-expression group in all tissues, the intensity of the RNAPII signal in the TSS was 264 highly correlated with its expression level. In the 75 th -100 th percentile group, a high accumulation of RNAPII in the gene body region was also detected, suggesting a movement of RNAPII to the 266 locus upon transcriptional activation. Here, we showed that tsChIL-RNAPII demonstrated a 267 preference for capturing highly expressed genes in tissues. Subsequently, we assessed the 268 overlap between RNA-seq-confirmed genes (TPM > 0) and tsChIL-RNAPII peaks. tsChIL peaks 269 captured approximately 30% (Testis slightly lower, approximately 20%) of the active genes (TPM 270 > 0), whereas false positives were almost absent (Fig. 3C). In addition, tsChIL peaks stably 271 detected approximately 40-50% of the genes expressed in RNA-seq, independent of the TPM 272 threshold for defining the expressed genes in RNA-seq (Fig. S5). These results suggest that the 273 peak region is likely to capture genes with high expression because the region with high signal 274 counts was judged to be the peak region 43 . In all tissues, the expression levels of the genes in 275 Common were higher than those in RNA-seq group as expected (Fig. 3D). Acta1 (which is highly expressed in skeletal muscle) and Cd68 (a surface marker of 316 macrophages) (Fig. 4B). The Cd68 locus showed an overall increase in the RNAPII signal from 317 day 0 to day 3, whereas Acta1 showed an overall decrease. These results indicate the rapid 318 increase in immune cells and the decrease in skeletal muscle cells during the early stages of 319 injury (days 2-3) as shown in Figure 4C. In Acta1, however, the RNAPII signal is more 320 concentrated near the transcriptional end site (TES) than the transcriptional start site (TSS). We    Figure 4E shows the estimated values of the mean RNAPII levels at TSS and TR, along with the 337 confidence intervals. We then compared the tissue-wide expression levels of the corresponding 338 genes (Fig. 4F). Surprisingly, the tissues-wide expression of Acta1 and Cd68 were synchronized  The traveling ratio (or pausing index), a concise measure of RNA polymerase II dynamics, which 407 was originally introduced in the ChIP-chip as a measure of the degree of transcriptional 408 elongation 45,46 ; and used in GRO-seq 55 and ChIP-seq 56 . We found that the shape of the immunoprecipitation. In contrast, ChIL-seq, on which tsChIL is based, achieves a higher genome 428 coverage of at least 90% for histone modifications at the single-cell level. Accordingly, the 429 acquired data was assumed to be a sum of the deeply profiled cells. Thus, we believe that the 430 acquisition of such high-depth epigenome data will continue to be necessary for the modeling 431 compositions of tissues as shown in our framework. These high-depth data are expected to be 432 provided not only by ChIL-seq, but also by other single-cell epigenomic analysis methods; thus, 433 other methods can be integrated to our analysis framework.

435
tsChIL showed great potential to replace ChIP-seq, which has been the standard method of 436 epigenomic analysis for tissues. In this paper, the high reproducibility of tsChIL, both technically 437 and biologically, was demonstrated. Furthermore, tsChIL achieved comparable performance 438 while using fewer cells than ChIP-seq (~1/10,000 of required cell), and parameters, such as 439 fixation conditions, can be monitored based on the quality of immunostaining images. These Eight-week-old C57BL/6N mice were used as replicates for this study. The liver, left ventricle and 457 testis were prepared from male, and tibialis anterior (TA) muscles were from female mise.

458
Tissues were freshly frozen using isopentane chilled with LN2 and stored at −80ºC. Muscle 459 regeneration studies were performed as previously reported, except for the injection of CTX into 460 the TA muscle 59 . Injured and intact TA muscles were sampled from five mice at day 0, 3, 5, 7,    Peaks of tsChIL-H3K27ac were called using MACS2 65 with the option: callpeak --call-summits 503 --nomodel --nolambda -q 0.05. Tissue specificities of the peaks were evaluated using the odds 504 ratio in the known tissue-specific enhancer lists 3 . The odds ratio is defined as (p/(1-p))/(q/(1-q)),

505
where p is the proportion of hits in the target tissue and q is the proportion of hits to the other 506 tissues in the enhancer lists. ChromVAR 66 analysis was performed using consensus peaks of 507 each tissue. The consensus peaks were constructed by taking the intersection of the peaks of 508 three biological replicates. Typical and super enhancer candidates were called using HOMER 67 509 finePeaks with the option: -style super -superSlope -1000 -gsize 3e9. The pre-ranked GSEA 40 510 was performed using tag (read) count-ordered enhancer peaks. Then, the peaks were marked 511 by a binary indicator overlapping with tsChIL-Hnf4a peaks (called by MACS2 as described above 512 with the option: -q 1e-5).

515
Aggregation plots of the gene expression percentile groups were created using agplus 68 . The 516 gene groups were divided according to the TPM of the bulk RNA-seq analysis of each tissue 517 (liver, heart, and testis).

524
The offset term M i is the total reads (in millions) of the replicate i, and s ij is the indicator variable 525 that the read count y ij is either TSS (s ij = 0) or TES (s ij = 1). Since the offsetting is equivalent to 526 the CPM normalization of the mean count, exp(β 0 ) and exp(β 1 ) can be referred to as the mean 527 CPM at TSS and the magnification factor of TES to TSS (i.e., the traveling ratio) of the gene, 528 respectively. The model evaluates variance and can thus estimate the confidence intervals of the 529 traveling ratio by utilizing all replicates (5 in our case) that have different total sequenced reads.

530
We assumed that the contrasts X -Y (e.g., fold-changes of TR between day 3 and day 0) follow a

835
Volcano plots of the contrasts (day 3 vs. day 0 after CTX injury) for TR (top) and TSS (bottom).

837
Significant changes that satisfy |log 2 FC| > 1 (twofold) and FDR < 0.1 are in red. Genes that have 838 the top 10 p-values are labelled.