ABSTRACT
Inhibition of RNA synthesis caused by DNA damage-impaired RNA polymerase II (Pol II) elongation is found to conceal a local increase in de novo transcription, slowly progressing from Transcription Start Sites (TSSs) to gene ends. Although associated with accelerated repair of Pol II-encountered lesions and limited mutagenesis, it is still unclear how this mechanism is maintained during recovery from genotoxic stress. Here we uncover a surprising widespread gain in chromatin accessibility and preservation of the active histone mark H3K27ac after UV-irradiation. We show that the concomitant increase in Pol II release from promoter-proximal pause (PPP) sites of most active genes, PROMoter uPstream Transcripts (PROMPTs) and enhancer RNAs (eRNAs) favors unrestrained initiation, as demonstrated by the synthesis of short nascent RNAs, including TSS-associated RNAs (start-RNAs). In accordance, drug-inhibition of the transition into elongation replenished the post-UV reduced levels of pre-initiating pol II at TSSs. Continuous engagement of new Pol II thus ensures maximal transcription-driven DNA repair of active genes and non-coding regulatory loci. Together, our results reveal an unanticipated layer regulating the UV-triggered transcriptional-response and provide physiologically relevant traction to the emerging concept that transcription initiation rate is determined by pol II pause-release dynamics.
Introduction
Initiation at transcription start sites (TSSs) of RNA polymerase II (Pol II) and promoter-proximal pause (PPP) release into productive elongation are ubiquitous and crucial steps regulating the transcription of protein-coding genes and long non-coding RNAs1,2 (together called mRNAs in this manuscript). The same stands for the transcription of regulatory non-coding regions expressing enhancer RNAs (eRNAs) bidirectionally from enhancer TSSs (eTSSs)3–5 and for PROMoter uPstream Transcripts or upstream antisense RNAs (collectively called PROMPTs inhere), which are produced in the opposite direction to mRNA when two stable transcripts are not initiated in very close proximity and in opposite directions6. Contrary to mRNAs, eRNAs and PROMPTS are short and their detection can be technically challenging because they are unstable as a result of high early-termination rates and increased susceptibility to degradation by the RNA exosome6,7.
Initiation of transcription by Pol II in all the above regions depends on the efficient assembly of the pre-initiation complex (PIC) upstream of transcription start sites (TSSs) and on TFIIH-dependent promoter opening and phosphorylation of serine 5 (S5P) residue in the C-Terminal Domain (CTD) of Pol II8,9. After elongation of ∼30–60 nucleotides of initiation-associated RNAs (or TSS-associated RNAs), so-called start-RNAs10,11, Pol II is paused at PPP sites by negative elongation factors DSIF and NELF2,12. Signal-regulated phosphorylation of these factors and of serine 2 residue (S2P) of Pol II CTD by P-TEFb is required for productive elongation13–15. It recently emerged that, if this step does not occur rapidly, start-RNAs are terminated14,16, implying that Pol II turnover at PPP sites is high and that replenishment of Pol II engaged in early transcription is achieved by the continuous re-entry of pre-initiating Pol II into PICs16,17.
The integrity of the genetic information encoded in DNA sequence is persistently challenged by a variety of genotoxic perturbations18. A plethora of DNA Damage Response (DDR) mechanisms have evolved to guarantee the detection and removal of different types of DNA lesions, limiting the probability of mutagenesis by adjusting to the cell’s status and need for efficient recovery from DNA damage19–21. Nucleotide Excision Repair (NER) plays a vital role in sensing and removing a large panel of bulky helix-distorting DNA adducts such as Cyclobutane Pyrimidine Dimers (CPDs) induced by ultraviolet (UV) light, as well as benzo[a]pyrene guanine adducts induced by cigarette smoke19,22. Transcription Coupled-NER (TC-NER) is promptly triggered by elongating Pol II molecules encountering DNA adducts and speeds-up excision and repair in expressed loci23,24. In comparison, the second NER sub-pathway, Global Genome-NER (GG-NER) operates through the entire genome but recognizes more stochastically helix distortions19,22,25,26. Importantly, given all the classes of transcripts defined above, it is estimated that the coverage of transcribed regions27 potentially scanned by TC-NER expands to more than 50% of the genome, thus qualifying transcription as a major driving force in safeguarding genomic stability.
Although TC-NER depends on lesion-sensing potential by elongating Pol II molecules, transcription elongation has been shown to be transiently inhibited after UV irradiation28–30 due to a proportion of Pol II molecules stalling at encountered DNA damages28,31. Moreover, depletion of the pre-initiating hypo-phosphorylated Pol II(-hypo) isoform from chromatin shortly after UV irradiation28,32,33 has led to the assumption that new transcription initiation events are transiently and globally repressed32–37. On the other hand, recent reports28,29,38 have revealed a functionally essential stress-dependent global increase in 5’ nascent RNA (nRNA) activity that depends on the UV-induced raise in active P-TEFb levels39,40 and on the rapid dissociation of the NELF complex41. The ensuing fast and global release of de novo Pol II elongation waves from PPP sites into gene bodies boosts lesion-sensing activity and accelerates removal of DNA adducts by TC-NER in virtually all active mRNA genes28. Together these findings substantiate the possibility that UV might not as severely affect initiation of transcription, in contrary to previous beliefs.
Taking also into consideration recent evidence that supports the model of disengagement of a given Pol II molecule from DNA template after damage recognition34,35,42, it is tempting to assume that ensuring continuity in transcription initiation may bring advantages in the repair process. We thus hypothesized that the apparent loss of pre-initiating RNAPII may not be due to the absence of RNAPII recruitment at TSSs, but rather due to a decrease in the dwell-time of Pol II-hypo isoform at TSS, as justified by the concomitant increase in S5P- and S2P-pol II downstream of TSS28. In this way, cells would be able to uninterruptedly feed the global release of lesion-scanning enzymes into transcribable sequences and guarantee the detection of more lesions along DNA template strands.
Herein, we deciphered chromatin dynamics genome-wide upon UV damage and found a significant gain in accessibility (Assay for Transposase-Accessible Chromatin using sequencing, ATAC-seq) at the TSSs of virtually all active regulatory regions controlling mRNAs, PROMPTs and eRNAs expression. This phenomenon was underlined by the maintenance of active histone marks (H3K27ac), the lack of deposition of transcriptional silencing modifications (H3K27me3) and correlated with the influx of Pol II into productive elongation. The paradoxical decrease in pre-initiating Pol II-hypo at these TSSs upon UV was elucidated by revealing that the presence of Pol II-hypo could be rescued when PPP release was drug-inhibited. Accordingly, preserved production of start-RNAs after UV stress lied under the increased production of nRNA and was prevented only after inhibition of transcription initiation. The identified genome-wide dependence of initiation rate on promoter-proximal pause release dynamics explains the seamless recruitment/initiation of Pol II upon UV, in turn enabling efficient repair of the totality of the sequences encoding active regulatory regions and mRNAs.
Results
Chromatin accessibility increases at active regulatory regions upon UV irradiation
To characterise the impact that UV might have on the chromatin landscape of transcriptional regulatory regions and how this could be linked to the widespread PPP-release of elongating Pol II and the local increase in nRNA production downstream of TSS28,29,38, we first determined the genome-wide changes in chromatin accessibility. The omni-ATAC-seq protocol43 was implemented in our system involving the irradiation with mild doses of UV-C of human skin fibroblasts synchronized in early G1 (see Methods and also28). We reproducibly measured chromatin accessibility before (NO UV) and after (+UV) irradiation during the early phase of recovery (Supplementary Fig. 1a) and mapped a total of 105,574 Accessible Regions (ARs) across conditions. ARs were enriched at promoters and intragenic or intergenic regions with transcriptional regulatory function (TSSs, TSSs flanks and enhancers according to ChromHMM annotation, Fig. 1a, Fig. 1b,c, Supplementary Fig. 1 b-d, Methods). Interestingly, we reveal a widespread increase (up to 1.71 average Fold Change (FC)) in chromatin accessibility after stress at 97.9% of promoter-, 94.6% of intragenic- and 94.4% of intergenic-ARs (Fig. 1 b-d, Supplementary Fig. 1 d, e).
We then selected Differentially Accessible Regions (DARs) by applying stringent thresholds both in terms of FC (Log2 FC > 1) and P-value (P < 0.001) and found that 6410 loci shown particularly increased chromatin accessibility upon UV (DAR-gain) (Fig. 1e, top panel). DAR-gain found at promoter regions represented 13,3% of all promoter ARs (Fig. 1e, lower panel), thus pinpointing towards a potentially functionally relevant chromatin opening at TSS regions. DAR-gain located at intragenic and intergenic loci (Fig. 1e) were linked to genes if they overlapped functional enhancers defined in FANTOM5 (Methods). We found that genes associated with DAR-gain loci (either identified on their promoter or enhancers) were representative (adjusted (adj.) P < 0.05) of a number of biological pathways previously associated with DDR processes, including cellular response to stress, DNA repair, transcription regulation by TP53 and cell cycle checkpoints (Supplementary Fig. 2). In addition, we identified a broad range of many other significant GO categories (163 in total, Supplementary Fig. 2), a result in line with the previously reported global PPP release of elongating Pol II waves at all active gene bodies upon UV irradiation28.
Chromatin marks associated with transcription status remain stable after damage
A number of studies have demonstrated that the turnover, modification and/or degradation of histones around damage sites represent essential steps in conserved pathways that help cells deal with genotoxic stress44–46. However, especially in the case of UV-C induced DNA damage, little is known about the post-translational modifications (PTMs) of histones around transcriptional regulatory regions. To better interpret the increase in chromatin accessibility and clarify its possible impact on genome-wide transcription dynamics, we studied the differential presence of two histone PTMs representative for the transcription status of associated chromatin: the silencing mark H3K27me3 and the activation mark H3K27ac47–49.
We conducted ChIP-seq experiments with antibodies specific for these histone PTMs in NO UV and + UV conditions and focused our analysis on TSSs of mRNAs and with on a robust set of eTSSs, which are known to be functional and potentially transcribed in the investigated cell type according to the FANTOM5 database. We used the ChIP-seq data generated in this study (H3K27ac and H3K27me3), as well as previously published ChIP-seq data (Pol II-ser2P28) from the steady-state (NO UV) condition, to determine subsets of Active (presence of H3K27ac and Pol II-ser2P peaks over TSS), Repressed (presence of H3K27me3 peaks over TSS) and Inactive loci (no peak detected over TSS for H3K27ac, H3K27me3, and Pol II) (Fig. 2a, see Methods) in our cell system. We associated the changes in histones marks and Pol II observed in these regions upon UV with ATAC-seq results. The increase in chromatin accessibility was detected at all active TSSs, which corresponded largely to the promoters identified above (compare Fig. 1a and Fig. 2a, and see Methods), as well as FANTOM5-annotated active eTSSs upon UV (Fig. 2b, ATAC, 95% Confidence Interval (CI) excludes 0). This opening was in sharp contrast to the UV-induced global loss of Pol II-hypo at TSSs and eTSSs (Fig. 2b, Pol II-hypo, 95% CI excludes 0) (Fig. 2a and b, and Supplementary Fig. 3) at these regulatory regions.
Strikingly, we also found a high stability in the levels of H3K27ac (Fig. 2a, b, 95% CI includes 0, and Supplementary Fig. 3) and we observed no exchange of H3K27ac for H3K27me3 in response to UV at these active TSSs and eTSSs. Reciprocally, there was no loss of H3K27me3 for H3K27ac, and no gain of Pol II at repressed loci (Fig. 2a, Supplementary Fig. 3). Accordingly, the results of our genome-wide analysis were consistent with biochemical evidence obtained by histone acetic extraction followed by Western Blot analysis, showing that the global levels of H3K27me3 or H3K27ac remain fairly stable during the full period of recovery from UV stress (Supplementary Fig. 3c, d).
We therefore conclude that depletion of detectable Pol II-hypo at TSSs and eTSSs does not occur due to repression of these loci by tri-methylation of H3K2750,51, or loss of activating histone mark H3K27ac48.
Chromatin opening parallels Pol II transition into elongation upon UV irradiation
To elucidate the functional advantage associated with increased chromatin accessibility in response to UV, we performed a thorough integrative analysis of our data in relation with previously published datasets (Pol II-ser2P from28and CAGE-seq from4, see Methods). First, we customised a genome annotation, which unambiguously pinpoints to the TSSs of mRNAs, PROMPTs, and eRNAs that do not overlap with regions possibly being transcribed through from neighboring/overlapping genes, promoters or enhancers (see Methods). We then established three categories (Fig. 3a-c), as per previously suggested models52: first, active bidirectional promoter regions, which include the TSSs of mRNA-mRNA pairs transcribed in opposite directions (Fig. 3a); second, active unidirectional promoters, which include the TSS of one mRNA gene (+ or -) for which we could associate an expressed PROMPT in the antisense direction (Fig. 3b); third, active intergenic—as opposed to intragenic—enhancers to avoid potential contamination by interfering reads that derive from overlapping transcription of other active elements (Fig. 3c). Importantly, PROMPT and enhancer transcriptional activity was defined from available Cap Analysis Gene Expression (CAGE) data for skin and dermal fibroblasts (FANTOM5 consortium, see Methods) that accurately determine transcript starting position (5’ end), abundance and directionality of Pol II transcription in our model (Fig. 3a-c, CAGE). TSS loci were sorted by interCAGE distance, which we defined as the distance separating the summits of CAGE signals detected on the (+) and (-) strands (Fig. 3a, b and Methods). This allowed us to identify regions with overlapping (convergent, CONV) or non-overlapping (divergent, DIV) transcription (Fig. 3a, b). By focusing on the latter category, we could study the dynamics of transcription at play in each direction, without having to deal with potential interferences.
Using this set-up, we discovered that the UV-dependent increase in chromatin accessibility (Fig. 3 a-c, ATAC) was paralleled by the transition of Pol II into active elongation (Fig. 3 a-c, pol II-ser2P), not only at flanking mRNAs (Fig. 3d, e), but also at adjacent PROMPTs and eRNA sequences (Fig. 3e, f), as shown by the loss in Pol II reads at TSSs and the gain of reads in downstream regions. These results were confirmed quantitatively by showing that Escape Index (EI) of elongating pol II (inverse of Pausing Index11) increased in the +UV condition in comparison to NO UV for 90.1% of bidirectional promoters (Fig. 3g, Chi-square test P = 5.1 × 10−266), as well as for 70.1 % of PROMPTs (Fig. 3h, Chi-square test P = 4.5 × 10−89) and 68.6 % of eRNAs (Fig. 3i, Chi-square test P = 2.5 × 10−44). We conclude that the PPP release of Pol II upon genotoxic stress is synchronously triggered at all active transcription units and coincides with increased chromatin breathing. These data extend the previously characterised transcription-driven genome surveillance mechanism28 to essential all active gene regulatory regions and give mechanistic insights into the synergy between the increase in chromatin accessibility and the transcriptional response observed upon UV.
DRB rescues the post-UV detection of Pol II in PIC
We noted that although 63.65 % of the transcribed genome shows reduction in transcription activity (coverage of the transcriptome with Log2 FC (+UV/NOUV)) < 0, see Methods), a local increase in nRNA synthesis downstream of TSS of all active genes is detected during the UV-recovery phase28–30,38. This observation combined with the above findings on the UV-induced chromatin opening around virtually all active TSSs, PROMPTs and eTSSs are hardly compatible with the previously suggested model of UV-induced global inhibition of transcription initiation. We thus searched for alternative reasons that could explain reduction of Pol II-hypo levels at active TSSs/eTSSs with increased accessibility after UV.
We performed a set of experiments aiming to determine whether Pol II was actually recruited to TSSs upon UV (proxied by its -hypo isoform). First, as depicted in Figure 4a, we irradiated cells with a mild UV dose and we left them to recover for 2 hours, when the levels of Pol II-hypo have been shown to be severely depleted28,32,33. We then applied, or not, an inhibitor of the release of elongating Pol II from PPP (DRB, see Methods). Cells were crosslinked 2 hours after the addition of DRB (or DMSO for the control cells). In accord with the previous reports, in cells that were crosslinked 2 h after UV irradiation in the absence of DRB (+UV / X 2 h), or in cells that were crosslinked 4 h after UV irradiation and had been incubated with DMSO for the last 2 h (+UV / −DRB / X 4 h), we detected only minimal levels of pre-initiating Pol II in total chromatin extracts or at TSSs, PROMPTs, and eTSSs, as revealed by Western Blot analysis (Fig. 4b) and ChIP-seq (Fig. 4c, d), respectively. In contrast, when cells had been incubated with DRB for the last 2 h before being crosslinked at 4 h after UV irradiation (+UV / +DRB / X 4h), we observed a significant rescue of pre-initiating Pol II (-hypo) levels in total chromatin (Fig. 4b, two-sided Student’s t test P = 0.0055 compared to “+UV/−DRB/X 4h” and P = 0.0156 compared to “+UV/X 2h”). The restoration of pre-initiating Pol II levels was even more pronounced when we focused on the occupancy on active TSSs, PROMPTs and eTSSs, where average read densities detected by Pol II-hypo ChIP-seq after DRB treatment (+UV / +DRB / X 4h) matched the control NO UV levels (NO UV / +DRB / X 4h) (Fig. 4c, d). Therefore, even by blocking the stress triggered transition of Pol II molecules from PPP sites into elongation at two hours post UV, when the prior-to-UV Pol II-hypo levels were almost completely depleted, we were able to reveal the underlying continuous de novo recruitment of Pol II-hypo molecules in PICs.
We also applied DRB just before and for two hours after UV (Supplementary Fig. 4a) and found a limited loss of pre-initiating Pol II in chromatin extracts upon UV (Supplementary Fig. 4b, c, two-sided Student’s t test P = 0.0145). This result was corroborated by ChIP-qPCR experiments (performed on the same chromatin extracts used above), as DRB prevented the UV-induced reduction in occupancy of Pol II-hypo at promoter/TSS proximal regions of six active genes (Supplementary Fig. 4d, two-sided Student’s t test P = 0.002 for DMSO, while P = 0.3138 (non-significant) for DRB).
We thus conclude that the genome-wide UV-induced PPP-release of Pol II molecules into elongation accelerates the transition into initiation of the next-to-be recruited Pol II-hypo molecules, limiting the dwell time of this isoform at essentially all active TSSs, PROMPTs and eTSSs.
Increased nRNA synthesis from active TSSs upon UV irradiation
Having established that UV irradiation does not inhibit the recruitment of Pol II-hypo into PICs, we next examined the presence of newly synthesized nRNA molecules at TSSs, to determine whether these post-UV recruited Pol II pre-initiating molecules actively proceed into initiation. We took advantage of our and others nRNA-seq data28,30 and we examined if the previously characterized global increase of EU- or Bru-labelled RNA reads at the beginning of genes (see Supplementary Fig. 4 in28) could originate from increased Pol II initiation at active TSSs (Fig. 5a, b and Supplementary Fig. 5a, b), as suggested before30. In particular at unidirectional promoters, we confirmed that nRNA synthesis was increased in the mRNA direction, but we also found a concomitant increase of nRNA production in the antisense, PROMPT direction. Similarly, we found widespread gains in intensity for eRNAs, which emanate equally in both directions from active eTSS (Fig. 5a, b and Supplementary Fig. 5 a, b). Identifying labeled nRNA even at short transcripts such as PROMPTs and eRNAs confirms active labeling close to TSSs and validates the fact that regions directly downstream of TSSs get de novo transcribed during the post-UV period. Taken together these data demonstrate that the continuous recruitment of Pol II-hypo molecules (see Fig. 4) and their fast transition into initiation/productive elongation (see Fig. 3), during the post-UV recovery period, is accompanied by synthesis of nascent RNA.
To further verify initiation activity during UV-recovery, we exploited the possibility to track start-RNAs, which directly inform on the amount of dynamically engaged Pol II located within the initially transcribed sequence (approximately the first 100 nucleotides11). We followed the experimental procedure depicted in Fig. 5c and applied, or not, transcription elongation (DRB) or initiation (triptolide-TRP) inhibitors 2 h post UV. For each condition, we isolated small RNAs by size-selection (<200 nucleotides), and we ligated an RNA-DNA linker to their 3’ ends. Reverse Transcription (RT) was performed using a universal primer annealing to the linker sequence as previously described7. Subsequently, locus specific qPCR reactions were performed in order to compare, in a quantitative way, the levels of start-RNAs at representative active loci for which we had identified Pol II-ser2P ChIP-seq or nRNA-seq signal (see Methods). Our results revealed that start-RNAs could be detected after UV treatment, validating the fact that initiation still occurs during the UV-recovery phase (Fig. 5d, +UV / −DRB). Similar result was obtained in the presence of the transcription elongation inhibitor (Fig. 5d, +UV / +DRB). However, the opposite was found after inhibiting transcription initiation by TRP, which led to a clear reduction of start-RNAs (Fig. 5d, +UV/+TRP, two-sided Student’s t-test P = 0.0037 compared to “NO UV/+DRB”, P = 0.0016 compared to “+ UV/−DRB”, P = 0.0009 compared to “+ UV/+DRB”), consolidating further evidence of the non-stop recruitment and functional engagement of Pol II at TSSs after UV irradiation.
Equal levels of Pol II-hypo at PICs primes for uniform TC-NER
Next, we took advantage of XR-seq data (eXcision-Repair sequencing)24, which precisely and exclusively pinpoint the location and levels of transcription-dependent repair (TC-NER pathway) when the assay is performed in GG-NER-deficient cells (Xeroderma Pigmentosum (XP)-C cells). Given the strand-specificity of the assay, we considered only the excision of CPD-damages from template (non-coding) strand (TS) for mRNAs, PROMPTs and eRNAs, which corresponded to the + (blue) or the – (red) strand of the genome (Fig. 6a-b) depending on the transcript orientation. Upon correlation with CAGE, we found that onset of TC-NER coincided with CAGE reads location, confirming the fact that TC-NER (triggered by damage-arrested Pol II molecules24) and CAGE4 accurately locate active TSSs (Fig. 6b (compare with Fig. 3a; left), Fig. 6c, d). As expected, repair efficiency was equal in each direction for bidirectional active promoters (Fig. 6b-d, Fig. 6e, XR-seq (XP-C)). This result was also in line with Pol II-hypo ChIP-seq data showing equivalent amounts of Pol II recruitment at PICs (Supplementary Fig. 6a) and CAGE data indicating balanced production of capped mRNAs (Fig. 6e, CAGE, boxes centered around Log2 FC = 0). Nevertheless, we note that the variability between both directions was strikingly less for TC-NER (XR-seq (XP-C)) and Pol II-hypo than for CAGE (Fig. 6e, proportion of non-significant F-Tests: P = 0) (Supplementary Fig. 6b, top panel, proportion of non-significant F-Tests: P = 0).
Next, we further investigated repair of PROMPTs and enhancers, a phenomenon previously observed, but hardly explained24,53. We quantified strand-specific repair upstream and downstream of unidirectional promoters and found stronger than expected repair activity at unambiguously resolved divergent PROMPTs (Fig. 6b-d, DIV). Indeed, XR-seq read density was not correlated to the steady state levels of CAGE at those loci (Pearson Correlation Coefficient = 0.1343). Also, FC of TC-NER reads between mRNA and PROMPTs were much smaller than for CAGE (Fig. 6f, 95 % CI excludes 0), thus matching the UV-independent Pol II-hypo uniformity (Supplementary Fig. 6 a-b). Similarly, TC-NER levels on TS of eRNAs were higher than anticipated. Indeed, density of eRNA XR-seq reads were similar to those of mRNAs (Fig. 6b-d) and contrasted with the very low CAGE signal detected at these loci (Fig. 3a-c). Therefore, balanced Pol II-hypo loading in PICs at all classes of transcripts allows for equal initiation events and mirrors the homogenous levels of XR-seq detected in these regions. Taken together, our results demonstrate that the widespread continual initiation and release into productive elongation of Pol II waves maximizes repair activity regardless of prior-to-UV transcript expression level at all kinds of active regulatory regions (mRNA, PROMPTs, enhancers).
Continuous transcription initiation maximises transcription-driven repair
We next assessed the biological purpose of continuous transcription initiation from active regulatory regions during the UV-recovery period. We have reported previously28 that in the absence of a UV-triggered PPP release of elongating Pol II waves, the pri-elongating (e.g. already elongating prior to UV) Pol II molecules cannot repair the totality of the transcribed genome. Thus, we and others believe that sending Pol II molecules to allow the detection of the next lesions in line on the TS is of pivotal importance28,42.
To delineate this concept further, we quantified excision activity at thymidine dimers (TTs) (see Methods and28) as time passes using XR-seq data from XP-C cells irradiated under conditions that allow PPP-release of Pol II for only a short period of time (DRB2 experiment by42) (compare “DRB2 +0.5 h” with “DRB2 +1 h” in Supplementary Fig. 7 a-c). Critically we find, that the number of excision events in cluster I and II (more downstream of TSS) 1 hour after UV (DRB2 +1 h) did not match the levels of cluster 0 and I (more upstream) for DRB2 +0.5 h (before the asterisk in Supplementary Fig. 7c). Taking into consideration that only one Pol II molecule can be accommodated per PPP site at each active mRNA, PROMPT and eRNA allele at the time of irradiation, one can postulate that the extent of ongoing repair activity observed downstream of TSSs without DRB, 1 hour after UV, is the result of concurrent Pol II recruitment and initiation.
When we mapped XR-seq reads at TTs 1 h, 4 h and 8 h after UV in WT cells (data obtained from Adar et al.53, see Methods), we confirmed that significant levels of transcription-dependent excision activity was maintained for a significant proportion of lesions located directly downstream of active genes TSSs (cluster 0), even at late time points during the recovery process (Fig. 7a-c). Notably, it also appeared that the lesions located more distally to TSSs in active genes’ TS (from 32,5 kb to 1 Mb) get recognized and excised more efficiently as time passes (Fig. 7a-c, clusters III-IV and V-VI). These results reveal that a large extent of the transcription-driven repair activity is due to the ongoing entry of recycling Pol II molecules at TSSs. Our analysis highlights the advantage of a continuously supplied transcription-dependent repair process over slower and less efficient lesion detection capabilities of GG-NER, which was detected at significantly lower levels in all clusters as shown in inactive genes (Fig. 7 a-c).
Discussion
In this study, we provide quantitative insights into the molecular processes underlying the major transcription-coordinated cellular response that is activated in human cells upon genotoxic stress28–30,38,41,54. The establishment of precise maps of chromatin state helped us to query in detail the impact of transcription on DNA repair activities at important functional regions, including PROMPT and eRNA loci. Our results support a model of continuous transcription initiation that can promptly feed the widespread UV-triggered escape of Pol II into the elongation enabling efficient DNA lesion-scanning of the whole transcribed genome.
The finding that increase in chromatin accessibility parallels the conservation of H3K27ac-modified nucleosomes at the flanks of already open regions in response to mild doses of UV-C irradiation is compatible with reports showing that there can be a significant gain in nucleosome accessibility without changes in nucleosome occupancy during rapid transcriptional induction55. Notably, the maintenance of H3K27ac at these sites prevents the imposition of repressive tri-methylation at active loci (see Fig. 2), in accord with the rule that H3K27ac and H3K27me3 are mutually exclusive56. Moreover, finding active transcription at these loci complies with prior reports suggesting that increase in gene expression are associated with surges in chromatin accessibility57,58 and that the presence of nRNA is known to inhibit51,59 the recruitment of H3K27me3-catalysing Polycomb Repressive complex 2 (PRC2) at active genes. Our data contrast the drastic chromatin remodeling observed in mice at a later time during recovery (6 h) when much higher doses of UV-B were used60. This suggests that when cells deal with unmanageable levels of damages they need to implement completely different expression changes required for the associated fate of programmed death, a protective mechanism limiting the risk of malignant transformation 61–63.
Our analysis takes advantage of a high-resolution strand-specific map of TSSs for coding and non-coding (enhancers and PROMPTs) loci and supports the idea that bidirectional transcription of divergent RNAs arises from two distinct hubs of transcription initiation (PICs), located within a single nucleosome-depleted region (NDR)9,64–66. Indeed, for bidirectional mRNAs and mRNA-PROMPTs, the binding of Pol II-hypo occurs at both edges of highly accessible regions (see ATAC-seq vs Pol II-hypo in Supplementary Fig. 6a, c), which correspond to single NDRs flanked by H3K27ac nucleosomes (see arrows in Supplementary Fig. 6c). These observations also extend the evidence supporting the claim that enhancers and PROMPTs PICs are organized in a similar manner to genes PICs9,15. In addition, the observed differences in transcript levels between PROMPTs and mRNAs (see Fig. 5) are probably not due to differences in Pol II-hypo recruitment (see Supplementary Fig. 6b, bottom), but rather due to differences in the frequency of premature termination at PPP sites and/or differences in degradation of PROMPTs RNAs by the RNA exosome. Interestingly, the latter is known to be inhibited upon UV stress 67,68 (see below).
By uncoupling TSSs of mRNA genes from those of PROMPTs and enhancers, we reveal that P-TEFb-dependent release of elongating Pol II from PPP sites extends to all actively transcribed regions (see Fig. 3). Interestingly, a growing number of studies have reported data to suggest that (i) UV irradiation preferentially inhibits elongation, rather than transcription initiation28–30,38, (ii) P-TEFb and NELF are important regulators of UV-response41,54,69 and (iii) although elongation gradually decelerates due to the encounter of Pol II with DNA lesions, significant initiation/early elongation activity (assessed by nRNA-seq) is observed in the first thousand bases of actively transcribed regions28,29,38, a characteristic that has also been used for the identification of active TSSs genome-wide after UV30. These features are consistent with our finding that new Pol II-hypo molecules are constantly recruited to PICs post-UV (see Fig. 4), and that they promptly proceed into initiation of start-RNAs and subsequently into elongation of longer nRNAs (see Fig. 5). Considering that Pol II ChIP density depends, among others, on the epitope residence-time at a given genomic locus70 and that Pol II molecules recruited in the PIC are readily phosphorylated28,32,33, we propose that the rapid exchange of Pol II isoforms after UV irradiation represents a perfectly plausible cause for the decreased ability to detect Pol II-hypo molecules at TSSs upon UV (see Fig.2, 4 and28). Such a model explains previously published data concerning the presence of (i) PIC/basal transcription factors in nuclear extracts33 or upstream of genes’ TSSs (TFIIB)37and (ii) nRNAs at the beginning of genes28–30,38 upon UV.
We note that excision fragments (from XR-seq) are distributed more homogeneously at sense (mRNA) and antisense (PROMPT) strands of unidirectional TSSs and at enhancers (Fig. 6) than it could be predicted from the CAGE levels. This finding reinforces the possibility that efficient repair at stable and unstable transcripts is primed by the uniform recruitment of Pol II-hypo at all classes of PICs in the steady state (see Supplementary Fig. 6). Remarkably, UV induces its continuous and uniform transition into initiation (see Fig. 4) and constantly feeds PPP release of DNA lesion-sensing Pol II into the transcriptome (see fig. 3 and 5). This concept was further validated by re-analysing an experiment mapping repair after inhibition of PPP-release, and thus a fortiori Pol II initiation42. We find a drastic impairment of CPD excision at TTs located on TS upstream of the previously calculated Pol II wave front (WF) (low XR-seq signal for DRB vs DMSO in supplementary Fig. 7d), and decrease in the percentage of excision of damaged TTs located in cluster 0-II (see Supplementary Fig. 7 a-c, DRB vs DMSO), thus suggesting that post-UV initiation is crucial for the repair of sequences located directly downstream of TSSs. Critically, uniform and continuous recruitment of Pol II at TSSs is further shown to maintain this process trough recovery, as ongoing TC-NER activity is persistently detected directly downstream of TSS at 4 and 8 hours after UV (see Fig. 7, TTs on active genes).
In the view of these results and the fact that continuous nRNA-seq signal has been detected uninterruptedly between 2 and 12 h post-UV29, we propose that maintaining initiation is necessary to allow for the sensing of the next lesion in line, thus maximizing the probability to repair quickly all lesions located on actively transcribed TS. Indeed, as time passes more lesions get to be removed further downstream (towards the end of long genes) (Fig. 7, cluster III to VI (32.5 kb to 1 Mb).
Increase in TC-NER at regulatory regions has also been observed in E. Coli71and is compatible with the idea that the act of antisense transcription exerts a meaningful biological function72 conserved through evolution. Indeed, these DNA sequences may serve as binding sites for transcription factors or encode target sites for RNA binding proteins, enabling accurate regulation of topologically associated mRNA genes66,73. Given the effect of DNA repair on the landscape of somatic mutations in cancer tissues19,20, surveillance of these vital sequences impacts on cell’s fitness. We propose that our model could account for the low levels of substitutions recently observed upstream of genes’ TSSs and around DNAse hyper-sensitive (DHS) sites74–76.
Recent advances in the field of transcription regulation point to the fact that activation of paused genes is mediated through switching from a premature termination state of Pol II at PPP sites to a processive elongation state13,16,17, implying that continuous initiation is required for fast transcriptional induction16. Our results, showing that persistent initiation guarantees a prolonged transcription-coupled NER, are functionally linked to the fact that DNA damage-triggered widespread PPP release of a given Pol II is sufficient to drive immediate initiation of the next Pol II (see Fig. 4 and 5, Supplementary Fig. 4 and 5). In other words, the clearance rate of Pol II from TSSs highly depends on PPP status. These findings are in favor to the emerging concept that Pol II pausing has an inhibitory effect on initiation77–79 and highlight how this mechanism can function across the whole transcriptome. At the same time, our results provide a novel physiological relevance to why cells could gain from firing initiation continuously, as balance between promoter-proximal termination and escape into elongation allows dynamic responses to stimuli.
Materials and Methods
Cell culture and treatments
Cells used in this study were VH10 HTERT immortalized normal human skin fibroblasts. Cells were cultured, synchronized by low-serum-starvation and release in full medium as described previously28, unless stated differently. When applied, 5,6-dichloro-1-β-D-ribofuranosylbenzimidazole (called DRB) (Calbiochem) and triptolide (called TRP, Invivogen) were used in a final concentration of 100 μM and 125 nM, respectively and they were added directly in growth media at indicated times. Cells were irradiated with mild doses of UV-C (254 nm, TUV Lamp, Philips) (15 J/m2 except if otherwise stated).
ChIP-seq
ChIP-seq was performed as previously described28 with minor changes. Cells were mock-treated (NO UV) or with UV (+UV) (Fig. 2, dose for H3K27me3 was 20 J/m2) or as indicated on the timeline (Fig. 4). Formaldehyde was added to the cells at a final concentration of 1% at 4°C for 12 mins. The reaction was quenched by adding glycine (final concentration of 125 mM) for 5 mins. Cells were washed 3 times with cold PBS and then collected in PBS containing 1 mM EDTA, 0.5 mM EGTA and 1 mM PMSF and pelleted in pellets of ∼2×107 cross-linked cells.
Pellets were resuspended in Chro-IP lysis Buffer (50 mM Hepes-KOH pH 8.0, 1 mM EDTA, 0.5 mM EGTA, 140 mM NaCl, 10 % glycerol, 0.5 % IGEPAL, 0.25 % Triton X-100, 1 mM PMSF, and a mix of protease inhibitors (Roche)). After 10 min rotation at 4°C, cell suspension was centrifuged (10 min, 2,800 rpm at 4°C) and supernatant was kept as soluble fraction. In turn, cell pellet was washed with Wash Buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 0.5 mM EGTA, 200 mM NaCl, 1 mM PMSF, 10 mM NaPy and protease inhibitors). The cell suspension was rotated for 10 min at 4°C and centrifuged as mentioned above. The cell pellet was resuspended in RIPA Buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 0.5 mM EGTA, 140 mM NaCl, 1% Triton X-100, 0.1% Na-Deoxycholate, 0.1% SDS, 1 mM PMSF, 10 mM NaPy and protease inhibitors). Samples were sonicated using the Bioruptor water bath sonicator (Diagenode) using the “high” setting with cycles of 30 sec “on” and 30 sec “off”, for a total duration of 25 minutes. Samples were then centrifuged for 10 min at 10,000 rpm at 4°C. The supernatant was kept as chromatin fraction (Input). Chromatin immunoprecipitation (ChIP) was performed by incubation of equal amounts of sheared chromatin from irradiated and non-irradiated cells with the appropriate antibody at 4°C overnight. The antibodies used for ChIP were the following: H3K27ac (ab4729, Abcam), H3K27me3 (07-449, Millipore), 8WG16 (Pol II-hypo) (05-952, Millipore). Protein A (for H3K27ac and H3K27me3) or Protein G (for 8WG16) Dynabeads (ThermoFisher Scientific) were blocked overnight at 4°C in RIPA buffer without protease inhibitors, NaPy and PMSF in the presence of Bovine Serum Albumin (BSA) (30μg/ml). Next day, beads and immunoprecipitated chromatin were co-incubated for 3 h at 4°C and then beads were sequentially washed twice with RIPA, three times with RIPA containing 0.3M NaCl, once with LiCl buffer (0,25M LiCl, 10mM Tris-HCl PH 8.0, 1mM EDTA, 0.5mM EGTA, 0.5% Triton X-100, 0.5% Sodium deoxycholate) and twice with TE buffer (10 mM Tris-HCl PH 8.0, 1 mM EDTA pH 8.0). Chromatin immuno-complexes were eluted by two rounds of incubation at 65°C for 20min in 1% SDS and 100 mM NaHCO3, and vigorous vortexing. Finally, de-crosslinking of Input and immunoprecipitated chromatin was performed by overnight incubation at 65°C in the presence of 200 mM of NACl. We then applied Proteinase K treatment (0.1μg/μl in 0.5% SDS) for 1h at 55°C, and DNA purification was performed with AMPURE XP Beads (Agencourt) according to manufacturer’s protocol.
The primers used for ChIP-qPCR experiments were the following (5’ to 3’, F:Forward, R:Reverse, ChIA neg was the negative primer): SSBP1_F: GTGAGGGAGGAAGGGATAGC, SSBP1_R: AGGGCCAGACACCTACACAG, OSBPL9_F:ATTGGCGGCTCCCAAGAT, OSBPL9_R: GCATTGTAGTCCAGCACGAA, TRPM7_F: CCCAGGGAAACCTTCTCAG, TRPM7_R: TCGCACAATTATGAAAGACTCG, MYC_F: ACTCAGTCTGGGTGGAAGGTATC, MYC_R:GGAGGAATGATAGAGGCATAAGGAG, AKNA_F: CCGTTCCAATCCCTTACC, AKNA_R: TGGAACAAAGAATTCACAGG, APRT_F: GCCTTGACTCGCACTTTTGT, APRT_R: TAGGCGCCATCGATTTTAAG, ChIA_neg_F: AGTCTGAGCTTTGTGGACAGC, ChIA_neg_R: CCCTCCCAGTATACAGTCTTGC. qPCR, library preparation and next-generation sequencing were performed as previously described28.
Histone acetic extraction
Cells grown to confluence in 10 cm plates, synchronized and released as described above were irradiated with 15 J/m2. Cells were washed 3 times with cold PBS, harvested and centrifuged at 2000 rpm for 5 min at different recovery periods as indicated in Supplementary Fig. 3c, d. Supernatant was discarded and cell pellet was washed with 10 volumes of cold PBS and centrifuged at 2000 rpm for 5 min. Supernatant was removed and cell pellet was suspended in 10 volumes of Lysis Buffer (10 mM Hepes pH 7.9, 1.5 mM MgCl2, 10 mM KCl, 0.5 mM DTT, 1.5 mM PMSF), sulfuric acid was added in a final concentration of 0.2M and the suspension was incubated on ice for 30 min before being centrifuged at 10,080g for 10 min at 4°C. Next, the supernatant fraction was collected and TCA was added in a final concentration of 20%. Samples were vortexed and kept on ice for 1 h. Next, samples were centrifuged for 15 min at 14,000 rpm at 4°C. Supernatant was discarded and the pellet was washed with 1 ml of ice cold (−20°C) acetone. After centrifugation at 14,000 rpm at 4°C for 5 min, acetone was removed carefully using a centrifugal evaporator and pellet was resuspended in TE buffer and stored at −80°C.
Western Blot analysis
Western Blot analysis of equal amounts of crosslinked chromatin extracts or equal amounts of histone extracts was performed as described28. Antibodies used for Western Blot analysis are the following: anti-H3K27ac (ab4729, Abcam), anti-H3K27me3 (07-449, Millipore), 8WG16 (05-952, Millipore), anti-elongating RNA pol II (ab5095, Abcam), anti-Lamin B1 (ab65986, Abcam), anti-histone 4 (ab10158, Abcam), anti-histone3 (ab1791, Abcam). Time for analysis are indicated on the Figures (Fig. 4b and Supplementary Fig. 3 c, d).
Assay for Transposase Accessible Chromatin (ATAC)-seq
ATAC-seq method (nuclei preparation, transposition and amplification of transposed fragments for library preparation) was performed using Nextera DNA Library Prep Kit (Illumina, Inc) and primers as described in Corces et al.43 with minor modifications; (i) 70,000 cells were used per experimental condition and (ii) The DNase treatment of cells in culture medium, before the transposition reaction, was skipped. The UV dose applied for ATAC-seq experiments was 15 J/m2 and treated cells were left to recover for 2 h before harvesting.
Start-RNAs isolation and qPCRs
To isolate small RNAs (smaller than 200 nucleotides), we used Qiagen miRNeasy Mini Kit and RNeasy MinElute Cleanup Kit according to manufacturer instructions. In order to monitor the efficiency of the different enzymatic reactions, we included in our experiments a spike-in RNA oligonucleotide of known sequence (oGAB11: rArGrUrCrArCrUrUrArGrCrGrArUrGrUrArCrArCrUrGrArCrUrGrUrG, synthesized and purified by IDT). After purification, small RNAs and spike-in molecules were ligated to the IDT DNA linker 1 (/5rApp/CTGTAGGCACCATCAAT/3ddC/). Specifically, samples were denatured for 2 mins at 80°C and then placed immediately on ice. Ligation mix (4,8 μl 50% PEG, 2 μl 10x RNA ligase Buffer, linker and RNase free H20, 0.5 μl truncated RNA ligase (NEB, Cat No. M0351S) was added in a final volume of 20 μl. The reaction was incubated for 3 h at 37°C. After H20 was added to a final volume of 100μl, ethanol precipitation (3 volumes of 100 % EtOH, with 1/10th volume of 3M NaAc, pH 5.2, and 10 μg of Glycogen (ThermoFischer Scientific, Cat Number AM9510) was performed overnight at −80°C. RNA was purified in 10 μl and Reverse Transcription (RT) was performed using primer oLSC003: /5Phos/TCGTATGCCGTCTTCTGCTTG/iSp18/CACTCA/iSp18/AATGATACGGCGACCACCGATCCGACGATCATTGATG GTGCCTACAG according to Invitrogen Superscript II (Cat Number 18064014) instructions. qPCR was performed using gene specific forward primers (Sequences 5’ to 3’ for OSBPL9: ATTGGCGGCTCCCAAGAT, SSBP1: GTGAGGGAGGAAGGGATAGC, IFIT1: TCTCAGAGGAGCCTGGCTAA, KPNA6: ATTTGGCGAGAGCCTGTCT) and one common reverse primer (oNTI230: 5’-AATGATACGGCGACCACCGA-3’), which anneals to RT primer oLSC003 sequence.
Read alignment, normalization, peak calling and differential accessibility analysis
For all Next Generation Sequencing (NGS) data analyses, in-house scripts and pipelines were developed to automate and analyze the data consistently (see below for details). Code is available upon request. Sequenced data and generated wig profiles are available on Gene Expression Omnibus (GEO) (Accession ID: GSE125181). Short read quality control, data filtering alignment and wig profile generation was performed essentially as described previously28 with minor modifications.
Chip-seq data for pol II-ser2P and pol II-hypo (Fig. 2,3 and Supplementary Fig. S3,6), were obtained from28 for NO UV and 2 h or 1.5 h post-UV (8 J/m2), respectively. nRNA-seq data were obtained from28,30 and processed as described in28 (Fig. 5 and Supplementary Fig. 5). CAGE alignments were obtained from FANTOM5 (see below).
For H3K27ac and H3K27me3 ChIP-seq alignment files, peak calling was performed using SICER version 1.180 with window parameter = 400 bp and gap parameter = 1, while fdr and log2fold change cutoffs were set to 0.01 and 1.5 respectively. For ATAC-seq alignment files, peak calling was performed using MACS281. Because of the variability of ATAC-seq fragment lengths, several runs of the peak calling algorithm were performed, using different parameters per run, in an attempt to maximize the sensitivity of the detection of open chromatin regions. In particular, --nomodel --shift 100 --extsize 200, --broad --nomodel --shift 100 --extsize 200 --keep-dup all, --nomodel --shift 37 --extsize 73, --broad --nomodel --shift 37 --extsize 73 --keep-dup all, --nomodel --shift 75 --extsize 150 --keep-dup all runs were combined, and detected peaks were filtered using fdr < 0.05 and fold change > 1. To perform differential accessibility analysis, diffBind R package (https://www.bioconductor.org/packages//2.10/bioc/html/DiffBind.html) was used, with the combined peak sets as a consensus open chromatin reference. Differential accessibility regions were detected and filtered by applying fold change (Log2 FC ≥ 1) and p-value (p-val ≤ 0.001) thresholds.
Read density plots
ATAC-seq, ChIP-seq, nRNA-seq, CAGE-seq and XR-seq data were subjected to read density analysis after read depth normalisation of all samples per experiment. Heatmaps and average density profiling were computed as previously28 around genomic regions of interest, as indicated in the figures. Heatmaps were generated directly using the software, from matrices of binned read densities (bin size is indicated in the figures) for all considered individual (n) items (metagenes). Read density matrices were also imported in R and python custom scripts for (i) plotting average density profiles (smoothing achieved by a moving window of the bin size as indicated) and (ii) for determination of read densities per genomic category.
Construction of mRNA-TSSs, PROMPT-TSSs and eTSS annotation
To annotate Transcription Start Sites (TSSs), all known protein coding and non-coding RNA hg19 RefSeq transcripts release 86 were downloaded from UCSC table browser (http://genome-euro.ucsc.edu/cgi-bin/hgTables). For each transcript, a biotype was assigned using BioMart (www.biomart.org), and all the small non-coding RNAs were excluded. For all the gene models containing multiple alternative transcripts, TSS neighborhoods of a 100 bp window were clustered together, and only the longest transcript was kept, resulting to 30,473 transcripts. Transcripts were then separated into 3 groups, based on their transcriptional activity. TSS coordinates were extended to 2kb on each direction and were tested for overlap with the Pol II-ser2P -UV, H3K27ac -UV and H3K27me3 -UV peak sets. Regions overlapping with an Pol II-ser2P -UV and H3K27ac -UV peak, were characterized as active, those overlapping with an H3K27me3 -UV peak, but not with an Pol II-ser2P -UV neither with a H3K27ac -UV peak were characterized as repressed, and those that did not overlap with any of the above peak sets were characterized as inactive. Any region overlapping with both H3K27ac -UV and H3K27me3 -UV peaks were excluded from the rest of the analysis. This resulted to 15,819 active, 2,943 repressed and 7,608 inactive transcripts. To further classify the active TSSs in terms of transcription directionality, the annotation was split up into unidirectional and bidirectional references. All active transcript pairs with opposite direction of transcription, where −2 kb ≤ TSSdistance ≤ +2 kb, TSSdistance = TSS coordinate_forward strand –TSScoordinate reverse strand (interCAGE distance) were characterized as bidirectional, while the rest of the annotations were characterized as unidirectional. Bidirectional pairs were further categorized into two groups of annotations: convergent bidirectional transcript pairs with TSSdistance ≤ 100 bp, and divergent bidirectional transcript pairs with TSSdistance > 100 bp. To optimize the categorization of convergent and divergent transcript pairs, TSS coordinates were redefined by scanning in a radius of 250 bp, to detect the nucleotide occupied by the maximum sense CAGE signal. Any bidirectional pair with a non-significant CAGE peak in the aforementioned region was excluded from the analysis. This finally resulted to 12,859 unidirectional transcripts and 2,822 active bidirectional TSS pairs, 1,806 of which were characterized as divergent and 1,016 as convergent.
To gain a complete overview of the non-coding antisense transcription events occurring around mRNA TSSs, we also annotated upstream antisense (uaRNA) and downstream antisense (daRNA) transcripts (referred as an ensemble to PROMPTs in this paper for convenience). Only the active unidirectional mRNA TSSs were used. For all the genes annotated with more than one mRNA transcript, only the leftmost TSS (for + strand genes), and rightmost TSS (for – strand genes) were considered for the rest of the analysis. The antisense CAGE peak with the highest summit in the region ranged from −2 kb upstream to +1 kb downstream of each unidirectional TSS was considered to be the main PROMPT TSS for further analyses. The above procedure was also repeated for the inactive transcript set, to estimate the highest CAGE summit background distribution. The putative active PROMPT CAGE summits, which were higher than the average of the summit background distribution, were considered as active. This resulted to 5,366 pairs of active unidirectional – PROMPT TSSs, which were categorized to 1,444 divergent and 3,922 convergent pairs, as described above. By focusing on the divergent loci, the dynamics of transcription could be studied at play in each direction, without having to deal with interference from either direction. Therefore, analysis was focused on upstream antisense RNA, which correspond to the original definition of PROMPTs6.
To annotate enhancer Transcription Start Sites (eTSSs), all 65,423 human enhancers from phases 1 and 2 of the FANTOM5 project from http://fantom.gsc.riken.jp/5/datafiles/phase2.2/extra/Enhancers/human_permissive_enhancers_phase_1_and_2.bed.gz, and the center of each annotation was considered as the corresponding transcription start site. Enhancers were separated to 6,766 active, 4,730 repressed and 39,227 inactive following the pipeline described above. Active intergenic enhancers were further analyzed, and all the eTSSs within a distance of 10kb from nearby active transcripts, or neighbor eTSSs within a distance of 2kb were excluded. The rest of the intergenic eTSSs were extended to 1kb in both directions, and sense and antisense maximum CAGE summit heights were detected for each reference. This procedure was also repeated for the inactive enhancer set and inactive sense and antisense highest CAGE summit background distributions were estimated as described above. Finally, the putative active intergenic sense and antisense CAGE summits which were higher the averages of the summit background distributions, were considered as active. This resulted to 1,228 active intergenic eTSSs.
Promoter Escape Indices analysis
Promoter escape analysis was performed for a subset of active unidirectional and bidirectional transcripts, PROMPTs and active enhancers. In particular, to avoid the inclusion of Pol II-ser2P reads mapped in overlapping promoters and gene bodies, only active divergent unidirectional transcript – PROMPT pairs were considered, where TSSdistance > 100 bp, TSSdistance= TSS coordinate forward reference-TSS coordinate reverse reference, active divergent bidirectional transcript pairs with TSSdistance > 100 bp and active intergenic enhancers with no nearby transcripts within 10 kb and no nearby eTSSs within 2kb. For TSSs and PROMPT-TSSs promoter Escape Indexes (EI) were calculated as previously defined28, by taking the average coverage in rpm in the gene body (density in gene body was abbreviated as Db and ranged from 101 bp to 2 kb downstream of TSS or 101 bp downstream of TSS to TTS for genes larger or smaller than 2 kb respectively) divided by the average coverage on the promoter-proximal region (Dp) ranged from 250 bp upstream to 100 bp downstream of TSS.
For enhancer escape analysis, EI was calculated as above, where Density of reads at enhancer flanks (Df) is calculated for the regions ranging from −2 kb to −100 bp upstream of eTSS and from +100 bp to +2 kb downstream of eTSS, while density of reads on enhancer TSS (De) is calculated for the regions ranging from 100 bp upstream to 100 bp downstream of eTSS.
Estimation of the proportion of the inhibited transcriptome upon UV irradiation
For calculating the percentage of the normally transcribed genome showing transcription inhibition previously published nRNA-seq data (NO UV and +UV 2h from28) were used to determine the actively transcribed regions where signal ratio (Log2 FC (+UV/NOUV)) < 0. All active transcripts of length over 100kb were trimmed up to 100kb and were divided to genomic bins of 1kb. Read-depth normalized and exon-free nRNA-seq reads of each of the two conditions were counted on each genomic bin of each transcript, and for each of the n{1,2,…,100} bin positions the average Log2 FC (+UV/NOUV)) ratio was calculated for the set of the active transcripts. This resulted to a vector of size 100, with Log2 FC (+UV/NOUV)) values >= 1 for the first 28 bins, implying transcription clearance on the first 28kb of the active transcriptome upon UV damage, while for the last 72 bins Log2 FC (+UV/NOUV)) values < 0, implying transcription inhibition upon UV damage. To calculate the total proportion of the active transcriptome where transcription was inhibited, the coverage (in bp) of all the normally actively transcribed elements (see above for definition of these loci) located within 28 kb from TSS were summed up and divided by the total length of all the actively transcribed elements, resulting to 63.65 %.
Nucleotide Excision Repair data meta-analysis
The strand-specific genome-wide maps of nucleotide excision repair of the UV-induced DNA damage (CPDs), available for XP-C mutants lacking the global-genome nucleotide excision repair mechanism (GG-NER-deficient, TC-NER-proficient) were obtained from Hu et al. 24 (data for Fig.6, Gene Expression Omnibus, (GEO) accession number GSE67941) and Chiou et al.42, (data for Supplementary Fig. 7, GEO accession number GSE106823). XR-seq data for wild type cells (used for Fig. 7, GEO accession number GSE76391) were obtained from Adar et al.53. Sequence Read Archive (SRA) datasets were downloaded from Gene Expression Omnibus using the sra toolkit prefetch (https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/) command, and converted to fastq files using fastq-dump. Fastq quality control, data filtering and short read alignment was performed as above. Meta-analysis involved that read counts were normalised to equal read depth. Heatmap read density matrices and average read density plots were computed as described in the section ‘Read densities heatmaps and average plots’. Read density matrices were calculated for both strands separately when indicated. Ratio of XR-ser reads between directions and calculation of variability between directions was performed as described in legend of Fig. 6. ‘S-F’ scores and quantification of reads around TT loci was performed as in Lavigne et al28 with clusters borders defined previously.
FANTOM5 Cap-analysis of Gene Expression (CAGE) sequencing data meta-analysis
The FANTOM5 strand specific CAGE-seq alignment files of normal Dermal fibroblast primary cells (6 Donors with source codes: 11269-116G9, 11346-117G5, 11418-118F5, 11450-119A1, 11454-119A5 and 11458-119A9) and normal skin fibroblasts (2 Donors with source codes: 11553-120C5 and 11561-120D4) were downloaded from ftp://ftp.biosciencedbc.jp/archive/fantom5/datafiles/phase2.2/basic/human.primary_cell.hCAGE and were combined. Heatmap read density matrices and average read density plots were computed as described in the section ‘Read densities heatmaps and average plots’. Read density matrices were calculated for both strands separately.
Reactome pathway analysis
DAR-gain regions were associated with active transcripts, either directly by searching for genomic overlaps with annotated promoters (this resulted to 2,767 DAR-gain at promoter) or by searching genomic overlaps with intergenic/intragenic FANTOM5 enhancers (this resulted to 820 DAR-gain at enhancers). DAR-gain at enhancers were functionaly defined by analysing predicted enhancer-promoter associations retrieved from the FANTOM5 project site from ‘enhancer.binf.ku.dk/presets/human.associations.hdr.txt.gz’ and generated an additional 1,268 gene targets. A total of 3,284 of unique gene names were finally used to perform REACTOME pathway enrichment analysis using the R package ReactomePA: https://bioconductor.org/packages/release/bioc/html/ReactomePA.html.
Data Availability
The data reported in this manuscript have been deposited with the Gene Expression Omnibus under accession code GSE125181, and will be released upon publication.
Author Contributions
M.F. and M.D.L. designed the study and were responsible for interpretation of the results. M.F. directed the study and obtained financial support. M.D.L., M.F., A.L. wrote the manuscript and all authors edited the manuscript. A.L. performed the experimental part of the study. D.K performed the statistical and bioinformatics analyses. M.D.L. contributed significantly in the bioinformatics analysis of the data. All authors discussed the results, reviewed, commented on and approved the final version of the manuscript. A.L and D.K. contributed equally to the paper.
Competing financial interests
The authors declare no competing financial interests.
Acknowledgements
We thank Pantelis Hatzis, Mihalis Verykokakis and members of the Fousteri lab for critical discussions and reading of the manuscript. We thank Vladimir Benes and the Genecore facility (EMBL, Germany) for the special care they use in sequencing our NGS libraries. This work was funded by a European Research Council grant to M.F., Agreement-309612 (TransArrest) and <Matching Funds> to MF funded by National sources.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.
- 63.↵
- 64.↵
- 65.
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.
- 76.↵
- 77.↵
- 78.
- 79.↵
- 80.↵
- 81.↵