Abstract
DNA double-strand breaks are among the most toxic lesions that can occur in a genome and their faithful repair is thus of great importance. Recent findings have uncovered a role for local transcription that initiates at the break and forms a non-coding transcript, called damage-induced long non-coding RNA or dilncRNA, which helps to coordinate the DNA transactions necessary for repair. We provide nascent RNA sequencing-based evidence that dilncRNA transcription by RNA polymerase II is more efficient if the DNA break occurs in an intron-containing gene in Drosophila. The spliceosome thus stimulates recruitment of RNA polymerase to the break, rather than the annealing of sense and antisense RNA. In contrast, RNA polymerase III nascent RNA libraries did not contain reads corresponding to the cleaved loci. Furthermore, selective inhibition of RNA polymerase III did not reduce the yield of damage-induced siRNAs (derived from the dilncRNA in Drosophila) and the damage-induced siRNA density was unchanged downstream of a T8 sequence, which terminates RNA polymerase III transcription. We thus found no evidence for a participation of RNA polymerase III in dilncRNA transcription and damage-induced siRNA generation in flies.
Introduction
The siRNA silencing system in Drosophila helps to fend off viral infections [1], but also contributes to the control of transposon mobilization in somatic cells [2]. In both cases, the trigger for siRNA generation is double-stranded RNA (dsRNA). During viral infection, this likely stems from replication intermediates, while for genome surveillance convergent transcription must occur. For multi-copy sequences, this convergent transcription can also be envisaged to occur in trans, i.e. at different instances of the same sequence. A particular form of dsRNA generation has been identified in Drosophila at transcribed DNA double-strand breaks [3]. The genetic requirements indicate an involvement of the spliceosome and this appears to be true for the surveillance of high-copy sequences as well [4]. Intriguingly, stalled spliceosomes can recruit RNA-dependent RNA polymerase (RdRP) to transposon mRNAs in the pathogenic yeast Cryptococcus neoformans [5]. For organisms that lack an RdRP gene, however, induction of convergent transcription must happen at the DNA. Thus, while the role of small RNAs in Drosophila DNA repair appears to be limited at best [6], their induction at a transcribed double-strand break may reveal mechanistic aspects of transposon recognition in flies.
DNA double-strand breaks (DSB) are highly toxic genome lesions that need to be faithfully repaired. A finely orchestrated series of molecular interactions is initiated once a DSB has been detected and signaling events recruit repair factors, modify local chromatin structure and mitigate access between transcription and DNA repair proteins [7]. Many studies have concluded that a relatively large region around the DSB is transcriptionally silenced in a reversible manner, presumably to avoid conflicts between transcription and repair [8]. In recent years, however, antisense transcription that initiates at the DNA break has been observed [9–11]. In the context of DNA repair, this transcription seems to fine-tune the dose of single-strand binding proteins such as RPA that initially associate with the 3’->5’ resected break [9]. Furthermore, damage-induced small RNAs derived from these antisense transcripts have been observed in Neurospora, Arabidopsis and human as well as Drosophila cell lines [3, 12–14]. This has provided sequencing-based evidence of DNA break-induced antisense transcription. Recently, break-induced transcription was also directly observed in human cells using single-molecule microscopy experiments [11].
While there is thus little doubt that a non-coding transcript initiates at the break (often referred to as damage-induced long non-coding RNA or dilncRNA), we still do not have a comprehensive understanding of its biogenesis, in particular regarding whether differences exist between transcribed (i.e. within transcriptionally active genes) and non-transcribed breaks. In vivo, DNA breaks occur in a chromatin-context and the mechanisms of dilncRNA generation may differ depending on the local chromatin state, which determines the accessibility for RNA polymerases. Furthermore, low-level transcription by RNA polymerase II may be much more pervasive due to e.g. inefficient termination, promoter- and enhancer-associated, unstable transcripts etc. [15–18] and reviewed in [19–21]. Plants have even devoted the function of two polymerase II related, multi-subunit polymerases, RNA polymerase IV and IVb/V, to pervasive genome surveillance [22–25]. Their non-coding transcripts can activate a number of cellular responses to cope with transposon invasion, viral infection and also DNA breaks [14].
RNA polymerase I transcribes the rDNA and is largely confined to the nucleolus [26], whereas RNA polymerase III generates a series of non-coding transcripts. This polymerase also functions in certain cases to detect aberrant DNA: It transcribes AT-rich linear DNA that may be cytoplasmic [27, 28] or nuclear in the case of Herpesviruses [29–31]. The resulting pol-III transcripts then activate the cellular interferon response via RIG-I, an RNA helicase recognizing 5’-triphosphate-containing RNA in a double-stranded configuration [32]. Furthermore, RNA polymerase III can transcribe transposon-derived Alu elements and may even determine new integration sites for Ty1 in budding yeast [33]. The transcriptional landscape of both, RNA pol II and pol III is thus complex and dynamic.
At least at a non-transcribed DNA end, RNA polymerase II can initiate at the DNA break to generate a dilncRNA. This model is supported, for example, by studies using RNA polymerase II specific inhibitors [34], chromatin-immunoprecipitation [35] and by the detection of dilncRNAs associated with RNA polymerase phosphorylated at tyrosine-1 within the CTD repeats in a metagene-analysis [36]. Recruitment of RNA polymerase II to the DNA end can involve the Mre11-Rad50-Nbs1 complex (MRN-complex) [37]. Transcription initiation at DNA breaks has been reconstituted in vitro with linear DNA, purified RNA polymerase II and the MRN-complex [37]. It appears that initial unwinding of the DNA end by the MRN complex promotes transcription initiation by RNA polymerase II. This view has been challenged, however, by observations that claim recruitment of RNA polymerase III – also with the help of the MRN complex - to double-strand breaks in cultured human cells [38]. There are thus opposing views about which RNA polymerase generates the transcript that initiates at the DNA end, and there may be more than one answer to this question.
In Drosophila, the dilncRNA originating from a DNA break is converted into damage-induced siRNAs if the break occurs in actively transcribed genes. The convergent transcripts form dsRNA, which is processed by the canonical RNAi machinery into Ago2-loaded siRNAs capable of silencing cognate transcripts [3, 6]. While their physiological role – if any – remains unclear [6, 39], the siRNAs are much more stable than the original dilncRNA and thus can serve as a convenient proxy of dilncRNA transcription [3, 40]. Since they are loaded into Ago2, 2’-O-methyl modified and capable of cleaving cognate mRNAs [3, 6], they can even serve as a reporter system for dilncRNA generation [4]. In Drosophila, the damage-induced siRNA response starts very close to the break and covers the gene – including introns – up to the transcription start site (TSS). This argues that the dilncRNA initiates in direct vicinity of the break and is processively transcribed at least up to the TSS. Yet, neighboring genes were not affected [4]. In contrast, transfection of a promoterless, linear PCR product of ~ 2 kb into cultured Drosophila cells did not produce any corresponding siRNAs [3]; a mere resection at either end is thus not sufficient to generate enough dilncRNA in each orientation for dsRNA formation. Results from a genome-wide screen in Drosophila cells suggest that spliceosomes assembled on the normal transcript can stimulate the generation of corresponding damage-induced siRNAs. This was corroborated by the observation that DNA breaks upstream of a gene’s first intron or anywhere within intron-less genes produce few siRNAs upon damage [4].
In this manuscript we address the question whether the spliceosome acts upstream or downstream of the dilncRNA induction. In a downstream involvement, the spliceosome would serve as an RNA chaperone and promote the annealing of the coding (sense) and non-coding (antisense) transcripts, thus boosting siRNA generation. An upstream action implies that the spliceosome can stimulate the generation of dilncRNAs, i.e. the initiation of transcription at the break, and thereby increase the amount of dsRNA generated and ultimately the siRNA yield. The two mechanisms can be distinguished by examining nascent transcription at DNA breaks in intron-containing and intronless genes. If the spliceosome acts as an RNA chaperone, the amount of dilncRNA should be comparable between intron-containing and intronless genes while a regulatory role should lead to more dilncRNA for the intron-containing gene than for its intronless counterpart. We thus measured the antisense RNA generation via nascent transcript sequencing after inducing a single DSB in an intron-containing or an intron-less gene. We observed that a DSB downstream of introns leads to higher levels of antisense transcription, arguing that the spliceosome stimulates dilncRNA production. Furthermore, it appears that in Drosophila cells it is RNA polymerase II that transcribes the dilncRNA.
Results
The aim of our study was to measure the rate of antisense transcription at a transcribed DNA break for an intron-containing and an intronless gene. Furthermore, we wanted to determine which RNA polymerase is recruited for this purpose in Drosophila. Incorporation of labeled nucleotide analogs such as 4SU (4SU-Seq) or biotinylated dNTPs (PRO-Seq) allows to measure nascent transcriptomes with high sensitivity but cannot distinguish between RNA polymerases. While specific inhibitor treatments are available, they have the caveat that inhibition of RNA polymerase II will also abrogate transcription of the normal mRNA transcripts, which recruits the spliceosome and may thus participate in induction of antisense transcription at intron-containing genes. Yet, this is precisely what we wanted to test.
We therefore established a nascent RNA sequencing strategy based on polymerase-specific immunoprecipitation (nascent elongating transcript sequencing or NET-seq [41, 42]). In short, we lysed cultured Drosophila S2-cells harboring epitope-tags on RNA polymerase II or III (introduced via genome editing) and washed out cytoplasmic and soluble nuclear components. Then, a brief digestion with benzonase liberated chromatin-associated material (“input” in our figures), from which we could subsequently immunopurify tagged polymerases (“IP” in our figures). The short RNA stump protected by the polymerase during the benzonase treatment can directly enter our established small RNA sequencing library pipeline because benzonase products carry a 5’-monophosphate (see supplementary Figure S1 for an outline of our cell fractionation and NET-seq procedure). To verify our protocol, we sequenced both the input material for the IP (roughly speaking chromatin-associated RNA) and the polymerase-associated transcripts after immunoprecipitation.
Validation of the NET-Seq procedure
We first examined the highly transcribed, protein-coding actin gene act5C. The profile of matching reads from the input material is dominated by the exonic portions of the gene, consistent with the notion that splicing can occur co-transcriptionally before release from the chromatin. Nonetheless, a certain level of intronic reads is already visible and demonstrates that the material also contains nascent transcripts. The nascent, RNA polymerase II associated reads sequenced after specific immunoprecipitation (IP) show a much stronger proportion of these intronic reads (Fig. 1 A, top panel). In comparison, the RNA polymerase III IP only showed non-specific background (distribution essentially unchanged - Fig. 1A, middle panel). Many genes show RNA polymerase II pausing shortly after transcript initiation. In Drosophila, this phenomenon was first comprehensively described in ChIP-Seq and PRO-seq experiments [43, 44]. Accordingly, promoter-proximal pausing is evident in the PRO-Seq trace for act5C as well as in our nuclear RNA sample (input) and particularly in the RNA-polymerase-II associated, nascent transcripts. When comparing our NET-Seq results for this highly abundant mRNA with published results of a nascent RNA labeling apporach (PRO-seq), it appears that our libraries still contain a moderate overrepresentation of exonic reads [45], presumably reflecting a higher background level in our NET-seq approach.
For a global perspective, we also mapped reads onto precompiled transcript classes (Flybase genome release 6.19) and determined the recovery (ratio of IP versus input after normalization to total genome matching reads in each library) for RNA polymerase II and III. The CDS collection corresponds to the protein coding part of the transcriptome (start to stop) and the recovery was clearly greater in the pol-II IP than in the pol-III IP (Fig. S2 A, pol-II IP n=6, pol-III IP n=4). The intronic part of the transcriptome also showed a preferential recovery with pol-II, but a certain number of introns also trended towards a high recovery in both, the pol-II and the pol-III IP (Fig. S2 B). Manual inspection of an arbitrary subset usually indicated the presence of non-coding RNAs such as snRNAs or snoRNAs in these introns.
To verify successful IP for RNA polymerase III, we analyzed the read distribution along the non-coding 7SK RNA locus (Fig. 1B). While RNA polymerase II associated nascent transcripts did not show a particular enrichment of signal along the locus (top panel), the corresponding reads were enriched after IP of RNA polymerase III (middle panel). Note that the 7SK RNA can be associated with RNA polymerase II while the CTD is phosphorylated by pTEF-b; this associated 7SK RNA could thus have co-purified and augmented the 7SK-mapping read number. However, this does not appear to contribute substantially to the RNA polymerase II IP signal. As expected, the PRO-Seq procedure also captured transcription of the RNA polymerase III transcribed 7SK locus (bottom panel). When we mapped the reads onto the Flybase collection of tRNA sequences, we found a preferential recovery for at least a subset of the tRNAs in the RNA polymerase III IP (Fig. S2 C). This is also visible when we mapped the reads onto the Flybase collection of “all transcripts”, which despite its name only comprises the protein-coding and lncRNAs. Essentially all of these are transcribed by RNA polymerase II but the Ntl locus is a notable exception (Fig. S2 D). This transcript appears pol-III transcribed according to our analysis, overlaps with an intron-containing Tyr-GTA tRNA gene and direct visualization of the mapping traces revealed that the read-counts mapped to the Ntl locus almost exclusively localize to the tRNA portion (Fig. S2 E).
Our Net-Seq libraries are contaminated by abundant cytoplasmic non-coding RNAs. This is illustrated with the help of the bantam locus (Fig. 1C). The 23 nt small RNA is one of the most abundant miRNAs in S2-cells and it is nucleolytically processed from a much larger primary transcript by Drosha and Dicer-1. The mature miRNA is cytoplasmic, yet our nuclear RNA fraction still contained a substantial amount of bantam reads (top and middle panel, input). While the IP procedure decreased this contamination, it did not remove the bantam reads completely (top and middle panel, IP). However, in the case of RNA polymerase II the nascent RNA reads indicate that larger precursor ncRNAs are transcribed (top panel, IP). This is consistent with the PRO-Seq reads from the locus (bottom panel). The three example loci for Fig. 1 were chosen because the published PRO-Seq reads can be represented at roughly comparable ppm-scales, hence their transcriptional output should be, as a first approximation, of comparable magnitude. Our own Net-Seq data for act5C and 7SK can indeed also be displayed with comparable scales, but the bantam locus required different scaling due to the cytoplasmic contamination. We also observed a substantial amount of mature ribosomal RNA reads in our libraries both, before and after IP (23%-72% of total genome-matching reads, with no obvious enrichment of unprocessed precursor transcripts). For these RNAs, no interpretation of our sequencing data should be attempted. This also limits conclusions about highly abundant RNAs transcribed by RNA polymerase III such as 5S rRNA. For most other transcripts, we conclude that our nascent RNA sequencing data successfully captures polymerase-specific profiles. Since our question focuses on the induced antisense transcription at DNA breaks, an RNA species that is neither cytoplasmic nor highly abundant, we conclude that the NET-Seq libraries are suitable for our analysis.
A DSB downstream of introns shows higher dilncRNA transcription activity
We generated sequencing libraries after employing our established cas9/CRISPR system to cleave in the intron-containing gene CG15098 and, separately, in the intronless gene tctp [4]. As before, the DNA breaks had been induced by transfection of a corresponding sgRNA expression cassette into cells that stably express the Cas9 protein. The majority of the cells were harvested and processed for NETseq libraries 2 or 3 days after transfection. The remaining cells were processed for a T7 endonuclease assay, demonstrating that the targeted loci were indeed cleaved with comparable efficiency (see also supplementary Fig. 1). In our experiments, libraries from the tctp-cut provide the “uncut” control for the CG15098 locus and vice-versa. This comparison ensures that any effects not specific to the cut locus or due to Cas9 activation per se will be accounted for.
We mapped the NET-seq libraries onto the respective loci and calculated the number of sense and antisense-matching reads. Figure 2 shows traces for one NET-Seq replicate mapped to CG15098 (left side) and tctp (right side). For CG15098, IP of RNA polymerase II associated, nascent transcripts led to an enrichment of antisense reads relative to input (Fig. 2A). In contrast, the antisense reads did not increase for the cut tctp locus, consistent with the low amounts of siRNAs generated upon cleavage of this locus [4]. There was no indication for a prominent signal in the RNA polymerase III NET-seq libraries of either locus (Fig. 2B).
To obtain a quantitative view of the replicate data, we normalized the number of antisense reads to the total transcriptional activity of the locus in each library [i.e. antisense /(sense + antisense)] (Fig. 2C). There was a significant increase of antisense reads for cut vs. uncut CG15098 (p=0.012, t-test unpaired, unequal variance, n=3) while no significant differences were observed for the neighboring CG15099 (p=0.640, n=3) or act5C, which resides on a different chromosome (p=0,644, n=3). We also normalized the antisense reads to the total number of genome-matching reads in each library (Supplementary Figure 3). In each of the three replicate experiments, the amount of CG15098 antisense-matching nascent, RNA polymerase II associated reads was higher in the cut state than in the uncut state (p=0.034, paired t-test, n=3). This was not the case for CG15099 gene (p=0.273, n=3) or the act5C gene (p=0.675, n=3); there were too few tctp antisense matching reads for an analogous comparison. Finally, our input material also showed a consistently higher amount of antisense-matching reads for CG15098 in the cut state in each replicate (p=0.072, paired t-test, n=3). In agreement with the visual inspection (Fig. 2B), the read quantification did not provide any indication that RNA polymerase III is contributing to antisense transcription (Supplementary Figure 3, bottom row).
We conclude that induction of a DNA double-strand break in the intron-containing CG15098 gene stimulates antisense transcription by RNA polymerase II. For the intronless tctp-gene, we detected none or only few antisense reads and statistical analysis is not appropriate. Our observations are thus consistent with the notion that a lower antisense transcription activity for the intronless gene (this study) correlates with fewer DNA-damage induced siRNAs [4]. It therefore appears that the role of the spliceosome is to stimulate dilncRNA transcription, rather than to promote annealing of the sense and antisense RNA strands.
No evidence for participation of RNA polymerase III in the biogenesis of damage-induced siRNAs
The recent description of MRN-dependent RNA polymerase III recruitment to DNA breaks in human cell lines [38] clearly differs from our observation of a predominant – if not exclusive - role of RNA polymerase II in dilncRNA generation (Fig. 2). It is certainly conceivable that mechanistic differences exist between humans and flies (as is the case for the subsequent processing into siRNAs, see [39]), but we wanted to confirm our observation with an independent approach. We thus turned to our established dual luciferase reporter system, which relies on the silencing activity of damage-induced siRNAs generated from a co-transfected, linearized plasmid (Fig. 3A, right side). With this assay, we had previously screened and detected a role for the MRN-complex in promoting siRNA generation, presumably by preparing the DNA end for RNA polymerases that initiate transcription at the break [4]. The inhibitor Mirin can block the access of Mre11 to dsDNA ends and thus all nucleolytic activities, while its derivative PFM-01 selectively blocks DNA access to the endonuclease active site [46]. Addition of Mirin (25 μM final concentration) clearly reduced the amount of damage-induced siRNAs generated (p=0.05, t-test, unequal variance, n=3), while PFM-01 (25 μM) had essentially no effect (Fig. 3A). This supports the notion that the initial unwinding of the double-stranded DNA by Mre-11 is important for dilncRNA generation, rather than endonucleolytic cleavage and resection that exposes single-stranded DNA with a 3’-end [37].
Importantly, addition of the selective RNA polymerase III inhibitor ML-60218 at a concentration of 10 μM - the highest concentration that still produced acceptable levels of luciferase readings - did not lead to a de-repression of Renilla luciferase (Fig. 3A). This is consistent with our genome-wide RNAi screen where no RNA polymerase III subunit scored as a hit [4] and it also confirms the undetectable dilncRNA transcription in our RNA pol-III NET-seq libraries.
We had previously determined that the damage-induced siRNA response starts in close proximity to the break and extends all the way until the transcription start site [3, 4, 40]. The corresponding dilncRNA transcripts thus arise over a stretch of more than 1 kb (e.g. 4.5 kb in the case of CG18273, see supplementary Figures in [4]). This would be unusually long for an RNA polymerase III transcript and random pol-III termination sequences might occur along the way. Indeed, inspection of the CG15098 locus revealed a serendipitous stretch of eight Adenosines in the second intron. For an RNA polymerase acting in antisense orientation, this corresponds to a T8-sequence preceded by a potential secondary structure element (see Fig. 3B), which should terminate most RNA polymerase III transcription complexes [47]. However, the siRNA read density we observed was similar before and after this pol-III termination site (Fig. 3B). We do note that there is a paucity of siRNA reads in a ~ 20 nt window surrounding the A8/T8 sequence; most likely this is for technical reasons given the short, homopolymeric sequence stretch (e.g. Illumina-sequencing or PCR polymerase drop-off).
Taken together, we conclude that RNA polymerase II can be recruited to a DNA double-strand break and that this is fostered by the spliceosome and the action of the MRN complex. While it is unlikely that RNA polymerase III functionally contributes to dilncRNA transcription in Drosophila, our observations cannot exclude that RNA polymerase III is recruited to sites of DNA damage without subsequently engaging in processive transcription of the dilncRNA.
Discussion
The observation that splicing stimulates the generation of siRNAs at a transcribed DNA double-strand break prompted the question of the underlying mechanism. For example, the spliceosome’s role could be to serve as a kind of RNA chaperone that fosters the annealing of sense and antisense transcript, thus promoting the formation of the dsRNA precursor for siRNAs at intron-containing genes. We now present evidence that the rate of antisense transcription differs between DSB’s in intron-containing and intronless genes. The spliceosome therefore influences – one way or another – the recruitment of an RNA polymerase to what normally is the non-template strand (Figure 4).
Furthermore, we demonstrate that the antisense running polymerase is RNA polymerase II in Drosophila. This has important implications for how the antisense transcript initiates since it could be the very same polymerase that synthesizes both sense and antisense transcript. In this most rudimentary form of “recruitment”, stalling of the splicing reaction could e.g. contribute to post-transcriptional modifications on RNA polymerase II that promote direct re-initiation upon a run-off at the break – a “U-turn” movement, essentially. However, it is currently unclear whether a run-off will occur at a DSB in vivo or whether the polymerase stalls when it encounters the break. As long as the transcript is not cleaved and removed, this creates an R-loop behind the polymerase with concomitant exposure of the non-template strand. This stretch of single-stranded DNA could serve as a landing site for another RNA polymerase complex and transcription thus initiates in the antisense orientation. In this case, the role of the stalled spliceosome could be to prevent transcript termination and release, thus extending the lifetime of the R-loop that may contribute to DNA damage signaling. Alternatively or in addition, signaling events that include or emanate from spliceosome components [48] could foster polymerase recruitment to the nearby single-stranded DNA.
The currently available data cannot distinguish between the U-turn model and more elaborate forms of recruitment to the exposed non-template strand. Our small RNA sequencing data provided siRNA reads starting at a distance of only a few nucleotides from the break. We did not find any reads that connect the sense and antisense strands (data not shown). Such reads are expected to be rare, difficult to map bioinformatically and our protocol for generating small RNA sequencing libraries is not tailored for these “connector-RNAs”. Furthermore, one would estimate that modifications of RNA polymerase II associated with initiation are most prevalent in the vicinity of the core promoter initiation site. A “U-turn” move, i.e. re-initiation of the same polymerase, might thus be more efficient at the beginning of a transcription unit. Yet, we found only inefficient siRNA generation when a DSB was introduced in proximity to the transcription initiation site [4]. Finally, it is not obvious why a U-turn move in the context of an R-loop would need the support of the MRN-complex to fray the DNA end. Cleary, further mechanistic studies are needed to determine how bi-directional transcription by RNA polymerase II is orchestrated and what the fate of the potentially stalled, sense-running RNA polymerase II complex may be.
By now, several publications provide independent evidence of RNA polymerase II as an enzyme capable of transcribing the dilncRNA. This includes biochemical reconstitutions [37], in vitro analysis with inhibitors [34], ChIP with qPCR [34] and metagene analysis after ChIP-Seq [36]. A single-molecule study is also suggestive of RNA polymerase II according to the reported speed [11], but the MS2 stem-loop employed as a reporter can in principle also be transcribed by RNA polymerase III [49]. We now add our direct observation of polymerase-associated, nascent transcripts only in RNA polymerase II NET-seq and the lack of effect for the pol-III inhibitor ML-60218 on damage-induced siRNA accumulation. It cannot be overstated that differences between organisms may exist: If the primary purpose is to generate a transcript, then the polymerase type could easily be swapped during the course of evolution. In plants, for example, genetic analysis has pinpointed a function of the plant-specific RNA polymerase IV in dilncRNA transcription [14]. While not all of the published experiments can exclude a concomitant function of more than one RNA polymerase - i.e. RNA polymerase II (or IV in plants) and RNA polymerase III - in dilncRNA generation, the recent description of RNA polymerase III as the exclusive source of dilncRNA in cultured human cell lines is surprising [38]. The situation is further complicated by the discovery that repair of transcribed genes by homologous recombination is fostered upon the establishment of mixed DNA/RNA displacement loops involving the normal transcript that runs sense towards the break [50]. A parallel comparison of the diverse experimental systems seems necessary to distinguish between technical and true biological differences; the latter may prove invaluable to further our understanding of the molecular mechanisms that lead to dilncRNA transcription.
Drosophila core promoters show strong inherent directionality and unlike yeast or vertebrates, flies often do not generate a divergent, unstable transcript in the direction opposite to the respective gene [44, 51]; nonetheless, sometimes bi-directional regulatory elements exist [52]. A genome-wide analysis of spontaneous (i.e. without induced DNA damage) antisense transcription rates is not straightforward, since especially introns often harbor transcription units that can be in opposite orientation to the host genes. Furthermore, cryptic transcription may continue far beyond the annotated poly-A site [17], which complicates the analysis of rare transcriptional events. Nonetheless, our NET-seq data appears by and large consistent with the notion of unidirectional core promoter activity. We refrain from drawing any explicit conclusions due to our limited sequencing depth. We therefore cannot directly determine how far DNA break-induced dilncRNA transcription extends in Drosophila. For human cells, a distance of roughly 2kb was proposed in one study based on ChIP [35]. The uniform distribution of Drosophila damage-induced siRNAs along the targeted gene (up to ~4.5 kb in the case of CG18273) suggests a high processivity of transcription up to the transcription start site of the targeted gene [4]. Because only dsRNA is processed into siRNAs, we cannot track the dilncRNA beyond this point via its small RNA descendants. It is nonetheless tempting to speculate that the same mechanism that confers uni-directionality to many promoters might also terminate the dilncRNA transcription in flies; further experiments are needed to test this hypothesis.
Materials and Methods
NET-Seq procedure
Cell culture
Drosophila S2-cells with stable expression of cas9 protein (clone 5-3) were cultured and transfected as previously described [53]. We further modified this cell line by introducing a twin V5-tag at the C-terminus of the largest subunit of RNA polymerase II (PolR2A, CG1554) and III (PolR3A, CG17209), followed by clonal selection as described [54]. For the NET-seq experiments, we transfected a 30 ml culture of cells expressing tagged RNA polymerase with guideRNA vectors targeting CG15098 or tctp. The sgRNA expression cassettes were first generated by PCR, then blunt-end cloned into pJet1.2 to yield pRB59 (CG15098) and pRB60 (tctp). The target sites were 5’-TCCAGTGTAGCTTCCCGTT-3’ for CG15098 and 5’-ATATCTAATTTCTTTTTAC-3’ for tctp as described [4].
Cell lysis
48 or 36 hours after transfection, the cells were harvested (density 4-5 x 106 cells/ml), resuspended in 500 μl of lysis buffer (10 mM HEPES/KOH PH7.5, 1.5 mM MgCl2, 1 mM DTT, 10 mM EDTA, 10% glycerol and 1% Tergitol-type NP40 (Sigma NP40S) supplemented with proteinase inhibitors (Roche complete without EDTA)) and incubated for 10 minutes on ice. Then nuclei were pelleted by centrifugation at 5000xg for 5 minutes and the supernatant (mostly cytosol) was discarded. The pellet was resuspended in lysis buffer without EDTA but containing 1 M urea, incubated for 5 minutes on ice and again pelleted at 5000xg for 5 minutes. The urea washing step was carried out twice in total, then the nuclei were resuspended in 110 μl of lysis buffer without EDTA and without urea. To digest the chromatin, 250 U of benzonase (Merck Millipore E1014, 90% purity grade) were added and the resuspended nuclei were incubated at 37°C for 3 minutes in a heating block. The digestion was stopped by adding EDTA and NaCl to a concentration of 10 mM and 500 mM, respectively. The insoluble fraction was pelleted by centrifugation at 16000xg for 5 minutes and the supernatant was used as input material for the immunoprecipitation.
Immunoprecipitation
20 μl of magnetic beads (Dynabeads protein G, Invitrogen 10004D) were washed 3 times with 200 μl of IP buffer (25 mM HEPES/KOH pH 7.5, 150 mM NaCl, 12.5 mM MgCL2, 1 mM DTT, 1% Tergitol-Type NP40, 0.1% Empigen (Sigma 30326) supplemented with Roche complete proteinase inhibitors without EDTA), then 1 μl of V5 antibody was coupled by rotation at 4 °C over night. On the following day, the beads were washed 3x with 300 μl of IP buffer, then the input material was added and incubated with agitation for 60 minutes at 4°C. After separation of the unbound supernatant, the beads were washed 5x with 200 μl of IP-buffer. The immunopurified RNA polymerase complexes were the digested with proteinase K to liberate the associated nucleic acids and RNA was prepared by TRIZOL extraction and precipitation.
Library generation and data analysis
RNA fragments with a size of 20-28 nt were PAGE-purified to select for the fragments that were protected from benzonase digestion by the polymerase. Since benzonase products harbor 5’-phosphorylated ends, the RNA fragments were processed for library generation as described [55] without further treatment. The libraries were sequenced in-house on an Illumina HiSeq1500 instrument and the reads were processed with custom PERL and BASH scripts for mapping with Bowtie [56] to the indicated references. During mapping, no mismatches were tolerated and each hit was reported only once. If multiple, perfectly matching sequences exist in the reference, the Bowtie algorithm will assign the read randomly. After mapping, the results were further processed with BEDtools [57] and custom R!-scripts or the IGV genome browser [58] for data visualization.
Luciferase assay
The luciferase assay for the detection of DNA-break induced siRNAs has been previously described [4]. Briefly, 25 ng of pRB2 (firefly-luciferase, circular), 10 ng of pRB1 (Renilla-luciferase, circular) and 40 ng of pRB4 (truncated Renilla luciferase, linearized with EcoRI) were transfected per well of a 96-well plate using Fugene-HD (Promgea). Inhibitors were added 2 hr prior to transfection in a volume of 1 μl DMSO (volume identical for all compounds and controls). The luciferase assay was performed 96 hrs after transfection using the Dual-Glo Luciferase assay system (Promega E2920) in a Tecan M-1000 plate reader. Data analysis was carried out using Microsoft Excel.
Accession numbers
The sequencing reads from this study are available at the European Nucleotide Archive with the accession number PRJEB12939.
Custom PERL, BASH and R! scripts have been deposited on Github: https://github.com/Foerstemann/small_RNA_seq_analysis.git
Funding
This study has been supported by DFG grant FO-360/9-1 to KF.
Supplementary Figure legends
Acknowledgements
We are grateful for the sequencing support by the laboratory of functional genome analysis (LaFuGa) at the Gene Center Munich.