R-loops and Topoisomerase 1 facilitate formation of transcriptional DSBs at gene bodies of hypertranscribed genes

DNA double-stranded breaks (DSBs) are considered the most deleterious type of DNA damage. Although cytotoxic levels of DSBs occur mainly due to external factors, such as ionizing irradiation and UV radiation, DSBs can occur naturally under physiological conditions through cell’s natural processes such as transcription and replication. However, how physiological DSBs are generated during transcription is still poorly understood. Here, we mapped DSBs using in-suspension break labeling in situ and sequencing (sBLISS) and re-analysis of ChIP-seq and DRIP-seq data, along with immunofluorescence staining for DNA damage markers in MCF-7 cells. Mapping of DSBs revealed their occurrence at highly expressed genes that are enriched with Topoisomerase 1 (TOP1), the main enzyme responsible for resolving positive and negative supercoiling, and with R-loops. Depleting R-loops and TOP1 decreased DSBs specifically at highly expressed and TOP1/R-loops-enriched genes implying that TOP1 and R-loops are the main causes of transcriptional DSBs. These findings propose a model in which DSBs in highly expressed genes, mainly oncogenes, are caused by resolving of R-loops and the catalytic activity of TOP1. Our data also reveal that these genes, despite being highly fragile, are covered by γ-H2AX suggesting efficient repair signaling. Our findings underscore the critical roles of TOP1 and R-loops in regulating DSBs in hypertranscribed oncogenes implicated in carcinogenesis.


Introduction
Double-stranded breaks (DSBs) are considered to be the most deleterious type of DNA damage, and thus must be repaired to preserve genome integrity. Although cytotoxic levels of DSBs occur mainly by external factors such as ionizing radiation and UV radiation, DSBs can occur naturally under physiological conditions through the cell's natural processes such as transcription and replication (Wilhelm, Said and Naim, 2020).
Transcription can cause DSBs in replicating cells due to collision with the replication machinery, especially when transcription and replication have opposite directions (Mirkin and Mirkin, 2005;Helmrich, Ballarino and Tora, 2011;Hamperl et al., 2017). This collision (also called transcription-replication conflict) happens when the replication machinery, mainly DNA polymerases, and the transcriptional machinery, mainly RNA polymerases, encounter each other in close proximity within the gene. Since cancer cells are highly active exhibiting hypertranscription and high replication levels, they are expected to exhibit a high occurrence of such conflicts (Helmrich, Ballarino and Tora, 2011;Hamperl et al., 2017). This collision between transcription and replication was found to occur mainly at transcription termination site (TTS), and to be prevented by the activity of topoisomerase 1 (TOP1) (Promonet et al., 2020). Elevated transcription was found to strongly correlate with DSBs (Lensing et al., 2016). However, how transcription by itself causes DSBs is still under investigation.
During transcription elongation, the DNA unwinds around the transcribing RNA Pol II (Choder and Aloni, 1988), which increases the chance of newly transcribed RNA to anneal to its template (García-Muse and Aguilera, 2019), forming a three-stranded nucleotide structure composed of a DNA:RNA hybrid and a dislocated single-stranded DNA, known as an R-loop. Very short R-loops (<10 bp) are important for various cellular processes and are formed during transcription and removed afterward by the enzyme RNase H1 (Cerritelli and Crouch, 2009) and RNA-DNA helicases (Aguilera and Gómez-González, 2017). Longer R-loops on the other hand, are known to be harmful to genome integrity due to generating DSBs. The specific mechanism by which R-loops generate DSBs is still poorly understood. A proposed mechanism is that they expose the displaced ssDNA to endonucleases (Sollier and Cimprich, 2015), making it more susceptible to single-stranded breaks (SSBs). Another observation is that R-loop structures can be recognized by specific endonucleases such as XPG, XPF, and FEN1 (Cristini et al., 2019), which cleave the ssDNA hybridized to the RNA in the R-loop leaving a SSB.
However, how these SSBs turn into DSBs is unknown.
Transcription elongation causes negative and positive supercoiling behind and ahead of the transcribing RNA PolII, respectively (Liu and Wang, 1987;Ma and Wang, 2016). This supercoiling has to be resolved to preserve efficient transcription, especially at highly transcribed genes (Ma and Wang, 2016). TOP1 is the main enzyme responsible for resolving positive and negative supercoiling; TOP1 cuts the DNA single-strandedly through its catalytic domain allowing the DNA to relax, followed by re-ligation of the cut strand (Pommier et al., 2016(Pommier et al., , 2022. Optimally, TOP1 forms a transient Topoisomerase cleavage complex (TOP1cc) which cuts the DNA, and this transient complex is rapidly reversed and TOP1 is released, eventually leading to no persistent DNA damage.
However, if TOP1 is trapped at its catalytic state, either by physiological or pathological causes, the ssDNA would remain cut, and TOP1cc that is bound to the DNA should be removed through the Tyrosyl-DNA Phosphodiesterase 1 (TDP1) excision pathway for the break to be repaired (Cristini et al., 2016;Pommier et al., 2022).
DSBs can be detected using various methods, including antibodies against markers of DNA damage like anti-γ-H2AX which is helpful to detect changes in DNA damage with different treatments and conditions, however, it fails to locate the site of these breaks in the genome. Using those antibodies for chromatin immunoprecipitation sequencing assays (ChIP-seq) can detect the location of those breaks (DeCaprio and Kohl, 2020), but because of the nature of γ-H2AX distribution, which can be as wide as several kilobasepairs around the break, it is less efficient to detect the site of the break with high resolution. For these reasons, we used the newly established technique, break labeling in situ and sequencing (sBLISS) (Bouwman et al., 2020), which detects DSBs in a single nucleotide resolution. sBLISS can detect the levels of DSBs across the genome, with high discrimination between DSBs occurring at proximal genomic locations and has also been confirmed to detect physiological DSBs (Hazan et al., 2019), which makes it a very powerful technique for studying transcriptional DSBs.
Here we used multiple techniques, including sBLISS, ChIP-seq, and DRIP-seq data to uncover a mechanism in which transcription, R-loops, and TOP1 altogether facilitate physiological DSBs formation at active regions and especially at highly transcribed genes.

Physiological breaks are enriched at highly transcribed genes
In our previous work (Hazan et al., 2019), we found physiological DSBs to be highly enriched at active regions such as promoters, insulators, and super-enhancers using the BLISS method. To further uncover the mechanism regulating physiological DSBs, we mapped DSBs distribution across the genome (breakome) using the newly developed sBLISS method (Bouwman et al., 2020) in untreated MCF-7 breast cancer luminal cells.
Comparing the observed breaks to those that are expected at different chromatin states, we found that the observed:expected ratio is higher at active promoters, strong enhancers, transcription elongation, and transcription transition chromatin states, as opposed to repressed regions, where the ratio was significantly lower ( Figure 1A). The increased break enrichment relative to expected at active but not at repressed regions further confirms the involvement of transcription in the formation of DSBs in these regions.
To further test this, relative gene expression was measured using the RNA sequencing technique CEL-seq (Hashimshony et al., 2012), and relative gene expression for each gene was correlated with its break density. Interestingly, relative gene expression was found to be positively correlated with break density ( Figure 1B and C, Supplementary Figure 1A, B and D-K), indicating that higher expressed genes tend to have higher break density. Moreover, gene ontology analysis of the top 400 broken genes in MCF-7 showed enrichment in biological processes such as translation, gland morphogenesis, and response to estradiol, which are biological processes that are expected to be highly active in breast epithelial tissue (Supplementary Figure 1C), indicating that active genes tend to exhibit high break densities.
To investigate these transcriptional DSBs, the top 1000 expressed and the lower 1000 expressed genes were selected from our CEL-seq, and for each gene, the ratio of observed to expected breaks was calculated ( Figure 1D). As predicted, highly expressed genes displayed a higher ratio (~7 folds) compared to low expressed genes (<1) ( Figure   1E). When examining break density distribution across the gene body of these genes ( Figure 1F), we observed an elevated break density at the gene body of the highly transcribed genes compared to low transcribed genes, with two distinct peaks at transcription start site (TSS) and transcription termination site (TTS).
These data clearly suggest that physiological DSBs are more enriched at active regions, specifically at highly transcribed genes, and point towards transcription as a direct or indirect cause of physiological DSBs. The distribution of DSBs in breast cancer MCF7 cells along ChromHMM-defined chromatin states of HMEC. Bar height represents break enrichment relative to other chromatin states. B. A zoomed log scaled scatter plot showing a positive correlation between break density and expression levels; each dot represents a gene, and for each gene break density and expression were measured and normalized to gene size. C. Mean expression (TPM) positively correlates with break density, genes were grouped into 22 groups with increasing break enrichment, the 22nd group being the group with the highest mean break enrichment. D. Ratio of observed breaks vs expected breaks at highly and low expressed genes. E. Plot showing break density at gene body of the 1000 most transcribed genes (continuous line) and 1000 least transcribed genes (dashed line). sBLISS experiments were done in duplicates, and CEL-seq was done in quadruplicates.

Transcriptional DSBs are associated with TOP1
Previously, TOP1 was found to overlap regulatory elements such as insulators, promoters, and enhancers, especially at those which exhibited high break density (Baranello et al., 2016;Hazan et al., 2019). To directly test the involvement of TOP1 in transcriptional DSBs formation, we correlated TOP1 levels and expression levels.
Interestingly, the expression of moderately and highly expressed genes revealed a positive correlation with TOP1 levels (Figure 2A, B, Supplementary Figure 2A). Moreover, we found high enrichment of breaks at TOP1 peaks across the genome ( Figure 2C). Furthermore, we compared the ratio of the observed breaks to the expected breaks between high and low TOP1 enriched genes and found a greater ratio in high TOP1 genes (~9 folds) compared to low TOP1 genes ( Figure 2C). To further confirm the significance of TOP1, we knocked down TOP1, using siRNA (Supplementary Figure 2B), and examined the consequences on DNA breaks using immunofluorescence staining. As seen in Fig   Transcriptional DSBs are associated with R-loops R-loops are known for their harmful effect on genome integrity, and their formation is known to be facilitated by the transcriptional machinery (Aguilera and García-Muse, 2012;García-Muse and Aguilera, 2019). Given the connection between TOP1 and R-loops, which was further validated by correlating TOP1 levels with R-loops levels (Supplementary Figure 2C), we next set to test the involvement of R-loops in the formation of transcriptional DSBs.
Correlating R-loops levels, using published DRIP-seq data done on untreated MCF-7 cells, and expression revealed a positive correlation in the top 50% expressed genes ( Figure 3A and B, Supplementary Figure 2D). When mapping break enrichment at Rloops peaks, we observed elevated break enrichment at these peaks ( Figure 3C).
Furthermore, comparing observed to expected breaks ratio between high and low R-loops genes revealed that genes enriched with R-loops have a higher ratio (~9 folds) compared to those deprived of R-loops ( Figure 3D).
To test the effect of R-loops depletion on global DSBs, the enzyme RNase H1 which degrades the RNA moiety of R-loops (García-Muse and Aguilera, 2019) was overexpressed in MCF-7 cells, and immunofluorescence staining was performed for 53BP1 and γ-H2AX ( Figure 3E, F). Interestingly, cells overexpressing RNase H1 had fewer γ-H2AX and 53BP1 foci per nucleus compared to control cells, indicating the involvement of R-loops in the formation of physiological DSBs. correlation between expression and R-loops levels for the top 50% expressed genes, each dot represents a gene. DRIP data was extracted from GSE81851. B. Mean expression (TPM) correlates with R-loops levels for the top 50% expressed genes. C. Heatmap of break density at R-loops regions across the genome. D. Ratio of observed breaks to expected breaks for low R-loops and high R-loops genes. E. Representative images of immunofluorescent staining of MCF-7 EV and MCF-7 overexpressing RNase H1 for γ-H2AX and 53BP1. F. Number of γ-H2AX and 53BP1 foci per nucleus, for MCF-7 EV and MCF-7 RNase H1. sBLISS and immunofluorescence experiments were done in duplicates.

TOP1 knockdown and RNase H1 overexpression decrease break enrichment specifically at the gene body of highly transcribed genes
To investigate the contribution of TOP1 and R-loops in transcriptional DSBs formation, TOP1 was depleted in MCF-7 cells overexpressing RNase H1 or control cells (supplementary Figure 3A, B, and C), and changes in DSBs were mapped by sBLISS ( Figure 4A). We first examined break enrichment at different chromatin states and found that regions associated with transcription elongation and transcription transition chromatin states, but not active promoters, had fewer breaks in the treated samples compared to the control ( Figure 4B). To investigate the effect of RNase H1 overexpression and TOP1 knockdown on highly active regions, we looked specifically at the highest expressed genes selecting the top 1000 expressed genes from CEL-seq data that was done on those treated MCF7 cells. Remarkably, we found that highly transcribed genes exhibit a decrease in DSBs enrichment when TOP1 is depleted and/or RNase H1 is overexpressed ( Figure 4C, Supplementary Figure 3D). In contrast, no significant effect was detected in low expressed genes ( Figure 4C, Supplementary Figure 3D). In addition, genes enriched with R-loops displayed a significant decrease in break enrichment after RNase H1 overexpression and/or TOP1 knockdown, while no significant change was detected in genes deprived of R-loops ( Figure 4D). Similarly, genes enriched with TOP1 exhibited a significant decrease in break density as compared to those which are deprived of TOP1 ( Figure 4E). We next examined how the distribution of breaks changed across the gene body upon TOP1 depletion and/or RNase H1 overexpression and found that breaks density decreases in gene body and TTS, but not at TSS ( Figure 4F and Supplementary Figure 3E). These findings are consistent with the observation that TOP1 is catalytically active at gene bodies but not at TSS (Baranello et al., 2016).
Interestingly, both manipulations (RNase H1 OE and TOP1 KD) together did not give a complementary effect compared to each treatment separately. To further uncover this, we selected active genes that exhibited the most decrease in break density with each treatment separately (calculated by the average of the two separate treatments) and examined break density of the three groups. Interestingly, although genes were selected based on the separate manipulations (SR or TE), break density was lowest with the combined manipulations (TR) at these genes (

Fragility of estradiol responsive genes is mediated by R-loops and TOP1
To test if an increase in transcription would increase global DNA damage, transcription in MCF-7 cells was induced by treatment with estradiol (E2) followed by immunofluorescence staining for γ-H2AX and 53BP1 ( Figure 4A, B). As expected, we found that cells displayed more 53BP1 and γ-H2AX foci per nucleus when treated with E2 (increased transcription), and this increase is suppressed with RNase H1 overexpression or TOP1 knockdown, indicating that the increase in DNA damage upon E2 treatment is mediated by R-loops and TOP1. To further specify these results, we utilized sBLISS to map DSBs in control cells (NE/SN), cells treated with estradiol (EE/SE2), and cells overexpressing RNase H1 or knocked down for TOP1 and treated with estradiol (ER/TE2). We found that estrogen responsive genes, exhibiting significant increased expression upon E2 treatment (extracted from GSE27463) (Hah et al., 2011), displayed increased break density as opposed to down-regulated genes (Supplementary Figure 4 A-D). Importantly, the majority of these genes had decreased break density upon depletion R-loops or TOP1 (ER/TE2) as compared to estradiol only, indicating that the increase in transcriptional DSBs upon E2 treatment is mediated by R-loops and TOP1. Furthermore, selecting estrogen responsive genes which exhibited increased break density following estradiol induction showed strongly that the fragility of these genes was highly attenuated by the depletion of R-loops or TOP1 (Figure 5 E, F), further confirming the involvement of TOP1 and R-loops in transcriptional DSBs in general, and specifically estrogen associated DSBs. Moreover, disease related pathway analysis for these estrogen responsive genes using DisGenNET (Piñero et al., 2021)   Representative images of immunofluorescent staining of MCF-7 EV non treated, MCF-7 EV treated with estradiol (E2), and MCF-7 cells overexpressing RNase H1 and treated with E2, for γ-H2AX and 53BP1. B. Quantification of number of γ-H2AX and p53BP1 foci per nucleus, for MCF-7 EV NT and MCF-7 EV treated with E2, and MCF-7 cells overexpressing RNase H1 treated with E2. C. Representative images of immunofluorescent staining of MCF-7 siSc non treated, MCF-7 siSc treated with estradiol (E2), and MCF-7 cells knocked down for TOP1 and treated with E2, for γ-H2AX and 53BP1. D. Quantification of number of γ-H2AX and p53BP1 foci per nucleus, for MCF-7 siSc NT and MCF-7 siSc treated with E2, and MCF-7 cells knocked down for TOP1 and treated with E2. E. Color coded heatmap showing the change in break density between control (NE), cells incubated with estradiol (EE), cells incubated with estradiol and overexpressing RNase H1 (ER), for estrogen responsive genes with the highest positive differential break density. F. Color coded heatmap showing the change in break density between control (SN), cells incubated with estradiol (SE2), cells knocked down for TOP1 (TN), for estrogen responsive genes with the highest positive differential break density. sBLISS and immunofluorescence experiments were done in duplicates.

DSBs at highly transcribed genes might limit transcription
To make sure the decrease in breaks enrichment at highly expressed genes was not due to decreased transcriptional levels caused by the treatment itself, we examined the change in normalized expression counts of the top 1000 expressed genes with the different treatments using CEL-seq (Supplementary Figure 5 A, B). Surprisingly, cells overexpressing RNase H1 (SR) or depleted for TOP1 (TE), but not with both treatments (TR), had more genes with increased normalized expression counts indicating increased expression of the top 1000 expressed genes with treatments (62.6%, 67.7%, and 47.3%

TOP1 and R-loops are the main causes of DSBs at highly transcribed genes
To further investigate the contribution of TOP1 and R-loops in the formation of DSBs of highly expressed genes at physiological conditions, we compared different gene sets with similar expression levels in untreated MCF-7 cells. Interestingly, TOP1 and R-loops density in break-prone genes is greater than in break-deprived genes although both groups exhibited similar expression levels ( Figure 6A, B). These results indicate that TOP1/R-loops density dictates the level of DSBs formation at actively transcribed genes.
Moreover, break density in highly expressed genes that are deprived of R-loops/TOP1 is lower as compared to similarly expressed genes with high R-loops/TOP1 genes, and have slightly lower breaks as compared to low expressed genes ( Figure 6C). These data clearly suggest that TOP1 and R-loops are contributing the major source of DSBs at highly expressed genes.

Transcriptional DSBs can cause mutations at cancer genes
Endogenous DSBs are a source of genomic instability and mutations that contributes to cancer (Tubbs and Nussenzweig, 2017). To investigate the involvement of transcriptional DSBs in cancer driver mutations, we examined break density change with each treatment, at well-established cancer driver mutated genes (Nik-Zainal et al., 2016;Hu et al., 2020).
Interestingly, from the 97 genes that have breakome data, 77 genes (79%) exhibited a decrease in break density with TOP1 knockdown and/or RNase H overexpression ( Figure   7A), with 62 genes (80%) having a decrease in both treatments. Sorting these genes according to their break enrichment shows that most of these genes exhibited a decrease in break density with RNase H1 overexpression and/or TOP1 knockdown, especially at those genes with the highest break density in the control ( Figure 7A). Moreover, these genes have a significant decrease in the total relative breaks with each treatment but not both treatments ( Figure 7B). These results indicate that TOP1 and R-loops mediated transcriptional DSBs can be a cause of mutations at cancer genes.

γ-H2AX is enriched at highly transcribed genes, mainly oncogenes
To investigate repair signaling of transcriptional DSBs, we performed chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) using a γ-H2AX antibody, known to mark DNA damage and repair sites (Mah, El-Osta and Karagiannis, 2010), in MCF7 cells without exposing them to external DNA damage. The distribution of γ-H2AX highly overlapped with RNA polymerase II (Pol II) ( Figure 8A) and correlated with transcription level and was enriched in highly expressed genes ( Figure 8B). These findings imply that γ-H2AX may be required for high transcriptional activity, as previously suggested (Seo et al., 2012). Next, we compared γ-H2AX coverage between different gene sets and found that break-prone genes display a much higher γ-H2AX enrichment as compared to break-deprived genes ( Figure 8C), further validating the role of γ-H2AX in DNA damage repair. Moreover, γ-H2AX was only enriched at highly expressed genes ( Figure 8D), consistent with DSBs data generated by sBLISS ( Figure 1E and Supplementary Figure 3E).
To further uncover the dynamics of DSBs at well established oncogenes, we investigated break enrichment at known oncogenes that are highly active in MCF7 cells. Previously, systematic screening of genes required for tumor survival have defined a dependency score for each gene in ~500 cancer cell-lines, including MCF-7 (Tsherniak et al., 2017).
We found that many MCF7-dependent oncogenes like ESR1, GATA3, NRAS, CCND1, ACTB, MYC, HES1, EGR1 and TFF1 were very enriched with DSBs (top 5% broken genes) (Supplementary Table 1 Table 3), we hypothesized that these genes must indeed be actively repaired. To test this hypothesis, we looked at γ-H2AX enrichment at these oncogenes. Interestingly, we found that these MCF7dependent oncogenes were indeed marked by high levels of γ-H2AX (Fig. 8E). A zoomout view demonstrates the specificity of γ-H2AX enrichment around these genes as well as other well-characterized oncogenes like XBP1, FOS and miR21 (Supplementary To test whether the decreased breaks observed at these oncogenes are also accompanied by a decrease in γ-H2AX levels, we performed ChIP-qPCR for selected genes that are most enriched with γ-H2AX ( Figure 8F) To determine how these transcriptional DSBs are being repaired, we analyzed the breakome of MCF-7 cells depleted for RAD51 (HR) or XRCC4 (NHEJ), two major proteins involved in DSB repair pathways. We found that while RAD51 depletion slightly increased break density at both TSS and gene bodies (Figure 8  DSBs produced by transcription-blocking TOP1 lesions to be triggered by DNA-PK which recruits NHEJ factors (Cristini et al., 2016). These results imply that transcriptional DSBs caused by TOP1 and R-loops at gene bodies might be repaired by the NHEJ pathway. Oncogene hypertranscription by RNA PolII increases the negative supercoiling (-SC) behind the transcribing RNA PolII. This -SC increases the demand for the catalytic activity of TOP1 to resolve it, which eventually leads to the stabilization of TOP1cc and formation of a persistent SSB. The increased -SC caused by the hypertranscription also facilitate the formation of R-loops that can turn into a SSB, which together with the SSB caused by TOP1cc, could result in a DSB. To ensure efficient and sustained hypertranscription of oncogenes, oncogenes are enriched with the mark γ-H2AX to ensure strong DNA damage response (DDR) and efficient repair of transcriptional DSBs. The illustration was generated using BioRender. sBLISS and ChIP-seq experiments were done in duplicates.

Discussion
Our data show that highly transcribed genes overlap both TOP1 and R-loops, and that Rloops and TOP1 genomic levels can predict the damage at these regions. Moreover, we showed that TOP1 knockdown or RNase H1 overexpression significantly decreases transcriptional DSBs at highly expressed genes, including oncogenes. These findings propose a model (Figure 8 K) in which DSBs in oncogenes are mainly caused by the accumulation of R-loops and the catalytic activity of TOP1. Our data also reveal that these oncogenes, despite being highly broken (fragile), are also highly expressed, and this high expression is mediated by efficient repair signaling as indicated by γ-H2AX accumulation and involvement of XRCC4.

Transcriptional DSBs at TSS are different from gene body
Our data propose a model in which transcriptional DSBs at TSS are produced and processed differently from transcriptional DSBs at gene body. TOP1 and R-loops depletion reduced DSBs only at gene bodies of the highly expressed genes, and not at TSS, consistent with the finding that DSBs at TSS are produced by high-mobility group AT-hook 2 protein (HMGA2) which is required for transcription initiation (Dobersch et al., 2021). This is also consistent with our previous observation that the pattern of DSBs is behaving differently at regulatory element including promoters, strong enhancers and insulators (Hazan et al., 2019). Moreover, TOP1 and R-loop depletion slightly increased break density at promoters and enhancers, implicating that they might serve another function at these regions.
Our data also indicate that not only the origin of DSBs at promoters are different from gene body but also how these DSBs are processed, consisting with a recent report showing that promoters are prone to oxidative damage and are protected by NuMA (Ray et al., 2022). These observations are proposing a model inwhich DNA damage and repair are subjected to different 'rules' at different regions across the gene.

The coordination between TOP1 and R-loops
The mechanism by which TOP1 and R-loops cause DSBs has been previously explored (Cristini et al., 2019). A mechanism was proposed in which TOP1 and R-loops cause DSBs through the induction of two adjacent SSBs, one produced by TOP1cc processing, and the other by the activity of endonucleases which recognizes R-loops structures. In our data, although the general view of highly expressed genes points towards the previously proposed mechanism, a more focussed view on genes with the highest decreased breaks with the separate manipulations showed that these genes had the lowest break enrichment with both RNase H1 OE and TOP1 KD (Supplementary Figure   3 F, H, and J), which was also validated by γ-H2AX ChIP-qPCR ( Figure 8F). These findings imply that R-loops and TOP1 cause DSBs independently of each other. A possible mechanism could be that R-loops and TOP1 themselves could accumulate SSBs at both strands leading to DSBs. Another posibility is that SSBs induced by TOP1 and R-loops could be adjacent to SSBs induced by other agents such as oxidative damage (Ray et al., 2022).

Estrogen responsiveness and DSBs
Previously, R-loops have been proposed to be the main cause of estrogen-induced DNA damage (Stork et al., 2016), pointing towards a role of R-loops in transcriptional DSBs.
Our sBLISS data on E2 responsive genes after TOP1 KD or RNase H1 OE shows induction of breaks at these genes after estradiol treatment, further validating a causing effect of transcription on DSBs formation. Furthermore, TOP1 KD or RNase H1 OE significantly attenuated the effect of estradiol, further validating the involvement of TOP1 and R-loops in transcriptional DSBs formation.

The repair of transcriptional DSBs
Our results show that despite many of MCF7 dependent oncogenes are enriched in the break density list, they are also highly expressed, implying that these regions are repaired very efficiently. Our findings suggest that highly expressed genes including oncogenes, are covered by the histone mark γ-H2AX, indicating that DDR signaling is very active in these regions likely to ensure repair of the DSBs resulting from the hypertranscription of these genes. We also showed that depletion of XRCC4 increased DSBs at the gene body of highly expressed genes as compared to RAD51 knockdown, indicating that the NHEJ repair pathway might be the main pathway for transcriptional DSBs repair at gene bodies, while HR might be more prominent at TSS.

Method details cell culture
MCF7 (HTB-22) were grown in RPMI supplemented with 10% (vol/vol) FBS (GIBCO), glutamine, and penicillin/streptomycin (Beit-Haemek). Cells were grown at 37°C under a humidified atmosphere with 5% CO2. Cells were routinely authenticated by STR profiling, tested for mycoplasma, and cell aliquots from early passages were used.
Cells were seeded to reach a confluency of 60-70% on the day of transfection.
Transfection solution was prepared by mixing 1.5ml serum/antibiotic-free RPMI with 30μL lipofectamine and incubating for 5 minutes, before adding a mixture of 1.5ml RPMI and 0.6nmoles of siRNA followed by 20 minutes incubation at RT. The mixture was then added to the cells in a 10cm plate cultured in antibiotic/serum-free media. The media was replaced by fully supplemented media after 5-6 hours, and cells were incubated in the incubator for 48 hours.
For RNase H1 overexpression, RNase H1 plasmid and its empty control plasmid containing only GFP, a kind gift from Dr. Sara Selig (Molecular Medicine Laboratory, Rambam Health Care Campus and Rappaport Faculty of Medicine, Technion, Haifa 31096, Israel), were used. GFP expression was used as an indicative of RNase H1 overexpression. The plasmid was prepared as described in (Sagie et al., 2017).

Immunofluorescence assay
Cells were seeded on coverslips to a 50-70% confluency at the time of fixation. Fixation was done using 4% paraformaldehyde for 10 minutes, followed by three washes with 1X PBS. Cells were then permeabilized with 0.25% Triton X for 7 minutes followed by another three washes with 1X PBS. This was followed by blocking with 10% goat serum for 1 hour. Cells were then stained with anti-γ-H2AX antibody (abcam, ab26350) using 1:1000 dilution, and anti-53BP1 anti-body (abcam, ab36823) using 1:500 dilution, and incubated in a humidity chamber in the cold room overnight. After three washes with PBS, cells were incubated with the secondary antibodies in a humidity chamber for one hour at room temperature, and then stained with Hoechst stain (1:2000 dilution in 1X PBS) followed by two washes with PBS. Coverslips were then mounted, and immunofluorescent pictures were taken using confocal microscopy.

CEL-seq
Total RNA was extracted using TRI reagent (BioLabs) according to the manufacturer specifications. RNA concentration was measured using Qubit Flex Fluorometer (ThermoFisher Scientific). RNA sequencing libraries were prepared using the CEL-Seq2 protocol in the Technion genomic Center (TGC), as published by (Hashimshony et al., 2012), with one modification; instead of single-cells as input, 2 ng purified RNA was taken as input for library preparation. The CEL-Seq2 libraries were analyzed for average fragment size using Agilent 2200 TapeStation (Agilent) and concentration was measured using Qubit Flex Fluorometer (ThermoFisher Scientific). The libraries were sequenced on the Illumina NextSeq 2000 sequencer (Illumina), 12 bases for read 1 and 88 bases for read 2. Demultiplexing was performed in two steps. First, Illumina demultiplexing was performed using bcl2fastq Illumina software with the following parameters: barcodemismatches =1, minimum-trimmed-read-length = 0, and mask-short-adapter-reads = 0.

Transcription induction using Estradiol
MCF-7 cells were hormonally starved by growing in RPMI media supplemented with 10% charcoal stripped fetal bovine serum for 48 hours. Transcription was induced by replacing the media with RPMI media supplemented with charcoal stripped FBS and 100nM β-Estradiol (Sigma-Aldrich) (pre-dissolved in ethanol), and incubated for 1 hour.
In-suspension break labelling in situ and sequencing (sBLISS) sBLISS was performed as described in (Bouwman et al., 2020). Briefly, 10 6 cells were fixed in 2% paraformaldhyde in 10%FCS/PBS for 10min at room temperature. Fixation was quenched with 125 mM glycine for 5min at RT, and another 5min on ice, followed by two washes in ice cold PBS. The cells were lysed for 60min on ice and the nuclei were permeabilized for 60min at 37°c. Then, nuclei were washed twice with CutSmart Buffer supplemented with 0.1% Triton X-100 (CS/TX100), and DSB ends were in situ blunted with NEB's Quick Blunting Kit for 60min at RT. Blunted nuclei were washed twice with 1x CS/TX100 before proceeding with in situ ligation of sBLISS adapters to the blunted DSB ends. Adaptor ligation was performed with T4 DNA Ligase for 20-24h at 16C and supplemented with BSA and ATP. After ligation, nuclei were washed twice with 1x CS/TX100 and genomic DNA was extracted with Proteinase K at 55°c for 14-18h while shaking at 800rpm. Afterward, Proteinase K was heat-inactivated for 10 min at 95°c, followed by extraction using Phenol:Chloroform:Isoamyl Alcohol, Chloroform, and ethanol precipitation. The purified DNA was sonicated in 100 μL ultra pure water using Covaris M220 for 60s. Sonicated samples were concentrated with AMPure XP beads (Beckman Coulter) and fragment sizes were assessed using a BioAnalyzer 2100 (Agilent Technologies) to range from 300bp to 800bp with a peak around 400-600bp. The sonicated DNA was in vitro transcribed using MEGAscript T7 Kit for 14 h at 37°c. After RNA purification and ligation of the 3 ' -Illumina adaptor, the RNA was reverse transcribed.
The final step of library indexing and amplification was performed using NEBNext® Ultra™ II Q5® Master Mix.

Chromatin immunoprecipitation (ChIP) and ChIP-seq
MCF7 cells (~10 6 ) were crosslinked with 1% formaldehyde (methanol free, Thermo Scientific 28906) for 10 min at room temperature and quenched with glycine, 125 mM final concentration. Fixed cells were washed twice in PBS and incubated in sonication buffer (0.5M NaCl, 0.5%SDS RIPA buffer containing PMSF, protease and phosphatase inhibitors) for 30min on ice. Cells were sonicated using Covaris M220 for 10min to produce chromatin fragments of ~200-300 bp. The sheared chromatin was centrifuged 10min at maximum speed. From the supernatant, 50 μl were saved as input DNA and the rest was diluted in RIPA buffer without NaCl and SDS. The chromatin was immunoprecipitated by incubation with 5 μg of gamma-H2AX (phospho S139; Abcam, AB2893) for 5 mg of TOP1 antibodies (ab3825, ab85038) antibody pre mixed with 100 μl Dynabeads Protein G (ThermoFisher, 10004D) and incubated over-night at 4°C with rotation. Immunoprecipitates were washed twice with RIPA buffer + 150mM NaCl, twice with RIPA buffer + 300 mM NaCl, and twice with TE buffer. The chromatin was eluted from the beads with 200 μl of direct elution buffer (10mM Tris pH 8, 0.3M NaCl, 5mM EDTA, 0.5% SDS) and incubated overnight at 65°C to reverse the cross-linking. Samples were treated with RNAse for 1h at 37 °C and with proteinase K at 55 °C for 2 h. Custom python and R scripts are applied to identify read start position and convert bam files to break bigwig format for downstream analysis. A custom R script is applied to discard a blacklist of positions (mainly within centromeres). For most analyses, breaks are aggregated over genomic tiles using R tileGenome utility.

CEL-Seq analysis
Quality control is applied on CEL-Seq fastq files with trim_galore to remove residual adapters, polyA tails, trim reads to base quality of at least 20 and filter out short reads with size smaller than 20 bp. Initial and final sample qualities are evaluated with fastqc.
Quality processed fastq files are then aligned to GRCh38 transcriptome using salmon aligner in it mapping mode while enabling the write_mapping key to create a sam file of transcripts. The sam file is deduplicated with a custom python script which scans for primary alignments only and deduplicates with respect to the transcript and Unique Molecular Identifiers (UMI). Non-duplicated primary alignments and all their associated secondary alignments are kept. Salmon aligner is executed again in its alignment mode, with sam deduplicated file as input resulting in a de-duplicated transcript count table.

DRIP-Seq analysis
DRIP-Seq peak file is downloaded from GEO datasets -GSE81851. The peaks selected for analysis are for T0-Input after liftover to GRCh38.

TOP1 ChIP-Seq analysis
MCF7 TOP1 ChIP-Seq treated and input fastq files are quality controlled with trim_galore to remove residual adapters, trim reads to base quality of at least 20 and filter out short reads with size smaller than 20 bp. Initial and final sample qualities are evaluated with fastqc. The quality processed fastq files are aligned to GRCh38 genome with hisat2 followed by sorting and indexing with samtools. The utility bamCompare of deepTools is applied on the bam files to create a peak bigwig file. This operation is set to ignore duplicates, discard a blacklist of regions, normalize by RPKM, and compare by ratio in tiles of 50 bp γ-H2AX ChIP-seq data processing The sequenced files, containing 84bp single-ended reads, were trimmed for Illumina adapter and for base quality, keeping Phred score above 20, with trim galore utility of Babraham Bioinformatics. Reads were then aligned to the human genome (GRCh38) with Bowtie2 (v2.2.9) (Langmead and Salzberg, 2012) keeping the default end-to-end scoring coefficients. Coverage files, both bigwig and bedgraph, were created with the bamCompare utility of deeptools (Ramírez et al., 2016) by comparing the treated BAM file with the control input file while ignoring duplicates. We have also created our own black list of regions, mainly consisting of chromosomal centromeres, and filtered the results accordingly.
Peak calling was performed with the MACS2 tool (Zhang et al., 2008). Prior to applying peak calling, filtered bed files were created from the aligned SAM files (for both treated and control) using a custom script. The filter kept reads exhibiting high mapping quality of 42 (highest figure for for Bowtie2) and an average base quality of at least 32. For peak calling, the treated file was compared the control file while ignoring duplicates, setting fold limits at their default of 5-50 but lowering q-value to a stricter figure of 0.01.

Algorithms used for plotting the data
Heatmaps were plotted using the following pipeline. First deepTools computeMatrix is applied with center reference-point. Next, an R script discards outlier spikes from the matrix.
Finally, deepTools plotHeatmap is performed. In two-part plots, each part is defined as an independent region.
Regions of chromatin states are based on GSE57498 for HMEC cell line. For each of the regions, breaks are counted within that region and compared between treated samples or between treated sample and an expected count. The expected count is calculated as the proportion of breaks expected due to regions size relative to overall genome size.
Plots are created with base R functions.
Figures showing break percentage at high expression/ R-loops/ TOP1 were generated using the following pipeline, Downsized breaks are counted across the gene body for all genes-of-interest (e.g., 1000 topmost expressed genes) and summed up for each of the treated samples. Downsizing is performed to reduce possible bias related to library preparation. An expected break value is also calculated based on the relative occupancy of the genes-of-interest relative to overall genome size. Bar-plots are created with base R functions.

Data availability
Raw data are available upon request.