RBMX enables productive RNA processing of ultra-long exons important for genome stability

Previously we showed that the germline-specific RNA binding protein RBMXL2 is essential for male meiosis where it represses cryptic splicing patterns (1). Here we find that its ubiquitously expressed paralog RBMX helps underpin human genome stability by preventing non-productive splicing. In particular, RBMX blocks selection of aberrant splice and polyadenylation sites within some ultra-long exons that would interfere with genes needed for normal replication fork activity. Target exons include within the ETAA1 (Ewings Tumour Associated 1) gene, where RBMX collaborates with its interaction partner Tra2β to enable full-length exon inclusion by blocking selection of an aberrant 3’ splice site. Our data reveal a novel group of RNA processing targets potently repressed by RBMX, and help explain why RBMX is associated with gene expression networks in cancer, replication and sensitivity to genotoxic drugs.

Introduction expression of genes that are key for genome stability. Importantly, our study provides 107 molecular insights into how ultra-long exons are processed during RNA maturation. 108

109
Global identification of a novel panel of RBMX-regulated RNA processing events 110 The sequence similarity between RBMX and RBMXL2 (34)  In order to detect a wide range of transcriptome changes in RBMX targets we analysed our 121 RNA-seq data using two bioinformatics programme, SUPPA2 and MAJIQ. SUPPA2 uses 122 estimates of whole isoforms expression to detect global changes in RNA processing patterns 123 (35). SUPPA2 analysis predicted 6708 differentially processed RNA isoforms upon RBMX 124 knock-down. Strikingly, Gene Ontology (GO) analysis revealed that approximately 15% of 125 the significantly enriched pathways were related to DNA replication, DNA repair and cell 126 division, while others involved RNA processing, cellular response to stress and other stimuli 127 We also analysed RNA-seq data using the Majiq bioinformatic tool, which detects local 134 splicing variations from RNA-seq data (36) and to identify the RNA processing patterns that 135 most strongly depend on RBMX we visually inspected the RBMX predicted targets on the 136 IGV genome browser (37). This visual search detected 155 strong changes in RNA 137 processing including splice site selection, differential selection of terminal exons and 138 alternative polyadenylation (polyA) sites, in addition to exon skipping ( Figure 1B and Figure  139 1 -Source Data 2). Most of these RNA processing events (80%) were predicted to be 140 repressed by RBMX ( Figure 1C). Importantly, comparison with publicly available RNA-seq 141 data (26) showed that while 48% of the same splicing events that we identified in MDA-MB-142 231 cells also switched mRNA processing after RBMX depletion in HEK293 cells (   The above data showed that genes involved in replication fork activity were globally enriched 197 amongst genes showing strong splicing changes after RBMX depletion. Replication fork 198 accuracy is critical for genome stability, and one of the most strongly RBMX-dependent RNA 199 processing patterns was for the ETAA1 (Ewing's Tumour-Associated Antigen 1) gene. The full-length ETAA1 protein is 926 amino acids long. Splicing selection of the ETAA1 exon 216 5-internal 3ʹ splice site produces an mRNA isoform predicted to encode an ETAA1 protein 217 isoform of just 202 amino acids ( Figure 2D). Although the ETAA1 exon 5-internal 3ʹ splice 218 site is annotated on Ensembl (v94), it is rarely selected in cells treated with control siRNAs 219 ( Figure 2A). Confirming that correct expression of ETAA1 protein depends on RBMX, 220 Western blot analysis with an antibody specific to ETAA1 protein showed strong reduction of 221 the full-length ETAA1 protein after RBMX depletion ( Figure 2E). Such a short ETAA1 protein 222 would lack RPA binding motifs ( Figure 2D) and thus be unable to operate similarly to the full-223 length ETAA1 protein isoform. Hence normal ETAA1 gene function relies on the RNA 224 processing activity of RBMX.  with either control siRNA ("Control siRNA") or two separate siRNAs against RBMX ("RBMX siRNA1" 253 and "RBMX siRNA2"). The strong upstream 3ʹ splice site and the two weak downstream 3ʹ splice sites 254 on ETAA1 exon 5 are shown. The HEK293 RNA-seq data is from (26).

256
RBMX cooperates with Tra2β to suppress cryptic splicing within ETAA1 exon 5 257 Two possible mechanistic models could explain the different use of 3ʹ splice sites within 258 ETAA1 exon 5 in RBMX-depleted cells: RBMX could normally promote recognition of the 259 strong upstream splice site; or, RBMX could normally prevent usage of the weak 260 downstream splice site. In order to distinguish between these two possibilities, we performed 261 a minigene assay. Briefly, a fragment of ETAA1 exon 5 that spanned the weak internal 3ʹ 262 splice site and flanking genomic regions (but not the stronger upstream 3ʹ splice site) was 263 cloned into an expression plasmid between two β-globin exons (47) ( Figure 3A and  The above data showed that RBMX prevents cryptic mRNA processing of the ultra-long 346 ETAA1 exon 5, which at 2111nt is considerably longer than the 129 nt median size of human 347 exons. Further examination revealed that RBMX also controls productive splicing patterns of     Furthermore, visual inspection of an alignment file that compares the RNA-seq reads from 430 cells treated with RBMX siRNA to control siRNA using the bamcompare tool from deepTools 431 v3.5.0 (58) confirmed reduction of RNA-seq reads after this putative polyA site upon RBMX 432 depletion ( Figure 5C). Consistent with this, RT-PCR analysis showed that the relative 433 abundance of a PCR product spanning the premature termination site, normalised over a 434 region upstream, was significantly reduced in RBMX-depleted cells compared to control 435 ( Figures 5D, E). This suggests that RBMX prevents premature transcription termination 436 within BRCA2 exon 11. 437

438
In order to better understand the impact of RBMX knock-down on premature transcription 439 termination, we used the IGV genome browser (37), annotation on previously mapped polyA 440 sites (54) and the bamcompare comparative track (58). We visually defined a termination 441 window (TW) within these genes where RNA-seq tracks drop in RBMX knock-down 442 compared to control. We then quantified RNA-seq reads upstream ("before") and 443 downstream ("after") of the TW ( Figure 5F

559
The above data showed that RBMX is required for productive RNA processing of genes 560 important for replication fork activity, including ETAA1, REV3L,ATRX,FANCM and BRCA2. 561 However, depletion of RBMX in U2OS cells caused no defects in S phase of the cell cycle 562

617
Here we have tested the hypothesis that RBMX controls genome stability via RNA 618 processing. Supporting this, global analyses of RBMX-controlled mRNA processing patterns 619 in human breast cancer cells show RBMX suppresses the use of splicing and 620 polyadenylation sites within key genes that are crucial for genome stability (Figures 7A, B). 621 This conclusion changes the way that we think about RBMX and DNA damage control, from 622 a purely structural role at sites of replication fork stalling or DNA damage (15,21), to include 623 an earlier role in gene expression patterns that regulate genome maintenance. Moreover, 624 this better understanding of RBMX-controlled RNA processing patterns provides new 625 molecular insights through which RBMX could operate as a tumour suppressor (4-8), and 626 within gene expression networks in cancer cells (3). 627

628
The RBMX-regulated RNA processing events identified in this study have largely distinct 629 properties compared with previous reported targets (26,27) in that they: (1) include a wider 630 spectrum of RBMX-regulated events than just skipped exons (2); are largely suppressed by 631 RBMX; and (3) seem to be regulated by RBMX largely independent of m6A RNA 632 modification. The RNA processing defects detected in this study are conceptually similar to 633 those detected in the mouse testis after the genetic deletion of the RBMX paralog Rbmxl2 634 (34), which showed increased use of weak splice sites that would poison gene expression. 635 Hence, although the actual regulated genes are different between human breast cancer cells 636 and mouse testis, RBMX and Rbmxl2 share similar predominantly repressive activities that 637 are important for productive gene expression. 638 639 Replication fidelity makes a key contribution to genome stability, and depletion of RBMX 640 causes defective ATR activation in response to replication fork stalling (15). Our data here 641 reveal that amongst the strongest defects in RNA processing patterns in response to RBMX 642 depletion are six genes that encode key replication fork proteins (these are ETAA1, REV3L, 643 BRCA2, ATRX, GEN1 and FANCM, Figures 7A, B). Most importantly, these include ETAA1 644 protein, which associates with single strand DNA at stalled replication forks, to activate ATR 645 kinase in an independent and parallel pathway to TOPBP1 (17,45,46). ETAA1 is crucial for 646 replication fork activity: cells directly depleted for ETAA1 protein (which also becomes 647 virtually undetectable after RBMX protein depletion) become hypersensitive to replication 648 stress and exhibit genome instability (45). RBMX is also required for productive expression 649 of REV3L, which encodes the catalytic component of DNA polymerase ζ. This polymerase is 650 used to by-pass sites of DNA adduct incorporation, or difficult to replicate DNA (51). The 651 large exon in REV3L that is disrupted by RBMX depletion encodes a 1386 amino acid 652 disordered peptide stretch important for efficient polymerase ζ activity, and inactivation of 653 REV3L causes genomic instability (51). Furthermore, RBMX represses an exitron within 654 ATRX, a gene encoding a protein that stabilises stalled replication forks (67). RBMX is 655 similarly important for full-length expression of the FANCM gene, which encodes a DNA 656 translocase that remodels stalled replication forks to facilitate activation of the ATR/ATRIP 657 kinase complex (68). RBMX promotes full-length UTR expression from the GEN1 gene, 658 which encodes a protein that resolves stalled replication forks (69). RBMX is also critical for 659 full-length expression of the BRCA2 gene that protects stalled replication forks from 660 degradation (70,71). Finally, we detect a subtle upregulation of other genes involved in DNA 661 replication and DNA damage control after RBMX depletion, which likely represents a cellular 662 response to increased DNA replication fork stalling (66). Previous studies have shown that 663 Chk1 kinase inhibits the E2F6 transcriptional repressor to promote upregulation of cell-cycle 664 transcriptional programmes in response to replication stress (66). However, further analysis 665 will be required to clarify the mechanisms of compensation that maintain replication fork 666 stability in the absence of RBMX. We also cannot exclude that shorter proteins are made 667 from truncated mRNAs after RBMX-depletion that might interfere with the function of the full-668 length protein isoforms.  These LSVs were then manually inspected using the RNA-seq data from the second RNA 753 sequencing of biological replicates for both RBMX-depleted and control cells, by visual 754 analysis on the UCSC browser (81) to identify consistent splicing changes that depend on 755 RBMX expression. The triplicate RNA-seq samples were further analysed for splicing 756 variations using SUPPA2 (35), which identified 6702 differential splicing isoforms with p-757 value < 0.05. Predicted splicing changes were confirmed by visual inspection of RNA-seq 758 reads using the UCSC (81) and IGV (37)  RNA was extracted using standard Trizol extraction protocol and DNAse treated using DNA-764 free kit (Invitrogen). The RNA from siRNA-treated cells was extracted using standard Trizol 765 RNA extraction (Life Technologies) following manufacturer's instructions. cDNA was 766 synthesized from 500 ng total RNA in 10 μl reactions using Superscript VILO cDNA 767 synthesis kit (Invitrogen) following manufacturer's instructions. To analyse the splicing 768 profiles of the alternative events primers were designed using Primer 3 Plus and the 769 predicted PCR products were confirmed using the UCSC In-Silico PCR tool. ETAA1 770 transcript isoform containing the long exon 5 was amplified by RT-PCR using primers 5'-771 GCTGGACATGTGGATTGGTG-3' and 5'-GTGCTCCAAAAAGCCTCTGG-3', while ETAA1 772 transcript isoform containing the short exon 5 was amplified using primers 5'-773 GCTGGACATGTGGATTGGTG-3' and 5'-GTGGGAGCTGCATTTACAGATG-3'. RT-PCR 774 with this second primer pair could in principle amplify also a 2313 bp product from the 775 ETAA1 transcript isoform containing the long exon 5, however PCR conditions were chosen 776 to selectively analyse shorter fragments. BRCA2 transcript isoform encompassing the 777 putative polyA site within exon 11 and a control fragment upstream this site were amplified 778 by multiplex RT-PCR using a forward primer 5'-TCAGGTAGACAGCAGCAAGC-3' and two 779 reverse primers, respectively 5'-TCCCTCCTTCATAAACTGGCC-3' and 5'-780 AACCCCACTTCATTTTCATCTGTT-3'. All PCR reactions were performed using GoTaq ® G2 781 DNA polymerase kit from Promega. All PCR products were examined using the QIAxcel ® 782 capillary electrophoresis system 100 (Qiagen). 783 784

Transcription termination analyses 816
Termination widows (TW) for all genes in Figure 5 -Source Data 1, which appear 817 prematurely terminated upon treatment with RBMX siRNA, were defined as the regions 818 where RNA-seq reads drop on tracks from RBMX-depleted cells but not on tracks from 819 control cells. Confirmation of TW was carried out by visual inspection of a comparative 820 alignment track generated using the bamcompare tool from deepTools v3.5.0 (58) on IGV 821 browser (37). Subsequently, a .SAF annotation file was built to define the regions "before" 822 and "after" TW for each premature transcription termination event. Specifically, regions 823 "before" were defined from Transcription Start Site (TSS) to TW Start coordinate and regions 824 "after" were defined from TW End coordinate to Gene End coordinate. Strand (+/-), TSS and 825 Gene End annotations were obtained from UCSC (81). The .SAF file was used as index to 826 Gene Set Enrichment Analysis (GSEA) was performed using the Broad Institute GSEA 864 software v.3.0 (65). Genes identified by RNA-seq were ranked using log10(p-value) with a 865 negative sign for down-regulated genes and positive sign for up-regulated genes. 866 Enrichment was queried for REACTOME pathways using the pre-ranked tool of GSEA 867 software with 1000 permutations.