The RNA-binding protein SFPQ preserves long-intron splicing and regulates circRNA biogenesis

Circular RNAs (circRNAs) represent an abundant and conserved entity of non-coding RNAs, however the principles of biogenesis are currently not fully understood. To elucidate features important for circRNA production, we performed global analyses of RNA-binding proteins associating with the flanking introns of circRNAs, and we identified two factors, SFPQ and NONO, to be highly enriched with circRNAs. Using transient knockdown of both proteins in two human cell lines followed by total RNAseq, we found a subclass of circRNAs with distal inverted Alu elements and long introns to be highly deregulated upon SFPQ knockdown. In addition, SFPQ depletion leads to increased intron retention with concomitant induction of cryptic splicing prevalent for long introns causing in some cases premature transcription termination and polyadenylation. While SFPQ depletion has an overall negative effect on circRNA production, premature termination is not the main causative explanation. Instead, data suggests that aberrant splicing in the upstream and downstream regions of circRNA producing exons are critical for shaping the circRNAome, and specifically, we observe a conserved impact of missplicing in the immediate upstream region to drive circRNA biogenesis. Collectively, our data show that SFPQ plays an important role in maintaining intron integrity by ensuring accurate splicing of long introns, and disclose novel features governing Alu-independent circRNA production.

inverted Alu elements and long introns to be highly deregulated upon SFPQ knockdown. In addition, SFPQ depletion leads to increased intron retention with concomitant induction of cryptic splicing prevalent for long introns causing in some cases premature transcription termination and polyadenylation. While SFPQ depletion has an overall negative effect on circRNA production, premature termination is not the main causative explanation. Instead, data suggests that aberrant splicing in the upstream and downstream regions of circRNA producing exons are critical for shaping the circRNAome, and specifically, we observe a conserved impact of missplicing in the immediate upstream region to drive circRNA biogenesis. Collectively, our data show that SFPQ plays an important role in maintaining intron integrity by ensuring accurate splicing of long introns, and disclose novel features governing Alu -independent circRNA production.
condensates known as paraspeckles (Clemson et al , 2009;Fox et al , 2018) , where they play a pivotal role in cellular mechanisms ranging from regulation of transcription by interaction with the C-Terminal Domain (CTD) of RNA polymerase II (Buxadé et al , 2008;Rosonina et al , 2005;Urban et al , 2000) , pre-mRNA splicing (Emili et al , 2002;Ito et al , 2008;Kameoka et al , 2004;Peng et al , 2002) and 3'end processing (Kaneko et al , 2007;Rosonina et al , 2005) to nuclear retention (Zhang & Carmichael, 2001) and nuclear export of RNA (Furukawa et al , 2015) . Recently, SFPQ has been implicated in ensuring proper transcription elongation of neuronal genes (Takeuchi et al , 2018) representing an interesting link to circRNAs, as these are highly abundant in neuronal tissues and often derive from neuronal genes (Rybak-Wolf et al , 2015) .
Here, we show that SFPQ depletion leads to specific deregulation of circRNAs with long flanking introns devoid of proximal inverted Alu elements. Moreover, we show that long introns in particular are prone to intron retention and alternative splicing with concomitant premature termination. While premature termination is not the main driver of circRNA deregulation, we provide evidence for a complex interplay between upstream (acting positively on circRNA production) and downstream features (acting negatively) that collectively govern the production of individual circRNAs in the absence of SFPQ. This not only elucidates a conserved role for SFPQ in circRNA regulation but also identifies upstream alternative splicing as an approach towards circRNA production.

The DALI circRNAs are defined by long flanking introns and distal inverted Alu elements
To stratify circRNAs by their inverted Alu -element dependencies, we characterized the circRNAome in two of the main ENCODE cell lines, HepG2 and K562 (Supplementary Table   1). Using the joint prediction of two circRNA detection algorithms, ciri2 and find_circ, we identified 3044 and 7656 circRNAs in HepG2 and K562, respectively. While proximal inverted Alu elements (IAEs) are important for the biogenesis of a subset of circRNAs (Jeck et al , 2013;Ivanov et al , 2015) , we and others have shown that long flanking introns associate with circRNAs loci, particularly for the conserved and abundant circRNAs (Stagsted et al , 2019;Westholm et al , 2014) , and the biogenesis of this group of circRNA species is largely unresolved. To focus our analysis on the non-Alu , long intron fraction of circRNAs, we subgrouped circRNAs based on their IAE distance and flanking intron length using median distance and length as cutoffs ( Fig. 1A-C), and we observed that these two features show interdependent distributions, where approximately 70% of the top1000 expressed circRNAs group as either Distal-Alu-Long-Intron (DALI) circRNAs or Proximal-Alu-Short-Intron (PASI) circRNAs (Fig. 1D). Apart from long flanking introns and distal IAEs, DALI circRNAs show higher overall expression compared to PASI circRNAs, longer genomic lengths, but a similar distribution of mature lengths (Supplementary Fig.   1A-C). Moreover, almost half of a previously characterized subgroup of circRNAs, the AUG circRNAs (Stagsted et al , 2019) , derive from DALI circles ( Supplementary Fig. 1B), and interestingly, when filtering circRNAs for conservation (in mouse and human), 69-72% of conserved circRNAs are DALI circRNAs (Fig. 1E). This finding suggests that the IAE-dependent biogenesis pathways is not relevant for the most conserved and abundant circRNAs and that other factors must be involved.

SFPQ and NONO are specifically enriched in the introns flanking DALI circRNAs
In order to identify RNA-binding proteins that could drive circRNA formation, we used the elaborate ENCODE eCLIP data (ENCODE Project Consortium, 2012) (Supplementary Table   2). We scrutinized the immediate flanking regions of the 1000 most highly expressed circRNAs in HepG2 and K562 with the assumption that factors directly involved in backsplicing are likely to bind in the vicinity of the back-splicing sites. We extracted an eCLIP enrichment score using Wilcoxon rank-sum tests between the number of eCLIP reads aligned to circRNA flanking regions (upstream and downstream) compared to flanking regions of host exons, i.e. other exons from the circRNAs expressing genes. We found SFPQ to be the protein most highly enriched for circRNA binding in HepG2 cells, while NONO -a known interaction partner for SFPQ (Dong et al , 1993) -was enriched for circRNA binding in K562 cells ( Fig. 2A-B, to our knowledge eCLIP datasets on SPFQ in K562 and NONO in HepG2 are not available). Comparing DALI and PASI circRNAs shows that SFPQ is DALI circRNA specific, both upstream and downstream of the circularizing exons (Fig. 2C, p≤1.2e-16), whereas NONO associates with circRNA loci more generally and with the upstream regions of DALI circRNAs specifically (Fig. 2D). SFPQ, like circRNAs, is known to associate with long introns (Iida et al , 2020;Takeuchi et al , 2018) . To exclude that the enrichment seen is a mere bias from the flanking intron length, we extracted a subset of annotated splice acceptor (SA) and splice donor (SD) pairs sampled to match the expression level (linear spliced reads) and flanking intron lengths of DALI circRNAs (denoted 'DALI-like exons') (Supplementary Fig. 2A-H). This analysis shows that both SFPQ and NONO were significantly more enriched around circRNA exons compared to sampled DALI-like exons (Supplementary Fig. 2E-H).
To validate the binding of SFPQ and NONO on nascent circRNA transcripts, we conducted RNA immunoprecipitation (RIP) qPCR in HepG2 (

SFPQ and NONO depletion represses DALI circRNAs production
To study the impact of SFPQ and NONO on circRNA production, we depleted SFPQ and NONO in HepG2 and HEK293T cells using two different siRNAs for each target  5B and F) confirmed repressed expression of DALI circRNAs and unchanged PASI circRNAs expression relative to host gene levels. Finally, to support a direct role for SFPQ in circRNA formation, we overlaid the results from SFPQ-depleted HepG2 cells with the SFPQ eCLIP data and observed a significant association between SFPQ binding in the flanking regions of DALI circRNAs, as expected, but also a clear association with deregulated circRNAs compared to unchanged circRNAs ( Fig. 3J, p < 1e-17, Wilcoxon rank-sum tests ).
In addition, we examined previously published total RNAseq from SFPQ conditional knock-out (KO) mouse brain (Takeuchi et al , 2018) (Supplementary Table 5

SFPQ depletion affects alternative splicing and intron retention in long genes
Next, to understand the impact of SFPQ and NONO on transcription and splicing in general, we used the RNAseq data to investigate SFPQ/NONO-sensitive mRNAs. Here, we found that SFPQ-depletion triggers a general repression of long genes (stratified by median gene length, Fig. 4A). The read distribution of highly repressed genes showed a peculiar expression profile with unaffected read densities at the genic 5'ends but with dramatic reduction at the 3'end in HepG2 cells ( Supplementary Fig. 7A-D) indicating that the transcription machinery drops off mid gene. This prompted us to survey genes globally for a 'drop-off' phenotype. Thus, we subgrouped genes into their expression profile by slicing each gene into 20 equally sized bins and conducting differential gene expression on all bins.
To subgroup genes with of similar profiles in an unsupervised manner, we clustered the log2foldchanges across genes into five categories, denoted kc1-5, using k-means clustering ( Fig. 4B). Here, kc5 but also kc3 and 4 showed 'drop-off' effects but to different degrees, and interestingly, the effect correlates with gene length (Fig. 4B-C). We obtain almost identical results from SFPQ-depleted HEK293T cells ( Supplementary Fig. 8A-C) and mouse brain (Supplementary Fig. 8F-H).
Upon inspection of the downregulated genes in our samples, we found an upregulation of alternative splicing in the SFPQ KD samples ( Supplementary Fig. 7) . We classified all alternative splicing events as either inclusion or skipping relative to their respective canonical isoform (Fig. 4D) and performed differential expression analysis using DESeq2. This showed an extensive change (mostly upregulation) of alternative splicing events correlating with intron length in both HepG2, HEK293T and mouse brain (Fig. 4E, Supplementary Fig. 8D and I). Of the 2106 significantly deregulated inclusion events in HepG2, more than 96% are upregulated and of these, 76% are not annotated by gencode (Fig. 4F, in HEK293T: 95% upregulated, 78% unannotated, in mouse: 90% upregulated, 88% unannotated: data not shown), and consequently, we suggest that these events are mostly cryptic or aberrant splicing. Furthermore, analyzing the levels of intron retention by quantifying unspliced intronic reads shows a very similar intron-length-dependent pattern with significant retention of long intron upon SFPQ depletion (Fig. 4G, Supplementary Fig. 8E and J). Consistently, we find a clear correlation between exon inclusion and intron retention (Fig. 4H), and a clear enrichment of SFPQ eCLIP signal in regions subjected to alternative splicing and intron retention (Fig. 4I). As an example, for DENND1A , we observe a previously unannotated splicing event joining exon eight to an alternative splice acceptor dinucleotide (AG) residing in intron eight of this gene (Fig. 4J), which is only detectable upon SFPQ knockdown (Fig.   4K), and, in DENND1A , this cryptic event marks the transition from unaffected to repressed state, as quantification of the upstream region shows modest to no effect between control and knockdown, whereas the downstream region is highly suppressed (Fig. 4K-M).
Collectively, this suggests that intron retention and alternative splicing are conserved effects of SFPQ depletion, and that SFPQ plays a vital role in splicing integrity for long introns in particular.

SFPQ depletion results in premature termination events
In order for alternative splicing to result in premature termination of transcription, the alternative/cryptic-included exons need to harbor a polyA-signal that can serve as a functional terminator of transcription. To investigate the magnitude of polyA-signal appearance in SFPQ knockdown samples, we subjected SFPQ and NONO-depleted HEK293T cells to 3'end quantSeq ( Supplementary Fig. 9A, Supplementary Table 6).
Putative polyA-signals were retrieved using the MACS2 callpeak algorithm, and to further increase the signal to noise ratio, we characterized each peak by the presence of a bonafide polyA-signal (PAS: AAUAAA or AUUAAA). Furthermore, for each quantseq peak, we also extracted the highest prevalence of A's in all possible 15-nucleotide windows to reduce non-polyA-tail artefacts in the samples. The fraction of PAS-containing peaks dropped markedly when regions with 14 or 15 nucleotides A-stretches were found ( Supplementary   Fig. 9B), suggesting that these A-rich peaks are likely polyA-tail-independent artefacts and were thus removed from the analysis. The remaining peaks were classified as PAS sites, and for all PASs, the genic origin was annotated, and the differential usage was determined by DESeq2. This showed a clear enrichment on intronic PAS and a repression of exonic PAS usage upon SFPQ knockdown (Fig. 5A). As before, NONO-depletion only showed a modest effect.
As an upstream termination impacts downstream elements, we determined the relative genic  Fig. 9D). For DENND1A, where cryptic splicing marks the transition from unaffected to repressed state, we also observe a clear PAS with a consensus polyA signal (Fig. 5C). This was validated using polyA enrichment, where the alternative transcript is oligo-dT purified as effectively as GAPDH only upon SFPQ knockdown. Collectively, this suggests that a notable fraction of genes exhibit alternative splicing and premature termination upon SFPQ knockdown with increased probability for longer introns, underscoring, once again, the importance of SFPQ in gene expression.

circRNA deregulation is not explained by premature termination
If SFPQ depletion results in wide-spread increase in premature termination, the observed deregulation of circRNAs in our dataset could simply be explained by incomplete transcription and not as a biogenesis effect per se . This notion is consistent with the fact that circRNAs in general and DALI circRNAs in particular associate with long flanking introns prone to alternative splicing and premature termination. However, not all circRNAs were depleted upon SFPQ knockdown and particularly in mouse brain, the DALI circRNAs were affected in both directions (i.e. up-and downregulated). To test whether the deregulation of circRNAs is driven by premature termination, we stratified circRNAs by their host gene clusters. This showed that while most circRNAs derive from kc1, 2 and 4, roughly the same expression profile is observed across all clusters (Supplementary Fig. 10A-F). In addition, comparing backsplicing to linear splicing from the circRNA producing loci, no clear correlation was observed, suggesting that the circRNA deregulation is not a mere consequence of transcription levels (Supplementary Fig. 10G-I). Finally, counting the prevalence of upstream (from the SD) significant intronic quantseq PASs ( Fig. 5F ) or alternative splicing events ( Fig. 5G ) there is no significant difference between up-and downregulated circRNAs, and for alternative splicing no difference between DALI and PASI circRNAs, whereas premature termination is more prominent upstream of DALI circRNAs.
Collectively, we argue that premature termination is not the main driver of circRNA deregulation.

Extracting features important for circRNA biogenesis
But what is then the underlying explanation for the deregulated expression of DALI circRNAs upon SFPQ depletion? As no single feature captures the circRNA deregulation accurately, we turned to multivariate regression analysis. Here, we collected a number of genic features (up-and downstream intron lengths, IAE distance, annotated distance to promoter and termination, and genomic length of circRNA), and differential expression data upon SFPQ depletion (linear up-and downstream splicing, flanking alternative splicing, upstream alternative splicing, up-and downstream intron retention) (Fig. 6A). Pairwise correlation of all features shows modest redundancy but for certain combinations, such as 5' linear splice (5'S) and 3' linear splicing (3'S), we find a high level of positive interdependence (Supplementary Fig. 11 and 12), whereas intron retention generally correlates negatively with linear splicing (5'IR vs 5'S and 3'IR vs 3'S). In fact, linear splicing correlates negatively with all other features included in both HepG2 cells and mouse brain (Supplementary Fig. 11 and 12).
Splitting the quantified circRNAs into train and test sets (80:20 ratio), we trained a generalized linear model (GLM) against the observed circRNA log2foldchange. As all features were standardized, the resulting coefficients serve as a proxy for feature importance. Here, ranking features by coefficient, it is evident for both HepG2 and mouse brain that 5' features generally correlate positively with circRNA production, whereas 3' features (and IAE distance) correlate negatively ( Fig. 6B and D). As seen in both HepG2 and mouse brain, certain features, such as 5'IR (upstream intron retention), 5' intron (upstream intron length) and 5'CSA (upstream cryptic SA usage), are highly distinctive for upregulated circRNA ( Fig. 6C 6B and D). This was also observed in HEK293T, although less convincingly partly due to low sequencing depth in these samples (Supplementary Fig   13A-D). Also, while the performance of the model on the test-set is modest but significant (Supplementary Fig. 14A and C, Pearson correlations: 0.38 (HepG2, p=2.3e-13) and 0.39 (mouse brain, p=6.0e-41), we observe convergens between the HepG2 and mouse brain-derived coefficients suggesting that the obtained features are conserved aspects of SFPQ-mediated circRNA regulation. Here, the most notable difference between HepG2 and mouse brain is the estimated intercept term ( Supplementary Fig. 14C), which we interpret as the difference in cellular context, suggesting that the overall impact of SFPQ on circRNA expression may depend on other unidentified factors. Conclusively, the aberrant splicing and intron retention caused by SFPQ depletion impacts circRNA production and explain in part the DALI-specific deregulation observed.

Discussion:
The biogenesis of circRNAs is typically ascribed the presence of proximal inverted repeat elements positioning the two splice sites involved in backsplicing into close proximity. For example, Alu elements are frequently found close to circRNA exons (Ivanov et al., 2015b;Jeck et al., 2013), but since they are primate-specific (Lander et al., 2001), the biogenesis of most highly expressed and conserved circRNAs can not be explained by the presence of such elements (Stagsted et al , 2019) . Additionally, by RNA association and dimerization, RNA-binding proteins have a similar ability to juxtapose splice sites destined for backsplicing, although this currently only seems to apply to a few specific cases (Conn et al , 2015;Errichelli et al , 2017;Ashwal-Fluss et al , 2014;Kramer et al , 2015) . Here, we attempted to further disclose the impact of RBPs on circRNAs biogenesis and to reveal features important for backsplicing. First, we subgrouped circRNAs into the Alu -dependent subset characterized by proximal IAEs and short flanking introns, termed PASI circRNAs, and the Alu -independent circRNAs with distal IAEs co-occurring with long flanking introns, the DALI circRNAs. By utilizing the extensive eCLIP resource made available by the ENCODE consortium, we identified SFPQ and NONO as two potentially interesting candidates, both associating significantly with the DALI circRNA producing loci. In both HepG2 and HEK293T cell lines as well as in mouse brain conditional knockouts, we observed a general deregulation of DALI circRNAs upon depletion of SFPQ and only subtle effects upon NONO KD, possibly due to the concomitant upregulation of SFPQ in these samples. Thus, we mostly focused our analysis on SFPQ. Here, apart from dramatic changes in DALI circRNA expression, we observed across all samples (HepG2, HEK293T and mouse brain), that the absence of SFPQ results in aberrant splicing with extensive induction of cryptic splice acceptor sites, particularly in long introns. This correlates with a similar increase in intron retention, suggesting that these two phenotypes are closely coupled. Consistently, recent studies have shown SFPQ to associate with long introns (Iida et al , 2020;Takeuchi et al , 2018) and to be vital in regulating alternative splicing of long target genes ultimately affecting neural differentiation (Luisier et al , 2018) and axon development (Thomas-Jinu et al , 2017) . Interestingly, circRNAs tend to originate from longer genes, especially neuronal genes (Ragan et al , 2019;Rybak-Wolf et al , 2015;Szabo et al , 2015;You et al , 2015) , and exons prone to circularize are more frequently flanked by longer introns than non-circularized exons (Jeck et al , 2013) , supporting SPFQ-sensitive regulation of circRNAs, in particular DALI circRNAs.
Using quantseq analysis, we found SFPQ-depletion to activate the use of intronic polyA-sites, which in many cases overlap with aberrant cryptic splicing thereby resulting in premature termination. Consistently, we also observe decreasing signal across the gene body in the absence of SFPQ. A model in which SFPQ facilitates the recruitment of CDK9 to the CTD of RNA polymerase II to maintain transcription elongation was recently proposed to explain the 'drop-off' effect seen upon SFPQ depletion (Takeuchi et al , 2018;Hosokawa et al , 2019) . Instead, we claim that this is partly explained by the induced cryptic splicing and subsequent premature termination, emphasizing that transcription is highly coupled to splicing. SFPQ was initially found to be associated with the polypyrimidine tract, aid in the assembly of the spliceosome and be critical for the second catalytic step in splicing (Ajuh et al , 2000;Makarov et al , 2002;Gozani et al , 1994;Patton et al , 1993) , supporting its role in splicing fidelity.
While DALI circRNAs in HepG2, HEK293T, and mouse brain are generally sensitive to SFPQ depletion, premature transcription termination fails to explain the observed circRNA levels. Instead, upstream intron length and aberrant splicing in the immediate upstream region have stimulating effects on circRNAs biogenesis, while, although less clear, cryptic events downstream show more detrimental impact. We speculate that SFPQ plays an imperative role in splicing fidelity, a role that becomes increasingly important with intron length. In the absence of SFPQ, by less recruitment to the RNA polymerase II by e.g. Dido3 (Mora Gallardo et al , 2019) or due to increased sequestering in paraspeckles, splicing malfunctions resulting in intron retention and reduced backsplicing. With the persistent presence of intronic sequences, cryptic and aberrant splicing become more likely, and cryptic exon inclusions in AU-rich introns will in many cases contain PAS-signal and thus cause premature termination. Consistent with this, RNA polymerase has been shown to be stalled at AT-rich sequences (Henriques et al , 2013;Palangat et al , 2004) , thus allowing a window of opportunity for splicing-and possible cleavage-directed transcription termination.
For circRNAs, the splice-sites involved in backsplicing must be protected from linear splicing as backsplicing occurs less effectively than canonical splicing. In particular, the SA has to remain unspliced until the RNA polymerase reaches the downstream SD. This can be facilitated by a fast polymerase elongation rate (Zhang et al , 2016) or by the lack of spliceosomal components (Liang et al , 2017) . While SFPQ-depletion generally induces cryptic splicing imposing additional splice site competition, it also potentially eliminates upstream linear splicing and thus uncouples the SA from any upstream SD. This potentially exposes the upstream SA for backsplicing consistent with the observed positive impact of cryptic events (5' intron retention and 5'cryptic SA). In addition, we also observe that the mere length of the 5' intron has an important predictive value for circRNA formation. This, we hypothesize, is due to the high correlation with aberrant splicing and intron retention; both features are limited to detection in RNAseq. Supposedly, many of the cryptic events are not detectable in a steady state sequencing approach as they are likely unstable and subjected to nuclear quality control or nonsense mediated decay (NMD), and therefore intron length may serve as a useful proxy for aberrant splicing upon SFPQ-depletion. Furthermore, SFPQ is often described in various protein complexes with some comprising FUS and the nuclear resolvase DHX9. Like SFPQ, FUS has been shown to act in various processes within the cell, such as transcription regulation and RNA metabolism (Lagier-Tourenne et al , 2012;Kwiatkowski et al , 2009;Vance et al , 2009) but also to associate with the 5'ss of long introns (Nakaya et al , 2013;Lagier-Tourenne et al , 2012) , especially those flanking circularizing exons, and hereby regulate circRNA biogenesis (Errichelli et al , 2017) . DHX9 has, on the other hand, been shown to unwind intronic base pairing and thereby reduce the production of Alu -dependent circRNAs (Aktaş et al , 2017;Errichelli et al , 2017) . Both proteins show interesting circRNA regulation abilities which could act cooperatively with SFPQ and thus affect the fate of DALI circRNAs upon SFPQ depletion.
Conclusively, we show that SFPQ is a key regulator of DALI circRNAs production by controlling and enforcing accurate long intron splicing. This highlights the complex and intricate relationship between splicing in general and backsplicing in particular. Furthermore, SFPQ has been associated to diverse neurological diseases, such as ALS (Thomas-Jinu et al , 2017;Luisier et al , 2018) and FTLD (Ishigaki et al , 2017) , and may prove to be a critical for maintaining the circRNAome in these and other neurodegenerative pathologies. And while in steady state scenarios, cryptic splicing is negligible, it is interesting to speculate whether upstream cryptic splicing is generally involved in DALI circRNA production providing a useful tool for manipulating circRNA production without impacting host gene expression. and subsequent centrifugation at 1200 rpm at 4°C for 4 min. 66.6% of the harvested cells was used for RNA isolation, which was carried out using TRIzol® Reagent (Thermo Fisher Scientific) accordingly to manufacturer's protocol. Except for RNA used for RNAseq and RIP, one μg RNA was subjected to DNase I treatment (Thermo Fisher Scientific #EN0521) accordingly standard protocol prior to subsequent analysis. The remaining cells (33.3%) were used for protein isolation; after centrifugation, the cell pellets were resuspended in 2xSDS loading buffer [125 mM Tris-HCl pH 6.8, 20% glycerol, 5% SDS, and 0.2 M DTT] and boiled at 95°C for 5 min.

RT-PCR and RT-qPCR
One μg of DNase-treated total RNA was reverse transcribed using the M-MLV Reverse Transcriptase kit (Thermo Fisher Scientific) according to manufacturer's protocol with the use of random hexamers to prime the reaction. In case of RT-PCR, the reaction was conducted with 30 cycles of PCR with or without RT enzyme (Primers listed in Supplementary Table 7). The products were visualized by 1% agarose gel electrophoresis and verified using Sanger sequencing. For quantitative PCR, cDNA was mixed with Platinum® SYBR® Green I Master kit (Invitrogen) and ran on Light cycler 480 II instrument (Roche). The reactions were carried out in technical triplicates. The obtained Ct values for each triplicate were transformed (2-Ct) and averaged (σ). All samples were normalized to GAPDH. The results were visualized using GraphPad (Prism 7) with individual biological replicates are shown and the mean is plotted as a bar. For statistical analysis, Student's two-tailed t-test was used. P-values below 0.05 (p<0.05) were considered significant. All statistical analyses were performed in GraphPad Prism.

Poly(A) enrichment was performed using NEBNext® Poly(A) mRNA Magnetic Isolation
Module (New England BioLabs® Inc.) according to manufacturer's protocol with five μg total RNA from CTRL-KD or SFPQ-KD in HEK293T cells used as input.

RNA sequencing
For total RNA sequencing of HEK293T, RNA from SFPQ, NONO, and CTRL KD (using two different siRNAs for each condition with biological duplicates) were rRNA depleted using RiboCop rRNA Depletion Kit V1.2 (Lexogen) according to manufacturer's protocol. Subsequent cDNA libraries were prepared using SENSE Total RNA-Seq Library Prep Kit (Lexogen) following manufacturer's protocol .
For 3'end sequencing, cDNA libraries from HEK293T were prepared using QuantSeq 3' mRNA-Seq Library Prep Kit (Lexogen). For both methods, RNA quality was determined using the BioAnalyzer RNA nanochip (Agilent) and library concentration was quantified with KAPA Library Quant KIT RT-qPCR (Roche). Total RNAseq was done as 100nt paired-end sequencing and performed using the Illumnia platform (HiSEQ4000, BGI, Copenhagen), while for 3'end sequencing, 75nt single-end sequencing was performed at MOMA (Aarhus University Hospital) on a NextSeq500. For total RNA sequencing of HepG2 cells, library preparation and sequencing was performed at BGI (Copenhagen) using BGIseq.

RNA-immunoprecipitation (RIP)
RIP was performed as previously described (Rinn et al. 2007) with some modifications to immunoprecipitate endogenous SFPQ and NONO. HepG2 and HEK293T cells were grown to confluence in 15cm 2 dishes. Cells were harvested by trypsinization and resuspended in 2ml PBS, 2ml nuclear isolation buffer (1.28 M sucrose; 40 mM Tris-HCl Ph 7.5; 20 mM MgCl2; 4% Triton X-100) and 6ml water on ice for 20 min (with frequent mixing). Nuclei were pelleted by centrifugation at 2,500G for 15 min. Nuclear pellet was resuspended in 1ml RIP buffer (150 mM KCl, 25 mM Tris-HCl, pH 7.4, 5 mM EDTA, 0.5% Triton X-100 and 5 mM dithiothreitol (DTT) supplemented with Ribolock (Thermo Fisher Scientific) and proteinase inhibitor cocktail (Roche). Resuspended nuclei were split into two fractions of 500 μl each (for Mock and IP) . Nuclear membrane and debris were pelleted by centrifugation at 13,000 RPM for 20 min. Antibody to SFPQ (P2860 Sigma), NONO (ab70335 Abcam) or FLAG epitope (Mock IP, F1804 Sigma) was added to supernatant (2.5μg) and incubated for 4hrs at 4C with gentle rotation. 20 μl of protein A/G beads were added and incubated for 1hr at 4C with gentle rotation. Beads isolated using magnetic, the supernatant were removed and beads were resuspended in 500μl RIP buffer and repeated for a total of 5 RIP washes.

Western blotting
Cells were harvested in 1xPBS and centrifuged at 1200 rpm at 4°C for 5 min. For cell lysis, the cell pellet was collected and resuspended in 2xSDS loading buffer [125 mM Tris-HCl pH 6.8, 20% glycerol, 5% SDS, and 0.2 M DTT] and briefly boiled at 95°C for 5 min before loading 1% on a 10% Tris-Glycine SDS-PAGE gel (Thermo Fisher Scientific) and run for app. 1.5 hours at 125 V. The proteins were transferred to an Immobilon-P Transfer Membrane (EMD Millipore) by wet-blotting ON at 4°C at 25 V. Subsequently, the membrane was pre-blocked for 1 hr at RT with 10% skim milk, followed by 1 hr incubation with primary antibody (Supplementary Table 7) and 1 hr with secondary antibody. After each antibody incubation, the membrane was rinsed 3x 5 min in 1xPBS+0.05% Tween20 and 1x 5 min wash with 1xPBS. The protein bands were developed using SuperSignal West Femto Maximum Sensitivity Substrate kit (Thermo Fisher Scientific) and Amersham Hyperfilm ECL (GE Healthcare) or Medical film (MG-SR plus, Konica Minolta).
CircRNAs were predicted and quantified using ciri2 (Gao, Wang, & Zhao, 2015) and find_circ v1.2 (https://github.com/marvin-jens/find_circ) adhering to default settings, and only the shared predictions with ciri2 quantification were kept for analysis. circRNAs were annotated using annotate_circ.py (python scripts used are available at github/ncrnalab/pyutils). Flanking intron lengths were based on the mean total distance to the flanking exons based on gencode annotation (in case of multiple annotated flanking introns, the mean length was used), and IAE distance is the shortest possible Alu-mediated inverted repeat distance based on RepeatMasker (UCSC Genome Browser). For mouse, the IAE-distance is the shortest distance involving B1, B2 or B4 elements possible. DALI and PASI circRNAs were classified based on the median flanking intron lengths and median IAE distance in the sample. If no flanking introns were annotated, the circRNA was classified as 'other'. Furthermore, circRNAs were classified as conserved if both splice sites coincide exactly with previously detected mouse circRNAs (Stagsted et al , 2019) converted to hg19 coordinates using the liftOver tool (UCSC genome browser). Flanking linear spliced reads from the circRNA producing loci were extracted using get_flanking spliced_reads.py .

Cryptic/alternative splicing and intron retention
First, all spliced reads were extracted from bam-files using get_spliced_reads.py requiring at least an 8 nucleotide match on each exon. Then, separately for each splice-donor and -acceptor, all possible conjoining splice sites were extracted and counted using get_alternative_splicing.py . For each splice site, the most abundant splicing event across all samples was denoted as canonical, whereas all other splicing events from that particular splice site were either classified as 'inclusion' if shorter or 'skipping' if longer than the canonical.
Based on the output from alternative splicing, for each splice-site the intronic region of the shortest alternative event was quantified using featureCounts (as above) but with [-minOverlap 5] to avoid quantification of any overlapping spliced reads.

Differential gene expression
First, in all analyses, low-expressed entries defined by mean counts across all samples <1 and expressed in less than three samples were discarded. Then, analysis of differential gene expression was performed using DESeq2 (v1.24.0, (Love et al , 2014) ) using formula~ treatment, where treatment denotes the knockdown/knockout target. For mRNA and circRNA expression, the raw counts were merged and analysed in bulk. For conditional analysis, such as circ vs linear, alternative vs canonical splicing, and intron-retention vs intron-splicing, raw counts for each type was combined in one expression matrix with the associated design formula:~treatment * type, where type denotes circular or linear splicing (in case of circ vs linear). The log2FoldChange and p-adjust values from the interaction-term (treatment:type) was used in subsequent analyses. For binned analysis of transcripts, each locus was sliced and re-annotated as 20 equally sized bins irrespective of exon-intron structure, and this was then used in the featureCounts quantification. After differential expression analysis by DESeq2, genes were subgrouped into five k-means clusters based on the DESeq2-derived log2foldchange of all 20 bins.
An adjusted p-value below 0.05 was considered significant. All statistics were conducted in R (v3.6.3) and visualizations were done in R using ggplot2 (v3.3.0) and GraphPad.

Data Accessibility
Sequencing data will be uploaded to the GEO omnibus, and scripts for RNAseq data processing are available at github: github.com/ncrnalab/pyutils LVWS performed western blot and library preparation. ETL performed RIP analysis. LVWS and ETL performed knockdown and qRT-PCR. TBH performed RNAseq analyses and supervised the project. LVWS, ETL and TBH wrote the manuscript.

Conflict of interest
The authors declare no conflict of interest