Diminished miRNA activity is associated with aberrant cytoplasmic intron retention in ALS pathogenesis

Intron retention (IR) is now recognized as a dominant splicing event during motor neuron (MN) development, however the role and regulation of intron-retaining transcripts (IRTs) localized to the cytoplasm remain particularly understudied. By resolving the spatiotemporal dynamics of IR underlying distinct stages of MN lineage restriction, we identify a cytoplasmic group of IRTs that is not associated with reduced expression of their own genes but instead with an upregulation of predicted target genes of specific miRNAs, the motifs of which are enriched within the intronic sequences of this group. Next, we show that ALS-causing VCP mutations lead to a selective increase in IR of this particular class of introns. This in turn temporally coincides with an increase in the expression level of predicted target genes of these miRNAs, providing a potential mechanistic insight into ALS pathogenesis. Altogether, we propose a novel role for the cytoplasmic intronic sequences in regulating miRNA activity through miRNA sequestration, which potentially contributes to ALS pathogenesis.


INTRODUCTION
Intron retention, a mode of alternative splicing whereby one or more introns are retained within a mature polyadenylated mRNA, has been greatly understudied in mammalian systems and for a long time mostly considered as a product of inefficient or mis-splicing. With advances in detection strategies, IR became recognised as a more widespread and regulated process than previously thought, and the idea that IR could even functionally modulate cellular processes has come into focus, with its role(s) in cellular physiology beginning to unfold (1,2) .
Neural cells exhibit a higher proportion of retained introns compared to other cell types and there is an expanding body of evidence demonstrating a functional role for intron retention (IR) both in neuronal development and homeostasis (1,(3)(4)(5) . Transcripts that exhibit IR often remain in the nucleus, mostly considered to be a means of reducing the expression levels of transcripts not required for cellular physiology at a particular stage (2,6,7) . Some of these transcripts would eventually be degraded by the nuclear exosome, while specific signals could stimulate splicing of the retained intron in others, resulting in export of the fully spliced mRNA into the cytoplasm and its subsequent translation (5) .
Indeed, nuclear detention of intron-retaining transcripts (IRTs) provides a powerful mechanism to hold gene expression in a suppressed but poised state that allows rapid protein production if and when an appropriate stimulus is received (4,5,(8)(9)(10) .
Although the stable cytoplasmic localisation of intronic sequences in neurons has been reported since 2013 (11) , there has been limited investigation into the possible role of cytoplasmic IRTs. This has presumably been overlooked in part due to detection limitations, but also due to a notion that these transcripts would likely contain premature translation termination codons (PTCs) and as such, be degraded by nonsense mediated mRNA decay (NMD) (12) . Whilst examples of IR coupled with NMD have been found to downregulate gene expression, such as in granulocyte development (13) , these transcripts can encounter other fates in the cell (14) .
Indeed, one of the few studies focussing on cytoplasmic IR in neurons showed an 'addressing' function for intronic RNA sequences, determining the spatial localization of their host transcripts within cellular compartments such as dendrites (15) .
Another speculated function of IR has arisen following the identification of miRNA binding motifs within the retained intronic sequences. This offers an intriguing route through which miRNA-directed degradation pathways might regulate abundance of IRTs; alternatively, the retained introns themselves may serve as miRNA sinks, or even encode novel miRNAs termed mirtrons (16,17) .
Altogether, despite advances in our understanding of IR in neuronal cells, much remains unanswered.
The importance of investigating the roles of IR has been further corroborated by studies that demonstrate its relevance across a diverse range of neurodegenerative diseases (18)(19)(20)(21) . One such example is amyotrophic lateral sclerosis (ALS) (20) , a rapidly progressive and incurable disease, which leads to selective degeneration of motor neurons (MNs). ALS is characterised by protein inclusions and axonal degeneration, and is often associated with RNA processing defects. ALS-causing mutations occur in numerous genes encoding crucial regulators of RNA-processing, which are normally expressed throughout development. Despite the growing number of causative gene mutations being identified in ALS, the precise aetiology remains unknown and early molecular pathogenic events remain poorly understood. We previously made the novel discovery that aberrant IR is a widespread phenomenon in ALS (20) , which was corroborated by subsequent studies (21,22) . Moreover, we went on to demonstrate aberrant cytoplasmic IR as a widespread molecular phenomenon in VCP-related ALS (23) . We showed that ALS-related aberrant cytoplasmic IRTs have conspicuously high affinity for RNA binding proteins (RBPs), including those that are mislocalized in ALS and proposed that a subset of cytoplasmic intronic sequences serve as 'blueprints' for the hallmark protein mislocalization events in ALS (24,25) . This raises an exciting possibility that intronic RNA sequences play additional significant roles beyond their recognized nuclear function. Nevertheless, the role and physiological relevance of cytoplasmic IR during neuronal development and disease still remains largely unresolved.
Against this background we sought to characterise the spatiotemporal dynamics of IRTs by re-analysing RNA-seq data from nuclear and cytoplasmic fractions of patient-specific hiPSCs undergoing motor neurogenesis. We first show that retained introns exhibit compartment-specific features including their dynamics, biological pathways, and molecular characteristics during this process.
We reveal a specific class of retained introns in the cytoplasm that is not associated with gene expression changes but exhibits high miRNA binding potential, which is functionally validated by identifying an altered expression profile of the predicted miRNA target genes. We finally analyze this class of retained introns in stem cell models of familial ALS and find evidence for a functional depletion of specific miRNAs, possibly as a result of cytoplasmic intronic sequences-mediated sequestration, which has potential implications for ALS pathogenesis and the development of therapies in this devastating and incurable disease.

Compliance with ethical standards
Informed consent was obtained from all patients and healthy controls in this study.
Experimental protocols were all carried out according to approved regulations and guidelines by UCLH's National Hospital for Neurology and Neurosurgery and UCL's Institute of Neurology joint research ethics committee (09/0272).

RNA-sequencing data
We obtained paired-end polyA stranded RNAseq libraries prepared from fractionated nucleus and cytoplasm obtained from 6 distinct stages of motor neuron differentiation from control and VCP mu samples (iPSC, and days 3, 7, 14, 21 and 35; Supplementary Table S1 ) from previously published study ( GSE152983 ) (23) .
We also obtained paired-end RNA sequencing reads derived from one independent study on familial form o f ALS caused by mutant SOD1 (n=5; 2 patient-derived SOD1A4V and 3 isogenic control MN samples where the mutation has been corrected; Hb9 FACS purified MNs, GSE54409 (26) .

Splicing analysis
The identification of all classes of alternative splicing (AS) events in motor neuron differentiation was performed with the Vertebrate Alternative Splicing and GO terms and terms which contain fewer than 5 significant genes.

Mapping and analysis of CLIP data
To identify RBPs that bind to retained introns, we examined iCLIP data for 21 RBPs (33) , and eCLIP data from K562 and HepG2 cells for 112 RBPs available from ENCODE (34,35) . Before mapping the reads, adapter sequences were removed using Cutadapt v1.9.dev1 and reads shorter than 18 nucleotides were dropped from the analysis. Reads were mapped with STAR v2.4.0i (36) to UCSC hg19/GRCh37 genome assembly. The results were lifted to hg38 using liftOver (37) . To quantify binding to individual loci, only uniquely mapping reads were used.

Analysis of cis-acting features
MaxEntScan ( lowest (most negative scores in both correlation and projection) motifs were selected for each singular vector using K-mean clustering.

MiRNA expression analysis
Total RNA including small RNAs was extracted from "patterned" precursor motor Relative miRNA expression levels between control and mutant cells were quantified using ΔΔCT Method, with U6 snRNA as a reference gene for normalisation. Data was plotted using RStudio software.

Nuclear and cytoplasmic IR affect two functionally divergent mRNA subsets
We previously reported a transient IR programme early during human motor neurogenesis using whole-cell RNA-sequencing data (20) . In line with our previous study (20) , IR was the predominant mode of splicing during neurodevelopment, accounting for 64% and 49% of the included AS events in the nucleus (638 events) and the cytoplasm (541 events) respectively, indicating that cytoplasmic IRTs are more abundant than previously recognized ( Fig. 1B ). Further examining the distributions of percent intron retention (PIR) during MN differentiation in the nucleus and the cytoplasm for 211,501 events revealed that IR exhibits distinct dynamics in the two compartments ( Fig. 1C ). In particular, the nuclear compartment exhibits the highest level of PIR at the hiPSC stage, while the cytoplasmic compartment exhibits the highest level of PIR at DIV=14, which is reminiscent of the early wave of IR we previously reported (20) .
Notably the cytoplasmic increase in PIR early during differentiation is likely explained by a change in the subcellular localisation of (some) IRTs rather than a modulation of the splicing given the coincident stable level of PIR in the nucleus.
Genes related to RNA processing and splicing are among the most affected by IR (13,(44)(45)(46)(47)(48) and we previously showed that IR early during MN development specifically affects RNA processing related biological pathways (20) . Here we find that cytoplasmic (but not nuclear) IR affects essential genes concerned with mRNA metabolism. In contrast we find that genes targeted by alternative exons (AltEx) are enriched in similar biological pathways in the nucleus and the cytoplasm as shown by Gene Ontology (GO) function analysis ( Fig. 1D ). These findings indicate that the previously reported wave of IR during MN differentiation, using whole-cell RNA-sequencing, likely reflected signals from cytoplasmic IRTs.
Prior studies reported specific features associated with retained introns including higher GC content, lower intron length and enrichment in RBP binding motifs compared to non-retained introns (1,47,49,50) . In line with these studies, we find that the PIR negatively correlates with the intron length and positively correlates with the GC content and the enrichment in crosslink events for 131 RBPs for which CLIP data were available (33)(34)(35)   stochastically, but appears to target a specific set of introns in each gene ( Supplementary Fig. 2A ), indicating that additional layer(s) of specific regulation must underlie the regulation of these 9 distinct IR programmes as previously suggested (47) . Previous studies suggest that cis-regulatory elements bound by trans-acting factors such as RBPs are likely to play a crucial role in regulating IR (50)(51)(52) . Thus we next sought to test whether the 9 spatiotemporally distinct classes of IR we identified are associated with different combinations of trans-acting factors that could regulate them. We achieved this by using the publicly available CLIP data to evaluate the crosslink events for 131 RBPs mapping to 5 regions we defined in relation to the acceptor and donor splice sites, namely the last 30 nucleotides (nts) of exonic sequence upstream of the 5' splice site (R1), the first 30nts of intronic sequence downstream of the 5' splice site (R2), the 30nts in the middle of the intron (R3), the last 30nts of intronic sequence upstream of the 3' splice site (R4), and the first 30nts of exonic sequence downstream of the 3' splice site (R5) ( Fig. 2B , upper and Supplementary Tables S12-S17 ). First, looking at the fractions of regions which are mapped by at least one crosslink event for each RBP, we find that the R1 and R5 exonic regions, sequences of which are the most evolutionarily conserved ( Supplementary Fig. 2B ), exhibit the highest frequency in crosslink events across the 131 RBPs irrespective of the IR grouping, with the exception of the C1 group ( Fig. 2B , lower ). These results, which are in line with previous studies showing that the splicing machinery is more likely to form across the exonic regions than across the introns for similarly long introns (>250 nts), indicate that the 9 groups of introns bear similar chances of splicing complex formation with respect to their R1 and R5 exonic regions (53,54) . This is further supported by the finding that identically optimal splicing signals are detected among the 9 groups of introns, with the exception of the C4 group ( Supplementary   Fig. 2C ) as failure in splice site recognition (50)(51)(52) or decreased expression levels of splicing factors (47) have also been proposed to underlie IR.
Although RBPs exhibit similarly high frequency of binding to the R1 and R5 exonic regions across the majority of the 9 groups of introns, we indeed noticed that the R2, R3 and R4 intronic regions display large variability in the percentages of crosslink events across the different spatiotemporal IR dynamics ( Fig. 2B , lower ).
Next, looking at the enrichment of RBPs mapping to each of these regions further revealed that the C3, C4 and C5 groups of cytoplasmic retained introns is indeed specifically enriched in RBPs binding to the R2, R3 and R4 intronic regions as opposed to the R1 and R5 regions which display as much RBP binding as the full set of introns ( Fig. 2C ) Fig. 2D ). Different combinations of RBPs have been shown to coordinately regulate functionally coherent "networks" of exons and introns (55) . Thus, using unsupervised hierarchical clustering of the 9 groups of introns based on their 131 RBP enrichment scores profiles we finally showed that {C3, C4, C5} form a coherent regulated group of introns in respect to all selected regions except the R1 exonic region ( Fig. 2D ).
Altogether, this analysis identifies a coherent regulated supergroup of retained introns which exhibits specific elements in their intronic regions that are not necessarily related to splicing efficiency, but rather may perform an additional role in the regulation of mRNA metabolism.  3G and Supplementary Figs. 5,6 ). This result indicates that the decrease in IR from DIV=7 to DIV=14 correlates with an increase in miRNA activity, supporting the hypothesis that cytoplasmic retained introns reduce miRNA activity potentially by sequestering them, as previously shown for long non-coding RNA (lncRNA) (68) .  Fig. 4A and Supplementary Fig. 4C ). Most notably VCP-driven changes in cytoplasmic IR are 1) unidirectional i.e. we only detect increases in IR in VCP mu samples compared to control samples irrespective of PIR dynamics, and 2) the VCP mutation specifically affects groups of introns in which the PIR exhibits a large decrease from DIV=7 to DIV=14. This is in contrast to those groups of introns where the PIR increases from DIV=7 to DIV=14, such as C2 and C6, where we find similar increase in control and VCP mu samples ( Supplementary Figs. 4C ). These results suggest that VCP mutations enhance the cytoplasmic stability of IRTs, rather than affecting nuclear export, which would equally impact C1, C2, C5 and C6.

VCP mutation-related transient accumulation of cytoplasmic
Having found that VCP mutations lead to large perturbations of the C5 group of cytoplasmic IRTs at DIV=14, which we have shown above to associate with decreased activity of specific miRNAs, we next sought to test whether the increase in cytoplasmic IR of the C5 group in VCP mutants correlates with a reduction in miRNA activity by looking at the changes in gene expression of their predicted target genes between VCP and control samples. This analysis revealed that the increase in IR in VCP mutant cultures correlates with a decrease in miR-4519, miR-1976 and miR-4716-5p activities as predicted by the up-regulation of their respective target genes at DIV=14 ( Fig. 4B ). Additionally, these changes are not explained by a change in the expression levels of these miRNAs which are not significantly different between VCP and control cultures at this time point ( Supplementary Figure 4D ). Notably, the predicted activities of miR-485-5p and miR-4267, whose target gene expression profiles did not correlate with IR level in control samples over time, also do not correlate with increase in C5 IR in VCP samples, thus supporting the hypothesis that the activities of the same miRNAs correlate to IR in both VCP and control samples over time ( Supplementary Fig. 6 ).
Noting that we have studied a relatively rare form of familial ALS (fALS) caused by gene mutations in VCP (selected as it exhibits the pathological hallmark of TDP-43 nuclear-to-cytoplasmic mislocalisation), we next sought to understand the generalizability of the association between increase in IR in ALS samples and decrease in miRNA activity. To this end, we chose to study one of the most common forms of fALS (SOD1), which in contrast does not exhibit the pathological hallmark of TDP-43 nuclear-to-cytoplasmic mislocalisation. We first looked at the PIR of the C5 group in SOD1 mutant hiPSC-derived MNs, revealing a statistically significant increase in IR in SOD1 ( Fig. 4C ). Next, looking at the changes in gene expression of the miRNA predicted target genes between SOD1 mutant MNs and their isogenic controls, further showed a decrease in miR-4519, miR-1976 and miR-4716-5p predicted activities ( Fig. 4D ). These findings further substantiate the relevance of the correlation between increased IR and decreased miRNA activity.
Altogether these findings support the hypothesis that the cytoplasmic pool of C5 introns leads to a reduction in miRNA activity, potentially through direct binding and sequestration, which may have important roles in ALS pathogenesis, and indeed implications for new therapeutic strategies.

DISCUSSION
Neuronal biology relies on complex regulation of gene expression and mRNA metabolism. Alternative splicing has been shown to play a key role in this process and IR is now recognized as the dominant mode of splicing during MN development (1,20) , including cytoplasmic IR, which we recently showed to affect >100 transcripts during neuronal development (23) . Because nuclear IR has been the focus of most previous studies, the regulation and role of cytoplasmic IRTs remain unclear. The objective of this study was twofold: to deepen our understanding of the role(s) of cytoplasmic IR in normal cellular physiology by resolving the spatiotemporal dynamics of IR underlying distinct stages of MN lineage restriction, and to decipher whether specific classes of IRTs become dysregulated in the context of disease by systematically examining the influence of ALS-causing VCP mutations on this process. In order to achieve this we re-analyzed nuclear and cytoplasmic RNA-sequencing data from a time course of patient-specific iPSCs differentiating into spinal MNs.
We first show that nuclear and cytoplasmic IR target distinct classes of mRNA associated with particular dynamics, biological pathways and molecular characteristics. Specifically, we find that the sequences of the retained introns that localise to the cytoplasm are evolutionarily more conserved and exhibit a higher capacity for RBP binding compared to the nuclearly detained introns. This argues against the hypothesis that cytoplasmic intron-containing pre-mRNAs simply 'leak' from the nucleus (69) , which is also further excluded by polyA selection during library preparation, and suggests that 1) cytoplasmic localisation signals for these IRTs are contained in the intronic sequences, and 2) cytoplasmic IRTs likely serve a biological function that has yet to be discovered.
We next show that MN differentiation exhibits complex IR spatiotemporal dynamics captured by 9 distinct IR programmes, 3 which are nuclearly detained and 6 that localise to the cytoplasm. Given the time and cell compartment specificity of these programmes, they are expected to associate with distinct complex regulation. IR has been previously proposed to be the consequence of globally inefficient splicing (47,70) , that could be linked to several mechanisms including the occupancy of MeCP2 near the splice junction (71) , the expression of PRMT5 (7) , and relatively weak splice sites (1) . Here we find that the 9 groups of introns exhibit similar 5' and 3' maximum entropy scores as well as similarly high RBP binding in their exonic regions juxtaposed to the splice sites where the splicing machinery is more likely to form (53,54) as opposed to the intronic regions. These findings indicate that an overall change in splicing efficiency during MN differentiation is unlikely to be the dominant regulatory factor for most of these IR programmes. Furthermore, IR during MN differentiation does not occur stochastically, but appears to target a specific set of introns in each gene, and thus an additional layer of specific regulation must underlie the regulation of these 9 distinct IR programmes as previously suggested (47      1F,H,J . Tables S4-S12 | Lists of the 9 groups of retained introns (N1, N2 N3 , C1, C2, C3, C4, C5, C6 ) associated with the distinct spatiotemporal dynamics during MN differentiation reported on Fig. 2A.

ACKNOWLEDGMENTS
The authors wish to thank the patients for fibroblast donations. We also thank Anob M Chakrabarti for sharing BED files of aligned CLIP data. We are grateful for the