Human 5’-tailed Mirtrons are Processed by RNaseP

Approximately a thousand microRNAs (miRNAs) are documented from human cells. A third appear to transit non-canonical pathways that typically bypass processing by Drosha, the dedicated nuclear miRNA producing enzyme. The largest class of non-canonical miRNAs are mirtrons which eschew Drosha to mature through spliceosome activity. While mirtrons are found in several configurations, the vast majority of human mirtron species are 5’-tailed. For these mirtrons, a 3’ splice site defines the 3’ end of their hairpin precursor while a “tail” of variable length separates the 5’ base of the hairpin from the nearest splice site. How this tail is removed is not understood. Here we examine sequence motifs in 5’-tailed mirtrons and interactions with RNA turnover processes to characterize biogenesis processes. Through studying the high confidence 5’-tailed mirtron, hsa-miR-5010, we identify RNaseP as necessary and sufficient for “severing” the 5’ tail of this mirtron. Further, depletion of RNaseP activity globally decreased 5’-tailed mirtron expression implicating this endoribonuclease in biogenesis of the entire class. Moreover, as 5’-tailed mirtron biogenesis appears to be connected to tRNA processing we found a strong correlation between accumulation of tRNA fragments (tRFs) and 5’-tailed mirtron abundance. This suggests that dysregulation of tRNA processing seen in cancers may also impact expression of the ∼400 5’-tailed mirtrons encoded in the human genome. SUMMARY Abundant non-canonical human miRNAs referred to as tailed mirtrons are processed by RNaseP, which “severs” tail nucleotides to yield a precursor hairpin suitable for Dicer processing. Biogenesis of these miRNAs is correlated with tRFs, which are also products of RNaseP processing.


INTRODUCTION 30
microRNAs (miRNAs) are small non-coding ~22 nucleotide (nt) RNAs processed from 31 stem-loop structures [1]. Biogenesis is initiated by the microprocessor which contains 32 Drosha, an RNase III enzyme, that crops ~70 nt precursor hairpin from primary transcripts 33 and leaves 2-nucleotide 3' overhang. The overhangs are recognized by Exportin-5 34 (XPO5), which carries pre-miRNAs to the cytoplasm for final processing by Dicer, another 35 RNase III, after which they are loaded into effector Argonaute (Ago) proteins. Once 36 recruited by Ago, miRNAs typically base pair with mRNAs via their 5' nucleotides between 37 position 2-9, inducing degradation or translational repression [2]. 38 While most miRNAs mature through the standard Drosha-Dicer-Ago pathway, numerous 39 additional routes have been identified that take advantage of alternate RNA processing 40 enzymes [3]. Indeed, miRNAs have been found to reside in most major non-coding RNA 41 (ncRNA) varieties such as tRNAs, rRNAs, and snoRNAs. Each of these ncRNAs are cut 42 from longer transcripts, cuts which are repurposed for creation of non-canonical miRNAs.
Among these the most prevalent variety is tRFs, which are found in numerous cell types 44 as a sign of stress and are diagnostic of an oncogenic phenotype [4,5]. This makes many 45 tRNAs dual functional molecules with both tRNA and miRNA capabilities such as tRNA-46 Ile which can form a 110 bp hairpin encoding hsa-miR-1983 which is also a mature tRNA 47 To investigate biogenesis of 5' tailed mirtrons we first sought to identify shared motifs in 109 these miRNAs. To do this, we applied the seqlogo algorithm to the first 20 bases of small 110 RNAs produced from human 5'-tailed mirtron hairpins (Fig 1A-B) [26]. Comparing all 111 species, similar motifs are apparent. 5' arms are comprised primarily of "G" residues while 112 3' arms are typically C and U residues, likely representing intronic polypyrimidine (ppy) 113 tracts. This situation was likewise noted in prior efforts to assess sequence elements in 114 mammalian mirtrons [13]. This suggests that G rich stretches arising adjacent to ppy 115 elements might be sufficient to lead to 5'-tail mirtron formation. Also, the higher 116 confidence G quartet at the beginning of the 5' arm suggests a major role for hairpin 117 structure in biogenesis of 5'-tailed mirtrons. To confirm this, we assessed the prevalence 118 of G quartets in all human introns relative to 5' tail mirtron encoding introns, showing 119 significantly higher G quartet occurrence in mirtrons ( Fig 1C). Thus, biogenesis of 5'-tailed 120 mirtrons appears linked to RNA structure, specifically polyG tracts. To further connect 5'-121 tailed mirtron biogenesis to nucleotide content we compared the free energy (DG) and 122 GC content of 767 canonical miRNAs and 404 5'-tailed mirtrons ( Fig 1D). As a class, 5'-123 tailed mirtrons show much high GC content relative to most miRNAs. Interestingly, this 124 does not seem to correlate with a substantially different energy profile. If mirtron hairpin 125 energy is greatly divergent from conserved miRNAs this might lead to incompatibilities in 126 Ago loading. 127 Considering the bias of 5' tailed mirtrons for G quartets and high GC content this 128 suggested that part of their biogenesis might be linked to being challenging substrates for 129 1/2 depleted cells, 5'-tailed mirtrons are expressed at a significantly higher level 132 compared to control (Fig 1 E-F). A higher level of mirtron expression was observed as 133 well as species that were completely absent in control conditions appeared after XRN1 134 1/2 knockdown. Initial reporting of these datasets revealed a similar situation, however, 135 this study benefits from a formalized annotation of 5'-tailed mirtrons [13]. 136 These data suggest that exoribonuclease activity antagonizes mirtron production, which 137 led us to investigate the characteristics of mirtrons that only accumulate in Xrn1/2 138 knockdown datasets. A notable feature of XRN-eliminated mirtrons is the presence of 139 heterogeneous isoforms, where over half were represented by two or more isoforms (fig  140   1G). This Imprecise processing is consistent with a biogenesis mechanism that is not 141 dedicated to mirtron production, unlike the reliability of Drosha to generate precise hairpin 142 ends. Focusing on a well-expressed mirtron, hsa-miR-3620, greater heterogeneity is seen 143 for 5p arm small RNAs ( Fig 1H). Moreover, the greatest isoform diversity is seen at the 5' 144 end of the 5p read, which is problematic as this shifts the identity of a miRNA's targets. It 145 also hints that 5'-tail removal is not occurring in a deliberate fashion, suggests a role for 146 XRN activity in limiting 5'-tailed mirtron accumulation, and that this class of miRNA 147 requires features that allow evasion of 5'-3' exoribounuclease activity. 148 Processing of the 5'-tailed mirtron, hsa-miR-5010

149
To dissect the biogenesis of 5' tailed mirtrons we examined hsa-miR-5010 which is 150 located in ATP6V0A1 (Sup Fig 1). hsa-miR-5010 is one of the most highly expressed 5'-151 tailed mirtrons and exhibits relatively consistent 5p arm processing. As a point of 152 comparison, we also examined the mouse 3'-tailed mirtron mmu-miR-668, which is 153 likewise a high confident tailed mirtron (Sup Fig 1) [12,28]. While there is a homolog of 154 mmu-miR-668 in humans that likewise resides in a large miRNA cluster it is only a 3'-155 tailed mirtron in mice [28]. Based on the tendency for 5p-tailed mirtrons to harbor polyG 156 tracts we investigated the effect of adding polyG tracts into the tails of hsa-miR-5010 and 157 mmu-miR-668. After each mirtron was cloned into an expression vector, either a 12G 158 element or a mixed identity 20-nucleotide insert was placed in the tail of each mirtron ( Fig  Increased hsa-miR-5010 expression after polyG insertion was verified using a luciferase 168 assay to assess target silencing ( Fig 2C). For the polyG construct we observed significant 169 reduction in expression, while very little was seen for WT and insert. Together these data 170 suggest that mammalian 5'-tailed and 3'-tailed mirtrons are produced by very different 171 mechanisms. The presence of additional nucleotides in mmu-miR-668 (3'-tailed) resulted 172 in inactivation, which is like what was observed for miR-1017 in drosophila [11]. Inhibition of 3'-5' exoribonucleases by tail inserts negatively impacted biogenesis. hsa-miR-5010, 174 in contrast, is enhanced specifically by a sequence element inhibitory to 175 exoribonucleases. This is consistent with the effects of Xrn1/2 knockdown on 5'-tailed 176 mirtron expression and suggests that this type of mirtron is produced by 177 endoribonuclease-mediated severing of tails from hairpins. 178 Considering the structure of tailed mirtrons, where a double-stranded RNA is connected 179 to an unpaired region, it is reminiscent of immature tRNAs. Excision of tRNAs from 180 precursors is carried out by two endoribonucleases: the 5' Leader cut by RNaseP and the 181 3' Trailer by RNaseZ [29]. In the case of 5'-tailed mirtrons the clear fit would be RNaseP. 182 This enzymatic activity is carried out by a multi-subunit complex that coordinates the 183 activity of a ribozyme. To assess the role of RNaseP in mirtron expression, siRNAs 184 targeting the Rpp30 subunit of RNaseP were transfected into HEK cells followed by a 185 second transfection with WT hsa-miR-5010 constructs (Sup Fig 3). After this, small RNA 186 sequencing libraries were created to assess differences in expression and processing. 187 The data shows that the expression of hsa-miR-5010 in Rpp30 knockdown is decreased 188 by half compared to control libraries implicating RNaseP as a factor necessary for 5'-189 tailed mirtron expression ( fig 2D). This led us to evaluate differences in RNAs produced 190 from hsa-miR-5010 in control and Rpp30 knockdown libraries. We detected three different 191 types of endings at the 5' end of a mature hsa-miR-5010 ( fig 2E). Most of the reads (94%) 192 in control started with "AG" (position 2) and when a "U" is added to the 3' end by 193 uridylation, a 2 nt overhang is obtained which resembles a perfect Drosha product. This 194 is while only 4% of the reads start with "CAG". On the other hand, in Rpp30 knockdown libraries, 5' processing has been perturbed and a shift in the first base is observed and 196 the rate of the "AG" is reduced to 75%. This suggests that Rpp30 knockdown has affected 197 the 5' end processing of hsa-miR-5010. A similar analysis was applied to isoforms 198 differing at the 3' end but there was no significant difference detected, further implicating 199 Rpp30 in shaping the 5p arm of 5'-tailed mirtrons (Sup Fig 5). 200 To verify the role of RNaseP in processing hsa-miR-5010, we sought to reconstitute 5'-201 tail removal using immunopurified complexes (Fig 2F). Antibodies against Rpp20, another 202 RNaseP subunit, were used to isolate complexes. In vitro synthesized hsa-miR-5010 203 primary intron transcripts were incubated with these isolated complexes or with beads 204 bound to mock anti-rabbit IgG antibodies. Incubation with isolates led to significant 205 accumulation of the ~85 nt hairpin of hsa-miR-5010 in the RNaseP IP, but not the IgG 206 control. In whole lysate incubation, the RNA is degraded by cellular RNases. 207 Immunoprecipitation efficiency was verified by RT-PCR of the RNA subunit of RNaseP, 208 which was greatly enriched in the Rpp20 IP condition relative to control IgG (Fig 2E). 209 These results indicate that not only is RNaseP necessary for processing of hsa-miR-5010, 210 but that it is also sufficient to recapitulate removal of the tail of this 5'-tailed mirtron. 211

212
To assess the effects of Rpp30 loss and its correlation with 5'-tailed mirtrons, we 213 compared the expression of all 5'-tailed mirtrons after Rpp30 knockdown to levels found 214 in control libraries (Fig 3A). We find that after knockdown of Rpp30 statistically significant 215 species. This is apart from DG values which were not predictive of significant down 217 regulation of 5'-tailed mirtrons in Rpp30 knockdown libraries. This reinforces our 218 observation that strength of RNA fold does not typically track with expression. The 219 exception is the handful of outliers, which do exhibit lower ∆G values. However, outliers 220 do not seem to respond consistently to Rpp30 knockdown, suggesting that these changes 221 might be more related to host gene expression changes than instructive of changes in 222 biogenesis. 223 Given the apparent connection of 5'-tailed mirtrons to RNaseP activity we sought to 224 correlate changes in mirtron expression with biogenesis of tRFs (Fig 3B,C). In small RNA 225 populations, a significant group appear to be derived from pieces of tRNAs [30]. tRFs are 226 not simply degradation products, but are actively loaded into Ago proteins [31-33] 227 Production of tRFs occurs through heterogeneous pathway with most segments of tRNA 228 clover leaf structures giving rise to small regulatory small RNAs [34]. While the function 229 of tRFs is controversial, their expression has been correlated with an oncogenic 230 phenotype [34]. This suggests that cancer cells have a dysregulated tRNA biogenesis, 231 which might also impact 5'-tailed mirtrons. Indeed, after surveying public small RNA 232 sequencing from several cancer types we generally observe greater expression of 5'-233 tailed mirtrons relative to control tissues (Fig 3B). In all cases the mirtrons were either 234 more abundant in cancer or equally distributed between cancer and norm. Decrease in 235 cancer was not observed. Intriguingly, the degree to which 5'-tailed mirtrons are elevated, 236 tracks almost exactly with the abundances of tRFs. The exception in this situation is lung 237 cancer where mirtrons were enriched to the same degree as tRFs. Even with this outlier, 238 correlation between 5'-tailed mirtrons and tRFs was clear (Fig 3C). These results suggest 239 a compelling correlation between changes in tRNA processing and 5'-tailed mirtrons, 240 reinforcing a shared biogenesis mechanism. 241

DISCUSSION 242
In this study, we identified how a large class of human non-canonical miRNAs, 5'-tailed 243 mirtrons, are processed (Fig 4) [14,35]. This class of miRNA exhibits distinct features 244 such as high G content on 5' arms that appears to be a complement to 3' splice site 245 adjacent ppy tracts. Prior reports on mirtron features report high GC content [15,36]. 246 Their accumulation appears to be highly antagonized by the activity of 5'-3' 247 exoribonucleases such that when this enzymatic activity is genetically depleted 248 expression of 5'-tailed mirtrons increases ~4 fold. Moreover, following XRN 1/2 249 knockdown 5 times as many mirtron species become detectable. Linked to inhibition of 250 5'-3' processing we also observe extreme 5' arm read heterogeneity. Together these data 251 suggest an antagonistic role for 5' turnover pathways in 5'-tailed mirtron biogenesis. 252 Consistent with this, insertion of a 12G tract into a high-confidence human 5'-tailed 253 mirtron, hsa-miR-5010, led to a nearly 4-fold increase in mature RNA expression. This is 254 the opposite result to what was observed when testing sequence requirements for miR-255 1017, a 3'-tailed mirtron encoded in the Drosophila genome. For miR-1017, 256 exoribonuclease activity of the RNA exosome was essential for 3' tail removal. 257 Interestingly, when we perform a similar test with a mammalian 3'-tailed mirtron, mmu-258 miR-668, a similar result was found. Thus, it would seem both vertebrate and invertebrate 259 3'-tailed mirtrons likely share a biogenesis mechanism-exoribonuclease processing. 260 They also suggest for mammalian 5'-tailed mirtrons tail removal is likely carried out by an 261 endoribonuclease. Together these results indicate a fundamentally different biogenesis 262 mechanism underlies 5'-tail and 3'-tail mirtrons. 263 In our studies we implicate the activity of RNaseP in the processing of 5'tailed mirtrons. 264 This ribozyme containing complex has a role in the maturation of tRNAs by removing 5' 265 leader sequences from precursors. However, studies have found RNaseP to interact and 266 cleave a variety of additional transcripts such as pre-rRNA, snoRNAs, along with 267 additional nuclear RNAs such as the HRA1 antisense RNA [37][38][39]. Thus, it is not 268 unreasonable to expect this complex to moonlight with mitron primary intron transcripts. 269 Moreover, binding affinity of RNaseP is biases to nucleotide homopolymers, with greatest 270 preference for polyG, a defining characteristic of 5'-tailed mirtrons. This with the tRNA-271 like single-stranded to double-stranded structure of 5'-tailed mirtrons further reinforces 272 confidence of RNaseP as the likely 5'-tail "severing" enzyme. 273 One unclear aspect of 5'-tailed biology is their hyper abundance in mammalian genomes. 274 RNaseP is a universal enzyme that is found in all cells and would be able to participate 275 in 5'-tailed mirtron biogenesis. Likewise, as 5'-tailed mirtrons appear to arise from 276 accumulation of polyG tracts adjacent to the universal eukaryotic ppy tract splicing 277 element they should arise at the same rate across many kingdoms [40]. What then leads 278 to the radical accumulation in mammalian genomes? The answer is likely related to the 279 extreme intron length seen in mammals, and the more elaborate alternative splicing 280 patterns [41][42][43]. We propose two possible scenarios. First, as do the massive length of 281 mammalian introns, exact placement of strong splicing elements such as the ppy are 282 highly favored. A polyG tract could then form through neutral evolutionary process to the 283 point that a viable pre-mirtron hairpin arises, which then become subject to negative 284 selection. Unfortunately, there is significant constraint on the ppy tract such that 5'-tail 285 mitron 3p arms are unable to acquire hairpin disrupting mutations. Harboring the 5'-tailed 286 mirtron is preferable to degrading the strength of the ppy track. In the second scenario, 287 the 5' arm polyG sequence has a role in modulating splicing as a cis-element. Here by 288 forming a hairpin that occludes the ppy tract, it might lead to alternative 3' splice choice, 289 and thereby contributing to the greater splicing complexity favored by mammalian gene 290 expression programs. The G-quartets could also be cis-elements that influence 291 alternative splicing through recruitment of hnRNPF [44]. In this second scenario 5'-tailed 292 mirtron expression is an unintended outcome of a different, favorable arrangement. 293 Considering this case, perhaps 5'-tailed mirtrons can be written off as little more than 294 genomic detritus. They may shape genomes through disfavoring polyG tracts in splice 295 adjacent ppy elements, and thereby apply constraints on ppy neighborhoods. However, 296 it might not be a role in normal cell physiology that should draw the attention of 5'-tailed 297 mirtron function. As potential regulatory molecules that are diverted from Ago loaded by 298 multiple mechanisms such as XRN activity and inhibitory nucleotide addition, these 299 mechanisms could become corrupted to serve the subversive genetics of cancer cells. 300 Indeed, we find a trend towards increased 5'-tailed mirtron expression in cancer-derived 301 small RNA datasets. A similar, and possibly mechanistically related, situation is seen with 302 tRFs, which are also amplified in cancer cells. Further, components of RNaseP are 303 upregulated in nearly all cancer types [45,46]. Both types of non-canonical miRNAs are 304 excellent candidates for exploitation by cancer biology to induce survival promoting gene 306 expression changes as well as excreted RNAs could be used to modulate the function of 307 stromal cells within the tumor microenvironment. 308

MATERIALS AND METHODS 309
Cell Culture, cloning and transfection. Rpp30 (10 nM) were transfected into HEK 293 cells using Lipofectamine 3000 reagent 318 according to the manufacturer's instructions (Invitrogen). Target are available 319 on www.idtdna.com. Two days after transfection, the total RNA was extracted and used 320 for qRT-PCR or RNA sequencing (Sup Table 1). 321

322
To evaluate functionality of mirtrons constructs a dual reporter luciferase assay was used. 323 Briefly, the hsa-miR-5010 hairpin was cloned downstream of Renilla into the psiCHECK2 vector (Sup Table 1

340
Synthetic full length hsa-miR-5010 intron RNA was generated by in vitro transcription 341 with MEGAscript® T7 Kit (Invitrogen) using full length intron PCR amplification with a T7 342 encoding forward primer (Sup Table 1). RNAs were radiolabeled by addition of alpha-P 32 343 UTP to the reaction. using a magnetic stand. The enzyme assay reaction was performed at 37 0 C for 15 353 minutes with incubation buffer (5% PEG, 10 mM MgCl2, 50 mM Tris-HCl, 100 nM NH4Cl) 354 by incubating RNAs with RNaseP-bound or control IgG-beads. Beads were pelleted and 355 the supernatant was loaded on 8% urea-acrylamide gel and the gel was exposed to 356 phosphor imaging plate (Fujifilm) and was read by Typhoon FLA 7,000. In a separate run, 357 beads were subject to organic extraction followed by RT-PCR of RNaseP RNA subunit. 358

359
Public data from NCBI was used to analyze 5'-tailed mirtron and tRF expression (Sup 360 Table 2). Cancer-norm pairs were from same studies. Alignment and read counting was 361 performed as described above. tRFs coordinates were obtained for hg19 362 http://gtrnadb.ucsc.edu. Counts were normalized to million of reads mapping (RPM) in 363 the corresponding library. After, relative difference was calculated as (cancer-364 norm)/(cancer+norm). processed by RNaseP and produce miRNAs or undergo turnover by XRN 520 exoribonuclease activity and will not be targeted by RNaseP. 521 522 523