Summary
One of the biggest surprises since the sequencing of the human genome has been the discovery of thousands of long noncoding RNAs (lncRNAs)1–6. Although lncRNAs and mRNAs are similar in many ways, they differ with lncRNAs being more nuclear-enriched and in several cases exclusively nuclear7,8. Yet, the RNA-based sequences that determine nuclear localization remain poorly understood9–11. Towards the goal of systematically dissecting the lncRNA sequences that impart nuclear localization, we developed a massively parallel reporter assay (MPRA). Unlike previous MPRAs12–15 that determine motifs important for transcriptional regulation, we have modified this approach to identify sequences sufficient for RNA nuclear enrichment for 38 human lncRNAs. Using this approach, we identified 109 unique, conserved nuclear enrichment regions, originating from 29 distinct lncRNAs. We also discovered two shorter motifs within our nuclear enrichment regions. We further validated the sufficiency of several regions to impart nuclear localization by single molecule RNA fluorescence in situ hybridization (smRNA-FISH). Taken together, these results provide a first systematic insight into the sequence elements responsible for the nuclear enrichment of lncRNA molecules.