The identification and functional annotation of RNA structures conserved in vertebrates

  1. Jan Gorodkin1,2
  1. 1Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, DK-1870 Frederiksberg, Denmark;
  2. 2Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, DK-1870 Frederiksberg, Denmark;
  3. 3Copenhagen Diabetes Research Center (CPH-DIRECT), Herlev University Hospital, DK-2730 Herlev, Denmark;
  4. 4Department of Cellular and Molecular Medicine (ICMM), Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark;
  5. 5Department of Obesity Biology and Department of Molecular Genetics, Novo Nordisk A/S, DK-2880 Bagsværd, Denmark;
  6. 6Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark;
  7. 7Allen Institute for Brain Science, Seattle, Washington 98109, USA;
  8. 8School of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA;
  9. 9Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
  1. 10 These authors contributed equally to this work.

  • 11 Present address: Microbial Biotechnology and Biorefining, National Food Institute, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark

  • Corresponding author: gorodkin{at}rth.dk
  • Abstract

    Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human–mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3′ ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.208652.116.

    • Freely available online through the Genome Research Open Access option.

    • Received April 19, 2016.
    • Accepted May 4, 2017.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server