Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Annotating RNA motifs in sequences and alignments

View ORCID ProfilePaul P. Gardner, View ORCID ProfileHisham Eldai
doi: https://doi.org/10.1101/011197
Paul P. Gardner
1School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.
2Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paul P. Gardner
  • For correspondence: paul.gardner@canterbury.ac.nz
Hisham Eldai
1School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hisham Eldai
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterise RNA motifs, which are the central building blocks of RNA structure. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterised RNAs. Moreover, we introduce a new profile-based database of RNA motifs - RMfam - and illustrate its application for investigating the evolution and functional characterisation of RNA.

All the data and scripts associated with this work is available from: https://github.com/ppgardne/RMfam

1. Introduction

Characterising functional RNAs is an extraordinarily difficult task. Even highly transcribed RNAs from model organisms have remained uncharacterised for decades after their discovery. A specific example is the 6S sRNA, which was discovered in 1971. The 6S sRNA is conserved across Bacteria and is highly expressed in stationary-phase cells [1, 2]. But the role of 6S as a regulator of RNA polymerase remained an enigma for almost three decades [3]. Likewise, Y RNA, which was discovered in 1981, is broadly conserved across metazoans and is highly expressed [4]. It took two and a half decades before Y RNAs were shown to be essential for the initiation of DNA replication [5]. However, the mechanism for Y RNA function still remains unclear. These and similar examples show that it is remarkably difficult to functionally characterise RNAs, even after decades of work.

A new generation of tools for RNA discovery is now available thanks to powerful new sequencing technologies. Entire transcriptomes from species at different life stages, tissue types and conditions can be studied with RNA-seq [6, 7, 8]. The total complement of RNA structures encoded in transcrip-tomes is also accessible with SHAPE-seq [9] and functional regions of entire genomes of bacteria can be probed with techniques like TraDIS and Tn-seq [10, 11]. The data obtained by these tools are unearthing novel RNAs at an unprecedented rate, many of which are evolutionarily conserved, highly expressed, activated under specific conditions, essential and fold into conserved secondary structures. Annotation efforts such as those by the Rfam consortium [12, 13] are useful. However, many RNAs are not found in this database and many that have been curated remain uncharacterised [8]. To make sense of the volumes of transcriptome data that is now being generated, annotating this data and functionally characterising the cohort of RNAs of Unknown Function (RUFs) is critical. A complication for such work is that evolutionary turnover, as well as sequence variation can be high for ncRNAs [14, 15]. Consequently homology searches and other sequence-alignment based analyses can be very challenging.

Many RNAs contain functional structures that recur both within and across different RNA families. These motifs provide signatures that can identify functional components of RNA sequences. The motifs that have been characterised to date are involved in a diverse number of functions. These include imparting structural stability, facilitating interactions with other biomolecules, specifying cellular localisation and coordinating gene regulatory signals [16, 17, 18, 19]

A number of publications detail bioinformatic methods for the de novo discovery of RNA motifs from RNA primary sequences [20, 21]. There are also tools that can screen RNA secondary structures [22] and RNA tertiary structures [23]. The de scito (knowledge-based) approaches for the annotation of RNA motifs include sequence and structure descriptors [24, 25], primary and secondary structure-based profile methods for specific motifs [26, 27] and even methods that combine primary, secondary and tertiary data [19]. We complement these approaches by introducing a resource that identifies a range of previously characterised RNA motifs in RNA sequences and alignments using covariance models (CMs) [28, 29, 30, 31, 32].

We present 34 alignments, consensus structures and corresponding probabilistic models of published RNA motifs. We call this resource RMfam, or RNA Motif Families (all associated data and computer code is freely available from our repository hosted on GitHub: http://github.com/ppgardne/RMfam). These have been used to predict approximately 1, 900 conserved motifs in the Rfam (v11.0) alignments of RNA families; many of which are confirmed in the published literature. Finally, we show examples of the applicability of our approach for studying RNA function, evolution and alignment curation.

2. MATERIALS & METHODS

2.1 Distinction between Rfam and RMfam

The Rfam database collects and curates “seed alignments” of RNA families. These are non-coding RNAs, cis-regulatory elements and self-splicing introns. The alignments are manually constructed and annotated with consensus secondary structures, and used to seed probabilities for covariance models (CMs) for each family. The Rfam CMs are widely used for genome annotation projects to identify RNA loci (e.g. [33]). A requirement before each family can pass Rfam quality-control is that it is specific. In other words, there exists a bit score threshold for each CM that distinguishes between sequence matches that are related to the family and obvious false-positive matches. Consequently, many RNA motifs are not included in Rfam as they lack the required specificity [34, 35, 36, 12, 13].

2.2 What is an RNA motif?

For the purposes of this work an RNA motif is a non-trivial, recurring RNA sequence and/or secondary structure that can be predominantly described by local sequence and secondary structure elements. The motif is generally not restricted to a particular family or taxonomic group. Note that in other contexts, such as structural biology, a more general definition of motif is frequently used, e.g. [37].

Accurate probabilistic methods for annotating structured RNAs on DNA sequences called hidden Markov models (HMMs) and covariance models (CMs) are now available [28, 29, 30, 31, 32, 38]. From a given alignment, probabilistic models of conserved sequence (HMMs) and conserved sequence plus secondary-structure (CMs) can be built and used to filter large numbers of sequences for candidate homologous and/or analogous regions [39]. CMs cater to the characteristics of RNA sequence evolution that are imposed by basepairing (i.e. variation tends to preserve basepairing), the result is that the accuracy of CMs is greater than alternative approaches [40]. The computational speed of CMs has tended to be poor, however a lot of effort has been expended on improving the speed of the approach while maintaining the accuracy. Improvements include using HMMs as pre-filters to accelerate CMs, query-dependent banding and Dirichlet mixture priors [41, 39, 42, 38, 43].

Figure 1.
  • Download figure
  • Open in new tab
Figure 1.

In the above plots we assess the accuracy of motif annotation and test whether annotating alignments instead of sequences improves the prediction accuracy. We have applied three different benchmarks (described in the text). In sub-figure A we show a ROC plot for pooled RMfam annotations. This plots the sensitivity versus specificity of all the motif annotations on sequences or alignments at different score thresholds. The ’x’s illustrate where on the curve the maximum Matthew’s Correlation Coefficient is located, and the corresponding bit scores are indicated. In sub-figure B we illustrate the maximum MCC of CM annotation for each motif from the 3 different benchmarks.

RMfam sequences, structures and alignments were collated from a variety of heterogeneous and sometimes overlapping data repositories [12, 23, 44, 27, 45, 46, 47, 48, 37, 49, 50, 51]. Where possible we sourced data from publicly accessible RNA motif resources, these included the FR3D MotifLibrary [37], the models supplied with RMDetect [19], the comparative RNA website [47] and SCOR [46]. We also used information from specialised resources such as the k-turn structural database [44] and SRPDB [52], as well as generating our own alignments for motifs such as the Shine-Dalgarno and Rho-independent terminators based upon the context of genome annotations (e.g. [27]). RNAFrabase was frequently used as a source of RNA secondary structure annotations derived from PDB structures [53, 54]. Finally, where necessary, we extracted sequences from publications. This was often a manual effort, involving manually transcribing sequences and structures from figures in published manuscripts. Where possible, these were mapped to PDB (downloaded June 2014) nucleotide sequences [55, 56, 57], the EMBL nucleotide archive [58] and Rfam (v11.0) [12, 13]. The provenance of each dataset is stored in the corresponding Stockholm alignment. Each of these motifs were then passed through quality control steps, where the sensitivity and specificity of the resulting motif is assessed (See Figures 1 and S10-S43). If these failed (e.g. the CM cannot identify member sequences or the false-positive rate is extremely high), then the motif was not included in the database. Each motif is also assigned a curated score-threshold. This threshold (in bits) provides a reasonable distinction between true and false matches.

2.3 Benchmarking motif annotations

In the following we briefly describe the benchmarks we have used to evaluate our motif annotations. These are described in further detail and with more elaborate results in the Supplementary Materials.

In order to determine the accuracy of our approach we ran a series of three benchmarks. These were evaluated on individual motifs (see Figures 1B and S10-S43), as well as on the collective RMfam results (see Figures 1A and S9). The first uses “RMfam sequences” which are taken from the seed alignments. Ten shuffled sequences, with identical dinucleotide distributions, were generated for each RMfam seed sequence [59]. Together these serve as positive and negative controls for our test.

We constructed two further tests based upon Rfam (v11.0) families. We identified Rfam families where there exists good evidence (primarily based upon literature) that a motif is conserved in the family of related sequences (Supplementary Table 1). These serve as positive controls for two further tests. For the “Rfam sequences” benchmark we randomly selected at least five sequences from each Rfam seed alignment (if fewer than five sequences were available, then all were included). We generated ten shuffled versions of each sequence; all had an identical di-nucleotide distribution to the native sequence. These sequences were all annotated with RMfam motifs, their CM scores were recorded and used to evaluate the accuracy of the annotations. Finally, for a “Rfam alignments” benchmark, we evaluated the accuracy of RMfam annotations in an alignment context. Each Rfam alignment was filtered, removing sequences more than 90% identical. The remaining sequences were annotated with RMfam CMs, retaining only those that cover more than 10% of the seed sequences and more than two Rfam seed sequences. The summary statistic we use for this final benchmark is a “sum-bits” score, this is the sum of the bit scores for each match in all the sequences in a seed.

The accuracy metrics that we report here are the Matthew’s correlation coefficient (MCC) [60], sensitivity and specificity.

The CMs built from RNA motifs tend to be short and contain little sequence information. In RMfam the mean sequence length is just 34.3 nucleotides and the mean number of basepairs is 10.9. Therefore scans of large sequence databases with these models result in a number of false-positives. We propose that annotating sequence alignments of ncRNAs have the potential to improve the specificity of our predictions. This assumes that evolutionarily conserved motifs are more likely to be correct. In theory this approach could be extended to genome alignments of e.g. transcribed regions.

3. RESULTS

In this study we present 34 RMfam alignments and probabilistic models of published RNA motifs (all freely available from our repository hosted on GitHub: http://github.com/ppgardne/RMfam). These have been used to predict approximately 2,500 conserved motifs in the Rfam (v11.0) seed alignments; many of which are confirmed in the published literature. Furthermore, our permutation tests have shown that both the sensitivity and specificity of this approach is remarkably high given the short motifs we use (See Figures 1 and S9-S44).

3.1 Function

One of the most labour intensive stages of RNA research is identifying the function of newly discovered RNAs. In order to illustrate the utility of RMfam for this task we show the matches between a model of the CsrA-binding site and two RNA families of unknown function, TwoAYGGAY and Bacillaceae-1 (Rfam IDs RF01731 and RF01690, see Figure 2). CsrA is a bacterial RNA binding protein that regulates the translation and stability of mRNAs [62]. It binds mRNAs carrying CsrA binding motifs, physically occluding ribosome-binding sites. This binding can itself be regulated by competition between the mRNAs and highly expressed sRNAs that host numerous CsrA binding sites. However, this class of sRNA (CsrB, CsrC, RsmX, RsmY and RsmZ) has only been identified in Gammaproteobacteria [63, 64]. The ncRNAs, TwoAYGGAY and Bacillaceae-1, were initially discovered in a large-scale bioinformatic screen [65]. Some further analysis identified two tandem-GAs in one of the stems that characterise the structure of TwoAYGGAY [19]. The matches between these families and the CsrA binding motif were discovered in this work and provide a testable hypothesis for further validation that there are also CsrA binding sRNAs in Clostridia (TwoAYGGAY), and Bacillales and Lactobacillales (Bacillaceae-1). The validation of these predictions is a work in progress with our collaborators.

Figure 2.
  • Download figure
  • Open in new tab
Figure 2.

The secondary structures and sequence conservation of CsrA binding motif and two new candidate CsrA binding sRNAs, TwoAYGGAY and Bacillaceae-1 illustrated with R2R [61]. These families each have two strong matches to the CsrA-binding motif, this new evidence provides a strong case that these RNAs regulate the activity of the regulatory protein, CsrA, by sequestering this nucleotide-binding protein. The “core” of the TwoAYGGAY structure is shown, the Rfam (v11.0) model contains a further external stem that is not well conserved. Also, the reverse-complement (RevComp) of the Bacillaceae-1 is illustrated, this strand has the matches to the CsrA binding motif and the original discoverers of this ncRNA are not confident of the strand (personal communication, Weinberg Z).

3.2 Evolution

Non-coding RNAs are remarkably tolerant of genetic variation, as evident by the wide degree of sequence variation that can be found between evolutionarily related ncRNAs [66, 67, 68, 15]. However, structure frequently constrains the evolution of RNA sequences. That said, structures can also be dynamic. For example, motifs that confer structural stability can be exchanged over time, resulting in a rich and complex evolutionary history. This illustrates that studying the gain and loss of RNA motifs over evolutionary time-scales can help characterise the dynamic evolution of RNA sequences and structures.

Figure 3.
  • Download figure
  • Open in new tab
Figure 3.

The Lysine riboswitch has substituted different motifs through its evolution. On the left is a representation of the consensus Lysine riboswitch secondary structure [61]. This has been annotated with the most frequent motifs the RMfam annotates in the Lysine Rfam (v11.0) seed alignment, the percentage of seed sequences hosting each motif is also indicated. On the right is an annotated species taxonomy that illustrates the phylogenetic nature of the motif distributions. We have also annotated each tip with the motifs hosted in the P2 and P4 stems. The red, blue, green, black and yellow boxes illustrate kink-turn, U-turn, sarcin-ricin loop, GNRA tetraloop and the T-loop, respectively.

A good example of this is the Lysine riboswitch. This is a convenient example, that for illustrative purposes that we will describe in further detail. As illustrated in Figure 3 many motifs may be exchanged, e.g. the U-turn motif with a k-turn in the P2 stem or the T-loop and the GNRA tetraloop in stem P4. Interestingly, the motif distributions are relatively clade-like, with closely related riboswitches more likely to share motifs, e.g. the GNRA tetraloop found in the Pasteurellales and Vibrionales taxonomic groups. This type of annotation information is valuable for researchers investigating the structure and evolution of RNA families.

3.3 Curation

Another use of the results presented in this work is of importance for the curators of RNA alignments and sequences [12, 69, 70]. Until now it has been difficult to analyse the evolutionary conservation of motifs in the context of an alignment, although some progress has been made when crystallographic data is available, e.g. the RNASTAR collection of structural RNA alignments [70]. With the help of RMfam, malformed alignments can be detected and corrected where conserved RNA motifs are incorrectly aligned. We illustrate an example of this for the Rfam (v11.0) 5S rRNA alignment that contains a misaligned, yet highly conserved sarcin-ricin motif (see Figure S45), and for the Rfam RsmY alignment, which is a CsrA binding sRNA. The RsmY alignment has a mis-annotated consensus structure that does not include a further CsrA binding motif (see Figure S46). These motifs generally occur in pairs, as CsrA is a homodimeric protein, with each half of the protein binding a motif [71, 72].

4. DISCUSSION & CONCLUSION

The chief motivation for this work is to functionally characterise novel ncRNAs. Our vision for the RMfam resource is to annotate RNAs of unknown function (e.g. [8]). These motif annotations will help develop further functional hypotheses and accelerate experimental characterisation.

In this work, we have shown that RMfam is surprisingly accurate. Despite the fact that the average RMfam motif consists of just 34.3 nucleotides and 10.9 basepairs, we show that the covariance models are specific enough to distinguish between motif-hosting sequences and negative control sequences (See Figures 1 and S10-S43). Our approach shows improved performance when evolutionary information encoded in Rfam sequence alignments is incorporated into the predictions. We hypothesise that annotated genome alignments may be a useful source of motifs and we will investigate this idea further in future. As a discovery tool this resource has already made some useful predictions. We have predicted the existence of two new CsrA binding ncRNAs, potentially the first of this class of regulatory molecules to be found outside of the Gammaproteobacteria. However, further work needs to be carried out to validate this claim.

4.1 Future work and potential applications

We have identified some future developments and applications for the RMfam resource. We plan to continue developing the accuracy of the motif annotation tools as well as increase the access to RMfam annotations via other databases and expand the number of motifs included in RMfam. Furthermore, it may be possible to boost the accuracy of RNA secondary structure prediction tools by constraining these with predicted motifs. We elaborate further on these ideas below.

The Lysine riboswitch example raises the possibility that certain types of motif are preferentially exchanged during the evolution of ncRNAs. Do stable hairpin motifs such as the GNRA and T-loops replace each other more frequently than we expect by chance? This would blur the lines between our understanding of homologous and analogous structures [73]. Another possibility is that certain motifs co-occur more frequently than we expect. For example, are k-turns more frequently closed by U-turns than we expect? If correct, these enrichments of favoured exchanges and co-occurances could be used to increase our confidence in motif annotations and can assist with the design of functional RNAs.

Typical RNA structure prediction methods to not incorporate information about RNA motifs. We propose that RM-fam predictions can be used as constraints for existing RNA structure prediction software, thus improving the accuracy of structure prediction tools which can often be inaccurate [74]. This approach is analogous to the fragment-library approach that is frequently used for tertiary structure prediction [75].

Another application for RMfam covariance models is as a pre-filter to accelerate the more complex methods, for example, the Bayesian network approach implemented in RMdetect [19].

Increasing the access of motif annotations is another goal of the authors. We are active in the Rfam and RNAcentral consortia, both of which curate non-coding RNAs, the former ncRNA alignments and the latter ncRNA sequences [12, 69, 13]. Our results show that curators can benefit greatly from motif annotations (see Figures S44-S45) and it is likely that RMfam annotations will be incorporated into these databases in future releases.

New technologies such as the sequencing of cross-linked RNA and protein are a potential source of new RNA-protein motifs. In the future we will mine these datasets [76, 77, 78] for new additions to the RMfam database. Furthermore, we will continue to add new motifs to RMfam as they are published.

Finally, as previously mentioned, the specificity of the RMfam annotations is generally low. However, incorporating the genomic and taxonomic context of annotations into the predictions may result in performance gains. For example, Shine-Dalgarno and rho-independent terminators are generally located in bacterial sequences and at the extremities of annotated genes. A probabilistic incorporation of contextual information will likely result in further performance gains.

In summary, we have developed a resource for annotating diverse sets of RNA motifs in nucleotide sequences and alignments. We have proven the accuracy using benchmarks, and the utility of this resource for alignment curation, evolutionary analyses and shown that it has some promise for the prediction of RNA function.

5.0.1 Conflict of interest statement

None declared.

5. ACKNOWLEDGEMENTS

This work would not be possible without input from a large community of RNA researchers that openly share their results. It has benefited from many discussions with members of the Xfam Consortium, the RNA Ontology Consortium and attendees of the 2012 Benasque Meeting on RNA. A special thanks to Lars Barquist, Elena Rivas, Rob Knight, Eric Westhof, Zasha Weinberg, Anthony Poole, Peter Fineran and two anonymous reviewers for their valuable contributions.

PPG is supported by a Rutherford Discovery Fellowship from Government funding, administered by the Royal Society of New Zealand.

References

  1. [1].↵
    G G Brownlee. Sequence of 6S RNA of E. coli. Nat New Biol, 229(5):147–9, Feb 1971.
    OpenUrlCrossRefPubMedWeb of Science
  2. [2].↵
    J E Barrick, N Sudarsan, Z Weinberg, W L Ruzzo, and R R Breaker. 6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter. RNA, 11(5):774–84, May 2005.
    OpenUrlAbstract/FREE Full Text
  3. [3].↵
    K M Wassarman and G Storz. 6S RNA regulates E. coli RNA polymerase activity. Cell, 101(6):613–23, Jun 2000.
    OpenUrlCrossRefPubMedWeb of Science
  4. [4].↵
    M R Lerner, J A Boyle, J A Hardin, and J A Steitz. Two novel classes of small ribonucleoproteins detected by antibodies associated with lupus erythematosus. Science, 211(4480):400–2, Jan 1981.
    OpenUrlAbstract/FREE Full Text
  5. [5].↵
    C P Christov, T J Gardiner, D Szüts, and T Krude. Functional requirement of noncoding Y RNAs for human chromosomal DNA replication. Mol Cell Biol, 26(18):6993–7004, Sep 2006.
    OpenUrlAbstract/FREE Full Text
  6. [6].↵
    Z Wang, M Gerstein, and M Snyder. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 10(1):57–63, Jan 2009.
    OpenUrlCrossRefPubMedWeb of Science
  7. [7].↵
    T T Perkins, R A Kingsley, M C Fookes, P P Gardner, K D James, L Yu, S A Assefa, M He, N J Croucher, D J Pickard, D J Maskell, J Parkhill, J Choudhary, N R Thomson, and G Dougan. A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet, 5(7):e1000569, Jul 2009.
    OpenUrlCrossRefPubMed
  8. [8].↵
    S. Lindgreen, S. Ugur Umu, A. Sook-Wei Lai, H. Eldai, W. Liu, S. McGimpsey, N. Wheeler, P. J. Biggs, N. R. Thomson, L. Barquist, A. M. Poole, and P. P. Gardner. Robust identification of noncoding RNA from transcriptomes requires phylogenetically-informed sampling. ArXiv e-prints, June 2014.
  9. [9].↵
    E Westhof and P Romby. The RNA structurome: high-throughput probing. Nat Methods, 7(12):965–7, Dec 2010.
    OpenUrlCrossRefPubMedWeb of Science
  10. [10].↵
    L Barquist, G C Langridge, D J Turner, M D Phan, A K Turner, A Bateman, J Parkhill, J Wain, and P P Gardner. A comparison of dense transposon insertion libraries in the Salmonella serovars Typhi and Typhimurium. Nucleic Acids Res, 41(8):4549–64, Apr 2013.
    OpenUrlCrossRefPubMedWeb of Science
  11. [11].↵
    L Barquist, C J Boinett, and A K Cain. Approaches to querying bacterial genomes with transposon-insertion sequencing. RNA Biol, 10(7):1161–9, Jul 2013.
    OpenUrlCrossRefPubMedWeb of Science
  12. [12].↵
    P P Gardner, J Daub, J Tate, B L Moore, I H Osuch, S Griffiths-Jones, R D Finn, E P Nawrocki, D L Kolbe, S R Eddy, and A Bateman. Rfam: Wikipedia, clans and the decimal release. Nucleic Acids Res, 39(Database issue):D141–5, Jan 2011.
    OpenUrlCrossRefPubMedWeb of Science
  13. [13].↵
    S W Burge, J Daub, R Eberhardt, J Tate, L Barquist, E P Nawrocki, S R Eddy, P P Gardner, and A Bateman. Rfam 11.0: 10 years of RNA families. Nucleic Acids Res, 41(Database issue):D226–32, Jan 2013.
    OpenUrlCrossRefPubMedWeb of Science
  14. [14].↵
    S R Eddy. Non-coding rna genes and the modern rna world. Nat Rev Genet, 2(12):919–29, Dec 2001.
    OpenUrlCrossRefPubMedWeb of Science
  15. [15].↵
    M P Hoeppner, P P Gardner, and A M Poole. Comparative analysis of rna families reveals distinct repertoires for each domain of life. PLoS Comput Biol, 8(11):e1002752, Nov 2012.
    OpenUrlCrossRefPubMed
  16. [16].↵
    D Gautheret, D Konings, and R R Gutell. A major family of motifs involving G.A mismatches in ribosomal RNA. J Mol Biol, 242(1):1–8, Sep 1994.
    OpenUrlCrossRefPubMedWeb of Science
  17. [17].↵
    R R Gutell, J J Cannone, D Konings, and D Gautheret. Predicting U-turns in ribosomal RNA with comparative sequence analysis. J Mol Biol, 300(4):791–803, Jul 2000.
    OpenUrlCrossRefPubMedWeb of Science
  18. [18].↵
    F Jossinet and E Westhof. Sequence to Structure (S2S): display, manipulate and interconnect RNA data from sequence to structure. Bioinformatics, 21(15):3320–1, Aug 2005.
    OpenUrlCrossRefPubMedWeb of Science
  19. [19].↵
    J A Cruz and E Westhof. Sequence-based identification of 3D structural modules in RNA with RMDetect. Nat Methods, 8(6):513–21, Jun 2011.
    OpenUrlCrossRefPubMedWeb of Science
  20. [20].↵
    Z Yao, Z Weinberg, and W L Ruzzo. CMfinder–a covariance model based RNA motif finding algorithm. Bioinformatics, 22(4):445–52, Feb 2006.
    OpenUrlCrossRefPubMedWeb of Science
  21. [21].↵
    J Gorodkin, S L Stricklin, and G D Stormo. Discovering common stem-loop motifs in unaligned RNA sequences. Nucleic Acids Res, 29(10):2135–44, May 2001.
    OpenUrlCrossRefPubMedWeb of Science
  22. [22].↵
    M Höchsmann, T Töller, R Giegerich, and S Kurtz. Local similarity in RNA secondary structures. Proc IEEE Comput Soc Bioinform Conf, 2:159–68, 2003.
    OpenUrlPubMed
  23. [23].↵
    M Sarver, C L Zirbel, J Stombaugh, A Mokdad, and N B Leontis. FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. J Math Biol, 56(1-2):215–52, Jan 2008.
    OpenUrlCrossRefPubMedWeb of Science
  24. [24].↵
    SR Eddy. RNABOB: a program to search for RNA secondary structure motifs in sequence databases, 1996.
  25. [25].↵
    T J Macke, D J Ecker, R R Gutell, D Gautheret, D A Case, and R Sampath. Rnamotif, an rna secondary structure definition and search algorithm. Nucleic Acids Res, 29(22):4724–35, Nov 2001.
    OpenUrlCrossRefPubMedWeb of Science
  26. [26].↵
    M Naville, A Ghuillot-Gaudeffroy, A Marchais, and D Gautheret. ARNold: a web tool for the prediction of Rho-independent transcription terminators. RNA Biol, 8(1):11–3, 2011.
    OpenUrlCrossRefPubMedWeb of Science
  27. [27].↵
    P P Gardner, L Barquist, A Bateman, E P Nawrocki, and Z Weinberg. RNIE: genome-wide prediction of bacterial intrinsic terminators. Nucleic Acids Res, 14(39):5845–5852, 2011.
    OpenUrl
  28. [28].↵
    S R Eddy and R Durbin. RNA sequence analysis using covariance models. Nucleic Acids Res, 22(11):2079–88, Jun 1994.
    OpenUrlCrossRefPubMedWeb of Science
  29. [29].↵
    Y Sakakibara, M Brown, R Hughey, I S Mian, K Sjölander, R C Underwood, and D Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res, 22(23):5112–20, Nov 1994.
    OpenUrlCrossRefPubMedWeb of Science
  30. [30].↵
    D. Haussler, A. Krogh, I.S. Mian, and K. Sjolander. Protein modeling using hidden markov models: analysis of globins. In System Sciences, 1993, Proceeding of the Twenty-Sixth Hawaii International Conference on, volume i, pages 792–802, jan 1993.
  31. [31].↵
    A. Krogh. Hidden Markov models for labelled sequences. Proceedings of the 12th IAPR International Conference on Pattern Recognition, 2:140–144, 1994.
    OpenUrl
  32. [32].↵
    R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis. Press, Cambridge U., 1998.
  33. [33].↵
    P. P. Gardner, M. Fasold, Burge S W, Ninova M, Hertel J, Kehr S, Steeves T E, Griffiths-Jones S, and Stadler P F. Conservation and losses non-coding RNA associated loci in avian genomes. ArXiv e-prints, June 2014.
  34. [34].↵
    S Griffiths-Jones, A Bateman, M Marshall, A Khanna, and S R Eddy. Rfam: an RNA family database. Nucleic Acids Res, 31(1):439–41, Jan 2003.
    OpenUrlCrossRefPubMedWeb of Science
  35. [35].↵
    S Griffiths-Jones, S Moxon, M Marshall, A Khanna, S R Eddy, and A Bateman. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res, 33(Database issue):D121–4, Jan 2005.
    OpenUrlCrossRefPubMedWeb of Science
  36. [36].↵
    P P Gardner, J Daub, J G Tate, E P Nawrocki, D L Kolbe, S Lindgreen, A C Wilkinson, R D Finn, S Griffiths-Jones, S R Eddy, and A Bateman. Rfam: updates to the RNA families database. Nucleic Acids Res, 37(Database issue):D136–40, Jan 2009.
    OpenUrlCrossRefPubMedWeb of Science
  37. [37].↵
    A I Petrov, C L Zirbel, and N B Leontis. WebFR3D–a server for finding, aligning and analyzing recurrent RNA 3D motifs. Nucleic Acids Res, 39(Web Server issue):W50–5, Jul 2011.
    OpenUrlCrossRefPubMedWeb of Science
  38. [38].↵
    Z Weinberg and W L Ruzzo. Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics, 22(1):35–9, Jan 2006.
    OpenUrlCrossRefPubMedWeb of Science
  39. [39].↵
    E P Nawrocki, D L Kolbe, and S R Eddy. Infernal 1.0: inference of RNA alignments. Bioinformatics, 25(10):1335–7, May 2009.
    OpenUrlCrossRefPubMedWeb of Science
  40. [40].↵
    E K Freyhult, J P Bollback, and P P Gardner. Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res, 17(1):117–125, Jan 2007.
    OpenUrlAbstract/FREE Full Text
  41. [41].↵
    D L Kolbe and S R Eddy. Fast Filtering for RNA Homology Search. Bioinformatics, Sep 2011.
  42. [42].↵
    E P Nawrocki and S R Eddy. Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol, 3(3):e56, Mar 2007.
    OpenUrlCrossRefPubMed
  43. [43].↵
    Z Weinberg and W L Ruzzo. Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy. Bioinformatics, 20 Suppl 1:i334–41, Aug 2004.
    OpenUrlCrossRefPubMed
  44. [44].↵
    K T Schroeder, S A McPhee, J Ouellet, and D M Lilley. A structural database for k-turn motifs in RNA. RNA, 16(8):1463–8, Aug 2010.
    OpenUrlAbstract/FREE Full Text
  45. [45].↵
    P S Klosterman, D K Hendrix, M Tamura, S R Holbrook, and S E Brenner. Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Res, 32(8):2342–52, 2004.
    OpenUrlCrossRefPubMedWeb of Science
  46. [46].↵
    M Tamura, D K Hendrix, P S Klosterman, N R Schimmelman, S E Brenner, and S R Holbrook. SCOR: Structural Classification of RNA, version 2.0. Nucleic Acids Res, 32(Database issue):D182–4, Jan 2004.
    OpenUrlCrossRefPubMedWeb of Science
  47. [47].↵
    J J Cannone, S Subramanian, M N Schnare, J R Collett, L M D’Souza, Y Du, B Feng, N Lin, L V Madabusi, K M Müller, N Pande, Z Shang, N Yu, and R R Gutell. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics, 3:2, 2002.
  48. [48].↵
    C Zhong and S Zhang. Clustering RNA structural motifs in ribosomal RNAs using secondary structural alignment. Nucleic Acids Res, 40(3):1307–17, Feb 2012.
    OpenUrlCrossRefPubMedWeb of Science
  49. [49].↵
    I López de Silanes, M Zhan, A Lal, X Yang, and M Gorospe. Identification of a target RNA motif for RNA-binding protein HuR. Proc Natl Acad Sci U S A, 101(9):2987–92, Mar 2004.
    OpenUrlAbstract/FREE Full Text
  50. [50].↵
    C L Zirbel, J E Sponer, J Sponer, J Stombaugh, and N B Leontis. Classification and energetics of the base-phosphate interactions in RNA. Nucleic Acids Res, 37(15):4898–918, Aug 2009.
    OpenUrlCrossRefPubMedWeb of Science
  51. [51].↵
    W W Grabow, Z Zhuang, Z N Swank, J E Shea, and L Jaeger. The right angle (RA) motif: a prevalent ribosomal RNA structural pattern found in group I introns. J Mol Biol, 424(1-2):54–67, Nov 2012.
    OpenUrlCrossRefPubMed
  52. [52].↵
    M A Rosenblad, N Larsen, T Samuelsson, and C Zwieb. Kinship in the SRP RNA family. RNA Biol, 6(5):508–16, 2009.
    OpenUrlCrossRefPubMedWeb of Science
  53. [53].↵
    M Popenda, M Blazewicz, M Szachniuk, and R W Adamiak. RNA FRABASE version 1.0: an engine with a database to search for the three-dimensional fragments within RNA structures. Nucleic Acids Res, 36(Database issue):D386–91, Jan 2008.
    OpenUrlCrossRefPubMedWeb of Science
  54. [54].↵
    M Popenda, M Szachniuk, M Blazewicz, S Wasik, E K Burke, J Blazewicz, and R W Adamiak. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures. BMC Bioinformatics, 11:231, 2010.
  55. [55].↵
    W F Bluhm, B Beran, C Bi, D Dimitropoulos, A Prlic, G B Quinn, P W Rose, C Shah, J Young, B Yukich, H M Berman, and P E Bourne. Quality assurance for the query and distribution systems of the RCSB Protein Data Bank. Database (Oxford), 2011:bar003, 2011.
    OpenUrlPubMed
  56. [56].↵
    P W Rose, B Beran, C Bi, W F Bluhm, D Dimitropoulos, D S Goodsell, A Prlic, M Quesada, G B Quinn, J D Westbrook, J Young, B Yukich, C Zardecki, H M Berman, and P E Bourne. The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res, 39(Database issue):D392–401, Jan 2011.
    OpenUrlCrossRefPubMedWeb of Science
  57. [57].↵
    Jun 2014. ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt.gz.
  58. [58].↵
    R Leinonen, R Akhtar, E Birney, J Bonfield, L Bower, M Corbett, Y Cheng, F Demiralp, N Faruque, N Goodgame, R Gibson, G Hoad, C Hunter, M Jang, S Leonard, Q Lin, R Lopez, M Maguire, H McWilliam, S Plaister, R Radhakrishnan, S Sobhany, G Slater, P Ten Hoopen, F Valentin, R Vaughan, V Zalunin, D Zerbino, and G Cochrane. Improvements to services at the European Nucleotide Archive. Nucleic Acids Res, 38(Database issue):D39–45, Jan 2010.
    OpenUrlCrossRefPubMedWeb of Science
  59. [59].↵
    C Workman and A Krogh. No evidence that mrnas have lower folding free energies than random sequences with the same dinucleotide distribution. Nucleic Acids Res, 27(24):4816–22, Dec 1999.
    OpenUrlCrossRefPubMedWeb of Science
  60. [60].↵
    B W Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta, 405(2):442–51, Oct 1975.
    OpenUrlCrossRefPubMedWeb of Science
  61. [61].↵
    Z Weinberg and R R Breaker. R2R - software to speed the depiction of aesthetic consensus RNA secondary structures. BMC Bioinformatics, 12:3, 2011.
  62. [62].↵
    C Lucchetti-Miganeh, E Burrowes, C Baysse, and G Ermel. The post-transcriptional regulator csra plays a central role in the adaptation of bacterial pathogens to different stages of infection in animal hosts. Microbiology, 154(Pt 1):16–29, Jan 2008.
    OpenUrlCrossRefPubMedWeb of Science
  63. [63].↵
    C Valverde, M Lindell, E G Wagner, and D Haas. A repeated gga motif is critical for the activity and stability of the riboregulator rsmy of pseudomonas fluorescens. J Biol Chem, 279(24):25066–74, Jun 2004.
    OpenUrlAbstract/FREE Full Text
  64. [64].↵
    Alejandro Toledo-Arana, Francis Repoila, and Pascale Cossart. Small noncoding RNAs controlling pathogenesis. Current Opinion in Microbiology, 10(2):182–188, Apr 2007.
    OpenUrlCrossRefPubMedWeb of Science
  65. [65].↵
    Z Weinberg, J X Wang, J Bogue, J Yang, K Corbino, R H Moy, and R R Breaker. Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes. Genome Biol, 11(3):R31, 2010.
    OpenUrlCrossRefPubMed
  66. [66].↵
    J Leonardi, J A Box, J T Bunch, and P Baumann. TER1, the RNA subunit of fission yeast telomerase. Nat Struct Mol Biol, 15(1):26–33, Jan 2008.
    OpenUrlCrossRefPubMedWeb of Science
  67. [67].↵
    C J Webb and V A Zakian. Identification and characterization of the Schizosaccharomyces pombe TER1 telomerase RNA. Nat Struct Mol Biol, 15(1):34–42, Jan 2008.
    OpenUrlCrossRefPubMedWeb of Science
  68. [68].↵
    P P Gardner, A Bateman, and A M Poole. SnoPatrol: how many snoRNA genes are there? J Biol, 9(1):4, 2010.
    OpenUrlCrossRefPubMed
  69. [69].↵
    A Bateman, S Agrawal, E Birney, E A Bruford, J M Bujnicki, G Cochrane, J R Cole, M E Dinger, A J Enright, P P Gardner, D Gautheret, S Griffiths-Jones, J Harrow, J Herrero, I H Holmes, H D Huang, K A Kelly, P Kersey, A Kozomara, T M Lowe, M Marz, S Moxon, K D Pruitt, T Samuelsson, P F Stadler, A J Vilella, J H Vogel, K P Williams, M W Wright, and C Zwieb. RNAcentral: A vision for an international database of RNA sequences. RNA, 17(11):1941–6, Nov 2011.
    OpenUrlAbstract/FREE Full Text
  70. [70].↵
    J Widmann, J Stombaugh, D McDonald, J Chocholousova, P Gardner, M K Iyer, Z Liu, C A Lozupone, J Quinn, S Smit, S Wikman, J R Zaneveld, and R Knight. RNASTAR: an RNA STructural Alignment Repository that provides insight into the evolution of natural and artificial RNAs. RNA, 18(7):1319–27, Jul 2012.
    OpenUrlAbstract/FREE Full Text
  71. [71].↵
    O Duss, E Michel, M Yulikov, M Schubert, G Jeschke, and F H Allain. Structural basis of the non-coding RNA RsmZ acting as a protein sponge. Nature, 509(7502):588–92, May 2014.
    OpenUrlCrossRefPubMedWeb of Science
  72. [72].↵
    M Schubert, K Lapouge, O Duss, F C Oberstrass, I Jelesarov, D Haas, and F H Allain. Molecular basis of messenger RNA recognition by the specific bacterial repressing clamp RsmA/CsrA. Nat Struct Mol Biol, 14(9):807–13, Sep 2007.
    OpenUrlCrossRefPubMedWeb of Science
  73. [73].↵
    L Barquist, S W Burge, and P P Gardner. RNA Structure Determination, chapter Building non-coding RNA families. Submitted to Methods in Molecular Biology, 2012.
  74. [74].↵
    P P Gardner and R Giegerich. A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics, 5:140–140, Sep 2004.
    OpenUrlCrossRefPubMed
  75. [75].↵
    K T Simons, C Kooperberg, E Huang, and D Baker. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J Mol Biol, 268(1):209–25, Apr 1997.
    OpenUrlCrossRefPubMedWeb of Science
  76. [76].↵
    S Granneman, G Kudla, E Petfalski, and D Tollervey. Identification of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking and high-throughput analysis of cDNAs. Proc Natl Acad Sci U S A, 106(24):9613–8, Jun 2009.
    OpenUrlAbstract/FREE Full Text
  77. [77].↵
    A C Jungkamp, M Stoeckius, D Mecenas, D Grün, G Mastrobuoni, S Kempa, and N Rajewsky. In vivo and transcriptome-wide identification of rna binding protein target sites. Mol Cell, 44(5):828–40, Dec 2011.
    OpenUrlCrossRefPubMedWeb of Science
  78. [78].↵
    D Ray, H Kazan, K B Cook, M T Weirauch, H S Najafabadi, X Li, S Gueroussov, M Albu, H Zheng, A Yang, H Na, M Irimia, L H Matzat, R K Dale, S A Smith, C A Yarosh, S M Kelly, B Nabet, D Mecenas, W Li, R S Laishram, M Qiao, H D Lipshitz, F Piano, A H Corbett, R P Carstens, B J Frey, R A Anderson, K W Lynch, L O Penalva, E P Lei, A G Fraser, B J Blencowe, Q D Morris, and T R Hughes. A compendium of RNA-binding motifs for decoding gene regulation. Nature, 499(7457):172–7, Jul 2013.
    OpenUrlCrossRefPubMedWeb of Science
Back to top
PreviousNext
Posted November 06, 2014.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Annotating RNA motifs in sequences and alignments
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Annotating RNA motifs in sequences and alignments
Paul P. Gardner, Hisham Eldai
bioRxiv 011197; doi: https://doi.org/10.1101/011197
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Annotating RNA motifs in sequences and alignments
Paul P. Gardner, Hisham Eldai
bioRxiv 011197; doi: https://doi.org/10.1101/011197

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (2540)
  • Biochemistry (4990)
  • Bioengineering (3491)
  • Bioinformatics (15254)
  • Biophysics (6921)
  • Cancer Biology (5414)
  • Cell Biology (7762)
  • Clinical Trials (138)
  • Developmental Biology (4545)
  • Ecology (7171)
  • Epidemiology (2059)
  • Evolutionary Biology (10246)
  • Genetics (7524)
  • Genomics (9811)
  • Immunology (4883)
  • Microbiology (13275)
  • Molecular Biology (5159)
  • Neuroscience (29525)
  • Paleontology (203)
  • Pathology (839)
  • Pharmacology and Toxicology (1469)
  • Physiology (2147)
  • Plant Biology (4765)
  • Scientific Communication and Education (1015)
  • Synthetic Biology (1340)
  • Systems Biology (4016)
  • Zoology (770)