Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine

BMC Bioinformatics. 2005 Dec 29:6:310. doi: 10.1186/1471-2105-6-310.

Abstract

Background: MicroRNAs (miRNAs) are a group of short (approximately 22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology.

Results: A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information.

Conclusion: The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Computational Biology / methods*
  • Computer Simulation
  • False Positive Reactions
  • Genome
  • Genomics
  • Humans
  • MicroRNAs / classification*
  • MicroRNAs / genetics*
  • Models, Statistical
  • Molecular Sequence Data
  • Nucleic Acid Conformation*
  • RNA, Messenger / metabolism
  • Software
  • Species Specificity

Substances

  • MicroRNAs
  • RNA, Messenger