detectIR: a novel program for detecting perfect and imperfect inverted repeats using complex numbers and vector calculation

PLoS One. 2014 Nov 19;9(11):e113349. doi: 10.1371/journal.pone.0113349. eCollection 2014.

Abstract

Inverted repeats are present in abundance in both prokaryotic and eukaryotic genomes and can form DNA secondary structures--hairpins and cruciforms that are involved in many important biological processes. Bioinformatics tools for efficient and accurate detection of inverted repeats are desirable, because existing tools are often less accurate and time consuming, sometimes incapable of dealing with genome-scale input data. Here, we present a MATLAB-based program called detectIR for the perfect and imperfect inverted repeat detection that utilizes complex numbers and vector calculation and allows genome-scale data inputs. A novel algorithm is adopted in detectIR to convert the conventional sequence string comparison in inverted repeat detection into vector calculation of complex numbers, allowing non-complementary pairs (mismatches) in the pairing stem and a non-palindromic spacer (loop or gaps) in the middle of inverted repeats. Compared with existing popular tools, our program performs with significantly higher accuracy and efficiency. Using genome sequence data from HIV-1, Arabidopsis thaliana, Homo sapiens and Zea mays for comparison, detectIR can find lots of inverted repeats missed by existing tools whose outputs often contain many invalid cases. detectIR is open source and its source code is freely available at: https://sourceforge.net/projects/detectir.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Arabidopsis / genetics
  • Base Sequence
  • Computational Biology / methods*
  • Genome
  • HIV-1 / genetics
  • Humans
  • Inverted Repeat Sequences / genetics*
  • Software*
  • Zea mays / genetics