Genome-wide analysis of mobile genetic element insertion sites

Nucleic Acids Res. 2011 Sep 1;39(16):6864-78. doi: 10.1093/nar/gkr337. Epub 2011 May 23.

Abstract

Mobile genetic elements (MGEs) account for a significant fraction of eukaryotic genomes and are implicated in altered gene expression and disease. We present an efficient computational protocol for MGE insertion site analysis. ELAN, the suite of tools described here uses standard techniques to identify different MGEs and their distribution on the genome. One component, DNASCANNER analyses known insertion sites of MGEs for the presence of signals that are based on a combination of local physical and chemical properties. ISF (insertion site finder) is a machine-learning tool that incorporates information derived from DNASCANNER. ISF permits classification of a given DNA sequence as a potential insertion site or not, using a support vector machine. We have studied the genomes of Homo sapiens, Mus musculus, Drosophila melanogaster and Entamoeba histolytica via a protocol whereby DNASCANNER is used to identify a common set of statistically important signals flanking the insertion sites in the various genomes. These are used in ISF for insertion site prediction, and the current accuracy of the tool is over 65%. We find similar signals at gene boundaries and splice sites. Together, these data are suggestive of a common insertion mechanism that operates in a variety of eukaryotes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • DNA / chemistry
  • Entamoeba histolytica / genetics
  • Genome, Human
  • Genomics / methods
  • Humans
  • Long Interspersed Nucleotide Elements
  • Mice
  • Retroelements*
  • Sequence Analysis, DNA
  • Software*

Substances

  • Retroelements
  • DNA