PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies

Nucleic Acids Res. 2012 Sep;40(16):e126. doi: 10.1093/nar/gks406. Epub 2012 May 14.

Abstract

Prophages are phages in lysogeny that are integrated into, and replicated as part of, the host bacterial genome. These mobile elements can have tremendous impact on their bacterial hosts' genomes and phenotypes, which may lead to strain emergence and diversification, increased virulence or antibiotic resistance. However, finding prophages in microbial genomes remains a problem with no definitive solution. The majority of existing tools rely on detecting genomic regions enriched in protein-coding genes with known phage homologs, which hinders the de novo discovery of phage regions. In this study, a weighted phage detection algorithm, PhiSpy was developed based on seven distinctive characteristics of prophages, i.e. protein length, transcription strand directionality, customized AT and GC skew, the abundance of unique phage words, phage insertion points and the similarity of phage proteins. The first five characteristics are capable of identifying prophages without any sequence similarity with known phage genes. PhiSpy locates prophages by ranking genomic regions enriched in distinctive phage traits, which leads to the successful prediction of 94% of prophages in 50 complete bacterial genomes with a 6% false-negative rate and a 0.66% false-positive rate.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Base Composition
  • Codon
  • DNA, Bacterial / chemistry
  • Genome, Bacterial*
  • Prophages / genetics*
  • Transcription, Genetic
  • Viral Proteins / genetics

Substances

  • Codon
  • DNA, Bacterial
  • Viral Proteins