Protein sequence similarity searches using patterns as seeds

Nucleic Acids Res. 1998 Sep 1;26(17):3986-90. doi: 10.1093/nar/26.17.3986.

Abstract

Protein families often are characterized by conserved sequence patterns or motifs. A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members. To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods. PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.

Publication types

  • Comparative Study

MeSH terms

  • Adenosine Triphosphatases
  • Algorithms*
  • Amino Acid Sequence*
  • Archaeal Proteins
  • Caenorhabditis elegans Proteins*
  • Calcium-Binding Proteins
  • DNA Primase
  • Databases, Factual
  • HSP90 Heat-Shock Proteins
  • Helminth Proteins
  • Pattern Recognition, Automated*
  • RNA Nucleotidyltransferases
  • Software*

Substances

  • Archaeal Proteins
  • Caenorhabditis elegans Proteins
  • Calcium-Binding Proteins
  • Ced-4 protein, C elegans
  • HSP90 Heat-Shock Proteins
  • Helminth Proteins
  • DNA Primase
  • RNA Nucleotidyltransferases
  • Adenosine Triphosphatases