GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences

J Mol Biol. 1999 Apr 9;287(4):797-815. doi: 10.1006/jmbi.1999.2583.

Abstract

A new protein fold recognition method is described which is both fast and reliable. The method uses a traditional sequence alignment algorithm to generate alignments which are then evaluated by a method derived from threading techniques. As a final step, each threaded model is evaluated by a neural network in order to produce a single measure of confidence in the proposed prediction. The speed of the method, along with its sensitivity and very low false-positive rate makes it ideal for automatically predicting the structure of all the proteins in a translated bacterial genome (proteome). The method has been applied to the genome of Mycoplasma genitalium, and analysis of the results shows that as many as 46 % of the proteins derived from the predicted protein coding regions have a significant relationship to a protein of known structure. In some cases, however, only one domain of the protein can be predicted, giving a total coverage of 30 % when calculated as a fraction of the number of amino acid residues in the whole proteome.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Genome*
  • Molecular Sequence Data
  • Neural Networks, Computer
  • Open Reading Frames
  • Protein Conformation*
  • Protein Folding*
  • Reproducibility of Results
  • Sequence Alignment / methods*
  • Sequence Homology, Amino Acid