Protein structural motif prediction in multidimensional phi-psi space leads to improved secondary structure prediction

J Comput Biol. 2006 Oct;13(8):1489-502. doi: 10.1089/cmb.2006.13.1489.

Abstract

A significant step towards establishing the structure and function of a protein is the prediction of the local conformation of the polypeptide chain. In this article, we present systems for the prediction of three new alphabets of local structural motifs. The motifs are built by applying multidimensional scaling (MDS) and clustering to pair-wise angular distances for multiple phi-psi angle values collected from high-resolution protein structures. The predictive systems, based on ensembles of bidirectional recurrent neural network architectures, and trained on a large non-redundant set of protein structures, achieve 72%, 66%, and 60% correct motif prediction on an independent test set for di-peptides (six classes), tri-peptides (eight classes) and tetra-peptides (14 classes), respectively, 28-30% above baseline statistical predictors. We then build a further system, based on ensembles of two-layered bidirectional recurrent neural networks, to map structural motif predictions into a traditional 3-class (helix, strand, coil) secondary structure. This system achieves 79.5% correct prediction using the "hard" CASP 3-class assignment, and 81.4% with a more lenient assignment, outperforming a sophisticated state-of-the-art predictor (Porter) trained in the same experimental conditions. The structural motif predictor is publicly available at: http://distill.ucd.ie/porter+/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs*
  • Computational Biology / methods*
  • Databases, Protein
  • Peptides / chemistry*
  • Protein Structure, Secondary
  • Proteins / chemistry*

Substances

  • Peptides
  • Proteins