Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences

Nucleic Acids Res. 2008 May;36(9):3025-30. doi: 10.1093/nar/gkn159. Epub 2008 Apr 4.

Abstract

Compared to the available protein sequences of different organisms, the number of revealed protein-protein interactions (PPIs) is still very limited. So many computational methods have been developed to facilitate the identification of novel PPIs. However, the methods only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins. In this article, a sequence-based method is proposed by combining a new feature representation using auto covariance (AC) and support vector machine (SVM). AC accounts for the interactions between residues a certain distance apart in the sequence, so this method adequately takes the neighbouring effect into account. When performed on the PPI data of yeast Saccharomyces cerevisiae, the method achieved a very promising prediction result. An independent data set of 11,474 yeast PPIs was used to evaluate this prediction model and the prediction accuracy is 88.09%. The performance of this method is superior to those of the existing sequence-based methods, so it can be a useful supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://www.scucic.cn/Predict_PPI/index.htm.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry
  • Artificial Intelligence*
  • Computational Biology / methods
  • Protein Interaction Mapping / methods*
  • Saccharomyces cerevisiae Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / metabolism
  • Sequence Analysis, Protein / methods*

Substances

  • Amino Acids
  • Saccharomyces cerevisiae Proteins