Kernel methods for predicting protein-protein interactions

Bioinformatics. 2005 Jun:21 Suppl 1:i38-46. doi: 10.1093/bioinformatics/bti1016.

Abstract

Motivation: Despite advances in high-throughput methods for discovering protein-protein interactions, the interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions.

Results: We present a kernel method for predicting protein-protein interactions using a combination of data sources, including protein sequences, Gene Ontology annotations, local properties of the network, and homologous interactions in other species. Whereas protein kernels proposed in the literature provide a similarity between single proteins, prediction of interactions requires a kernel between pairs of proteins. We propose a pairwise kernel that converts a kernel between single proteins into a kernel between pairs of proteins, and we illustrate the kernel's effectiveness in conjunction with a support vector machine classifier. Furthermore, we obtain improved performance by combining several sequence-based kernels based on k-mer frequency, motif and domain content and by further augmenting the pairwise sequence kernel with features that are based on other sources of data. We apply our method to predict physical interactions in yeast using data from the BIND database. At a false positive rate of 1% the classifier retrieves close to 80% of a set of trusted interactions. We thus demonstrate the ability of our method to make accurate predictions despite the sizeable fraction of false positives that are known to exist in interaction databases.

Availability: The classification experiments were performed using PyML available at http://pyml.sourceforge.net. Data are available at: http://noble.gs.washington.edu/proj/sppi.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Computational Biology / methods*
  • Databases, Protein
  • False Positive Reactions
  • Fungal Proteins / chemistry
  • Models, Theoretical
  • Protein Binding
  • Protein Interaction Mapping / methods*
  • Protein Structure, Tertiary
  • ROC Curve
  • Software

Substances

  • Fungal Proteins