Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition

J Theor Biol. 2013 Feb 7:318:1-12. doi: 10.1016/j.jtbi.2012.10.033. Epub 2012 Nov 5.

Abstract

The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins' sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a Jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry*
  • Animals
  • Chemistry, Physical
  • Computational Biology / methods
  • Hydrophobic and Hydrophilic Interactions
  • Membrane Proteins / chemistry*
  • Protein Sorting Signals*
  • Protein Structure, Tertiary*
  • Sequence Analysis, Protein / methods
  • Support Vector Machine

Substances

  • Amino Acids
  • Membrane Proteins
  • Protein Sorting Signals