Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features

Amino Acids. 2008 Jan;34(1):103-9. doi: 10.1007/s00726-007-0568-2. Epub 2007 Jul 12.

Abstract

DNA-binding proteins play a pivotal role in gene regulation. It is vitally important to develop an automated and efficient method for timely identification of novel DNA-binding proteins. In this study, we proposed a method based on alone the primary sequences of proteins to predict the DNA-binding proteins. DNA-binding proteins were encoded by autocross-covariance transform, pseudo-amino acid composition, dipeptide composition, respectively and also the different combinations of the three encoded methods; further, these feature matrices were applied to support vector machine classifiers to predict the DNA-binding proteins. All modules were trained and validated by the jackknife cross-validation test. Through comparing the performance of these substituted modules, the best result was obtained from pseudo-amino acid composition with the overall accuracy of 96.6% and the sensitivity of 90.7%. The results suggest that it can efficiently predict the novel DNA-binding proteins only using the primary sequences.

MeSH terms

  • Amino Acids / chemistry*
  • Amino Acids / metabolism*
  • Chemical Phenomena
  • Chemistry, Physical
  • Computational Biology
  • DNA-Binding Proteins / chemistry*
  • DNA-Binding Proteins / metabolism*
  • Sequence Analysis, Protein

Substances

  • Amino Acids
  • DNA-Binding Proteins