Abstract
Motivation Decrypting the interface residues of the protein complexes provide insight into the functions of the proteins and hence the overall cellular machinery. Computational methods have been devised in the past to predict the interface residues using amino acid sequence information but all these methods have been majorly applied to predict for prokaryotic protein complexes. Since the composition and rate of evolution of the primary sequence is different between prokaryotes and eukaryotes, it is important to develop a method specifically for eukaryotic complexes.
Results Here we report a new hybrid pipeline for the prediction of protein-protein interaction interfaces from the amino acid sequence information which is based on the framework of co-evolution, machine learning (random forest) and network analysis named CoRNeA trained specifically on eukaryotic protein complexes. We use conservation, structural and contact potential as major group of features to train the random forest classifier. We also incorporate the intra contact information of the individual proteins to eliminate false positives from the predictions keeping in mind that the amino acid sequence also holds information for its own folding and not only the interface propensities. Our prediction on example datasets shows that CoRNeA not only enhances the prediction of true interface residues but also reduces false positive rates significantly.