More challenges for machine-learning protein interactions

Tobias Hamp; Burkhard Rost

doi:10.1093/bioinformatics/btu857

More challenges for machine-learning protein interactions

Bioinformatics. 2015 May 15;31(10):1521-5. doi: 10.1093/bioinformatics/btu857. Epub 2015 Jan 12.

Authors

Tobias Hamp¹, Burkhard Rost¹

Affiliation

¹ Department of Informatics, Bioinformatics and Computational Biology I12, Technische Universität München, 85748 Garching/Munich, Germany.

PMID: 25586513
DOI: 10.1093/bioinformatics/btu857

Abstract

Motivation: Machine learning may be the most popular computational tool in molecular biology. Providing sustained performance estimates is challenging. The standard cross-validation protocols usually fail in biology. Park and Marcotte found that even refined protocols fail for protein-protein interactions (PPIs).

Results: Here, we sketch additional problems for the prediction of PPIs from sequence alone. First, it not only matters whether proteins A or B of a target interaction A-B are similar to proteins of training interactions (positives), but also whether A or B are similar to proteins of non-interactions (negatives). Second, training on multiple interaction partners per protein did not improve performance for new proteins (not used to train). In contrary, a strictly non-redundant training that ignored good data slightly improved the prediction of difficult cases. Third, which prediction method appears to be best crucially depends on the sequence similarity between the test and the training set, how many true interactions should be found and the expected ratio of negatives to positives. The correct assessment of performance is the most complicated task in the development of prediction methods. Our analyses suggest that PPIs square the challenge for this task.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Computational Biology / methods*
Humans
Protein Interaction Mapping / methods*
Proteins / metabolism*
Saccharomyces cerevisiae Proteins / metabolism
Sequence Analysis, Protein

Substances

Proteins
Saccharomyces cerevisiae Proteins