Abstract
Motivation Predicting how mutations impact protein biophysical properties remains a significant challenge in computational biology. In recent years, numerous predictors, primarily deep learning models, have been developed to address this problem; however, issues such as their lack of interpretability and limited accuracy persist.
Results We showed that a simple evolutionary score, based on the log-odd ratio (LOR) of wild-type and mutated residue frequencies in evolutionary related proteins, when scaled by the residue’s relative solvent accessibility (RSA), performs on par with or slightly outperforms most of the benchmarked predictors, many of which are considerably more complex. The evaluation is performed on mutations from the ProteinGym deep mutational scanning dataset collection, which measures various properties such as stability, activity or fitness. This raises further questions about what these complex models actually learn and highlights their limitations in addressing prediction of mutational landscape.
Availability The RSALOR model is available as a user-friendly Python package that can be installed from the PyPI repository. The code is freely available at https://github.com/3BioCompBio/RSALOR.
Contact Matsvei.Tsishyn{at}ulb.be, Fabrizio.Pucci{at}ulb.be
Competing Interest Statement
The authors have declared no competing interest.