Abstract
The relationship between pH and enzyme catalytic activity, especially the optimal pH (pHopt) at which enzymes function, is critical for biotechnological applications. Hence, computational methods to predict pHopt will enhance enzyme discovery and design by facilitating accurate identification of enzymes that function optimally at specific pH levels, and by elucidating sequence-function relationships. In this study, we proposed and evaluated various machine-learning methods for predicting pHopt, conducting extensive hyperparameter optimization, and training over 11,000 model instances. Our results demonstrate that models utilizing language model embeddings markedly outperform other methods in predicting pHopt. We present EpHod, the best-performing model, to predict pHopt, making it publicly available to researchers. From sequence data, EpHod directly learns structural and biophysical features that relate to pHopt, including proximity of residues to the catalytic center and the accessibility of solvent molecules. Overall, EpHod presents a promising advancement in pHopt prediction and will potentially speed up the development of enzyme technologies.
Competing Interest Statement
DM is an advisor for Dyno Therapeutics, Octant, Jura Bio, Tectonic Therapeutics, and Genentech, and a cofounder of Seismic Therapeutics. CS is an advisor for CytoReason Ltd. GTB is an advisor for Bluestem Biosciences and Samsara Eco. The remaining authors declare no competing interests.
Footnotes
We have revised the entire paper based on constructive reviewer feedback.