Abstract
The link between a protein’s primary sequence and its thermal stability and temperature dependent activity is central to an understanding of protein folding, stability, and evolution. However, the relationship between primary sequence and these biochemical properties can be difficult to quantify, due to the large sequence space and complexity of protein folding. Fortunately, evolution naturally explores both sequence space and temperature space through organismal adaptation to various thermal niches. Here, we use machine learning, in the form of multilayer perceptrons, to predict the originating species’ optimal growth temperatures from a protein family’s primary sequences. Trained machine learning models outperformed linear regressions in predicting the originating species growth temperature, achieving a root mean squared error of 3.34 °C. Notably, the models are protein family specific, and the predicted organismal growth temperatures are correlated with the proteins’ temperatures for melting and optimal activity. Therefore, this method provides a new tool for quickly predicting an organism’s optimal growth temperature in silico, which can serve as a convenient proxy for protein stability and temperature dependent activity.