Abstract
The optimal growth temperature (OGT) of organisms is an important index to estimate the stability of enzymes encoded in their genomes. However, experimental determination of OGT for microorganisms that cannot be cultivated is difficult. Here, we report on the development of a machine learning model that can accurately predict OGT directly from proteome-wide 2-mer amino acid composition. We make use of this model to predict OGTs for 1,438 microorganisms. In a subsequent step we combine OGT data with amino acid composition of individual enzymes to develop a second machine learning model for prediction of enzyme temperature optima (Topt). The resulting model is far superior to using OGT alone for estimating Topt in a dataset of 2,609 enzymes. Finally, we predict Topt for 6.5 million enzymes, covering 4,447 EC numbers, and make the resulting dataset available for researchers, enabling simple identification of enzymes that are potentially functional at extreme temperatures.