Abstract
Protein engineering holds significant promise for designing proteins with customized functions, yet the vast landscape of potential mutations versus limited lab capacity constrains the discovery of optimal sequences. To address this, we present the µProtein framework, which accelerates protein engineering by combining µFormer, a deep learning model for accurate mutational effect prediction, with µSearch, a reinforcement learning algorithm designed to efficiently navigate the protein fitness landscape using µFormer as an oracle. µProtein uniquely leverages single-point mutation data to predict optimal sequences with complex, multi-point mutations through its modeling of epistatic interactions and a two-step, multi-round search strategy. In benchmark testing, µProtein achieved state-of-the-art results, and in wet-lab experiments, it identified high-gain-of-function multi-point mutants for the enzyme β-lactamase, including variants that increased bacterial growth rate by up to 2000-fold, surpassing the highest known activity level, all while training solely on single-site mutation data. These results demonstrate µProtein’s capability to discover impactful mutations across vast protein sequence space, offering a robust, efficient approach for protein optimization.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
We improved the contribution of RL to the framework, and revised figure 1