Abstract
Protein engineering holds significant promise for designing proteins with customized functions, yet the vast landscape of potential mutations versus limited lab capacity constrains the discovery of optimal sequences. To address this, we present the μProtein framework, which accelerates protein engineering by combining μFormer, a deep learning model for accurate mutational effect prediction, with μSearch, a reinforcement learning algorithm designed to efficiently navigate the protein fitness landscape using μFormer as an oracle. μProtein leverages single mutation data to predict optimal sequences with complex, multi-amino acid mutations through its modeling of epistatic interactions and a multi-step search strategy. Except from state-of-the-art performance on benchmark datasets, μProtein identified high-gain-of-function multi-point mutants for the enzyme β-lactamase, surpassing the highest known activity level, in wet-lab, trained solely on single mutation data. These results demonstrate μProtein's capability to discover impactful mutations across vast protein sequence space, offering a robust, efficient approach for protein optimization.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Revise manuscript; Benchmark and evaluation on uSearch