Abstract
Protein engineering plays a pivotal role in designing novel proteins with desired functions, yet the rugged fitness landscape of proteins within their mutant space presents a major challenge, limiting the effective discovery of optimal sequences. To address this, we introduce µFormer, a deep learning framework that combines a pre-trained protein language model with custom-designed scoring modules to predict the mutational effects of proteins. µFormer achieves state-of-the-art performance in predicting high-order mutants, modeling epistatic interactions, and handling insertion. By integrating µFormer with a reinforcement learning framework, we enable efficient exploration of vast mutant spaces, encompassing trillions of mutation candidates, to design protein variants with enhanced activity. Remarkably, we successfully predicted mutants that exhibited a 2000-fold increase in bacterial growth rate due to enhanced enzymatic activity. These results highlight the effectiveness of our approach in identifying impactful mutations across diverse protein targets and fitness metrics, offering a powerful tool for optimizing proteins with significantly higher success rates.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Contributing authors: 767780791shr{at}gmail.com; guoqingliu{at}microsoft.com; haiguang.liu{at}microsoft.com; chuancao{at}microsoft.com; fusongju{at}microsoft.com; lijuwu{at}microsoft.com; taoqin{at}microsoft.com; tie-yan.liu{at}microsoft.com;
We performed wet-lab experiments and included the experimental results in the new manuscript.
↵2 Retrieved on 12/30/2022
Data Availability
The FLIP benchmark is publicly available through: https://github.com/J-SNACKKB/FLIP/. The ProteinGym data collections can be accessed via https://github.com/OATML-Markslab/ProteinGym2. The list of extended spectrum beta-lactamases can be found on http://bldb.eu/([42]).