Abstract
ProteinMPNN is crucial in many protein design pipelines, identifying amino acid (AA) sequences that fold into given 3D protein backbone structures. We explore ProteinMPNN in the context of designing therapeutic proteins that need to avoid triggering unwanted immune reactions. More specifically, we focus on intra-cellular proteins that face the challenge of evading detection by Cytotoxic T-lymphocytes (CTLs) that detect their presence via the MHC Class I (MHC-I) pathway. To reduce visibility of the designed proteins to this immune-system component, we develop a framework that uses the large language model (LLM) tuning method, Direct Preference Optimization (DPO), to guide ProteinMPNN in minimizing the number of predicted MHC-I epitopes in its designs. Our goal is to design proteins with low MHC-I immune-visibility while preserving the original structure and function. For our assessment, we first use AlphaFold to predict the 3D structures of designed protein sequences. We then use TM-score, that measures the structural alignment between the predicted design and original protein, to evaluate fidelity to the original protein structure. We find our LLM-based tuning method for constraining MHC-I visibility is able to effectively reduce visibility without compromising structural similarity to the original protein.
Competing Interest Statement
The authors have declared no competing interest.
Acronyms
- AA
- amino acid
- Ab
- antibody
- AR
- auto-regressive
- CTL
- Cytotoxic T-lymphocyte
- DPO
- Direct Preference Optimization
- GAN
- Generative Adversarial Network
- LLM
- large language model
- MD
- Molecular Dynamics
- MHC-II
- MHC Class II
- MHC-I
- MHC Class I
- ML
- machine learning
- MPNN
- message passing neural network
- NLP
- Natural Language Processing
- PPO
- Proximal Policy Optimization
- PWM
- position weight matrix
- RBF
- radial basis function
- RL
- reinforcement learning
- RLHF
- reinforcement learning from human feedback
- RNA
- ribonucleic acid
- SOTA
- state of the art
- VAE
- Variational Autoencoder