Abstract
Major histocompatibility complex (MHC) molecules bind to peptides from exogenous antigens, and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system [1]. Predicting whether a given peptide will bind to the MHC is an important step in the above process, motivating the introduction of many computational approaches. NetMHCPan [2], a pan-specific model predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using a shallow neural network. The successful results of AI methods, especially Natural Language Processing (NLP-based) pretrained models in various applications including protein structure determination, motivated us to explore their use in this problem as well. Specifically, we considered fine-tuning these large deep learning models using as dataset the peptide-MHC sequences. Using standard metrics in this area, and the same training and test sets, we show that our model outperforms NetMHCpan4.1 which has been shown to outperform all other earlier methods [2].
Competing Interest Statement
The authors have declared no competing interest.