Abstract
At the core of our immunological system lies a group of proteins named Major Histocompatibility Complex (MHC), to which epitopes (also proteins sometimes named antigenic determinants), bind to eliciting a response. These responses are extremely varied and of widely different nature. For instance, Killer and Helper T cells are responsible for, respectively, counteracting viral pathogens and tumorous cells. Many other types exist, but their underlying structure can be very similar due to the fact that they all are proteins and bind to the MHC receptor in a similar fashion. With this framework in mind, being able to predict with precision the structure of a protein that will elicit a specific response in the human body represents a novel computational approach to drug discovery. Although many machine learning approaches have been used, no attempt to solve this problem using Recurrent Neural Networks (RNNs) exist. We extend the current efforts in the field by applying a variety of network architectures based on RNNs and word embeddings (WE). The code is freely available and under current development at https://github.com/carlomazzaferro/mhcPreds