%0 Journal Article %A Rohit Bhattacharya %A Ashok Sivakumar %A Collin Tokheim %A Violeta Beleva Guthrie %A Valsamo Anagnostou %A Victor E. Velculescu %A Rachel Karchin %T Evaluation of machine learning methods to predict peptide binding to MHC Class I proteins %D 2017 %R 10.1101/154757 %J bioRxiv %P 154757 %X Binding of peptides to Major Histocompatibility Complex (MHC) proteins is a critical step in immune response. Peptides bound to MHCs are recognized by CD8+ (MHC Class I) and CD4+ (MHC Class II) T-cells. Successful prediction of which peptides will bind to specific MHC alleles would benefit many cancer immunotherapy appications. Currently, supervised machine learning is the leading computational approach to predict peptide-MHC binding, and a number of methods, trained using results of binding assays, have been published. Many clinical researchers are dissatisfied with the sensitivity and specificity of currently available methods and the limited number of alleles for which they can be applied. We evaluated several recent methods to predict peptide-MHC Class I binding affinities and a new method of our own design (MHCnuggets). We used a high-quality benchmark set of 51 alleles, which has been applied previously. The neural network methods NetMHC, NetMHCpan, MHCflurry, and MHCnuggets achieved similar best-in-class prediction performance in our testing, and of these methods MHCnuggets was significantly faster. MHCnuggets is a gated recurrent neural network, and the only method to our knowledge which can handle peptides of any length, without artificial lengthening and shortening. Seventeen alleles were problematic for all tested methods. Prediction difficulties could be explained by deficiencies in the training and testing examples in the benchmark, suggesting that biological differences in allele-specific binding properties are not as important as previously claimed. Advances in accuracy and speed of computational methods to predict peptide-MHC affinity are urgently needed. These methods will be at the core of pipelines to identify patients who will benefit from immunotherapy, based on tumor-derived somatic mutations. Machine learning methods, such as MHCnuggets, which efficiently handle peptides of any length will be increasingly important for the challenges of predicting immunogenic response for MHC Class II alleles.Author Summary Machine learning methods are a popular approach for predicting whether a peptide will bind to Major Histocompatibility Complex (MHC) proteins, a critical step in activation of cytotoxic T-cells. The input to these methods is a peptide sequence and an MHC allele of interest, and the output is the predicted binding affinity. MHC Class I and II proteins bind peptides of 8-11 amino acids and 16-26 amino acids respectively. This has been an obstacle for machine learning, because the methods used to date can only handle fixed-length inputs. We show that a recently developed technique known as gated recurrent neural networks can handle peptides of variable length and predict peptide-MHC binding as well or better than existing methods, at substantially faster speeds. Our results have implications for the hundreds of MHC alleles that cannot be predicted with current methods. %U https://www.biorxiv.org/content/biorxiv/early/2017/07/27/154757.full.pdf