Abstract
Machine learning prediction of the interaction between major histocompatibility complex I (MHC I) proteins and their small peptide ligands is important for vaccine design and other applications in adaptive immunity. We describe and benchmark a new open-source MHC I binding prediction package, MHCflurry. The software is a collection of allele-specific binding predictors incorporating a novel neural network architecture and adhering to software development best practices. MHCflurry outperformed the standard predictors NetMHC 4.0 and NetMHCpan 3.0 on a benchmark of mass spec-identified MHC ligands and showed competitive accuracy on a benchmark of affinity measurements. The accuracy improvement was due to substantially better prediction of non-9-mer peptide ligands, which offset a narrowly lower accuracy on 9-mers. MHCflurry was on average 8.6X faster than NetMHC and 44X faster than NetMHCpan; performance is further increased when a graphics processing unit (GPU) is available. MHCflurry is freely available to use, retrain, or extend, includes Python library and command line interfaces, and may be installed using standard package managers.