Abstract
Epitope-based vaccines are promising therapeutic modalities for infectious diseases and cancer, but identifying immunogenic epitopes is challenging. The vast majority of prediction methods are sequence-based, and do not incorporate wide-scale structure data and biochemical properties across each peptide-MHC (pMHC) complex. We present ImmunoStruct, a deep-learning model that integrates sequence, structural, and biochemical information to predict multi-allele class-I pMHC immunogenicity. By leveraging a multimodal dataset of ∼ 27,000 peptide-MHC complexes that we generated with AlphaFold, we demonstrate that ImmunoStruct improves immunogenicity prediction performance and interpretability beyond existing methods, across infectious disease epitopes and cancer neoepitopes. We further show strong alignment with in vitro assay results for a set of SARS-CoV-2 epitopes. This work also presents a new architecture that incorporates equivariant graph processing and multi-modal data integration for the long standing task in immunotherapy.
Competing Interest Statement
The authors have declared no competing interest.