RT Journal Article SR Electronic T1 Prediction of celiac disease associated epitopes and motifs in a protein JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.07.26.501507 DO 10.1101/2022.07.26.501507 A1 Tomer, Ritu A1 Patiyal, Sumeet A1 Dhall, Anjali A1 Raghava, Gajendra P. S. YR 2022 UL http://biorxiv.org/content/early/2022/07/27/2022.07.26.501507.abstract AB Celiac disease (CD) is an autoimmune gastrointestinal disorder which causes immune-mediated enteropathy against gluten. The gluten immunogenic peptides have the potential to trigger immune responses which leads to damage the small intestine. HLA-DQ2 and HLA-DQ8 are major alleles that bind to epitope/antigenic region of gluten and induce celiac disease. There is a need to identify CD associated epitopes in protein-based foods and therapeutics. In addition, prediction of CD associated epitope/peptide is also required for developing antigen-based immunotherapy against celiac disease. In this study, computational tools have been developed to predict CD associated epitopes and motifs. Dataset used in this study for training, testing and evaluation contain experimentally validated CD associated and non-CD associate peptides. Our analysis support existing hypothesis that proline (P) and glutamine (Q) are highly abundant in CD associated peptides. A model based on density of P&Q in peptides has been developed for predicting CD associated which achieve maximum AUROC 0.98. We discovered CD associated motifs (e.g., QPF, QPQ, PYP) which occurs specifically in CD associated peptides. We also developed machine learning based models using peptide composition and achieved maximum AUROC 0.99. Finally, we developed ensemble method that combines motif-based approach and machine learning based models. The ensemble model-predict CD associated motifs with 100% accuracy on an independent dataset, not used for training. Finally, the best models and motifs has been integrated in a web server and standalone software package “CDpred”. We hope this server anticipate the scientific community for the prediction, designing and scanning of CD associated peptides as well as CD associated motifs in a protein/peptide sequence (https://webs.iiitd.edu.in/raghava/cdpred/).Key PointsCeliac disease is one of the prominent autoimmune diseasesGluten immunogenic peptides are responsible for celiac diseaseMapping of celiac disease associated epitopes and motifs on a proteinsIdentification of proline and glutamine rich regionsA web server and software package for predicting CD associate peptidesAuthor’s BiographyRitu Tomer is currently working as Ph.D. in Computational Biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.Sumeet Patiyal is currently working as Ph.D. in Computational biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.Anjali Dhall is currently working as Ph.D. in Computational Biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.Gajendra P. S. Raghava is currently working as Professor and Head of Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.Competing Interest StatementThe authors have declared no competing interest.CDCeliac DiseaseHLAHuman leukocyte antigensCXCR3Chemokine receptor 3tTGTissue transglutaminasesIgASecretory Immunoglobulin AIEDBImmune Epitope DatabaseAUROCArea under receiver operator curveDTDecision TreeRFRandom ForestSVCSupport Vector ClassifierXGBXGBoostLRLogistic RegressionETExtra Tree classifierKNNk-Nearest NeighborsGNBGaussian Naive Bayes