Abstract
Phage therapy is a viable alternative to antibiotics for treating microbial infections, particularly managing drug-resistant strains of bacteria. One of the major challenges in designing phage based therapy is to identify the most appropriate phage to treat a bacterial infection. In this study, an attempt has been made to predict phage-host interaction with high accuracy to identify the best virus for treating a bacterial infection. All models have been developed on a training dataset containing 826 phage host-interactions, whereas models have been evaluated on a validation dataset comprising 1201 phage-host interactions. Firstly, alignment based models have been developed using similarity between phage-phage (BLASTPhage), host-host (BLASTHost) and phage-CRISPR (CRISPRPred) where we achieved accuracy between 42.4%-66.2% for BLASTPhage, 55%-78.4% for BLASTHost, and 43.7%-80.2% for CRISPRPred at five taxonomic levels. Secondly, alignment free models have been developed using machine learning techniques. Thirdly, hybrid models have been developed by integrating alignment-free models and similarity-score where we achieved maximum performance of (60.6%-93.5%). Finally, an ensemble model has been developed that combines hybrid and alignment based model. Our ensemble model achieved highest accuracy of 67.9%, 80.6%, 85.5%, 90%, 93.5% at Genus, Family, Order, Class and Phylum levels, which is better than existing methods. In order to serve the scientific community we have developed a webserver named PhageTB and standalone software package (https://webs.iiitd.edu.in/raghava/phagetb/).
Key Points
Phage therapy provides an alternative to mange drug resistant strains of bacteria
Prediction bacterial strains that can be treated by a given phage
Alignment-based, alignment-free and ensemble models have been developed.
Prediction of appropriate phage/virus that can lyse a given strain of bacteria.
Webserver and standalone package provided to predict phage-host interactions.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Mailing Address of Authors
Suchet Aggarwal: suchet18105{at}iiitd.ac.in
Anjali Dhall: anjalid{at}iiitd.ac.in
Sumeet Patiyal: sumeetp{at}iiitd.ac.in
Shubham Choudhury: shubhamc{at}iiitd.ac.in
Akanksha Arora: akankshaar{at}iiitd.ac.in
Gajendra P.S. Raghava: raghava{at}iiitd.ac.in
Author’s Biography
1. Suchet Aggarwal is pursuing B. Tech. in Computer Science and Engineering from Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, New Delhi, India.
2. Anjali Dhall is currently working as Ph.D. in Computational Biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
3. Sumeet Patiyal is currently working as Ph.D. in Computational Biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
4. Shubham Choudhury is currently working as Ph.D. in Computational Biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
5. Akanksha Arora is currently working as Ph.D. in Computational Biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
6. Gajendra P. S. Raghava is currently working as Professor and Head of Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.