RT Journal Article SR Electronic T1 PhageAI - Bacteriophage Life Cycle Recognition with Machine Learning and Natural Language Processing JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.07.11.198606 DO 10.1101/2020.07.11.198606 A1 Piotr Tynecki A1 Arkadiusz Guziński A1 Joanna Kazimierczak A1 Michał Jadczuk A1 Jarosław Dastych A1 Agnieszka Onisko YR 2020 UL http://biorxiv.org/content/early/2020/07/12/2020.07.11.198606.abstract AB Background As antibiotic resistance is becoming a major problem nowadays in a treatment of infections, bacteriophages (also known as phages) seem to be an alternative. However, to be used in a therapy, their life cycle should be strictly lytic. With the growing popularity of Next Generation Sequencing (NGS) technology, it is possible to gain such information from the genome sequence. A number of tools are available which help to define phage life cycle. However, there is still no unanimous way to deal with this problem, especially in the absence of well-defined open reading frames. To overcome this limitation, a new tool is definitely needed.Results We developed a novel tool, called PhageAI, that allows to access more than 10 000 publicly available bacteriophages and differentiate between their major types of life cycles: lytic and lysogenic. The tool included life cycle classifier which achieved 98.90% accuracy on a validation set and 97.18% average accuracy on a test set. We adopted nucleotide sequences embedding based on the Word2Vec with Ship-gram model and linear Support Vector Machine with 10-fold cross-validation for supervised classification. PhageAI is free of charge and it is available at https://phage.ai/. PhageAI is a REST web service and available as Python package.Conclusions Machine learning and Natural Language Processing allows to extract information from bacteriophages nucleotide sequences for lifecycle prediction tasks. The PhageAI tool classifies phages into either virulent or temperate with a higher accuracy than any existing methods and shares interactive 3D visualization to help interpreting model classification results.Competing Interest StatementIn accordance with PhageAI - Bacteriophage Life Cycle Recognition with Machine Learning and Natural Language Processing policy, the authors are reporting that the PhageAI platform was developed by the authors and Proteon Pharmaceuticals is the owner of the platform.AUCArea Under CurveMLMachine LearningMLPMulti-layer Perceptron classifierNGSNext Generation SequencingNLPNatural Language ProcessingRFECVFeature ranking with recursive feature elimination and cross-validated selection of the best number of featuresROC AUCarea under an ROC curveSVMSupport Vector MachineUMAPUniform Manifold Approximation and Projection