TY - JOUR T1 - Optimal feature selection and software tool development for bacteriocin prediction JF - bioRxiv DO - 10.1101/2022.09.29.510068 SP - 2022.09.29.510068 AU - Suraiya Akhter AU - John Miller Y1 - 2022/01/01 UR - http://biorxiv.org/content/early/2022/09/30/2022.09.29.510068.abstract N2 - Antibiotic resistance is a major public health concern around the globe. As a result, researchers always look for new compounds to develop new antibiotic drugs for combating antibiotic-resistant bacteria. Bacteriocin becomes a promising antimicrobial agent to fight against antibiotic resistance, due to its narrow killing spectrum. Sequence matching methods are widely used to identify bacteriocins by comparing them with the known bacteriocin sequences; however, these methods often fail to detect new bacteriocin sequences due to sequences’ high diversity. The ability to use a machine learning approach can help find new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. The aim of this work is to identify optimal sets of features and develop a machine learning-based software tool for predicting bacteriocin protein sequences with high accuracy. We extracted potential features from known bacteriocin and non-bacteriocin sequences by considering the physicochemical and structural properties of the protein sequences. Then we reduced the feature set using statistical justifications and recursive feature elimination technique. Finally, we built support vector machine (SVM) and random forest (RF) models using the selected features and our models can achieve accuracy up to 95.54%. We compared the performance of our method with a popular sequence matching-based approach and a deep learning-based method. We also developed a software tool called Bacteriocin Prediction (BacPred) that implements the prediction model using the optimal set of features obtained from this study. The software package and its user manual are available at https://github.com/suraiya14/ML_bacteriocins/BacPred.Competing Interest StatementThe authors have declared no competing interest. ER -