PT - JOURNAL ARTICLE AU - Avdesh Mishra AU - Reecha Khanal AU - Md Tamjidul Hoque TI - AIRBP: Accurate identification of RNA-binding proteins using machine learning techniques AID - 10.1101/2020.03.10.985416 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.03.10.985416 4099 - http://biorxiv.org/content/early/2020/03/10/2020.03.10.985416.short 4100 - http://biorxiv.org/content/early/2020/03/10/2020.03.10.985416.full AB - Motivation Identification of RNA-binding proteins (RBPs) that bind to ribonucleic acid molecules, is an important problem in Computational Biology and Bioinformatics. It becomes indispensable to identify RBPs as they play crucial roles in post-transcriptional control of RNAs and RNA metabolism as well as have diverse roles in various biological processes such as splicing, mRNA stabilization, mRNA localization, and translation, RNA synthesis, folding-unfolding, modification, processing, and degradation. The existing experimental techniques for identifying RBPs are time-consuming and expensive. Therefore, identifying RBPs directly from the sequence using computational methods can be useful to efficiently annotate RBPs and assist the experimental design. In this work, we present a method, called AIRBP, which is designed using an advanced machine learning technique, called stacking, to effectively predict RBPs by utilizing features extracted from evolutionary information, physiochemical properties, and disordered properties. Moreover, our method, AIRBP is trained on the useful feature-subset identified by the evolutionary algorithm (EA).Results The results show that AIRBP attains Accuracy (ACC), F1-score, and MCC of 95.38%, 0.917, and 0.885, respectively, based on the benchmark dataset, using 10-fold cross-validation (CV). Further evaluation of AIRBP on independent test set reveals that it achieves ACC, F1-score, and MCC of 93.04%, 0.943, and 0.855, for Human test set; 91.60%, 0.942 and 0.789 for S. cerevisiae test set; and 91.67%, 0.953 and 0.594 for A. thaliana test set, respectively. These results indicate that AIRBP outperforms the current state-of-the-art method. Therefore, the proposed top-performing AIRBP can be useful for accurate identification and annotation of RBPs directly from the sequence and help gain valuable insight to treat critical diseases.Availability Code-data is available here: http://cs.uno.edu/~tamjid/Software/AIRBP/code_data.zip