RT Journal Article SR Electronic T1 Deep learning models for identification of splice junctions across species JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.06.13.448260 DO 10.1101/2021.06.13.448260 A1 Aparajita Dutta A1 Kusum Kumari Singh A1 Ashish Anand YR 2021 UL http://biorxiv.org/content/early/2021/06/14/2021.06.13.448260.abstract AB Deep learning models like convolutional neural networks (CNN) and recurrent neural networks (RNN) have been frequently used to identify splice sites from genome sequences. Most of the deep learning applications identify splice sites from a single species. Furthermore, the models generally identify and interpret only the canonical splice sites. However, a model capable of identifying both canonical and non-canonical splice sites from multiple species with comparable accuracy is more generalizable and robust. We choose some state-of-the-art CNN and RNN models and compare their performances in identifying novel canonical and non-canonical splice sites in homo sapiens, mus musculus, and drosophila melanogaster.The RNN-based model named SpliceViNCI outperforms its counterparts in identifying splice sites from multiple species as well as on unseen species. SpliceViNCI maintains its performance when trained with imbalanced data making it more robust. We observe that all the models perform better when trained with more than one species. SpliceViNCI outperforms the counterparts when trained with such an augmented dataset. We further extract and compare the features learned by SpliceViNCI when trained with single and multiple species. We validate the extracted features with knowledge from the literature.Competing Interest StatementThe authors have declared no competing interest.