PT - JOURNAL ARTICLE AU - Aparajita Dutta AU - Kusum Kumari Singh AU - Ashish Anand TI - Deep learning models for identification of splice junctions across species AID - 10.1101/2021.06.13.448260 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.06.13.448260 4099 - http://biorxiv.org/content/early/2021/06/14/2021.06.13.448260.short 4100 - http://biorxiv.org/content/early/2021/06/14/2021.06.13.448260.full AB - Deep learning models like convolutional neural networks (CNN) and recurrent neural networks (RNN) have been frequently used to identify splice sites from genome sequences. Most of the deep learning applications identify splice sites from a single species. Furthermore, the models generally identify and interpret only the canonical splice sites. However, a model capable of identifying both canonical and non-canonical splice sites from multiple species with comparable accuracy is more generalizable and robust. We choose some state-of-the-art CNN and RNN models and compare their performances in identifying novel canonical and non-canonical splice sites in homo sapiens, mus musculus, and drosophila melanogaster.The RNN-based model named SpliceViNCI outperforms its counterparts in identifying splice sites from multiple species as well as on unseen species. SpliceViNCI maintains its performance when trained with imbalanced data making it more robust. We observe that all the models perform better when trained with more than one species. SpliceViNCI outperforms the counterparts when trained with such an augmented dataset. We further extract and compare the features learned by SpliceViNCI when trained with single and multiple species. We validate the extracted features with knowledge from the literature.Competing Interest StatementThe authors have declared no competing interest.