Abstract
We describe ReorientExpress (https://github.com/comprna/reorientexpress), a method to perform reference-free orientation of transcriptomic long sequencing reads. ReorientExpress uses deep-learning to correctly predict the orientation of the majority of reads, and in particular when trained on a closely related species or in combination with read clustering. ReorientExpress enables long-read transcriptomics in non-model organisms and samples without a genome reference and without using additional technologies.
Footnotes
We provide extra analyses and information about the methods used. We have included a convolutional neural network for comparison. We have also described how the accuracy depends on the transcript type the read originates from. Additionally, we have characterized the sequence motifs that contribute to the prediction of the read orientation, and have identified similarities with motifs for RNA-protein interactions.
List of Abbreviations
- ONT
- Oxford Nanopore Technologies
- DRS
- Direct RNA sequencing
- cDNA
- complementary DNA
- k-mer
- length k oligomer
- DNN
- Deep Neural Network
- MLP
- Multi-layer perceptron
- CNN
- Convolutional Neural Network
- IVT
- In vitro transcribed
- PWM
- Position Weight Matrix