RT Journal Article SR Electronic T1 Novel transformer networks for improved sequence labeling in genomics JF bioRxiv FD Cold Spring Harbor Laboratory SP 836163 DO 10.1101/836163 A1 Jim Clauwaert A1 Willem Waegeman YR 2020 UL http://biorxiv.org/content/early/2020/02/28/836163.abstract AB In genomics, a wide range of machine learning methodologies are used to annotate biological sequences for the detection of positions of interest such as transcription start sites, translation initiation sites, methylation sites, splice sites and promotor start sites. In recent years, this area has been dominated by convolutional neural networks, which typically outperform previously designed methods as a result of automated scanning for influential sequence motifs. However, those architectures do not allow for the efficient processing of the full genomic sequence. As an improvement, we introduce transformer architectures for whole genome sequence labeling tasks. We show that these architectures, recently introduced for natural language processing, allow for a fast processing of long DNA sequences. We apply existing networks and introduce an optimized method for the calculation of attention from input nucleotides. To demonstrate this, we evaluate our architecture on several sequence labeling tasks, and find it to achieve state-of-the-art performances when comparing it to specialized models for the annotation of transcription start sites, translation initiation sites and 4mC methylation in E. coli, substantiating the robustness of the newly introduced framework.