Abstract
Gene structural annotation is a critical step in obtaining biological knowledge from genome sequences yet remains a major challenge in genomics projects. Current de novo Hidden Markov Models are limited in their capacity to model biological complexity; while current pipelines are resource-intensive and their results vary in quality with the available extrinsic data. Here, we build on our previous work in applying Deep Learning to gene calling to make a fully applicable, fast and user friendly tool for predicting primary gene models from DNA sequence alone. The quality is state-of-the-art, with predictions scoring closer by most measures to the references than to predictions from other de novo tools. Helixer’s predictions can be used as is or could be integrated in pipelines to boost quality further. Moreover, there is substantial potential for further improvements and advancements in gene calling with Deep Learning.
Helixer is open source and available at https://github.com/weberlab-hhu/Helixer
A web interface is available at https://www.plabipd.de/helixer_main.html
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Fixed typos in affiliations. Fixed exact wording of affiliations & institute naming. More precise assignment of dual affiliations. Clean up extraneous '.' characters in title & affiliations. Updated relevant acknowledgement text to exactly match the HHU HPC's template.