RT Journal Article SR Electronic T1 Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi JF bioRxiv FD Cold Spring Harbor Laboratory SP 219287 DO 10.1101/219287 A1 Jens Keilwagen A1 Frank Hartung A1 Michael Paulini A1 Sven O. Twardziok A1 Jan Grau YR 2017 UL http://biorxiv.org/content/early/2017/11/14/219287.abstract AB Motivation Genome annotation is of key importance in many research questions. The identification of protein-coding genes is often based on transcriptome sequencing data, ab-initio or homology-based prediction. Recently, it was demonstrated that intron position conservation improves homology-based gene prediction, and that experimental data improves ab-initio gene prediction.Results Here, we present an extension of the gene prediction tool GeMoMa that utilizes amino acid sequence conservation, intron position conservation and optionally RNA-seq data for homology-based gene prediction. We show on published benchmark data for plants, animals and fungi that GeMoMa performs better than the gene prediction programs BRAKER1, MAKER2, and CodingQuarry, and purely RNA-seq-based pipelines for transcript identification. In addition, we demonstrate that using multiple reference organisms may help to further improve the performance of GeMoMa. Finally, we apply GeMoMa to four nematode species and to the recently published barley reference genome indicating that current annotations of protein-coding genes may be refined using GeMoMa predictions.Availability GeMoMa has been published under GNU GPL3 and is freely available at http://www.jstacs.de/index.php/GeMoMa.Contact jens.keilwagen{at}julius-kuehn.de