Abstract
Despite large omics datasets, the establishment of a reliable gene annotation is still challenging for eukaryotic genomes. Here, we used the reference genome of the major fungal wheat pathogen Zymoseptoria tritici (isolate IPO323) as a case study to develop methods to improve eukaryotic gene prediction. Four previous IPO323 annotations identified 10,933 to 13,260 gene models, but only one third of these coding sequences (CDS) have identical structures. To resolve these discrepancies and improve gene models, we generated full-length transcripts using long-read sequencing. This dataset was used together with other evidence (RNA-Seq transcripts and protein sequences) to generate novel ab initio gene models. The selection of the best structure among novel and existing gene models was performed according to transcript and protein evidence using InGenAnnot, a novel bioinformatics suite. Overall, 13,414 re-annotated gene models (RGMs) were predicted, including 671 new genes among which 53 encoded effector candidates. This process corrected many of the errors (15%) observed in previous gene models (coding sequence fusions, false introns, missing exons). While fungal genomes have poor annotations of untranslated regions (UTRs), our Iso-Seq long-read sequences outlined 5’ and 3’UTRs for 73% of the RGMs. Alternative transcripts were identified for 13% of RGMs, mostly due to intron retention (75%), likely corresponding to unprocessed pre-mRNAs. A total of 353 genes displayed alternative transcripts with combinations of previously predicted or novel exons. Long non-coding transcripts (lncRNAs) and double-stranded RNAs from two fungal viruses were also identified. Most lncRNAs corresponded to antisense transcripts of genes (52%). lncRNAs that were up or down regulated during infection were enriched in antisense transcripts (70%), suggesting their involvement in the control of gene expression. Our results showed that combining different ab initio gene predictions and evidence-driven curation using InGenAnnot improved the quality of gene annotations of a compact eukaryotic genome. Our analysis also provided new insights into the transcriptional landscape of Z. tritici, helping develop an increasingly complex picture of its biology.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Add Gert H.J. as author.