Benchmarking Hayai-Annotation Plants: A Re-evaluation Using Standard Evaluation Metrics

The rapid growth of next-generation sequencing (NGS) technology has led to a surge in the determination of whole genome sequences in plants. This has created a need for functional annotation of newly predicted gene sequences in the assembled genomes. To address this, “Hayai-Annotation Plants” was developed as a gene functional annotation tool for plant species. In this report, we compared Hayai-Annotation Plants with Blast2GO and TRAPID, focusing on the three primary gene-ontology (GO) domains: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). Using the Arabidopsis thaliana GO annotation as a benchmark, we evaluated each tool using two approaches: the area under the precision-recall curve (AUC-PR) and the metrics used at the critical assessment of functional annotation (CAFA). In the latter case, a CAFA-evaluator, was used to determine the F-score, weighted F-score, and S-score for each domain. Hayai-Annotation Plants showed better performances in all three GO domains. Our results thus reaffirm the effectiveness of Hayai-Annotation Plants for functional gene annotation in plant species. In this era of extensive whole genome sequencing, Hayai-Annotation Plants will serve as a valuable tool that facilitates simplified and accurate gene function annotation for numerous users, thereby making a significant contribution to plant research.


Introduction
The advancement of next-generation sequencing (NGS) technology has led to the construction of numerous whole genome sequences in plants as well as humans and other organisms. As a result, there is an increasing need for functional annotation for newly predicted gene sequences on the assembled genomes. Functional annotation of genes involves comparing query sequences with databases containing gene sequences and their functional information, such as OrthoDB (https://www.orthodb.org/), UniProt (https://www.uniprot.org/), and RefSeq (https://www.ncbi.nlm.nih.gov/refseq/). While referencing multiple databases enables more accurate functional annotation, the computational demands are high. This has made it challenging for many researchers who lack access to large-scale computer servers to perform high-precision functional annotation.
In order to achieve fast and accurate gene function annotation, we have developed "Hayai-Annotation Plants," an R package designed for the ultrafast and comprehensive functional annotation of plants (Ghelfi et al. 2018). Hayai-Annotation Plants employs the browser interface of the R package "Shiny", alongside the cost-free edition of USEARCH (32-bit) (Edgar 2010) for sequence alignment. Hayai-Annotation Plants leverages UniProtKB data (Embryophyta taxonomy) to assign protein names, gene ontology (GO) terms and codes, EC numbers, protein existence levels, and evidence types. For obtaining GO codes, the Gene Ontology Annotation (UniProt-GOA) database was chosen due to its focus on delivering superior GO annotation quality (Camon et al. 2004). Details of the design of Hayai-Annotation Plants are given in our previous paper (Ghelfi et al. 2018), where the package was introduced.
It is worth noting that the remarkable speed of Hayai Annotation is largely due to its implementation of USEARCH, a renowned software package known for its fast and accurate sequence alignment capabilities. In our previous report (Ghelfi et al. 2018), we also evaluated and compared the accuracy of GO assignments annotated by Hayai-Annotation Plants and two other software programs: Blast2GO (Conesa and Götz 2008) and TRAPID (Van Bel et al. 2013). These software tools use different GO graph structures in annotation; Hayai-Annotation Plants and Blast2GO assign child GO terms, whereas TRAPID often reports a parental term rather than the most specific term (Van Bel and Vandepoele 2020).
Van Bel and Vandepoele (2020) pointed out that the comparison performed in our original paper on Hayai-Annotation Plants did not consider the GO ontology (Ashburner et al. 2000), even though GO ontology would have been particularly relevant to that analysis since the software tools analyzed use different levels of GO annotation. In consideration of this important limitation, we therefore re-evaluated the GO annotations of Hayai Annotation Plants by calculating the precision, recall and F max for each GO domain independently (Clark and Radivojac 2013).
In the present study, we re-examined the GO assignments by Hayai-Annotation Plants in comparison to Blast2GO and TRAPID, with particular attention to the full spectrum of the GO ontology. Specifically, we focused on ontology terms with direct relationships within the three primary domains: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). By benchmarking against the gold standard GO annotation of Arabidopsis thaliana, we evaluated each software tool with two approaches: the area under the precision-recall curve (AUC-PR) and the standard metrics used in the critical assessment of functional annotation (CAFA). AUC-PR is a metric used to assess the performance of binary classification models, particularly in cases where the data is imbalanced (Sofaer et al. 2019). CAFA is a community-driven initiative that aims to assess and enhance the accuracy of computational methods used for predicting protein function (Zhou et al. 2019). We used CAFA-evaluator (https://github.com/BioComputingUP/CAFA-evaluator), which is the official tool for the CAFA challenges. Our results showed that Hayai-Annotation Plants exhibited better performance in each domain in this study.

Data source
The GO annotation of A. thaliana downloaded from the Gene Ontology Consortium website (July 2019 from http://current.geneontology.org/products/pages/downloads.html) was considered the gold standard annotation.

First method: Evaluation using AUC-PR
The evaluation AUC-PR was based on in-house scripts to assign the direct ontology of GO, which was obtained by extracting all relations defined as "is_a" in the go-basic obo file (downloaded in July 2019 from http://current.geneontology.org/ontology/subsets/goslim_plant.obo).
The sampling method employed the function "randomNumbers" from the R-package "random" (https://cran.r-project.org/web/packages/random/index.html) with 1,000 of the In order to compare the GO IDs predictions we calculated the F-score, weighted Fscore and S-score and the precision-recall and remaining uncertainty-misinformation curves using the CAFA-evaluator (https://github.com/BioComputingUP/CAFA-evaluator). The software was executed using the same parameters as applied for the CAFA5 challenge (Kaggle; https://www.kaggle.com/competitions/cafa-5-protein-function-prediction), except that the "-no_orphans" parameter was added, which excludes the orphan term, for example the GO roots.
The gold standards file used in the CAFA-evaluator was generated by using an inhouse python script to extract GO annotation from the go_arabidopsis.gaf file. The information accretion file was generated by using the Bioconductor SemDist

First method: Evaluation using AUC-PR
The  (Table 1). Each analysis was performed with 1,000 genes with three replications.
Precision and recall were based on the gold-standard GO annotation of A. thaliana (goa_arabidopsis.gaf). An analysis of variance (ANOVA) and Tukey's HSD test were applied to compare the AUC-PR means for each condition. Superscript letters indicate significant differences (level of significance α = 0.05).
Regarding the separation of test data versus training data, we divided 2,000 annotated genes of Arabidopsis thaliana (Araport11) into two groups of 1,000 genes each.
We tested four values of sequence alignment in Hayai-Annotation Plants: 60%, 70%, 80%, and 90%. The results showed that, except for the CC domain, there was no significant difference among the different parameters selected for comparison (Table 2).

Second method: Standard CAFA evaluation metrics
The same set of re-annotations of A. thaliana performed by Hayai-Annotation Plants and two benchmarks, BLAST2GO and TRAPID, was evaluated by using CAFAevaluator. The GO annotation of A. thaliana from the Gene Ontology Consortium website was considered as a gold standard annotation for the purpose of our comparison. Using the CAFA-evaluator, three predictors were calculated for each GO domain independently, the F-score, weighted F-score and S-score, along with precision-recall and remaining uncertainty-misinformation curves as described in the CAFA. Hayai-Annotation Plants showed better performance when compared with our benchmarks in all GO domains in all calculated scores, as shown in Figure 1, Figure 2 and Figure 3.

Discussion and Conclusion
The results using AUC-PR showed that Hayai-Annotation Plants is a reliable tool to assign GO annotations, because it did not differ significantly (α = 0.05) from the benchmark Blast2GO in all GO domains except for CC, in which Hayai-Annotation Plants showed better performances. The lower AUC-PR values observed with TRAPID were obtained because this tool annotates mainly the parental terms, and lower values would thus be expected compared to software tools that prioritize child GO terms, since we considered the complete GO set of information to calculate precision and recall.
In our previous publications about Hayai-Annotation Plants, we considered only child GO terms and disregarded parental terms to evaluate the performance of all tested software tools. In response to the observations by Van Bel and Vandepoele (2020), we realized that the comparison of tools in our previous publications was insufficient.
Therefore, we utilized the AUC-PR comparison data described in this paper as a response to their comments and submitted it to the journal where the report of Van Bel and Vandepoele (2020) was published. Unfortunately, we were unable to secure the opportunity to publish the results of this analysis in the same journal where the comments were featured.
As a result, despite our reservations about the situation, the paper commented upon by Van Bel and Vandepoele (2020) has been retracted.
Therefore, we considered it insufficient to rely solely on the evaluation using AUC-PR and conducted further assessment using the standard CAFA evaluation metric. The results again showed that Hayai-Annotation Plants is a reliable tool to assign GO annotations, because, in all GO domains, it showed better performance. In the review on gene functional annotation utilizing the Gene Ontology (GO) model, Zao et al. (2020) emphasize the significance of considering both the flat and hierarchical relationships among GO terms. In the context of Hayai-Annotation Plants, a more focused functional annotation is conducted by employing child terms from the GO to provide annotations. In contrast, TRAPID employs a higher level of GO terms for its annotations compared to Hayai-Annotation Plants. While comparing tools that utilize annotations from different levels of the GO hierarchy presents challenges, our re-evaluation, as mentioned earlier, has indeed validated the high reliability of functional annotation achieved by Hayai-Annotation Plants. Evaluating tools with different underlying systems in a definitive manner is exceedingly challenging, and the outcomes often vary based on the evaluation methods employed. What we aim to benchmark in this paper is not that the accuracy of TRAPID is inferior to that of Hayai-Annotation Plants, but rather that Hayai-Annotation Plants is a tool with high reliability and accuracy.
In essence, previous manuscripts concerning Hayai-Annotation Plants only carried out comparisons from a certain perspective, which might have raised issues if only the numerical data were extracted for comparison. However, this should not undermine the accuracy and reliability of Hayai-Annotation Plants as a functional annotation tool. The functionality of Hayai-Annotation Plants is publicly available through the following site (https://github.com/aghelfi/Hayai-Annotation-Plants), and analysis on the cloud is also feasible at http://pgdbjsnp.kazusa.or.jp/app/hayai2. Moreover, the Plant GARDEN platform has integrated the results of gene sequence re-annotation using a customized version of Hayai-Annotation Plants, further highlighting its capabilities (Ichihara et al. 2023). In the present era of extensive whole genome sequencing, Hayai-Annotation Plants will serve as a valuable tool that facilitates simplified and accurate gene function annotation for numerous users and will thereby make a significant contribution to plant research.