Abstract
Genome annotation is an important but challenging task. Accurate identification of short interspersed nuclear elements (SINEs) is particularly difficult due to their lack of highly conserved sequences. AnnoSINE is state-of-the-art software for annotating SINEs in plant genomes, but its homology-based module is not available for animals and it is computationally inefficient for large genomes. Therefore, we propose AnnoSINE_v2, which extends accurate SINE annotation for animal genomes with greatly optimized computational efficiency. Our results show that AnnoSINE_v2’s annotation of SINEs has over 20% higher F1-score compared to the existing tools on animal genomes and enables the processing of complicated genomes, like human and zebrafish, which were beyond the capabilities of AnnoSINE_v1. AnnoSINE_v2 is freely available on Conda and GitHub: https://github.com/liaoherui/AnnoSINE_v2.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
1.We have included SINE-Scan and RepeatMasker in the benchmark experiment to evaluate the performance of AnnoSINE_v2 in annotating animal genomes. 2.We added additional experiments to investigate the influence of parameters on the performance of both AnnoSINE_v2 and SINE-Finder. 3.In Fig. 3, we have included the actual execution time of AnnoSINE_v1. Additionally, we have conducted experiments to compare the execution time of AnnoSINE_v1 and AnnoSINE_v2 using one core, as well as the execution time of AnnoSINE_v2 with different cores. 4.We have evaluated the performance of AnnoSINE_v2 under different modes. The results demonstrate that the tool's performance does not solely rely on the SINE animal pHMM library, but also benefits from the structure-based method employed in the tool. 5.We revised some sentences in the main article to avoid confusion.
Abbreviations
- SINEs
- Short interspersed nuclear elements
- pHMMs
- profile hidden Markov models
- TPR
- True positive rate
- FPR
- False positive rate
- TSD
- Target site duplications