Abstract
Motivation More than 8% of the human genome is derived from endogenous retroviruses (ERVs). In recent years, an increasing number of human diseases have been found to be associated with ERVs. However, it is still challenging to accurately detect the full spectrum of polymorphic (unfixed) ERVs using next-generation sequencing (NGS) data.
Results We designed a new tool, ERVcaller, to detect and genotype unfixed transposable element (TE) insertions, including ERVs, in the human genome. We evaluated the tool using both simulated and real benchmark whole-genome sequencing datasets. ERVcaller achieved > 97% sensitivity and > 99% precision for detecting, and > 96% accuracy for genotyping the simulated HERV-K insertions (sequencing depth > 5X). We compared ERVcaller with four existing tools, and ERVcaller consistently showed the highest sensitivity and precision for detecting unfixed ERV insertions, especially under low sequencing depths. ERVcaller also achieved the most precise determination of ERV breakpoints at single-nucleotide resolution. By applying ERVcaller to a subset of the 1000 Genomes Project samples, we detected 100% of the known unfixed ERV insertions and 95% of other unfixed TE insertions. We also detected almost all the known genotypes (100% for ERVs and 98% for other TEs). In conclusion, ERVcaller is capable of identifying and genotyping TE insertions using NGS data with high sensitivity and precision. This tool can be applied broadly to other species.
Availability www.uvm.edu/genomics/software/ERVcaller.html
Contact dawei.li{at}uvm.edu
Supplementary information Supplementary data are available at Bioinformatics online.