TY - JOUR T1 - LTR_retriever: a highly accurate and sensitive program for identification of LTR retrotransposons JF - bioRxiv DO - 10.1101/137141 SP - 137141 AU - Shujun Ou AU - Ning Jiang Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/05/12/137141.abstract N2 - Long terminal repeat retrotransposons (LTR-RTs) are prevalent in most plant genomes. Identification of LTR-RTs is critical for achieving high-quality gene annotation. The sequences of LTR-RTs are diverse among species, yet the structure of the element is well conserved. Based on the conserved structure, multiple programs were developed for de novo identification of LTR-RTs. Most of these programs are associated with low specificity, and excessive curation is required since false positives are very detrimental for downstream analyses. Here we report LTR_retriever, a multithreading empowered Perl program that identifies LTR retrotransposons and generates high-quality LTR libraries from genomes with various assembly qualities. LTR_retriever demonstrated significant improvements by achieving high levels of sensitivity, specificity, accuracy, and precision, which are 91.7%, 96.9%, 95.7%, and 90.0%, respectively, in rice (Oryza sativa). Besides LTR-RTs with canonical ends (TG.CA), LTR_retriever also identifies non-canonical LTRs accurately. A scan of 50 public plant genomes identified seven non-canonical types of LTRs. LTR_retriever is also compatible with long-read sequencing technologies. With 40k self-corrected PacBio reads equivalent to 4.5X of genome coverage in Arabidopsis, the quality of constructed LTR library surpasses that constructed from the genome alone. LTR_retriever has demonstrated the highest performance with great flexibility for automatically retrieving LTR-RTs. ER -