TIR-Learner, a New Ensemble Method for TIR Transposable Element Annotation, Provides Evidence for Abundant New Transposable Elements in the Maize Genome

Mol Plant. 2019 Mar 4;12(3):447-460. doi: 10.1016/j.molp.2019.02.008. Epub 2019 Feb 23.

Abstract

Transposable elements (TEs) make up a large and rapidly evolving proportion of plant genomes. Among Class II DNA TEs, TIR elements are flanked by characteristic terminal inverted repeat sequences (TIRs). TIR TEs may play important roles in genome evolution, including generating allelic diversity, inducing structural variation, and regulating gene expression. However, TIR TE identification and annotation has been hampered by the lack of effective tools, resulting in erroneous TE annotations and a significant underestimation of the proportion of TIR elements in the maize genome. This problem has largely limited our understanding of the impact of TIR elements on plant genome structure and evolution. In this paper, we propose a new method of TIR element detection and annotation. This new pipeline combines the advantages of current homology-based annotation methods with powerful de novo machine-learning approaches, resulting in greatly increased efficiency and accuracy of TIR element annotation. The results show that the copy number and genome proportion of TIR elements in maize is much larger than that of current annotations. In addition, the distribution of some TIR superfamily elements is reduced in centromeric and pericentromeric positions, while others do not show a similar bias. Finally, the incorporation of machine-learning techniques has enabled the identification of large numbers of new DTA (hAT) family elements, which have all the hallmarks of bona fide TEs yet which lack high homology with currently known DTA elements. Together, these results provide new tools for TE research and new insight into the impact of TIR elements on maize genome diversity.

Keywords: TIR transposable element; annotation pipeline; machine learning; maize genomes.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • DNA Transposable Elements*
  • Gene Dosage
  • Genome, Plant*
  • Inverted Repeat Sequences
  • Machine Learning
  • Molecular Sequence Annotation / methods*
  • Zea mays / genetics*

Substances

  • DNA Transposable Elements