NLR-parser: rapid annotation of plant NLR complements

Bioinformatics. 2015 May 15;31(10):1665-7. doi: 10.1093/bioinformatics/btv005. Epub 2015 Jan 12.

Abstract

Motivation: The repetitive nature of plant disease resistance genes encoding for nucleotide-binding leucine-rich repeat (NLR) proteins hampers their prediction with standard gene annotation software. Motif alignment and search tool (MAST) has previously been reported as a tool to support annotation of NLR-encoding genes. However, the decision if a motif combination represents an NLR protein was entirely manual.

Results: The NLR-parser pipeline is designed to use the MAST output from six-frame translated amino acid sequences and filters for predefined biologically curated motif compositions. Input reads can be derived from, for example, raw long-read sequencing data or contigs and scaffolds coming from plant genome projects. The output is a tab-separated file with information on start and frame of the first NLR specific motif, whether the identified sequence is a TNL or CNL, potentially full or fragmented. In addition, the output of the NB-ARC domain sequence can directly be used for phylogenetic analyses. In comparison to other prediction software, the highly complex NB-ARC domain is described in detail using several individual motifs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arabidopsis / genetics*
  • Arabidopsis Proteins / genetics*
  • Gene Expression Regulation, Plant
  • Genome, Plant
  • Immunity, Innate / genetics*
  • Leucine-Rich Repeat Proteins
  • Molecular Sequence Annotation*
  • Plant Diseases / genetics*
  • Plant Diseases / immunology
  • Proteins / genetics*
  • Repetitive Sequences, Amino Acid / genetics
  • Sequence Analysis, DNA / methods*
  • Software

Substances

  • Arabidopsis Proteins
  • Leucine-Rich Repeat Proteins
  • Proteins