Abstract
Background Long non-coding RNAs (lncRNAs) are under-studied and under-annotated in plants. In mammals, lncRNA loci are nearly as ubiquitous as protein-coding genes, and their expression has been shown to be highly variable between individuals of the same species. Using A. thaliana as a model, we aimed to understand the true scope of lncRNA transcription across plants from different regions and study its natural expression variability.
Results Using RNA-seq data spanning hundreds of natural lines and several developmental stages to create a more comprehensive annotation of lncRNAs, we found over 10,000 new loci — three times as many as in the current public annotation. While lncRNA loci are ubiquitous in the genome, most appear to be actively silenced and their expression and repressive chromatin levels are extremely variable between natural lines. This was particularly common for intergenic lncRNAs, where pieces of transposable elements (TEs) present in 50% of the loci are associated with increased silencing and variation, and such lincRNAs tend to be targeted by TE silencing machinery.
Conclusion lncRNAs are ubiquitous in the A. thaliana genome but largely silenced, and their expression is highly variable between different lines. This high expression variability is largely caused by high epigenetic variability of non-coding loci, especially those containing pieces of transposable elements. We create the most comprehensive A. thaliana lncRNA annotation to date and improve our understanding of plant lncRNA biology.
Competing Interest Statement
The authors have declared no competing interest.