PT - JOURNAL ARTICLE AU - Stephen J. Bush AU - Charity Muriuki AU - Mary E. B. McCulloch AU - Iseabail L. Farquhar AU - Emily L. Clark AU - David A. Hume TI - Assembly and validation of conserved long non-coding RNAs in the ruminant transcriptome AID - 10.1101/253997 DP - 2018 Jan 01 TA - bioRxiv PG - 253997 4099 - http://biorxiv.org/content/early/2018/01/25/253997.short 4100 - http://biorxiv.org/content/early/2018/01/25/253997.full AB - mRNA-like long non-coding RNAs (lncRNA) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. In many cases, therefore, lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNA, we compared de novo assembled lncRNA derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNA assembled in cattle and human. Few lncRNA could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNA assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNA to identify a consensus set of ruminant lncRNA and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. The majority of lncRNA were encoded by single exons, and expressed at < 1 TPM. In sheep, 20-30% of lncRNA had expression profiles significantly correlated with neighbouring protein-coding genes, suggesting association with enhancers. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNA in other species.