Abstract
The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated ORFs in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 48% of the transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph, a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally-testable hypotheses about candidate genes.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Jing Li: jingli{at}iastate.edu
Urminder Singh: usingh{at}iastate.edu
Zebulun Arendsee: zbwrnz{at}gmail.com
Eve S Wurtele: evewurtele{at}gmail.com
Add ribo-seq analysis. Revise acknowledgements.