Abstract
Motivation Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution although the physiological basis of this assumption remain unclear.
Results In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21–27 replicates), and the characteristics of gene-dependent distribution profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, the distribution profiles could be described by a Gauss-power mixing distribution derived from a simple model of a stochastic transcriptional network containing a feedback loop. The distribution profiles of gene expression levels were roughly classified as Gaussian, power law-like containing a long tail, and mixed. The fitting function predicted that gene expression levels with long-tailed distributions would be strongly influenced by feedback regulation. Thus, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian distribution and those of genes encoding nucleic acid-binding proteins and transcription factors exhibiting long-tailed distributions.
Availability Fastq files of RNA-seq experiments were deposited into the DNA Data Bank of Japan Sequence Read Archive as accession no. DRA005887. Quantified expression data are available in supplementary information.
Contact awa{at}hiroshima-u.ac.jp