A two-parameter generalized Poisson model to improve the analysis of RNA-seq data

Nucleic Acids Res. 2010 Sep;38(17):e170. doi: 10.1093/nar/gkq670. Epub 2010 Jul 29.

Abstract

Deep sequencing of RNAs (RNA-seq) has been a useful tool to characterize and quantify transcriptomes. However, there are significant challenges in the analysis of RNA-seq data, such as how to separate signals from sequencing bias and how to perform reasonable normalization. Here, we focus on a fundamental question in RNA-seq analysis: the distribution of the position-level read counts. Specifically, we propose a two-parameter generalized Poisson (GP) model to the position-level read counts. We show that the GP model fits the data much better than the traditional Poisson model. Based on the GP model, we can better estimate gene or exon expression, perform a more reasonable normalization across different samples, and improve the identification of differentially expressed genes and the identification of differentially spliced exons. The usefulness of the GP model is demonstrated by applications to multiple RNA-seq data sets.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Exons
  • Gene Expression Profiling / methods*
  • Humans
  • Mice
  • Models, Statistical*
  • Poisson Distribution
  • RNA Splicing
  • Sequence Analysis, RNA / methods*