RNA-seq differential expression studies: more sequence or more replication?

Bioinformatics. 2014 Feb 1;30(3):301-4. doi: 10.1093/bioinformatics/btt688. Epub 2013 Dec 6.

Abstract

Motivation: RNA-seq is replacing microarrays as the primary tool for gene expression studies. Many RNA-seq studies have used insufficient biological replicates, resulting in low statistical power and inefficient use of sequencing resources.

Results: We show the explicit trade-off between more biological replicates and deeper sequencing in increasing power to detect differentially expressed (DE) genes. In the human cell line MCF7, adding more sequencing depth after 10 M reads gives diminishing returns on power to detect DE genes, whereas adding biological replicates improves power significantly regardless of sequencing depth. We also propose a cost-effectiveness metric for guiding the design of large-scale RNA-seq DE studies. Our analysis showed that sequencing less reads and performing more biological replication is an effective strategy to increase power and accuracy in large-scale differential expression RNA-seq studies, and provided new insights into efficient experiment design of RNA-seq studies.

Availability and implementation: The code used in this paper is provided on: http://home.uchicago.edu/∼jiezhou/replication/. The expression data is deposited in the Gene Expression Omnibus under the accession ID GSE51403.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • MCF-7 Cells
  • Reproducibility of Results
  • Sample Size
  • Sequence Analysis, RNA / methods*