Polyester: simulating RNA-seq datasets with differential transcript expression

Bioinformatics. 2015 Sep 1;31(17):2778-84. doi: 10.1093/bioinformatics/btv272. Epub 2015 Apr 28.

Abstract

Motivation: Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data.

Results: Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user.

Availability and implementation: Polyester is freely available from Bioconductor (http://bioconductor.org/).

Contact: jtleek@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binomial Distribution
  • Chromosomes, Human, Pair 22 / genetics*
  • Computational Biology / methods*
  • Europe
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation
  • Genetics, Population
  • Haplotypes / genetics
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Protein Isoforms
  • RNA / genetics
  • Sequence Analysis, RNA / methods*
  • Software*

Substances

  • Protein Isoforms
  • RNA