Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels

Bioinformatics. 2012 Apr 15;28(8):1086-92. doi: 10.1093/bioinformatics/bts094. Epub 2012 Feb 24.

Abstract

Motivation: High-throughput sequencing has made the analysis of new model organisms more affordable. Although assembling a new genome can still be costly and difficult, it is possible to use RNA-seq to sequence mRNA. In the absence of a known genome, it is necessary to assemble these sequences de novo, taking into account possible alternative isoforms and the dynamic range of expression values.

Results: We present a software package named Oases designed to heuristically assemble RNA-seq reads in the absence of a reference genome, across a broad spectrum of expression values and in presence of alternative isoforms. It achieves this by using an array of hash lengths, a dynamic filtering of noise, a robust resolution of alternative splicing events and the efficient merging of multiple assemblies. It was tested on human and mouse RNA-seq data and is shown to improve significantly on the transABySS and Trinity de novo transcriptome assemblers.

Availability and implementation: Oases is freely available under the GPL license at www.ebi.ac.uk/~zerbino/oases/.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alternative Splicing
  • Animals
  • Gene Expression Profiling*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Mice
  • RNA, Messenger / genetics
  • Sequence Analysis, RNA / methods*
  • Software*

Substances

  • RNA, Messenger