Transcriptomics in the RNA-seq era

doi:10.1016/j.cbpa.2012.12.008

Current Opinion in Chemical Biology

Volume 17, Issue 1, February 2013, Pages 4-11

https://doi.org/10.1016/j.cbpa.2012.12.008 Get rights and content

The transcriptomics field has developed rapidly with the advent of next-generation sequencing technologies. RNA-seq has now displaced microarrays as the preferred method for gene expression profiling.

The comprehensive nature of the data generated has been a boon in terms of transcript identification but analysis challenges remain. Key among these problems is the development of suitable expression metrics for expression level comparisons and methods for identification of differentially expressed genes (and exons). Several approaches have been developed but as yet no consensus exists on the best pipeline to use.

De novo transcriptome approaches are increasingly viable for organisms lacking a sequenced genome. The reduction in starting RNA required has enabled the development of new applications such as single cell transcriptomics.

The emerging picture of mammalian transcription is complex with further refinement expected with the integration of epigenomic data generated by projects such as ENCODE.

Highlights

► Transcriptomics is a ‘true’ ‘omics technology. ► Bioinformatics methods are developing rapidly but lack consensus approach. ► De novo transcriptome assembly and single cell RNA-seq are now viable protocols. ► Several recent ‘discoveries’ have turned out to be artifacts — ‘MacArthur's Law’. ► ENCODE project results will enhance our understanding of transcriptional control.

Introduction

The ‘transcriptome’ is defined as ‘the complete complement of mRNA molecules generated by a cell or population of cells’. The term was first proposed by Charles Auffray in 1996 [1] and first used in a scientific paper in 1997 [2]. Unlike many of the technologies that have acquired the ‘-ome’ appendage the ‘Transcriptome’ has a long pedigree and certainly meets the requirements of a true ‘omics technology [3].

The last couple of years have seen intense development of transcriptomic applications and the supplanting of microarrays by RNA-seq as the technology of choice for gene expression analysis. However the amount of data generated by these technologies has generated problems both of data management and storage as well as posing novel analytical problems.

Although the transcriptome can encompass many species of RNA (miRNA, snoRNA, etc.) this review will focus mainly on mRNAs, specifically mammalian mRNAs. Readers can find good reviews of the advances that have been made in nonmammalian and noneukaryotic transcriptomics in other locations [4, 5].

In contemporary multidisciplinary projects global transcription profiling is frequently the first ‘omics technology to be applied. It generates information about which genes are expressed, at what level and can also provide information about different transcript isoforms used. A preliminary analysis via microarray or RNA-seq can indicate the appropriateness or usefulness of other ‘omics technologies such as proteomics, glycomics or metabolomics. It can be a relatively cheap way of determining the likely interesting subsets of samples that are likely to generate results in other ‘omics technologies. It can also be used to indicate modifications of capture protocols which should be for technologies such as proteomics; where the biochemical idiosyncrasies of particular proteins or protein families can make it difficult to isolate proteins or metabolites which the RNA-seq data have indicated to be of potential interest.

One example of this type of multidisciplinary approach can be found in our own work. For the past five years our reproductive biology cluster has been profiling different tissues of the female bovine reproductive tract under different conditions of pregnancy status, stage of estrus cycle or embryo development. In each case the initial RNA-seq experiment is then complemented by additional profiling with proteomics, metabolomics, or glycomics. Each ‘omics technology helps to piece together a complex biological picture for example; how the endometrium tissue can support embryo growth and implantation (proteomics analysis of histotroph [6] following RNA-seq of endometrium [7] and embryo [8]), how enzymes expressed in follicular tissue can support the development of oocytes before ovulation (RNA-seq of theca and granulosa cells [9] followed by metabolomic profiling of follicular fluid [10]) or to determine exactly how the modulation of glycosylation enzymes impact on cervical mucus structure and generate a permissive or hostile environment for sperm or bacterial transit (glycomic profiling of cervical mucus following RNA-seq of cervical tissue [11]).

Section snippets

Brief history of transcriptomics

The first efforts at profiling mammalian transcriptomes started in 1991 with the publication of a human EST database compiled by a group from the NIH led by J. Craig Venter [12]. This database consisted of just 609 cDNA clones with an average length of 397 ± 99 bases. It represented one of the earliest applications of the then newly developed automated Sanger sequencing technology. This technology enabled methods such as SAGE (Serial Analysis of Gene Expression) which were one of the first

Bioinformatics challenges

The first major bioinformatics problem posed by the emergence of RNA-seq was the alignment of the reads to a reference genome. Given that the number of reads in a RNA-seq sample can be of the order of millions (even tens of millions) alignment speed has been the primary performance metric by which these tools have been judged. This has led to the displacement of the original cohort of aligners by tools based on the Burrows Wheeler Transform such as Bowtie [24] and SOAP [25].

The early years of

Conclusions

Five years into the next-generation sequencing revolution RNA-seq has been widely adopted and has effectively displaced microarrays for gene expression analysis. Unfortunately RNA-seq has not been the panacea to the problems of gene expression analysis that some may have hoped: artifacts and biases exist that still need to be identified and controlled for.

The last two years has seen an explosion of RNA-seq analysis approaches. The next few years will hopefully see consensus emerge on the best

Conflict of interest

None declared.

Acknowledgements

PM is funded through a grant from Science Foundation Ireland (07/SRC/B1156). The author would like to thank Professor Alex Evans for very constructive criticism during the drafting of this review.

Glossary

cDNA: Complementary DNA is synthesized from mRNA using reverse transcriptase. This is the starting material typically used in nextgen sequencing or gene expression microarray protocols for measuring RNA levels.
De novo assembly: Constructing a transcriptome in the absence of an assembled genome sequence for the organism.
DGE: Digital Gene Expression. An alternative protocol for measuring gene expression. It is a version of the SAGE protocol adapted for use with next-generation sequencers.
ENCODE

References (84)

V.E. Velculescu et al.
Characterization of the yeast transcriptome
Cell
(1997)
N.J. Croucher et al.
Studying bacterial transcriptomes using RNA-seq
Curr Opin Microbiol
(2010)
U. Mader et al.
Comprehensive identification and quantification of microbial transcriptomes by genome-wide unbiased methods
Curr Opin Biotechnol
(2011)
J. Loven et al.
Revisiting global gene expression analysis
Cell
(2012)
F. Tang et al.
Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis
Cell Stem Cell
(2010)
G. Pietu et al.
The Genexpress IMAGE knowledge base of the human brain transcriptome: a prototype integrated resource for functional and computational genomics
Genome Res
(1999)
J. Eisen
Badomics words and the power and peril of the ome-meme
Gigascience
(2012)
M.P. Mullen et al.
Proteomic characterization of histotroph during the preimplantation phase of the estrous cycle in cattle
J Proteome Res
(2012)
N. Forde et al.
Evidence for an early endometrial response to pregnancy in cattle: both dependent upon and independent of interferon tau
Physiol Genomics
(2012)
S. Mamo et al.
RNA sequencing reveals novel gene clusters in bovine conceptuses associated with maternal recognition of pregnancy and implantation
Biol Reprod
(2011)

S.W. Walsh et al.

Effect of the metabolic environment at key stages of follicle development in cattle: focus on steroid biosynthesis

Physiol Genomics

(2012)

K. Bender et al.

Metabolite concentrations in follicular fluid may explain differences in fertility between heifers and lactating cows

Reproduction

(2010)

K. Pluta et al.

Molecular aspects of mucin biosynthesis and mucus formation in the bovine cervix during the periestrous period

Physiol Genomics

(2012)

M.D. Adams et al.

Complementary DNA sequencing: expressed sequence tags and human genome project

Science

(1991)

V.E. Velculescu et al.

Serial analysis of gene expression

Science

(1995)

M. Schena et al.

Quantitative monitoring of gene expression patterns with a complementary DNA microarray

Science

(1995)

M.N. Bainbridge et al.

Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach

BMC Genomics

(2006)

A. Mortazavi et al.

Mapping and quantifying mammalian transcriptomes by RNA-Seq

Nat Methods

(2008)

M. Sultan et al.

A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome

Science

(2008)

B.T. Wilhelm et al.

Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution

Nature

(2008)

D. Parkhomchuk et al.

Transcriptome analysis by strand-specific sequencing of complementary DNA

Nucleic Acids Res

(2009)

Y. Katz et al.

Analysis and design of RNA sequencing experiments for identifying isoform regulation

Nat Methods

(2010)

F. Ozsolak et al.

Direct RNA sequencing

Nature

(2009)

L. Mamanova et al.

FRT-seq: amplification-free, strand-specific transcriptome sequencing

Nat Methods

(2010)

J.W. Li et al.

SEQanswers: an open access community for collaboratively decoding genomes

Bioinformatics

(2012)

B. Langmead et al.

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

Genome Biol

(2009)

R. Li et al.

SOAP2: an improved ultrafast tool for short read alignment

Bioinformatics

(2009)

D.B. Allison et al.

Microarray data analysis: from disarray to consolidation and consensus

Nat Rev Genet

(2006)

C. Trapnell et al.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Nat Biotechnol

(2010)

S. Pepke et al.

Computation for ChIP-seq and RNA-seq studies

Nat Methods

(2009)

D. Risso et al.

GC-content normalization for RNA-Seq data

BMC Bioinformatics

(2011)

K.D. Hansen et al.

Removing technical variability in RNA-seq data using conditional quantile normalization

Biostatistics

(2012)

W. Zheng et al.

Bias detection and correction in RNA-Sequencing data

BMC Bioinformatics

(2011)

J.H. Bullard et al.

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

BMC Bioinformatics

(2010)

M.D. Robinson et al.

A scaling normalization method for differential expression analysis of RNA-seq data

Genome Biol

(2010)

B. Li et al.

RNA-Seq gene expression estimation with read mapping uncertainty

Bioinformatics

(2010)

P.L. Auer et al.

Statistical design and analysis of RNA sequencing data

Genetics

(2010)

M.D. Robinson et al.

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Bioinformatics

(2010)

S. Anders et al.

Differential expression analysis for sequence count data

Genome Biol

(2010)

T.J. Hardcastle et al.

baySeq: empirical Bayesian methods for identifying differential expression in sequence count data

BMC Bioinformatics

(2010)

Y. Di et al.

The NBP negative binomial model for assessing differential gene expression from RNA-seq

Stat Appl Genet Mol Biol

(2011)

S. Tarazona et al.

Differential expression in RNA-seq: a matter of depth

Genome Res

(2011)

Cited by (0)

View full text

Transcriptomics in the RNA-seq era

Highlights

Introduction

Section snippets

Brief history of transcriptomics

Bioinformatics challenges

Conclusions

Conflict of interest

Acknowledgements

Glossary

Cell

Curr Opin Microbiol

Curr Opin Biotechnol

Cell

Cell Stem Cell

The Genexpress IMAGE knowledge base of the human brain transcriptome: a prototype integrated resource for functional and computational genomics

Genome Res

Badomics words and the power and peril of the ome-meme

Gigascience

Proteomic characterization of histotroph during the preimplantation phase of the estrous cycle in cattle

J Proteome Res

Evidence for an early endometrial response to pregnancy in cattle: both dependent upon and independent of interferon tau

Physiol Genomics

RNA sequencing reveals novel gene clusters in bovine conceptuses associated with maternal recognition of pregnancy and implantation

Biol Reprod

Effect of the metabolic environment at key stages of follicle development in cattle: focus on steroid biosynthesis

Physiol Genomics

Metabolite concentrations in follicular fluid may explain differences in fertility between heifers and lactating cows

Reproduction

Molecular aspects of mucin biosynthesis and mucus formation in the bovine cervix during the periestrous period

Physiol Genomics

Complementary DNA sequencing: expressed sequence tags and human genome project

Science

Serial analysis of gene expression

Science

Quantitative monitoring of gene expression patterns with a complementary DNA microarray

Science

Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach

BMC Genomics

Mapping and quantifying mammalian transcriptomes by RNA-Seq

Nat Methods

A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome

Science

Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution

Nature

Transcriptome analysis by strand-specific sequencing of complementary DNA

Nucleic Acids Res

Analysis and design of RNA sequencing experiments for identifying isoform regulation

Nat Methods

Direct RNA sequencing

Nature

FRT-seq: amplification-free, strand-specific transcriptome sequencing

Nat Methods

SEQanswers: an open access community for collaboratively decoding genomes

Bioinformatics

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

Genome Biol

SOAP2: an improved ultrafast tool for short read alignment

Bioinformatics

Microarray data analysis: from disarray to consolidation and consensus

Nat Rev Genet

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Nat Biotechnol

Computation for ChIP-seq and RNA-seq studies

Nat Methods

GC-content normalization for RNA-Seq data

BMC Bioinformatics

Removing technical variability in RNA-seq data using conditional quantile normalization

Biostatistics

Bias detection and correction in RNA-Sequencing data

BMC Bioinformatics

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

BMC Bioinformatics

A scaling normalization method for differential expression analysis of RNA-seq data

Genome Biol

RNA-Seq gene expression estimation with read mapping uncertainty

Bioinformatics

Statistical design and analysis of RNA sequencing data

Genetics

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data