Transcriptomics in the RNA-seq era
Highlights
► Transcriptomics is a ‘true’ ‘omics technology. ► Bioinformatics methods are developing rapidly but lack consensus approach. ► De novo transcriptome assembly and single cell RNA-seq are now viable protocols. ► Several recent ‘discoveries’ have turned out to be artifacts — ‘MacArthur's Law’. ► ENCODE project results will enhance our understanding of transcriptional control.
Introduction
The ‘transcriptome’ is defined as ‘the complete complement of mRNA molecules generated by a cell or population of cells’. The term was first proposed by Charles Auffray in 1996 [1] and first used in a scientific paper in 1997 [2]. Unlike many of the technologies that have acquired the ‘-ome’ appendage the ‘Transcriptome’ has a long pedigree and certainly meets the requirements of a true ‘omics technology [3].
The last couple of years have seen intense development of transcriptomic applications and the supplanting of microarrays by RNA-seq as the technology of choice for gene expression analysis. However the amount of data generated by these technologies has generated problems both of data management and storage as well as posing novel analytical problems.
Although the transcriptome can encompass many species of RNA (miRNA, snoRNA, etc.) this review will focus mainly on mRNAs, specifically mammalian mRNAs. Readers can find good reviews of the advances that have been made in nonmammalian and noneukaryotic transcriptomics in other locations [4, 5].
In contemporary multidisciplinary projects global transcription profiling is frequently the first ‘omics technology to be applied. It generates information about which genes are expressed, at what level and can also provide information about different transcript isoforms used. A preliminary analysis via microarray or RNA-seq can indicate the appropriateness or usefulness of other ‘omics technologies such as proteomics, glycomics or metabolomics. It can be a relatively cheap way of determining the likely interesting subsets of samples that are likely to generate results in other ‘omics technologies. It can also be used to indicate modifications of capture protocols which should be for technologies such as proteomics; where the biochemical idiosyncrasies of particular proteins or protein families can make it difficult to isolate proteins or metabolites which the RNA-seq data have indicated to be of potential interest.
One example of this type of multidisciplinary approach can be found in our own work. For the past five years our reproductive biology cluster has been profiling different tissues of the female bovine reproductive tract under different conditions of pregnancy status, stage of estrus cycle or embryo development. In each case the initial RNA-seq experiment is then complemented by additional profiling with proteomics, metabolomics, or glycomics. Each ‘omics technology helps to piece together a complex biological picture for example; how the endometrium tissue can support embryo growth and implantation (proteomics analysis of histotroph [6] following RNA-seq of endometrium [7] and embryo [8]), how enzymes expressed in follicular tissue can support the development of oocytes before ovulation (RNA-seq of theca and granulosa cells [9] followed by metabolomic profiling of follicular fluid [10]) or to determine exactly how the modulation of glycosylation enzymes impact on cervical mucus structure and generate a permissive or hostile environment for sperm or bacterial transit (glycomic profiling of cervical mucus following RNA-seq of cervical tissue [11]).
Section snippets
Brief history of transcriptomics
The first efforts at profiling mammalian transcriptomes started in 1991 with the publication of a human EST database compiled by a group from the NIH led by J. Craig Venter [12]. This database consisted of just 609 cDNA clones with an average length of 397 ± 99 bases. It represented one of the earliest applications of the then newly developed automated Sanger sequencing technology. This technology enabled methods such as SAGE (Serial Analysis of Gene Expression) which were one of the first
Bioinformatics challenges
The first major bioinformatics problem posed by the emergence of RNA-seq was the alignment of the reads to a reference genome. Given that the number of reads in a RNA-seq sample can be of the order of millions (even tens of millions) alignment speed has been the primary performance metric by which these tools have been judged. This has led to the displacement of the original cohort of aligners by tools based on the Burrows Wheeler Transform such as Bowtie [24] and SOAP [25].
The early years of
Conclusions
Five years into the next-generation sequencing revolution RNA-seq has been widely adopted and has effectively displaced microarrays for gene expression analysis. Unfortunately RNA-seq has not been the panacea to the problems of gene expression analysis that some may have hoped: artifacts and biases exist that still need to be identified and controlled for.
The last two years has seen an explosion of RNA-seq analysis approaches. The next few years will hopefully see consensus emerge on the best
Conflict of interest
None declared.
Acknowledgements
PM is funded through a grant from Science Foundation Ireland (07/SRC/B1156). The author would like to thank Professor Alex Evans for very constructive criticism during the drafting of this review.
Glossary
- cDNA
- Complementary DNA is synthesized from mRNA using reverse transcriptase. This is the starting material typically used in nextgen sequencing or gene expression microarray protocols for measuring RNA levels.
- De novo assembly
- Constructing a transcriptome in the absence of an assembled genome sequence for the organism.
- DGE
- Digital Gene Expression. An alternative protocol for measuring gene expression. It is a version of the SAGE protocol adapted for use with next-generation sequencers.
- ENCODE
References (84)
- et al.
Characterization of the yeast transcriptome
Cell
(1997) - et al.
Studying bacterial transcriptomes using RNA-seq
Curr Opin Microbiol
(2010) - et al.
Comprehensive identification and quantification of microbial transcriptomes by genome-wide unbiased methods
Curr Opin Biotechnol
(2011) - et al.
Revisiting global gene expression analysis
Cell
(2012) - et al.
Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis
Cell Stem Cell
(2010) - et al.
The Genexpress IMAGE knowledge base of the human brain transcriptome: a prototype integrated resource for functional and computational genomics
Genome Res
(1999) Badomics words and the power and peril of the ome-meme
Gigascience
(2012)- et al.
Proteomic characterization of histotroph during the preimplantation phase of the estrous cycle in cattle
J Proteome Res
(2012) - et al.
Evidence for an early endometrial response to pregnancy in cattle: both dependent upon and independent of interferon tau
Physiol Genomics
(2012) - et al.
RNA sequencing reveals novel gene clusters in bovine conceptuses associated with maternal recognition of pregnancy and implantation
Biol Reprod
(2011)