Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Power analysis of single-cell RNA-sequencing experiments

Abstract

Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Strategy for scRNA-seq protocol comparison.
Figure 2: Performance metrics for scRNA-seq protocols.
Figure 3: Performance metrics after accounting for sequencing depth.
Figure 4: Effects of various factors on performance metrics.

Similar content being viewed by others

Accession codes

Primary accessions

ArrayExpress

Referenced accessions

ArrayExpress

European Nucleotide Archive

Gene Expression Omnibus

Sequence Read Archive

References

  1. Macaulay, I.C. & Voet, T. Single cell genomics: advances and future perspectives. PLoS Genet. 10, e1004126 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Stegle, O., Teichmann, S.A. & Marioni, J.C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).

    Article  CAS  PubMed  Google Scholar 

  3. Wu, A.R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 41–46 (2014).

    Article  CAS  PubMed  Google Scholar 

  4. Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Preprint at http://biorxiv.org/content/early/2016/06/29/035758/ (2016).

  5. External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics 6, 150 (2005).

  6. Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Munro, S.A. et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat. Commun. 5, 5125 (2014).

    Article  CAS  PubMed  Google Scholar 

  8. Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep. 2, 666–673 (2012).

    Article  CAS  PubMed  Google Scholar 

  9. Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).

    Article  CAS  PubMed  Google Scholar 

  10. Viphakone, N., Voisinet-Hakil, F. & Minvielle-Sebastia, L. Molecular dissection of mRNA poly(A) tail length control in yeast. Nucleic Acids Res. 36, 2418–2433 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Grün, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).

    Article  PubMed  Google Scholar 

  12. Walker, E. & Nowacki, A.S. Understanding equivalence and noninferiority testing. J. Gen. Intern. Med. 26, 192–196 (2011).

    Article  PubMed  Google Scholar 

  13. SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).

  14. Kapteyn, J., He, R., McDowell, E.T. & Gang, D.R. Incorporation of non-natural nucleotides into template-switching oligonucleotides reduces background and improves cDNA synthesis from very small RNA samples. BMC Genomics 11, 413 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 7, 1130–1142 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).

    Article  CAS  PubMed  Google Scholar 

  17. Pollen, A.A. et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

    CAS  PubMed  Google Scholar 

  20. Zeisel, A. et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).

    Article  CAS  PubMed  Google Scholar 

  21. Jaitin, D.A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Ferreira, T. et al. Silencing of odorant receptor genes by G protein βγ signaling ensures the expression of one odorant receptor per olfactory sensory neuron. Neuron 81, 847–859 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Owens, N.D.L. et al. Measuring absolute RNA copy numbers at high temporal resolution reveals transcriptome kinetics in development. Cell Rep. 14, 632–647 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Llorens-Bobadilla, E. et al. Single-cell transcriptomics reveals a population of dormant neural stem cells that become activated upon brain injury. Cell Stem Cell 17, 329–340 (2015).

    Article  CAS  PubMed  Google Scholar 

  25. Fan, X. et al. Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol. 16, 148 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Dang, Y. et al. Tracing the expression of circular RNAs in human pre-implantation embryos. Genome Biol. 17, 130 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Velten, L. et al. Single-cell polyadenylation site mapping reveals 3′ isoform choice variability. Mol. Syst. Biol. 11, 812 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).

    Article  CAS  PubMed  Google Scholar 

  30. Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Macaulay, I.C. et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat. Methods 12, 519–522 (2015).

    Article  CAS  PubMed  Google Scholar 

  33. Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).

    Article  CAS  PubMed  Google Scholar 

  34. Padovan-Merhar, O. et al. Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. Mol. Cell 58, 339–352 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Sansom, S.N. et al. Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epithelia. Genome Res. 24, 1918–1931 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wilson, N.K. et al. Combined single-cell functional and gene expression analysis resolves heterogeneity within stem cell populations. Cell Stem Cell 16, 712–724 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Streets, A.M. et al. Microfluidic single-cell whole-transcriptome sequencing. Proc. Natl. Acad. Sci. USA 111, 7048–7053 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Guo, F. et al. The transcriptome and DNA methylome landscapes of human primordial germ cells. Cell 161, 1437–1452 (2015).

    Article  CAS  PubMed  Google Scholar 

  39. Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Preprint at http://biorxiv.org/content/early/2016/07/26/065912/ (2016).

  40. Brennecke, P. et al. Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Nat. Immunol. 16, 933–941 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Patro, R., Duggal, G., Love, M.I., Irizarry, M.A. & Kingsford, C. Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference. Preprint at http://biorxiv.org/content/early/2016/08/30/021592/ (2015).

  42. Srivastava, A., Sarkar, H., Gupta, N. & Patro, R. RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics 32, i192–i200 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  PubMed  Google Scholar 

  44. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  45. Carpenter, B., Gelman, A., Hoffman, M., Lee, D. & Goodrich, B. Stan: A probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to O. Stegle and J.K. Kim for helpful discussions and comments on the manuscript. We thank M. Lynch for support with the C1 experiments, X. Chen for discussions on spike-ins, and M. Quail for help with 10× Chromium experiments. We extend our gratitude to S. Linnarsson and A. Zeisel for invaluable support in implementing STRT-seq in our laboratory and for help with sequencing the STRT library. We also thank D. Grün for sharing smFISH molecule counts. Finally we thank R. Kirchner for many improvements to the umis tool. This study was supported by Cancer Research UK grant C45041/A14953 to A.C. and C.L.; European Research Council project 677501–ZF_Blood to A.C.; a core support grant from the Wellcome Trust and MRC to the Wellcome Trust–Medical Research Council Cambridge Stem Cell Institute; ERC grant ThSWITCH to S.A.T. (grant 260507); and a Lister Institute Research Prize to S.A.T. K.N.N. was supported by the Wellcome Trust Strategic Award 'Single cell genomics of mouse gastrulation'. We thank P. Liu (Wellcome Trust Sanger Institute) for providing cells.

Author information

Authors and Affiliations

Authors

Contributions

V.S. and S.A.T. conceived the study. V.S. and L.-H.L. annotated and processed all data. V.S. conceived and implemented the umis tool. V.S. conceived and performed the performance modeling of the data. V.S., R.J.M., and K.N.N. designed the in-house experiments. K.N.N. optimized and implemented the protocols. The degradation experiments were designed by V.S., I.C.M., R.J.M., and K.N.N., who performed the experiments. I.C.M. and C.L. performed zebrafish Smart-seq2 experiments under the supervision of A.C. V.S. and L.H.L. designed the degradation model, and L.H.L. implemented the model. V.S., K.N.N., and S.A.T. wrote the manuscript.

Corresponding authors

Correspondence to Valentine Svensson or Sarah A Teichmann.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Comparison and overview of spike-in sets.

ERCC spike-ins consist of 92 very distinct sequences based on bacterial genes logarithmically distributed across 22 abundance levels (in Mix 1), with poly-A tails ranging from 20 to 26 base pairs. SIRV spike-ins are 69 sequences, modeled after sequences and splicing patterns in 7 human genes. In Mix 2, which we used, the SIRV molecules are present at 4 abundance levels, with virtual alternative isoforms from each gene present at each abundance level. All SIRV molecules have 30 base pair long poly-A tails.

Supplementary Figure 2 UMI efficiency as an alternative metric of sensitivity.

(A) Assuming that UMI counts correspond to a count of the fraction of molecules successfully captured by the RNA-sequencing process, in log-log space the efficiency corresponds to the offset from perfect correspondence between input molecules and counted UMIs. (B) With the exception of data from the MARS-Seq protocol, spike-in detection limits correspond well with UMI efficiency measures. The spike-in detection limit can however also be used for coverage based data quantified by TPM. (C) The assumption with UMI counting as a quantitative measurement is that efficiency is the only factor determining differences between real counts and observed counts. However, fitting a model with a non-one exponent on the number of input molecules shows this is almost in all cases < 1. This means UMI counts underestimate expression of highly expressed genes. (D) The saturation of UMI counts can be partially explained by short UMIs. If an experiment uses too short UMIs, eventually the number of possible observable UMIs plateau. However, even for very long UMIs, such as 10 base pairs, the mean molecule exponent is 0.8, indicating some additional unexplained factor is causing a saturation of UMI counts. (E) Averaged efficiency comparison of endogenous genes and ERCC spike-ins. The data by Grun et al had smFISH measurements for 9 genes in the same experimental conditions as the single-cell RNA-seq data. Assuming 100% capture rate for smFISH, we can compare average smFISH counts with average UMI counts. Round markers correspond to median value across cells, and bars correspond to 95% confidence interval across cells. The smFISH counts suggest UMI counts for endogenous transcripts are on the order of 5-10% on average, while ERCC spike-in UMI counts correspond to 0.5-1% efficiency on average.

Supplementary Figure 3 Trace plots from Bayesian models of degradation.

The posterior samples from the model parameters in Stan41 for both the ERCC and SIRV analysis show very narrow confidence intervals and good correspondence between the different sampling chains. The SIRV based model is slightly noisier, which can be expected, as isoform-level expression when multiple isoforms are present is a harder quantification problem than quantifying expression of the unique ERCC sequences. For the ERCC model, the mode of the degradation rate parameter p is 19%, and for the SIRV model it is 18.5%.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 (PDF 417 kb)

Supplementary Table 1

Descriptive summaries of the public studies used for the comparison (XLSX 11 kb)

Supplementary Table 2

Full data table of technical parameters for each sample used for comparison and generation of all figures (CSV 9363 kb)

Supplementary Software

Umis version 0.3.0, which we used for processing all UMI data. See https://github.com/vals/umis for updated versions (ZIP 30 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Svensson, V., Natarajan, K., Ly, LH. et al. Power analysis of single-cell RNA-sequencing experiments. Nat Methods 14, 381–387 (2017). https://doi.org/10.1038/nmeth.4220

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4220

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing