Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Integrating single-cell transcriptomic data across different conditions, technologies, and species

Abstract

Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of Seurat alignment of single-cell RNA-seq data sets.
Figure 2: Integrated analysis of resting and stimulated PBMCs.
Figure 3: Comparative analysis of mouse hematopoietic progenitors across scRNA-seq technologies.
Figure 4: Joint identification of cell types across human and mouse islet scRNA-seq atlases.
Figure 5: Benchmarking alignment and batch correction methods.

Similar content being viewed by others

Accession codes

Primary accessions

ArrayExpress

Gene Expression Omnibus

References

  1. Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

    Article  CAS  Google Scholar 

  2. Zilionis, R. et al. Single-cell barcoding and sequencing using droplet microfluidics. Nat. Protoc. 12, 44–73 (2017).

    Article  CAS  Google Scholar 

  3. Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  Google Scholar 

  4. Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  Google Scholar 

  5. Shekhar, K. et al. Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell 166, 1308–1323.e30 (2016).

    Article  CAS  Google Scholar 

  6. Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).

    Article  Google Scholar 

  7. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

    Article  CAS  Google Scholar 

  8. Welch, J.D., Hartemink, A.J. & Prins, J.F. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome Biol. 17, 106 (2016).

    Article  Google Scholar 

  9. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    Article  CAS  Google Scholar 

  10. Achim, K. et al. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat. Biotechnol. 33, 503–509 (2015).

    Article  CAS  Google Scholar 

  11. DeLaughter, D.M. et al. Single-cell resolution of temporal gene expression during heart development. Dev. Cell 39, 480–490 (2016).

    Article  CAS  Google Scholar 

  12. Bendall, S.C. et al. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157, 714–725 (2014).

    Article  CAS  Google Scholar 

  13. Blakeley, P. et al. Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development 142, 3613 (2015).

    Article  Google Scholar 

  14. Johnson, M.B. et al. Single-cell analysis reveals transcriptional heterogeneity of neural progenitors in human cortex. Nat. Neurosci. 18, 1–30 (2015).

    Article  Google Scholar 

  15. Regev, A. et al. The Human Cell Atlas. Elife 6, 1–30 (2017).

    Article  Google Scholar 

  16. Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

    Article  CAS  Google Scholar 

  17. Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

    Article  Google Scholar 

  18. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D. & Batzoglou, S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14, 414–416 (2017).

    Article  CAS  Google Scholar 

  19. Kiselev, V.Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).

    Article  CAS  Google Scholar 

  20. Lin, P., Troup, M. & Ho, J.W.K. CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol. 18, 59 (2017).

    Article  Google Scholar 

  21. Prabhakaran, S., Azizi, E. & Pe'er, D. Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. Proc. 33rd Int. Conf. Mach. Learn. 48, 1070–1079 (2016).

    Google Scholar 

  22. Ntranos, V., Kamath, G.M., Zhang, J.M., Pachter, L. & Tse, D.N. Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts. Genome Biol. 17, 112 (2016).

    Article  Google Scholar 

  23. Xu, C. & Su, Z. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31, 1974–1980 (2015).

    Article  CAS  Google Scholar 

  24. Lei, Z., Bai, Q., He, R. & Li, S.Z. Face shape recovery from a single image using CCA mapping between tensor spaces. 26th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR doi:10.1109/CVPR.2008.4587341 (2008).

  25. Zhou, F. & Torre, F. in Advances in Neural Information Processing Systems 22; NIPS 2009 (eds. Y. Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I. & Culotta, A.) https://papers.nips.cc/paper/3728-canonical-time-warping-for-alignment-of-human-behavior (Neural Information Processing Systems Foundation, Inc., 2009).

  26. Wang, C. & Mahadevan, S. in Proc. Twenty-Second International Joint Conference on Artificial Intelligence, Vol. 2 (ed. Walsh, T.) 1541–1546 (AAAI, 2011).

  27. Huang, H., He, H., Fan, X. & Zhang, J. Super-resolution of human face image using canonical correlation analysis. Pattern Recognit. 43, 2532–2543 (2010).

    Article  Google Scholar 

  28. Hotelling, H. Relations between two sets of variates. Biometrika 28, 321–377 (1936).

    Article  Google Scholar 

  29. Hardoon, D.R., Szedmak, S. & Shawe-Taylor, J. Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004).

    Article  Google Scholar 

  30. Witten, D.M., Tibshirani, R. & Hastie, T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009).

    Article  Google Scholar 

  31. Lê Cao, K.-A., Martin, P.G., Robert-Granié, C. & Besse, P. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10, 34 (2009).

    Article  Google Scholar 

  32. Waaijenborg, S., Verselewel de Witt Hamer, P.C. & Zwinderman, A.H. Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis. Stat. Appl. Genet. Mol. Biol. 7, e3 (2008).

    Article  Google Scholar 

  33. Kettenring, J. Canonical analysis of several sets of variables. Biometrika 58, 433–451 (1971).

    Article  Google Scholar 

  34. Nielsen, A.A. Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data. IEEE Trans. Image Process. 11, 293–305 (2002).

    Article  Google Scholar 

  35. Berndt, D. & Clifford, J. Using dynamic time warping to find patterns in time series. Work. Knowl. Knowl. Discov. Databases 398, 359–370 (1994).

    Google Scholar 

  36. Kang, H.M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    Article  CAS  Google Scholar 

  37. Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20–e31 (2016).

    Article  CAS  Google Scholar 

  38. Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).

    Article  CAS  Google Scholar 

  39. Adolfsson, J. et al. Identification of Flt3+ lympho-myeloid stem cells lacking erythro-megakaryocytic potential a revised road map for adult blood lineage commitment. Cell 121, 295–306 (2005).

    Article  CAS  Google Scholar 

  40. Lacar, B. et al. Corrigendum: nuclear RNA-seq of single neurons reveals molecular signatures of activation. Nat. Commun. 8, 15047 (2017).

    Article  CAS  Google Scholar 

  41. Poli, A. et al. CD56bright natural killer (NK) cells: an important NK cell subset. Immunology 126, 458–465 (2009).

    Article  CAS  Google Scholar 

  42. Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360.e4 (2016).

    Article  CAS  Google Scholar 

  43. Scheuner, D. & Kaufman, R.J. The unfolded protein response: a pathway that links insulin demand with β-cell failure and diabetes. Endocr. Rev. 29, 317–333 (2008).

    Article  CAS  Google Scholar 

  44. Walter, W., Sánchez-Cabo, F. & Ricote, M. GOplot: an R package for visually combining expression data with functional analysis. Bioinformatics 31, 2912–2914 (2015).

    Article  CAS  Google Scholar 

  45. Jiang, H.-Y. et al. Activating transcription factor 3 is integral to the eukaryotic initiation factor 2 kinase stress response. Mol. Cell. Biol. 24, 1365–1377 (2004).

    Article  CAS  Google Scholar 

  46. Papa, F.R. Endoplasmic reticulum stress, pancreatic β-cell degeneration, and diabetes. Cold Spring Harb. Perspect. Med. 2, a007666 (2012).

    Article  Google Scholar 

  47. Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016).

    Article  Google Scholar 

  48. Johnson, W.E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article  Google Scholar 

  49. Ritchie, M.E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  Google Scholar 

  50. Lake, B.B. et al. Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. Science 352, 1586–1590 (2016).

    Article  CAS  Google Scholar 

  51. Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643.e4 (2017).

    Article  CAS  Google Scholar 

  52. Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017).

    Article  CAS  Google Scholar 

  53. Junker, J.P. et al. Genome-wide RNA tomography in the zebrafish embryo. Cell 159, 662–675 (2014).

    Article  CAS  Google Scholar 

  54. Lee, J.H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360–1363 (2014).

    Article  CAS  Google Scholar 

  55. Ståhl, P.L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).

    Article  Google Scholar 

  56. Scialdone, A. et al. Resolving early mesoderm diversification through single-cell expression profiling. Nature 535, 289–293 (2016).

    Article  CAS  Google Scholar 

  57. Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).

    Article  CAS  Google Scholar 

  58. Ilicic, T. et al. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 29 (2016).

    Article  Google Scholar 

  59. Dudoit, S., Fridlyans, J. & Speed, T.P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002).

    Article  CAS  Google Scholar 

  60. Tibshirani, R., Hastie, T., Narasimhan, B. & Chu, G. Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci. 18, 104–117 (2003).

    Article  Google Scholar 

  61. Baglama, J. & Reichel, L. Augmented implicitly restarted lanczos bidiagonalization methods. SIAM J. Sci. Comput. (2005).

  62. Giorgino, T. Computing and visualizing dynamic time warping alignments in R: the dtw package. J. Stat. Softw. 31, 1–24 (2009).

    Article  Google Scholar 

  63. Waltman, L. & Van Eck, N.J. A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86, 1–33 (2013).

    Article  Google Scholar 

  64. Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 1–21 (2014).

    Google Scholar 

  65. Richards, J. diffusionMap: diffusion map. (2014) at https://cran.r-project.org/package=diffusionMap.

  66. Hastie, T. & Stuetzle, W. Principal curves. J. Am. Stat. Assoc. 84, 502 (1989).

    Article  Google Scholar 

  67. S original by Trevor Hastie R port by Andreas Weingessel. princurve: Fits a Principal Curve in Arbitrary Dimension. https://cran.r-project.org/package=princurve (2013).

  68. Tseng, G.C., Ghosh, D. & Feingold, E. Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 40, 3785–3799 (2012).

    Article  CAS  Google Scholar 

  69. Kuleshov, M.V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).

    Article  CAS  Google Scholar 

  70. Mayer, C. et al. Developmental diversification of cortical inhibitory interneurons. Nature 555, 457–462 (2018).

    Article  CAS  Google Scholar 

  71. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank members of the Satija laboratory, as well as P. Roelli, M. Stoeckius, G. Fishell, C. Desplan, R. Bonneau, E. Macosko, and A. Corvelo for their valuable feedback, and F. Hamey, HM Kang, and J. Ye for assistance with published data sets. This work was supported by an NIH New Innovator Award (1DP2HG009623-01) and R01 (5R01MH071679-12) to R.S. and an NSF Graduate Fellowship (DGE1342536) to A.B.

Author information

Authors and Affiliations

Authors

Contributions

A.B. and R.S. conceived the research. A.B., P.H., and R.S. implemented the alignment procedure, performed all data analysis, and wrote the manuscript. E.P. performed the PBMC validation experiments, and P.S. performed the ddSeq experiments.

Corresponding author

Correspondence to Rahul Satija.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 (PDF 8713 kb)

Life Sciences Reporting Summary (PDF 179 kb)

Supplementary Dataset 1

Cell metadata for IFNB response analysis (TXT 681 kb)

Supplementary Dataset 2

Cell metadata for murine hematopoiesis analysis (TXT 83 kb)

Supplementary Dataset 3

Cell metadata for cross-species pancreatic islet analysis (TXT 545 kb)

Supplementary Dataset 4

This table contains a summary of the data distributions and statistical details related to the manuscript figures (XLSX 32 kb)

Supplementary Software

Source code and installation instructions for software used in described analyses (ZIP 810 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Butler, A., Hoffman, P., Smibert, P. et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36, 411–420 (2018). https://doi.org/10.1038/nbt.4096

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.4096

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing