Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data

Abstract

t-distributed stochastic neighbor embedding (t-SNE) is widely used for visualizing single-cell RNA-sequencing (scRNA-seq) data, but it scales poorly to large datasets. We dramatically accelerate t-SNE, obviating the need for data downsampling, and hence allowing visualization of rare cell populations. Furthermore, we implement a heatmap-style visualization for scRNA-seq based on one-dimensional t-SNE for simultaneously visualizing the expression patterns of thousands of genes. Software is available at https://github.com/KlugerLab/FIt-SNE and https://github.com/KlugerLab/t-SNE-Heatmaps.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Identifying subpopulations in a large dataset by using marker genes.
Fig. 2: Schematic and demo of t-SNE heatmaps.

Similar content being viewed by others

Code availability

FIt-SNE is available at https://github.com/KlugerLab/FIt-SNE. The code for all experiments is available on request and will be publicly available at https://github.com/KlugerLab/FIt-SNE-paper on publication.

Data availability

The dataset of 1.3 million mouse brain cells and FACS-purified PBMCs of Zheng et al.23 can be downloaded from the 10X Genomics website (https://support.10xgenomics.com/single-cell-gene-expression/datasets/). Two other public scRNA-seq datasets from NCBI Gene Expression Omnibus (GEO) were used: Hrvatin et al. (GSE102827) and Shekhar et al. (GSE81905).

References

  1. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Nat. Protoc. 13, 599–604 (2018).

  2. 10X Genomics. Transciptional profiling of 1.3 million brain cells with the chromium single cell 3′ solution. SequMed BioTechnology http://www.sequmed.com/Private/Files/20170726/6363668905396462451645665.pdf (2017).

  3. Tasic, B. et al. Nature 563, 72–78 (2018).

  4. van der Maaten, L. J. Mach. Learn. Res. 15, 3221–3245 (2014).

  5. Yianilos, P. N. in Proc. Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ‘93) 311–321 (Society for Industrial and Applied Mathematics, 1993).

  6. Bernhardsson, E. Annoy: approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk. GitHub https://github.com/spotify/annoy (2017).

  7. Linderman, G. C. & Steinerberger, S. arXiv Preprint at http://arXiv.org/1706.02582 (2017).

  8. Belkina, A. C. et al. bioRxiv Preprint at https://www.biorxiv.org/content/early/2018/10/24/451690 (2018).

  9. Kobak, D. & Berens, P. bioRxiv Preprint at https://www.biorxiv.org/content/early/2018/10/25/453449 (2018).

  10. Cheng, Y., Wong, M. T., van der Maaten, L. & Newell, E. W. J. Immunol. 196, 924–932 (2015).

  11. Galili, T., O’Callaghan, A., Sidi, J. & Sievert, C. Bioinformatics 34, 1600–1602 (2017).

  12. Shekhar, K. et al. Cell 166, 1308–1323 (2016).

  13. van der Maaten, L. & Hinton, G. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  14. Barnes, J. & Hut, P. Nature 324, 446–449 (1986).

  15. Dahlquist, G. & Björck, Å. Numerical Methods in Scientific Computing Vol. 1 (Society for Industrial and Applied Mathematics, Philadelphia, 2008).

  16. Trefethen, L. N. Approximation Theory and Approximation Practice (SIAM, Philadelphia, 2013).

  17. Abramowitz, M. & Stegun, I. A. in Handbook of Mathematical Function: With Formulas, Graphs and Mathematical Tables (Dover Publications, Mineola, 1965).

  18. Trefethen, L. N. & Weideman, J. A. C. J. Approx. Theory 65, 247–260 (1991).

  19. Halko, N., Martinsson, P.-G., Shkolnisky, Y. & Tygert, M. SIAM J. Sci. Comput. 33, 2580–2594 (2011).

  20. Halko, N., Martinsson, P.-G. & Tropp, J. A. SIAM Rev. 53, 217–288 (2011).

  21. Witten, R. & Candes, E. Algorithmica 72, 264–281 (2015).

  22. Li, H. et al. ACM Trans. Math. Software 43, 28 (2017).

  23. Zheng, G. X. Y. et al. Nat. Commun. 8, 14049 (2017).

  24. Wolf, F. A., Angerer, P. & Theis, F. J. Genome. Biol. 19, 15 (2018).

  25. Hrvatin, S. et al. Nat. Neurosci. 21, 120–129 (2018).

  26. Erichson, N. B., Voronin, S., Brunton, S. L. & Kutz, J. N. arXiv Preprint at http://arXiv.org/1608.02148 (2016).

Download references

Acknowledgements

The authors thank V. Rokhlin, D. Kobak, M. Tygert and J. Zhao for many useful discussions. The authors also thank J. Spilden and I. Taylor for help with testing FIt-SNE on their CyTOF and scRNA-seq datasets.

G.C.L. was supported in part by NIH grants F30HG010102, 1R01HG008383-01A1 and US NIH MSTP Training Grant T32GM007205. M.R. was supported in part by AFOSR grant no. FA9550-16-10175 and NIH grant no. 1R01HG008383-01A1. S.S. was supported in part by the NSF (DMS-1763179) and the Alfred P. Sloan Foundation. Y.K. was supported in part by NIH grant no. 1R01HG008383-01A1.

Author information

Authors and Affiliations

Authors

Contributions

G.C.L., M.R., J.G.H., S.S. and Y.K. conceived and designed the project. G.C.L. implemented the method. All authors wrote and edited the manuscript.

Corresponding author

Correspondence to Yuval Kluger.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Accuracy of approximation to the repulsive term.

Accuracy of computing Frep,i using FFT-accelerated Interpolation-based (FI) t-SNE as compared to the Barnes-Hut (BH) t-SNE implementation over 1000 iterations.

Supplementary Figure 2 The populations identified in Fig. 1 are apparent by embedding using exact nearest neighbors (VP trees) and approximate nearest neighbors (ANN).

1.3 million mouse brain cells are embedding using FIt-SNE with ANN and VP trees; a random 100,000 sized subset of the embedded cells is shown.

Supplementary Figure 3 FIt-SNE of 1.3 million mouse brain cells using exact nearest neighbors (VP trees) vs. FIt-SNE of same cells using approximate nearest neighbors (ANNOY).

1.3 million mouse brain cells are embedding using t-SNE with ANN and VP trees; a random 100,000 sized subset of the embedded cells is shown, colored by Louvain clustering in the original high-dimensional space. The 1N error is computed as the proportion of cells for which the nearest neighbor in the embedding is a member of the same cluster.

Supplementary Figure 4 FIt-SNE of purified peripheral blood monocyte cell (PBMC) populations using exact nearest neighbors (VP trees) vs. FIt-SNE of same cells using approximate nearest neighbors (ANNOY).

64,664 purified PBMCs of Zheng et al.23 are embedding using t-SNE with ANN and VP trees. The 1N error is computed as the proportion of cells for which the nearest neighbor in the embedding is a member of the same population.

Supplementary Figure 5 FIt-SNE of mouse cortical cells using exact nearest neighbors (VP trees) vs. FIt-SNE of same cells using approximate nearest neighbors (ANNOY).

48,266 cells from Hrvatin et al.25 are embedding using t-SNE with ANN and VP trees and labeled as the subtypes in that paper. The 1N error is computed as the proportion of cells for which the nearest neighbor in the embedding is a member of the same subtype.

Supplementary Figure 6 The importance of early exaggeration when embedding large datasets.

1.3 million mouse brain cells are embedded using default early exaggeration setting of 250 (left) and also embedded using setting of 2000 (right). Cells are colored by Louvain clustering in the original high-dimensional space (independent of the t-SNE). Many clusters are broken up when the number of early exaggeration iterations is insufficient, for example the 6 clusters highlighted (bottom).

Supplementary Figure 7 t-SNE heatmap of retinal bipolar cells from Shekhar et al.12.

Genes presented are the 25 genes most associated with each marker gene and cluster metagene (denoted by blue). The heatmap is interactive, allowing users to zoom into a region of interest (see Supplemental Fig. 8).

Supplementary Figure 8 t-SNE heatmap of retinal bipolar cells from Shekhar et al.12, zoomed into region of interest.

Zooming into a section of the t-SNE heatmap in Supplementary Fig. 7.

Supplementary Figure 9 Standard heatmap of retinal bipolar cells from Shekhar et al.12.

Using the same genes (rows) in the same ordering as Fig. 2e and Supplementary Fig. 7, cells were clustered using hierarchical clustering (columns), for comparison to Fig. 2e.

Supplementary Figure 10 An illustration of the algorithm.

Both the intervals on the left are (z0, z0 + R), and both the intervals on the right are (y0, y0 + R). In the lower intervals, the white squares denote the locations zj and yi, and in the upper intervals the white circles indicate the locations of the equispaced nodes \(\tilde{Z}_{i}\) and \(\tilde{Z}_{j}\). The arrows illustrate how a point zj communicates with a point yi.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Tables 1–3

Reporting Summary

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Linderman, G.C., Rachh, M., Hoskins, J.G. et al. Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat Methods 16, 243–245 (2019). https://doi.org/10.1038/s41592-018-0308-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-018-0308-4

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing