TY - JOUR T1 - Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning JF - bioRxiv DO - 10.1101/052225 SP - 052225 AU - Bo Wang AU - Junjie Zhu AU - Emma Pierson AU - Daniele Ramazzotti AU - Serafim Batzoglou Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/02/28/052225.abstract N2 - Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. Here, we propose a novel similarity-learning framework, SIMLR (single-cell interpretation via multi-kernel learning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization applications. Benchmarking against state-of-the-art methods for these applications, we used SIMLR to re-analyse seven representative single-cell data sets, including high-throughput droplet-based data sets with tens of thousands of cells. We show that SIMLR greatly improves clustering sensitivity and accuracy, as well as the visualization and interpretability of the data.SIMLRSingle-cell Interpretation via multi-kernel enhanced similarity learningmESCsmouse embryonic stem cellsNMINormalized mutual informationNNENearest neighbor errorMDSMultidimensional scalingFAFactor analysisPCAPrincipal component analysisPPCAProbabilistic principal components analysisZIFAZero-inflated factor analysisSNEStochastic neighbor embeddingt-SNEt-distributed stochastic neighbor embedding ER -