TY - JOUR T1 - Embedding to Reference t-SNE Space Addresses Batch Effects in Single-Cell Classification JF - bioRxiv DO - 10.1101/671404 SP - 671404 AU - Pavlin G. Poličar AU - Martin Stražar AU - Blaž Zupan Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/06/14/671404.abstract N2 - Dimensionality reduction techniques, such as t-SNE, can construct informative visualizations of high-dimensional data. When working with multiple data sets, a straightforward application of these methods often fails; instead of revealing underlying classes, the resulting visualizations expose data set-specific clusters. To circumvent these batch effects, we propose an embedding procedure that takes a t-SNE visualization constructed on a reference data set and uses it as a scaffold for embedding new data. The new, secondary data is embedded one data-point at the time. This prevents any interactions between instances in the secondary data and implicitly mitigates batch effects. We demonstrate the utility of this approach with an analysis of six recently published single-cell gene expression data sets containing up to tens of thousands of cells and thousands of genes. In these data sets, the batch effects are particularly strong as the data comes from different institutions and was obtained using different experimental protocols. The visualizations constructed by our proposed approach are cleared of batch effects, and the cells from secondary data sets correctly co-cluster with cells from the primary data sharing the same cell type. ER -