RT Journal Article SR Electronic T1 Exploring Single-Cell Data with Deep Multitasking Neural Networks JF bioRxiv FD Cold Spring Harbor Laboratory SP 237065 DO 10.1101/237065 A1 Amodio, Matthew A1 van Dijk, David A1 Srinivasan, Krishnan A1 Chen, William S A1 Mohsen, Hussein A1 Moon, Kevin R A1 Campbell, Allison A1 Zhao, Yujiao A1 Wang, Xiaomei A1 Venkataswamy, Manjunatha A1 Desai, Anita A1 V., Ravi A1 Kumar, Priti A1 Montgomery, Ruth A1 Wolf, Guy A1 Krishnaswamy, Smita YR 2019 UL http://biorxiv.org/content/early/2019/01/03/237065.abstract AB Biomedical researchers are generating high-throughput, high-dimensional single-cell data at a staggering rate. As costs of data generation decrease, experimental design is moving towards measurement of many different single-cell samples in the same dataset. These samples can correspond to different patients, conditions, or treatments. While scalability of methods to datasets of these sizes is a challenge on its own, dealing with large-scale experimental design presents a whole new set of problems, including batch effects and sample comparison issues. Currently, there are no computational tools that can both handle large amounts of data in a scalable manner (many cells) and at the same time deal with many samples (many patients or conditions). Moreover, data analysis currently involves the use of different tools that each operate on their own data representation, not guaranteeing a synchronized analysis pipeline. For instance, data visualization methods can be disjoint and mismatched with the clustering method. For this purpose, we present SAUCIE, a deep neural network that leverages the high degree of parallelization and scalability offered by neural networks, as well as the deep representation of data that can be learned by them to perform many single-cell data analysis tasks, all on a unified representation. A well-known limitation of neural networks is their interpretability. Our key contribution here are newly formulated regularizations (penalties) that render features learned in hidden layers of the neural network interpretable. When large multi-patient datasets are fed into SAUCIE, the various hidden layers contain denoised and batch-corrected data, a low dimensional visualization, unsupervised clustering, as well as other information that can be used to explore the data. We show this capability by analyzing a newly generated 180-sample dataset consisting of T cells from dengue patients in India, measured with mass cytometry. We show that SAUCIE, for the first time, can batch correct and process this 11-million cell data to identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue on the basis of single-cell measurements.