PT - JOURNAL ARTICLE AU - Andrew Butler AU - Rahul Satija TI - Integrated analysis of single cell transcriptomic data across conditions, technologies, and species AID - 10.1101/164889 DP - 2017 Jan 01 TA - bioRxiv PG - 164889 4099 - http://biorxiv.org/content/early/2017/07/18/164889.short 4100 - http://biorxiv.org/content/early/2017/07/18/164889.full AB - Single cell RNA-seq (scRNA-seq) has emerged as a transformative tool to discover and define cellular phenotypes. While computational scRNA-seq methods are currently well suited for experiments representing a single condition, technology, or species, analyzing multiple datasets simultaneously raises new challenges. In particular, traditional analytical workflows struggle to align subpopulations that are present across datasets, limiting the possibility for integrated or comparative analysis. Here, we introduce a new computational strategy for scRNA-seq alignment, utilizing common sources of variation to identify shared subpopulations between datasets as part of our R toolkit Seurat. We demonstrate our approach by aligning scRNA-seq datasets of PBMCs under resting and stimulated conditions, hematopoietic progenitors sequenced across two profiling technologies, and pancreatic cell ‘atlases’ generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across datasets, and can identify subpopulations that could not be detected by analyzing datasets independently. We anticipate that these methods will serve not only to correct for batch or technology-dependent effects, but also to facilitate general comparisons of scRNA-seq datasets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.Availability Installation instructions, documentation, and tutorials are available at http://www.satijalab.org/seurat