Abstract
Single-cell RNA-Seq (scRNA-Seq) enables the systematic molecular characterization of heterogeneous tissues at an unprecedented resolution and scale. However, it is currently unclear how to establish formal cell type definitions, which impedes the systematic analysis of scRNA-Seq data across experiments and studies. To address this challenge, we have developed Moana, a hierarchical machine learning framework that enables the construction of robust cell type classifiers from heterogeneous scRNA-Seq datasets. To demonstrate Moana’s capabilities, we construct cell type classifiers for human immune cells that accurately distinguish between closely related cell types in the presence of experimental perturbations and systematic differences between scRNA-Seq protocols. We show that Moana is generally applicable and scales to datasets with more than ten thousand cells, thus enabling the construction of tissue-specific cell type atlases that can be directly applied to analyze new scRNASeq datasets. A Python implementation of Moana can be found at https://github.com/yanailab/moana.