RT Journal Article SR Electronic T1 Superscan: Supervised Single-Cell Annotation JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.05.20.445014 DO 10.1101/2021.05.20.445014 A1 Shasha, Carolyn A1 Tian, Yuan A1 Mair, Florian A1 Miller, Helen E.R. A1 Gottardo, Raphael YR 2021 UL http://biorxiv.org/content/early/2021/05/22/2021.05.20.445014.abstract AB Automated cell type annotation of single-cell RNA-seq data has the potential to significantly improve and streamline single cell data analysis, facilitating comparisons and meta-analyses. However, many of the current state-of-the-art techniques suffer from limitations, such as reliance on a single reference dataset or marker gene set, or excessive run times for large datasets. Acquiring high-quality labeled data to use as a reference can be challenging. With CITE-seq, surface protein expression of cells can be directly measured in addition to the RNA expression, facilitating cell type annotation. Here, we compiled and annotated a collection of 16 publicly available CITE-seq datasets. This data was then used as training data to develop Superscan, a supervised machine learning-based prediction model. Using our 16 reference datasets, we benchmarked Superscan and showed that it performs better in terms of both accuracy and speed when compared to other state-of-the-art cell annotation methods. Superscan is pre-trained on a collection of primarily PBMC immune datasets; however, additional data and cell types can be easily added to the training data for further improvement. Finally, we used Superscan to reanalyze a previously published dataset, demonstrating its applicability even when the dataset includes cell types that are missing from the training set.Competing Interest StatementR.G. has received consulting income from Illumina and declares ownership in Ozette Technologies and minor stock ownerships in 10X Genomics.