TY - JOUR T1 - Non-negative Independent Factor Analysis for single cell RNA-seq JF - bioRxiv DO - 10.1101/2020.01.31.927921 SP - 2020.01.31.927921 AU - Weiguang Mao AU - Maziyar Baran Pouyan AU - Dennis Kostka AU - Maria Chikina Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/02/02/2020.01.31.927921.abstract N2 - Motivation Single cell RNA sequencing (scRNA-seq) enables transcriptional profiling at the level of individual cells. With the emergence of high-throughput platforms datasets comprising tens of thousands or more cells have become routine, and the technology is having an impact across a wide range of biomedical subject areas. However, scRNA-seq data are high-dimensional and affected by noise, so that scalable and robust computational techniques are needed for meaningful analysis, visualization and interpretation. Specifically, a range of matrix factorization techniques have been employed to aid scRNA-seq data analysis. In this context we note that sources contributing to biological variability between cells can be discrete (or multi-modal, for instance cell-types), or continuous (e.g. pathway activity). However, no current matrix factorization approach is set up to jointly infer such mixed sources of variability.Results To address this shortcoming, we present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that combines features of complementary approaches like Independent Component Analysis (ICA), Principal Component Analysis (PCA), and Non-negative Matrix Factorization (NMF). NIFA simultaneously models uni- and multi-modal latent factors and can so isolate discrete cell-type identity and continuous pathway-level variations into separate components. Similar to NMF, NIFA constrains factor loadings to be non-negative in order to increase biological interpretability. We apply our approach to a range of data sets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA and NMF in terms of cell-type identification and biological interpretability. Studying an immunotherapy dataset in detail, we show that NIFA identifies biomedically meaningful sources of variation, derive an improved expression signature for regulatory T-cells, and identify a novel myeloid cell subtype associated with treatment response. Overall, NIFA is a general approach advancing scRNA-seq analysis capabilities and it allows researchers to better take advantage of their data. NIFA is available at https://github.com/wgmao/NIFA.Contact mchikina{at}pitt.edu ER -