PT - JOURNAL ARTICLE AU - Weiguang Mao AU - Maziyar Baran Pouyan AU - Dennis Kostka AU - Maria Chikina TI - Non-negative Independent Factor Analysis for single cell RNA-seq AID - 10.1101/2020.01.31.927921 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.01.31.927921 4099 - http://biorxiv.org/content/early/2020/02/02/2020.01.31.927921.short 4100 - http://biorxiv.org/content/early/2020/02/02/2020.01.31.927921.full AB - Motivation Single cell RNA sequencing (scRNA-seq) enables transcriptional profiling at the level of individual cells. With the emergence of high-throughput platforms datasets comprising tens of thousands or more cells have become routine, and the technology is having an impact across a wide range of biomedical subject areas. However, scRNA-seq data are high-dimensional and affected by noise, so that scalable and robust computational techniques are needed for meaningful analysis, visualization and interpretation. Specifically, a range of matrix factorization techniques have been employed to aid scRNA-seq data analysis. In this context we note that sources contributing to biological variability between cells can be discrete (or multi-modal, for instance cell-types), or continuous (e.g. pathway activity). However, no current matrix factorization approach is set up to jointly infer such mixed sources of variability.Results To address this shortcoming, we present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that combines features of complementary approaches like Independent Component Analysis (ICA), Principal Component Analysis (PCA), and Non-negative Matrix Factorization (NMF). NIFA simultaneously models uni- and multi-modal latent factors and can so isolate discrete cell-type identity and continuous pathway-level variations into separate components. Similar to NMF, NIFA constrains factor loadings to be non-negative in order to increase biological interpretability. We apply our approach to a range of data sets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA and NMF in terms of cell-type identification and biological interpretability. Studying an immunotherapy dataset in detail, we show that NIFA identifies biomedically meaningful sources of variation, derive an improved expression signature for regulatory T-cells, and identify a novel myeloid cell subtype associated with treatment response. Overall, NIFA is a general approach advancing scRNA-seq analysis capabilities and it allows researchers to better take advantage of their data. NIFA is available at https://github.com/wgmao/NIFA.Contact mchikina{at}pitt.edu