TY - JOUR T1 - Scalable latent-factor models applied to single-cell RNA-seq data separate biological drivers from confounding effects JF - bioRxiv DO - 10.1101/087775 SP - 087775 AU - Florian Buettner AU - Naruemon Pratanwanich AU - John C. Marioni AU - Oliver Stegle Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/11/15/087775.abstract N2 - Single-cell RNA-sequencing (scRNA-seq) allows heterogeneity in gene expression levels to be studied in large populations of cells. Such heterogeneity can arise from both technical and biological factors, thus making decomposing sources of variation extremely difficult. We here describe a computationally efficient model that uses prior pathway annotation to guide inference of the biological drivers underpinning the heterogeneity. Moreover, we jointly update and improve gene set annotation and infer factors explaining variability that fall outside the existing annotation. We validate our method using simulations, which demonstrate both its accuracy and its ability to scale to large datasets with up to 100,000 cells. Moreover, through applications to real data we show that our model can robustly decompose scRNA-seq datasets into interpretable components and facilitate the identification of novel sub-populations. ER -