RT Journal Article SR Electronic T1 A Joint Model of RNA Expression and Surface Protein Abundance in Single Cells JF bioRxiv FD Cold Spring Harbor Laboratory SP 791947 DO 10.1101/791947 A1 Adam Gayoso A1 Romain Lopez A1 Zoƫ Steier A1 Jeffrey Regier A1 Aaron Streets A1 Nir Yosef YR 2019 UL http://biorxiv.org/content/early/2019/10/07/791947.abstract AB Cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) combines unbiased single-cell transcriptome measurements with surface protein quantification comparable to flow cytometry, the gold standard for cell type identification. However, current analysis pipelines cannot address the two primary challenges of CITE-seq data: combining both modalities in a shared latent space that harnesses the power of the paired measurements, and handling the technical artifacts of the protein measurement, which is obscured by non-negligible background noise. Here we present Total Variational Inference (totalVI), a fully probabilistic end-to-end framework for normalizing and analyzing CITE-seq data, based on a hierarchical Bayesian model. In totalVI, the mRNA and protein measurements for each cell are generated from a low-dimensional latent random variable unique to that cell, representing its cellular state. totalVI uses deep neural networks to specify conditional distributions. By leveraging advances in stochastic variational inference, it scales easily to millions of cells. Explicit modeling of nuisance factors enables totalVI to produce denoised data in both domains, as well as a batch-corrected latent representation of cells for downstream analysis tasks.