Abstract
Single cell RNA sequencing (scRNA-seq) enables transcriptional profiling at the resolution of individual cells. These experiments measure features at the level of transcripts, but biological processes of interest often involve the complex coordination of many individual transcripts. It can therefore be difficult to extract interpretable insights directly from transcript-level cell profiles. Latent representations which capture biological variation in a smaller number of dimensions are therefore useful in interpreting many experiments. Variational autoencoders (VAEs) have emerged as a tool for scRNA-seq denoising and data harmonization, but the correspondence between latent dimensions in these models and generative factors remains unexplored. Here, we explore training VAEs with modifications to the objective function (i.e. β-VAE) to encourage disentanglement and make latent representations of single cell RNA-seq data more interpretable. Using simulated data, we find that VAE latent dimensions correspond more directly to data generative factors when using these modified objective functions. Applied to experimental data of stimulated peripheral blood mononuclear cells, we find better correspondence of latent dimensions to experimental factors and cell identity programs, but impaired performance on cell type clustering.
Publication Status This pre-print represents the final output of a preliminary research direction and will not be updated or published in an archival journal. We are happy to discuss future directions we believe to be promising with any interested researchers.