%0 Journal Article
%A Treppner, Martin
%A Haug, Stefan
%A Köttgen, Anna
%A Binder, Harald
%T Designing Single Cell RNA-Sequencing Experiments for Learning Latent Representations
%D 2022
%R 10.1101/2022.07.08.499284
%J bioRxiv
%P 2022.07.08.499284
%X To investigate the complexity arising from single-cell RNA-sequencing (scRNA-seq) data, researchers increasingly resort to deep generative models, specifically variational autoencoders (VAEs), which are trained by variational inference techniques. Similar to other dimension reduction approaches, this allows encoding the inherent biological signals of gene expression data, such as pathways or gene programs, into lower-dimensional latent representations. However, the number of cells necessary to adequately uncover such latent representations is often unknown. Therefore, we propose a single-cell variational inference approach for designing experiments (scVIDE) to determine statistical power for detecting cell group structure in a lower-dimensional representation. The approach is based on a test statistic that quantifies the contribution of every single cell to the latent representation. Using a smaller scRNA-seq data set as a starting point, we generate synthetic data sets of various sizes from a fitted VAE. Employing a permutation technique for obtaining a null distribution of the test statistic, we subsequently determine the statistical power for various numbers of cells, thus guiding experimental design. We illustrate with several data sets from various sequencing protocols how researchers can use scVIDE to determine the statistical power for cell group detection within their own scRNA-seq studies. We also consider the setting of transcriptomics studies with large numbers of cells, where scVIDE can be used to determine the statistical power for sub-clustering. For this purpose, we use data from the human KPMP Kidney Cell Atlas and evaluate the power for sub-clustering of the epithelial cells contained therein. To make our approach readily accessible, we provide a comprehensive Jupyter notebook at https://github.com/MTreppner/scVIDE.jl that researchers can use to design their own experiments based on scVIDE.Competing Interest StatementThe authors have declared no competing interest.
%U https://www.biorxiv.org/content/biorxiv/early/2022/07/10/2022.07.08.499284.full.pdf