Abstract
A variety of experimental and computational methods have been developed to demultiplex samples from pooled individuals in a single-cell RNA sequencing (scRNA-Seq) experiment which either require adding information (such as hashtag barcodes) or measuring information (such as genotypes) prior to pooling. We introduce scSplit which utilises genetic differences inferred from scRNA-Seq data alone to demultiplex pooled samples. scSplit also extracts a minimal set of high confidence presence/absence genotypes in each cluster which can be used to map clusters to original samples. Using a range of simulated, merged individual-sample as well as pooled multi-individual scRNA-Seq datasets, we show that scSplit is highly accurate and concordant with demuxlet predictions. Furthermore, scSplit predictions are highly consistent with the known truth in cell-hashing dataset. We also show that multiplexed-scRNA-Seq can be used to reduce batch effects caused by technical biases. scSplit is ideally suited to samples for which external genome-wide genotype data cannot be obtained (for example non-model organisms), or for which it is impossible to obtain unmixed samples directly, such as mixtures of genetically distinct tumour cells, or mixed infections. scSplit is available at: https://github.com/jon-xu/scSplit
Footnotes
format and affiliations