Abstract
Combining single-cell cytometry datasets increases the analytical flexibility and the statistical power of data analyses. However, in many cases the full potential of co-analyses is not reached due to technical variance between data from different experimental batches. Here, we present cyCombine, a method to robustly integrate cytometry data from different batches, experiments, or even different experimental techniques, such as CITE-seq, flow cytometry, and mass cytometry. We demonstrate that cyCombine maintains the biological variance and the structure of the data, while minimizing the technical variance between datasets. cyCombine does not require technical replicates across datasets, and computation time scales linearly with the number of cells, allowing for integration of massive datasets. Robust, accurate, and scalable integration of cytometry data enables integration of multiple datasets for primary data analyses and the validation of results using public datasets.
Competing Interest Statement
CJW holds equity in BioNTech, Inc; and receives research funding from Pharmacyclics. The authors declare no conflict of interest.
Footnotes
Funding This work was funded by the Independent Research Fund Denmark (grant 8048-00078B to LRO) and the National Institutes of Health (grants P30AR070253 and U01AI138318 to JAL; P01CA206978 and UG1 CA233338 to CJW). SHG was supported by the Kay Kendall Leukaemia Fund.
Conflict of interest CJW holds equity in BioNTech, Inc; and receives research funding from Pharmacyclics. The authors declare no conflict of interest.