Abstract
To achieve accurate and reproducible cytometry data analysis, we benchmarked 19 machine learning algorithms for supervised and unsupervised cell classification. The underlying data encompassed 138 million cells from seven independent datasets including conventional flow cytometry, spectral flow cytometry and mass cytometry. We found that tree-based classifiers and in particular Decision Trees, outperformed other approaches in classification accuracy, speed and memory use. High accuracy was achieved even for cell populations rarer than 1% using decision trees. We validated our decision tree-based approach in a clinical setting using diagnostic blood T cell phenotyping of 107 patients. Automatic quantification of CD4 helper T cell phenotypes achieved 99 % accuracy compared to manual expert assessment. Finally, we combined automated data transformation, supervised and unsupervised gating, an application program interface and a user-friendly desktop-application into FACSPy and FACSPyUI, a fast and scalable open-source toolbox for the analysis and visualization of cytometry data.
Competing Interest Statement
The authors have declared no competing interest.