SUMMARY
Single-cell technologies have enhanced our knowledge of molecular and cellular heterogeneity underlying disease. As the scale of single-cell datasets expands, linking cell-level phenotypic alterations with clinical outcomes becomes increasingly challenging. To address this, we introduce CellPhenoX, an eXplainable machine learning method to identify cell-specific phenotypes that influence clinical outcomes. CellPhenoX integrates classification models, explainable AI techniques, and a statistical framework to generate interpretable, cell-specific scores that uncover cell populations associated with relevant clinical phenotypes and interaction effects. We demonstrated the performance of CellPhenoX across diverse single-cell designs, including simulations, binary disease-control comparisons, and multi-class studies. Notably, CellPhenoX identified an activated monocyte phenotype in COVID-19, with expansion correlated with disease severity after adjusting for covariates and interactive effects. It also uncovered an inflammation-associated gradient in fibroblasts from ulcerative colitis. We anticipate that CellPhenoX holds the potential to detect clinically relevant phenotypic changes in single-cell data with multiple sources of variation, paving the way for translating single-cell findings into clinical impact.
Competing Interest Statement
The authors have declared no competing interest.