Abstract
Gene-by-gene differential expression analysis is a popular supervised learning method for analyzing single-cell RNA sequencing (scRNA-seq) data. However, the large number of cells in scRNA-seq studies often results in numerous differentially expressed genes with extremely small p-values but minimal effect sizes, complicating interpretation. To address this challenge, we developed a method called Supervised Deep Learning with gene ANnotation (SDAN). SDAN integrates gene annotation and gene expression data using a graph neural network to identify gene sets to classify cells, and then the corresponding individuals. We demonstrated the usage of SDAN by identifying gene sets associated with severe COVID-19, dementia, and cancer patients’ responses to immunotherapy.
Competing Interest Statement
The authors have declared no competing interest.