Abstract
Unsupervised feature selection, or gene filtering, is a common preprocessing step to reduce the dimensionality of single-cell RNA sequencing (scRNAseq) data sets. Existing gene filters operate on scRNAseq datasets in isolation from other datasets. When jointly analyzing multiple datasets, however, there is a need for gene filters that are tailored to comparative analysis. In this work, we present a method for ranking the relevance of genes for comparing trajectory datasets. Our method is unsupervised, i.e., the cell metadata are not assumed to be known. Using the top-ranking genes significantly improves performance compared to methods not tailored to comparative analysis. We demonstrate the effectiveness of our algorithm on previously published datasets from studies on preimplantation embryo development, neurogenesis and cardiogenesis.