RT Journal Article SR Electronic T1 Information Theoretic Feature Selection Methods for Single Cell RNA-Sequencing JF bioRxiv FD Cold Spring Harbor Laboratory SP 646919 DO 10.1101/646919 A1 Umang Varma A1 Justin Colacino A1 Anna Gilbert YR 2019 UL http://biorxiv.org/content/early/2019/05/24/646919.abstract AB Single cell RNA-sequencing (scRNA-seq) technologies have generated an expansive amount of new biological information, revealing new cellular populations and hierarchical relationships. A number of technologies complementary to scRNA-seq rely on the selection of a smaller number of marker genes (or features) to accurately differentiate cell types within a complex mixture of cells. In this paper, we benchmark differential expression methods against information-theoretic feature selection methods to evaluate the ability of these algorithms to identify small and efficient sets of genes that are informative about cell types. Unlike differential methods, that are strictly binary and univariate, information-theoretic methods can be used as any combination of binary or multiclass and univariate or multivariate. We show for some datasets, information theoretic methods can reveal genes that are both distinct from those selected by traditional algorithms and that are as informative, if not more, of the class labels. We also present detailed and principled theoretical analyses of these algorithms. All information theoretic methods in this paper are implemented in our PicturedRocks Python package that is compatible with the widely used scanpy package.