Information Theoretic Feature Selection Methods for Single Cell RNA-Sequencing

Umang Varma; Justin Colacino; Anna Gilbert

doi:10.1101/646919

Abstract

Single cell RNA-sequencing (scRNA-seq) technologies have generated an expansive amount of new biological information, revealing new cellular populations and hierarchical relationships. A number of technologies complementary to scRNA-seq rely on the selection of a smaller number of marker genes (or features) to accurately differentiate cell types within a complex mixture of cells. In this paper, we benchmark differential expression methods against information-theoretic feature selection methods to evaluate the ability of these algorithms to identify small and efficient sets of genes that are informative about cell types. Unlike differential methods, that are strictly binary and univariate, information-theoretic methods can be used as any combination of binary or multiclass and univariate or multivariate. We show for some datasets, information theoretic methods can reveal genes that are both distinct from those selected by traditional algorithms and that are as informative, if not more, of the class labels. We also present detailed and principled theoretical analyses of these algorithms. All information theoretic methods in this paper are implemented in our PicturedRocks Python package that is compatible with the widely used scanpy package.

Footnotes

https://github.com/umangv/picturedrocks
https://github.com/umangv/picturedrocksbenchmarks
↵³ In practice, is can be faster to represent (a₀, a₁, …, a_r−1) as a₀ + a₁k + + a_r–1k^r–1 where k = max_i(a_i) + 1, but we will avoid this conversion for the sake of readability. See our source code for such an implementation.
↵⁴ It should be noted, however, that no algorithm can truly overcome all the weaknesses below for that will require being able to infer a joint probability distribution over a large number of random variables which is meaningless without an enormous number of samples and also computationally intractable.
↵⁵ In cases of extreme synergy, such as the current example (where individual feature provide no information on their own) even the J_cmi objective will fail to choose an optimal set of features as a result of the shortsightedness of a greedy approach. In less extreme cases, it is still beneficial for the objective function to recognize such synergy).