RT Journal Article SR Electronic T1 Optimal marker gene selection for cell type discrimination in single cell analyses JF bioRxiv FD Cold Spring Harbor Laboratory SP 599654 DO 10.1101/599654 A1 Bianca Dumitrascu A1 Soledad Villar A1 Dustin G. Mixon A1 Barbara E. Engelhardt YR 2019 UL http://biorxiv.org/content/early/2019/04/04/599654.abstract AB Single-cell technologies characterize complex cell populations across multiple data modalities at un-precedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers to identify and differentiate specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGene-Fit selects gene transcript markers that jointly optimize cell label recovery using label-aware compressive classification methods, resulting in a substantially more robust and less redundant set of markers than existing methods. When applied to a data set given a hierarchy of cell type labels, the markers found by our method enable the recovery of the label hierarchy through a computationally efficient and principled optimization.