TY - JOUR T1 - Optimal marker gene selection for cell type discrimination in single cell analyses JF - bioRxiv DO - 10.1101/599654 SP - 599654 AU - Bianca Dumitrascu AU - Soledad Villar AU - Dustin G. Mixon AU - Barbara E. Engelhardt Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/04/04/599654.abstract N2 - Single-cell technologies characterize complex cell populations across multiple data modalities at un-precedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers to identify and differentiate specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGene-Fit selects gene transcript markers that jointly optimize cell label recovery using label-aware compressive classification methods, resulting in a substantially more robust and less redundant set of markers than existing methods. When applied to a data set given a hierarchy of cell type labels, the markers found by our method enable the recovery of the label hierarchy through a computationally efficient and principled optimization. ER -