TY - JOUR T1 - Feature extraction approach in single-cell gene expression profiling for cell-type marker identification JF - bioRxiv DO - 10.1101/686659 SP - 686659 AU - Nigatu A. Adossa AU - Leif Schauser AU - Vivi G. Gregersen AU - Laura L. Elo Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/06/28/686659.abstract N2 - Background Recent advances in single-cell gene expression profiling technology have revolutionized the understanding of molecular processes underlying developmental cell and tissue differentiation, enabling the discovery of novel cell-types and molecular markers that characterize developmental trajectories. Common approaches for identifying marker genes are based on pairwise statistical testing for differential gene expression between cell-types in heterogeneous cell populations, which is challenging due to unequal sample sizes and variance between groups resulting in little statistical power and inflated type I errors.Results We developed an alternative feature extraction method, Marker gene Identification for Cell-type Identity (MICTI) that encodes the cell-type specific expression information to each gene in every single-cell. This approach identifies features (genes) that are cell-type specific for a given cell-type in heterogeneous cell population. To validate this approach, we used (i) simulated single cell RNA-seq data, (ii) human pancreatic islet single-cell RNA-seq data and (iii) a simulated mixture of human single-cell RNA-seq data related to immune cells, particularly B cells, CD4+ memory cells, CD8+ memory cells, dendritic cells, fibroblast cells, and lymphoblast cells. For all cases, we were able to identify established cell-type-specific markers.Conclusions Our approach represents a highly efficient and fast method as an alternative to differential expression analysis for molecular marker identification in heterogeneous single-cell RNA-seq data.MICTIMarker gene Identification for Cell-type IdentityDEDifferential ExpressionMASTModel-based Analysis of Single-cell Transcriptomics.ROTSThe Reproducibility-Optimized Test StatisticBPSCBeta-Poisson model for Single-Cell RNA-seq data analysesEC2Elastic cloud computeDGEDifferential Gene ExpressionTPMTranscript Per Millions of mapped readRPKMRead Per Kilobase per Millions of mapped readGEOGene Expression OmnibusUMIUnique Molecular IdentifierSC3Single-Cell Consensus ClusteringLDALatent Dirichlet AllocationPCAPrincipal Component AnalysisICAIndependent Component AnalysisTF-IDFTerm Frequency Inverse Document FrequencyNMFNegative Matrix Factorizationt-SNEt-Distributed Stochastic Neighbor EmbeddingMSTMinimum Spanning TreeTSCANTools for Single-Cell AnalysisNBNegative BinomialFACsFluorescence-Activated Cell SorterPCRPolymerase Chain ReactioncDNAcomplementary DNAscRNA-seqSingle-Cell RNA Sequencing ER -