PT - JOURNAL ARTICLE AU - Eric Van Buren AU - Ming Hu AU - Chen Weng AU - Fulai Jin AU - Yan Li AU - Di Wu AU - Yun Li TI - TWO-SIGMA: a novel TWO-component SInGle cell Model-based Association method for single-cell RNA-seq data AID - 10.1101/709238 DP - 2019 Jan 01 TA - bioRxiv PG - 709238 4099 - http://biorxiv.org/content/early/2019/07/22/709238.short 4100 - http://biorxiv.org/content/early/2019/07/22/709238.full AB - Two key challenges in the analysis of single cell RNA-seq (scRNA-seq) data are excess zeros due to “drop-out” events and substantial overdispersion due to stochastic and systematic differences. Association analysis of scRNA-seq data is further confronted with the possible dependency introduced by measuring multiple single cells from the same biological sample. To address these three challenges, we propose TWO-SIGMA: a TWO-component SInGle cell Model-based Association method. The first component models the drop-out probability with a mixed-effects logistic regression, and the second component models the (conditional) mean read count with a mixed-effects negative binomial regression. Our approach simultaneously allows for overdispersion and accommodates dependency in both drop-out probability and mean mRNA abundance at the gene level, leading to improved statistical power while still providing highly interpretable coefficient estimates. Simulation studies and real data analysis show advantages in type-I error control, power enhancement, and parameter estimation over alternative approaches including MAST and a zero-inflated negative binomial model without random effects. TWO-SIGMA is implemented in the R package “twosigma” available at https://github.com/edvanburen/twosigma.