PT - JOURNAL ARTICLE AU - Tallulah S. Andrews AU - Martin Hemberg TI - Modelling dropouts allows for unbiased identification of marker genes in scRNASeq experiments AID - 10.1101/065094 DP - 2016 Jan 01 TA - bioRxiv PG - 065094 4099 - http://biorxiv.org/content/early/2016/07/21/065094.short 4100 - http://biorxiv.org/content/early/2016/07/21/065094.full AB - Single-cell RNASeq (scRNASeq) differs from bulk RNASeq in that a large number of genes have zero reads in some cells, but relatively high expression in the remaining cells. We propose that these zeros, or dropouts, are due to failure of the reverse transcription, and we model the process using the Michaelis-Menten (MM) equation. We show that the MM equation provides an equivalent or superior fit to existing scRNASeq datasets compared to other models. In addition, identifying genes significantly to the right of the MM curve is a fast and accurate method to distinguish differentially expressed genes without prior identification of subpopulations of cells. We applied our method to a mouse preimplantation dataset and demonstrate that clustering the selected genes identifies biologically meaningful clusters. Furthermore, this feature selection makes it possible to overcome batch effects and cluster cells from five different datasets by their biological groups rather than by experimental origin.