RT Journal Article SR Electronic T1 Correcting batch effects in single-cell RNA sequencing data by matching mutual nearest neighbours JF bioRxiv FD Cold Spring Harbor Laboratory SP 165118 DO 10.1101/165118 A1 Haghverdi, Laleh A1 Lun, Aaron T. L. A1 Morgan, Michael D. A1 Marioni, John C. YR 2017 UL http://biorxiv.org/content/early/2017/07/18/165118.abstract AB The presence of batch effects is a well-known problem in experimental data analysis, and single- cell RNA sequencing (scRNA-seq) is no exception. Large-scale scRNA-seq projects that generate data from different laboratories and at different times are rife with batch effects that can fatally compromise integration and interpretation of the data. In such cases, computational batch correction is critical for eliminating uninteresting technical factors and obtaining valid biological conclusions. However, existing methods assume that the composition of cell populations are either known or the same across batches. Here, we present a new strategy for batch correction based on the detection of mutual nearest neighbours in the high-dimensional expression space. Our approach does not rely on pre-defined or equal population compositions across batches, only requiring that a subset of the population be shared between batches. We demonstrate the superiority of our approach over existing methods on a range of simulated and real scRNA-seq data sets. We also show how our method can be applied to integrate scRNA-seq data from two separate studies of early embryonic development.