ABSTRACT
Batch effect correction is an essential step in the integrative analysis of multiple single cell RNA-seq (scRNA-seq) data. One state-of-the-art strategy for batch effect correction is via unsupervised or supervised detection of mutual nearest neighbors (MNNs). However, both two kinds of methods only detect MNNs across batches on the top of uncorrected data, where the large batch effect may affect the MNN search. To address this issue, we presented iSMNN, a batch effect correction approach via iterative supervised MNN refinement across data after correction. Our benchmarking on both simulation and real datasets showed the advantages of the iterative refinement of MNNs on the performance of correction. Compared to popular alternative methods, our iSMNN is able to better mix the cells of the same cell type across batches. In addition, iSMNN can also facilitate the identification of differentially expression genes (DEGs) relevant to the biological function of certain cell types. These results indicated that iSMNN will be a valuable method for integrating multiple scRNA-seq datasets that can facilitate biological and medical studies at single-cell level.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Yuchen Yang is a Research Assistant Professor in the Department of Pathology and Laboratory Medicine and McAllister Heart Institute at University of North Carolina at Chapel Hill.
Gang Li is a PhD candidate in the Department of Statistics and Operations Research at University of North Carolina at Chapel Hill.
Yifang Xie is a postdoctoral research fellow in the Department of Pathology and Laboratory Medicine at University of North Carolina at Chapel Hill.
Li Wang is an Associate Professor in the Frontier Science Center for Immunology and Metabolism at Wu Han University.
Yingxi Yang is an undergraduate student in the Department of Statistics at Sun Yat-sen University.
Jiandong Liu is an Associate Professor in the Department of Pathology and Laboratory Medicine and McAllister Heart Institute at University of North Carolina at Chapel Hill.
Li Qian is an Associate Professor in the Department of Pathology and Laboratory Medicine and McAllister Heart Institute at University of North Carolina at Chapel Hill.
Yun Li is an Associate Professor in the Departments of Genetics, Biostatistics and Computer Science at University of North Carolina at Chapel Hill.
Key Points
Mutual nearest neighbor (MNN) detection has been recognized as a sensible approach for batch effect correction in single cell RNA-sequencing data (scRNA-seq). Among MNN based methods, the supervised version (e.g., implemented in our SMNN method) explicitly leverages cell type or state label information and demonstrates superior performance over its unsupervised counterpart.
However, SMNN searches MNNs from the original expression matrices. The number of MNNs can be rather small in the presence of substantial batch effect, which may lead to insufficient or inaccurate correction.
To address this issue, here we propose iSMNN, which performs iterative MNN refinement and batch effect correction. With the iteratively refined MNNs from batch-effect-partially-corrected data, iSMNN improves correction accuracy compared to those using a simple one-iteration correction on the original data.
Our iSMNN method shows clear advantages over two state-of-the-art batch effect correction methods and can better mix cells of the same cell type across batches and more effectively recover cell-type specific features, in both simulations and real datasets.