Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

iSMNN: Batch Effect Correction for Single-cell RNA-seq data via Iterative Supervised Mutual Nearest Neighbor Refinement

Yuchen Yang, Gang Li, Yifang Xie, Li Wang, Yingxi Yang, Jiandong Liu, Li Qian, View ORCID ProfileYun Li
doi: https://doi.org/10.1101/2020.11.09.375659
Yuchen Yang
1Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
2McAllister Heart Institute, University of North Carolina, Chapel Hill, NC 27599, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gang Li
3Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yifang Xie
1Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
2McAllister Heart Institute, University of North Carolina, Chapel Hill, NC 27599, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Li Wang
4Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, Hubei 430071, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yingxi Yang
5Department of Statistics, Sun Yat-sen University, Guangzhou, Guangdong 510275, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jiandong Liu
1Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
2McAllister Heart Institute, University of North Carolina, Chapel Hill, NC 27599, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Li Qian
1Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
2McAllister Heart Institute, University of North Carolina, Chapel Hill, NC 27599, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: li_qian@med.unc.edu yunli@med.unc.edu
Yun Li
6Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
7Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
8Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yun Li
  • For correspondence: li_qian@med.unc.edu yunli@med.unc.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Batch effect correction is an essential step in the integrative analysis of multiple single cell RNA-seq (scRNA-seq) data. One state-of-the-art strategy for batch effect correction is via unsupervised or supervised detection of mutual nearest neighbors (MNNs). However, both two kinds of methods only detect MNNs across batches on the top of uncorrected data, where the large batch effect may affect the MNN search. To address this issue, we presented iSMNN, a batch effect correction approach via iterative supervised MNN refinement across data after correction. Our benchmarking on both simulation and real datasets showed the advantages of the iterative refinement of MNNs on the performance of correction. Compared to popular alternative methods, our iSMNN is able to better mix the cells of the same cell type across batches. In addition, iSMNN can also facilitate the identification of differentially expression genes (DEGs) relevant to the biological function of certain cell types. These results indicated that iSMNN will be a valuable method for integrating multiple scRNA-seq datasets that can facilitate biological and medical studies at single-cell level.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Yuchen Yang is a Research Assistant Professor in the Department of Pathology and Laboratory Medicine and McAllister Heart Institute at University of North Carolina at Chapel Hill.

    Gang Li is a PhD candidate in the Department of Statistics and Operations Research at University of North Carolina at Chapel Hill.

    Yifang Xie is a postdoctoral research fellow in the Department of Pathology and Laboratory Medicine at University of North Carolina at Chapel Hill.

    Li Wang is an Associate Professor in the Frontier Science Center for Immunology and Metabolism at Wu Han University.

    Yingxi Yang is an undergraduate student in the Department of Statistics at Sun Yat-sen University.

    Jiandong Liu is an Associate Professor in the Department of Pathology and Laboratory Medicine and McAllister Heart Institute at University of North Carolina at Chapel Hill.

    Li Qian is an Associate Professor in the Department of Pathology and Laboratory Medicine and McAllister Heart Institute at University of North Carolina at Chapel Hill.

    Yun Li is an Associate Professor in the Departments of Genetics, Biostatistics and Computer Science at University of North Carolina at Chapel Hill.

  • Key Points

    • Mutual nearest neighbor (MNN) detection has been recognized as a sensible approach for batch effect correction in single cell RNA-sequencing data (scRNA-seq). Among MNN based methods, the supervised version (e.g., implemented in our SMNN method) explicitly leverages cell type or state label information and demonstrates superior performance over its unsupervised counterpart.

    • However, SMNN searches MNNs from the original expression matrices. The number of MNNs can be rather small in the presence of substantial batch effect, which may lead to insufficient or inaccurate correction.

    • To address this issue, here we propose iSMNN, which performs iterative MNN refinement and batch effect correction. With the iteratively refined MNNs from batch-effect-partially-corrected data, iSMNN improves correction accuracy compared to those using a simple one-iteration correction on the original data.

    • Our iSMNN method shows clear advantages over two state-of-the-art batch effect correction methods and can better mix cells of the same cell type across batches and more effectively recover cell-type specific features, in both simulations and real datasets.

  • https://github.com/yycunc/iSMNN

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 12, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
iSMNN: Batch Effect Correction for Single-cell RNA-seq data via Iterative Supervised Mutual Nearest Neighbor Refinement
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
iSMNN: Batch Effect Correction for Single-cell RNA-seq data via Iterative Supervised Mutual Nearest Neighbor Refinement
Yuchen Yang, Gang Li, Yifang Xie, Li Wang, Yingxi Yang, Jiandong Liu, Li Qian, Yun Li
bioRxiv 2020.11.09.375659; doi: https://doi.org/10.1101/2020.11.09.375659
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
iSMNN: Batch Effect Correction for Single-cell RNA-seq data via Iterative Supervised Mutual Nearest Neighbor Refinement
Yuchen Yang, Gang Li, Yifang Xie, Li Wang, Yingxi Yang, Jiandong Liu, Li Qian, Yun Li
bioRxiv 2020.11.09.375659; doi: https://doi.org/10.1101/2020.11.09.375659

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (2409)
  • Biochemistry (4757)
  • Bioengineering (3300)
  • Bioinformatics (14584)
  • Biophysics (6591)
  • Cancer Biology (5132)
  • Cell Biology (7384)
  • Clinical Trials (138)
  • Developmental Biology (4327)
  • Ecology (6826)
  • Epidemiology (2057)
  • Evolutionary Biology (9843)
  • Genetics (7309)
  • Genomics (9471)
  • Immunology (4509)
  • Microbiology (12597)
  • Molecular Biology (4904)
  • Neuroscience (28113)
  • Paleontology (198)
  • Pathology (799)
  • Pharmacology and Toxicology (1372)
  • Physiology (1996)
  • Plant Biology (4452)
  • Scientific Communication and Education (970)
  • Synthetic Biology (1293)
  • Systems Biology (3894)
  • Zoology (718)