A test metric for assessing single-cell RNA-seq batch correction

Maren Büttner; Zhichao Miao; F Alexander Wolf; Sarah A Teichmann; Fabian J Theis

doi:10.1038/s41592-018-0254-1

A test metric for assessing single-cell RNA-seq batch correction

Nat Methods. 2019 Jan;16(1):43-49. doi: 10.1038/s41592-018-0254-1. Epub 2018 Dec 20.

Authors

Maren Büttner^#¹, Zhichao Miao^#^{2

3}, F Alexander Wolf¹, Sarah A Teichmann^{4

5

6}, Fabian J Theis^{7

8}

Affiliations

¹ Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
² European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK.
³ Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
⁴ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK. st9@sanger.ac.uk.
⁵ Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK. st9@sanger.ac.uk.
⁶ Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge, UK. st9@sanger.ac.uk.
⁷ Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany. fabian.theis@helmholtz-muenchen.de.
⁸ Department of Mathematics, Technische Universität München, Munich, Germany. fabian.theis@helmholtz-muenchen.de.

^# Contributed equally.

PMID: 30573817
DOI: 10.1038/s41592-018-0254-1

Abstract

Single-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations, but as with all genomics experiments, batch effects can hamper data integration and interpretation. The success of batch-effect correction is often evaluated by visual inspection of low-dimensional embeddings, which are inherently imprecise. Here we present a user-friendly, robust and sensitive k-nearest-neighbor batch-effect test (kBET; https://github.com/theislab/kBET ) for quantification of batch effects. We used kBET to assess commonly used batch-regression and normalization approaches, and to quantify the extent to which they remove batch effects while preserving biological variability. We also demonstrate the application of kBET to data from peripheral blood mononuclear cells (PBMCs) from healthy donors to distinguish cell-type-specific inter-individual variability from changes in relative proportions of cell populations. This has important implications for future data-integration efforts, central to projects such as the Human Cell Atlas.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Cluster Analysis
Sequence Analysis, RNA / methods*
Single-Cell Analysis / methods*