Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

ComBat-Seq: batch effect adjustment for RNA-Seq count data

Yuqing Zhang, Giovanni Parmigiani, W. Evan Johnson
doi: https://doi.org/10.1101/2020.01.13.904730
Yuqing Zhang
1Division of Computational Biomedicine, Boston University School of Medicine
2Graduate Program in Bioinformatics, Boston University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Giovanni Parmigiani
3Department of Data Sciences, Dana-Farber Cancer Institute
4Department of Biostatistics, Harvard T.H. Chan School of Public Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
W. Evan Johnson
1Division of Computational Biomedicine, Boston University School of Medicine
2Graduate Program in Bioinformatics, Boston University
5Department of Biostatistics, Boston University School of Public Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: wej@bu.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The benefit of integrating batches of genomic data to increase statistical power in differential expression is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data. Many existing methods for batch effect adjustment assume continuous, bell-shaped Gaussian distributions for data. However in RNA-Seq studies where data are skewed, over-dispersed counts, this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used to better capture the properties of counts. We developed a batch correction method, ComBat-Seq, using negative binomial regression. ComBat-Seq retains the integer nature of count data in RNA-Seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-Seq adjusted data result in better statistical power and control of false positives in differential expression, compared to data adjusted by the other available methods. We further demonstrated in a real data example where ComBat-Seq successfully removes batch effects and recovers the biological signal in the data.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 14, 2020.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
ComBat-Seq: batch effect adjustment for RNA-Seq count data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
ComBat-Seq: batch effect adjustment for RNA-Seq count data
Yuqing Zhang, Giovanni Parmigiani, W. Evan Johnson
bioRxiv 2020.01.13.904730; doi: https://doi.org/10.1101/2020.01.13.904730
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
ComBat-Seq: batch effect adjustment for RNA-Seq count data
Yuqing Zhang, Giovanni Parmigiani, W. Evan Johnson
bioRxiv 2020.01.13.904730; doi: https://doi.org/10.1101/2020.01.13.904730

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3477)
  • Biochemistry (7316)
  • Bioengineering (5294)
  • Bioinformatics (20188)
  • Biophysics (9972)
  • Cancer Biology (7697)
  • Cell Biology (11243)
  • Clinical Trials (138)
  • Developmental Biology (6415)
  • Ecology (9911)
  • Epidemiology (2065)
  • Evolutionary Biology (13270)
  • Genetics (9347)
  • Genomics (12544)
  • Immunology (7667)
  • Microbiology (18928)
  • Molecular Biology (7415)
  • Neuroscience (40868)
  • Paleontology (298)
  • Pathology (1226)
  • Pharmacology and Toxicology (2125)
  • Physiology (3138)
  • Plant Biology (6836)
  • Scientific Communication and Education (1268)
  • Synthetic Biology (1891)
  • Systems Biology (5295)
  • Zoology (1083)