Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

How to normalize metatranscriptomic count data for differential expression analysis

Heiner Klingenberg, Peter Meinicke
doi: https://doi.org/10.1101/134650
Heiner Klingenberg
Abteilung für Bioinformatik, Institut für Mikrobiologie und Genetik, Universität Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter Meinicke
Abteilung für Bioinformatik, Institut für Mikrobiologie und Genetik, Universität Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: peter@gobics.de
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

BACKGROUND Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it is not clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics. The potential side effects that may result from direct application of transcriptomic tools to metatranscriptomic count data have not been studied so far.

METHODS We propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples. As a main consequence the correct normalization of metatranscriptomic count data requires the taxonomic separation of the data into organism-specific bins. Then the taxon-specific scaling of organism profiles yields a valid normalization and allows to recombine the scaled profiles into a metatranscriptomic count matrix. This matrix can then be analyzed with statistical tools for transcriptomic count data. For taxon-specific scaling and recombination of scaled counts we provide a simple R script.

RESULTS When applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data the organism-independent (global) scaling of counts implies a high risk of falsely predicted functional differences. In simulation studies we show that incorrect normalization not only tends to loose significant differences but especially can produce a large number of false positives. In contrast, taxon-specific scaling can equalize the variation of relative library sizes from different organisms and therefore shows a reliable detection of significant differences in all simulations. On real metatranscriptomic data the results from taxon-specific and global scaling can largely differ. In our study, global scaling shows a high number of extra predictions which are not supported by single transcriptome analyses. Inspection of the scaling error suggests that these extra predictions may actually correspond to artifacts of an incorrect normalization.

CONCLUSIONS As in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics. Our model implies a taxon-specific scaling of counts for normalization of the data. The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore effectively prevents the risk of false predictions due to incorrect normalization.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted May 05, 2017.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
How to normalize metatranscriptomic count data for differential expression analysis
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
How to normalize metatranscriptomic count data for differential expression analysis
Heiner Klingenberg, Peter Meinicke
bioRxiv 134650; doi: https://doi.org/10.1101/134650
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
How to normalize metatranscriptomic count data for differential expression analysis
Heiner Klingenberg, Peter Meinicke
bioRxiv 134650; doi: https://doi.org/10.1101/134650

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1526)
  • Biochemistry (2480)
  • Bioengineering (1739)
  • Bioinformatics (9683)
  • Biophysics (3903)
  • Cancer Biology (2972)
  • Cell Biology (4195)
  • Clinical Trials (135)
  • Developmental Biology (2627)
  • Ecology (4102)
  • Epidemiology (2031)
  • Evolutionary Biology (6898)
  • Genetics (5206)
  • Genomics (6501)
  • Immunology (2184)
  • Microbiology (6945)
  • Molecular Biology (2752)
  • Neuroscience (17281)
  • Paleontology (126)
  • Pathology (427)
  • Pharmacology and Toxicology (706)
  • Physiology (1057)
  • Plant Biology (2489)
  • Scientific Communication and Education (643)
  • Synthetic Biology (831)
  • Systems Biology (2689)
  • Zoology (430)