Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis

View ORCID ProfileCecilia Wieder, Clément Frainay, View ORCID ProfileNathalie Poupin, View ORCID ProfilePablo Rodríguez-Mier, Florence Vinson, View ORCID ProfileJuliette Cooke, View ORCID ProfileRachel PJ Lai, View ORCID ProfileJacob G Bundy, View ORCID ProfileFabien Jourdan, Timothy Ebbels
doi: https://doi.org/10.1101/2021.05.24.445406
Cecilia Wieder
1Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Cecilia Wieder
Clément Frainay
4Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nathalie Poupin
4Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nathalie Poupin
Pablo Rodríguez-Mier
4Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pablo Rodríguez-Mier
Florence Vinson
4Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juliette Cooke
4Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Juliette Cooke
Rachel PJ Lai
3Department of Infectious Disease, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rachel PJ Lai
Jacob G Bundy
2Section of Biomolecular Medicine, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jacob G Bundy
Fabien Jourdan
4Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
5MetaToul-MetaboHUB, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fabien Jourdan
Timothy Ebbels
1Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: t.ebbels@imperial.ac.uk
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention in the field. We developed in-silico simulations using five publicly available datasets and illustrated that changes in parameters, such as the background set, differential metabolite selection methods, and pathway database choice, could all lead to profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases: KEGG, Reactome, and BioCyc, led to vastly different results in both the number and function of significantly enriched pathways. Metabolomics data specific factors, such as reliability of compound identification and assay chemical bias also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.

Author summary Metabolomics is a rapidly growing field of study involving the profiling of small molecules within an organism. It allows researchers to understand the effects of biological status (such as health or disease) on cellular biochemistry, and has wide-ranging applications, from biomarker discovery and personalised medicine in healthcare to crop protection and food security in agriculture. Pathway analysis helps to understand which biological pathways, representing collections of molecules performing a particular function, are involved in response to a disease phenotype, or drug treatment, for example. Over-representation analysis (ORA) is perhaps the most common pathway analysis method used in the metabolomics community. However, ORA can give drastically different results depending on the input data and parameters used. In this work, we have established the effects of these factors on ORA results using computational simulations applied to five real-world datasets. Based on our results, we offer the research community a set of best-practice recommendations applicable not only to ORA but also to other pathway analysis methods to help ensure the reliability and reproducibility of results.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted May 24, 2021.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis
Cecilia Wieder, Clément Frainay, Nathalie Poupin, Pablo Rodríguez-Mier, Florence Vinson, Juliette Cooke, Rachel PJ Lai, Jacob G Bundy, Fabien Jourdan, Timothy Ebbels
bioRxiv 2021.05.24.445406; doi: https://doi.org/10.1101/2021.05.24.445406
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis
Cecilia Wieder, Clément Frainay, Nathalie Poupin, Pablo Rodríguez-Mier, Florence Vinson, Juliette Cooke, Rachel PJ Lai, Jacob G Bundy, Fabien Jourdan, Timothy Ebbels
bioRxiv 2021.05.24.445406; doi: https://doi.org/10.1101/2021.05.24.445406

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3687)
  • Biochemistry (7782)
  • Bioengineering (5673)
  • Bioinformatics (21259)
  • Biophysics (10566)
  • Cancer Biology (8165)
  • Cell Biology (11920)
  • Clinical Trials (138)
  • Developmental Biology (6748)
  • Ecology (10393)
  • Epidemiology (2065)
  • Evolutionary Biology (13847)
  • Genetics (9700)
  • Genomics (13061)
  • Immunology (8133)
  • Microbiology (19976)
  • Molecular Biology (7841)
  • Neuroscience (43006)
  • Paleontology (318)
  • Pathology (1276)
  • Pharmacology and Toxicology (2257)
  • Physiology (3350)
  • Plant Biology (7219)
  • Scientific Communication and Education (1309)
  • Synthetic Biology (2000)
  • Systems Biology (5529)
  • Zoology (1126)