Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

ZODIAC: database-independent molecular formula annotation using Gibbs sampling reveals unknown small molecules

View ORCID ProfileMarcus Ludwig, View ORCID ProfileLouis-Félix Nothias, View ORCID ProfileKai Dührkop, Irina Koester, View ORCID ProfileMarkus Fleischauer, View ORCID ProfileMartin A. Hoffmann, View ORCID ProfileDaniel Petras, View ORCID ProfileFernando Vargas, View ORCID ProfileMustafa Morsy, Lihini Aluwihare, View ORCID ProfilePieter C. Dorrestein, View ORCID ProfileSebastian Böcker
doi: https://doi.org/10.1101/842740
Marcus Ludwig
1Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marcus Ludwig
Louis-Félix Nothias
2Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, USA
3Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Louis-Félix Nothias
Kai Dührkop
1Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kai Dührkop
Irina Koester
2Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, USA
4Scripps Institution of Oceanography, University of California San Diego, La Jolla, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Markus Fleischauer
1Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Markus Fleischauer
Martin A. Hoffmann
1Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
5International Max Planck Research School “Exploration of Ecological Interactions with Molecular and Chemical Techniques”, Max Planck Institute for Chemical Ecology, Jena, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Martin A. Hoffmann
Daniel Petras
2Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, USA
3Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, USA
4Scripps Institution of Oceanography, University of California San Diego, La Jolla, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel Petras
Fernando Vargas
3Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, USA
6Division of Biological Science, University of California San Diego, La Jolla, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fernando Vargas
Mustafa Morsy
7Department of Biological and Environmental Sciences, University of West Alabama, Livingston, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mustafa Morsy
Lihini Aluwihare
4Scripps Institution of Oceanography, University of California San Diego, La Jolla, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pieter C. Dorrestein
2Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, USA
3Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pieter C. Dorrestein
Sebastian Böcker
1Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sebastian Böcker
  • For correspondence: sebastian.boecker@uni-jena.de
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

1 Abstract

The confident high-throughput identification of small molecules remains one of the most challenging tasks in mass spectrometry-based metabolomics. SIRIUS has become a powerful tool for the interpretation of tandem mass spectra, and shows outstanding performance for identifying the molecular formula of a query compound, being the first step of structure identification. Nevertheless, the identification of both molecular formulas for large compounds above 500 Daltons and novel molecular formulas remains highly challenging. Here, we present ZODIAC, a network-based algorithm for the de novo estimation of molecular formulas. ZODIAC reranks SIRIUS’ molecular formula candidates, combining fragmentation tree computation with Bayesian statistics using Gibbs sampling. Through careful algorithm engineering, ZODIAC’s Gibbs sampling is very swift in practice. ZODIAC decreases incorrect annotations 16.2-fold on a challenging plant extract dataset with most compounds above 700 Dalton; we then show improvements on four additional, diverse datasets. Our analysis led to the discovery of compounds with novel molecular formulas such as C24H47BrNO8P which, as of today, is not present in any publicly available molecular structure databases.

Footnotes

  • https://bio.informatik.uni-jena.de/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted November 16, 2019.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
ZODIAC: database-independent molecular formula annotation using Gibbs sampling reveals unknown small molecules
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
ZODIAC: database-independent molecular formula annotation using Gibbs sampling reveals unknown small molecules
Marcus Ludwig, Louis-Félix Nothias, Kai Dührkop, Irina Koester, Markus Fleischauer, Martin A. Hoffmann, Daniel Petras, Fernando Vargas, Mustafa Morsy, Lihini Aluwihare, Pieter C. Dorrestein, Sebastian Böcker
bioRxiv 842740; doi: https://doi.org/10.1101/842740
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
ZODIAC: database-independent molecular formula annotation using Gibbs sampling reveals unknown small molecules
Marcus Ludwig, Louis-Félix Nothias, Kai Dührkop, Irina Koester, Markus Fleischauer, Martin A. Hoffmann, Daniel Petras, Fernando Vargas, Mustafa Morsy, Lihini Aluwihare, Pieter C. Dorrestein, Sebastian Böcker
bioRxiv 842740; doi: https://doi.org/10.1101/842740

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3607)
  • Biochemistry (7581)
  • Bioengineering (5529)
  • Bioinformatics (20809)
  • Biophysics (10338)
  • Cancer Biology (7988)
  • Cell Biology (11647)
  • Clinical Trials (138)
  • Developmental Biology (6611)
  • Ecology (10217)
  • Epidemiology (2065)
  • Evolutionary Biology (13630)
  • Genetics (9550)
  • Genomics (12854)
  • Immunology (7925)
  • Microbiology (19555)
  • Molecular Biology (7668)
  • Neuroscience (42147)
  • Paleontology (308)
  • Pathology (1258)
  • Pharmacology and Toxicology (2203)
  • Physiology (3269)
  • Plant Biology (7051)
  • Scientific Communication and Education (1294)
  • Synthetic Biology (1952)
  • Systems Biology (5429)
  • Zoology (1119)