Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A Deep Learning Genome-Mining Strategy Improves Biosynthetic Gene Cluster Prediction

View ORCID ProfileGeoffrey D. Hannigan, David Prihoda, Andrej Palicka, Jindrich Soukup, Ondrej Klempir, Lena Rampula, Jindrich Durcak, Michael Wurst, Jakub Kotowski, Dan Chang, Rurun Wang, Grazia Piizzi, Daria J. Hazuda, Christopher H. Woelk, Danny A. Bitton
doi: https://doi.org/10.1101/500694
Geoffrey D. Hannigan
1Merck Exploratory Science Center, Merck Research Laboratories, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Geoffrey D. Hannigan
David Prihoda
2Big Data Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrej Palicka
3AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jindrich Soukup
4Data Science, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ondrej Klempir
5Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lena Rampula
6NLP, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jindrich Durcak
5Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Wurst
3AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jakub Kotowski
3AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dan Chang
7Genetics & Pharmacogenomics, Merck & Co., Inc., Boston, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rurun Wang
1Merck Exploratory Science Center, Merck Research Laboratories, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Grazia Piizzi
1Merck Exploratory Science Center, Merck Research Laboratories, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daria J. Hazuda
1Merck Exploratory Science Center, Merck Research Laboratories, Cambridge, Massachusetts, USA
8Infectious Diseases and Vaccine Research, Merck Research Laboratories, Merck & Co. Inc., West Point, PA, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher H. Woelk
1Merck Exploratory Science Center, Merck Research Laboratories, Cambridge, Massachusetts, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: danny.bitton@merck.com christopher.woelk@merck.com
Danny A. Bitton
5Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: danny.bitton@merck.com christopher.woelk@merck.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers more accurate BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing tools. We supplemented this with downstream random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a significant step forward for in-silico BGC identification.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted December 18, 2018.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A Deep Learning Genome-Mining Strategy Improves Biosynthetic Gene Cluster Prediction
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A Deep Learning Genome-Mining Strategy Improves Biosynthetic Gene Cluster Prediction
Geoffrey D. Hannigan, David Prihoda, Andrej Palicka, Jindrich Soukup, Ondrej Klempir, Lena Rampula, Jindrich Durcak, Michael Wurst, Jakub Kotowski, Dan Chang, Rurun Wang, Grazia Piizzi, Daria J. Hazuda, Christopher H. Woelk, Danny A. Bitton
bioRxiv 500694; doi: https://doi.org/10.1101/500694
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
A Deep Learning Genome-Mining Strategy Improves Biosynthetic Gene Cluster Prediction
Geoffrey D. Hannigan, David Prihoda, Andrej Palicka, Jindrich Soukup, Ondrej Klempir, Lena Rampula, Jindrich Durcak, Michael Wurst, Jakub Kotowski, Dan Chang, Rurun Wang, Grazia Piizzi, Daria J. Hazuda, Christopher H. Woelk, Danny A. Bitton
bioRxiv 500694; doi: https://doi.org/10.1101/500694

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4238)
  • Biochemistry (9159)
  • Bioengineering (6797)
  • Bioinformatics (24054)
  • Biophysics (12149)
  • Cancer Biology (9564)
  • Cell Biology (13819)
  • Clinical Trials (138)
  • Developmental Biology (7654)
  • Ecology (11733)
  • Epidemiology (2066)
  • Evolutionary Biology (15536)
  • Genetics (10665)
  • Genomics (14353)
  • Immunology (9504)
  • Microbiology (22887)
  • Molecular Biology (9120)
  • Neuroscience (49094)
  • Paleontology (357)
  • Pathology (1487)
  • Pharmacology and Toxicology (2579)
  • Physiology (3851)
  • Plant Biology (8349)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2300)
  • Systems Biology (6204)
  • Zoology (1302)