Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Automated assembly of molecular mechanisms at scale from text mining and curated databases

View ORCID ProfileJohn A. Bachman, View ORCID ProfileBenjamin M. Gyori, View ORCID ProfilePeter K. Sorger
doi: https://doi.org/10.1101/2022.08.30.505688
John A. Bachman
Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for John A. Bachman
Benjamin M. Gyori
Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Benjamin M. Gyori
  • For correspondence: peter_sorger@hms.harvard.edu lsp-papers@hms.harvard.edu benjamin_gyori@hms.harvard.edu
Peter K. Sorger
Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter K. Sorger
  • For correspondence: peter_sorger@hms.harvard.edu lsp-papers@hms.harvard.edu benjamin_gyori@hms.harvard.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

The analysis of ‘omic data depends heavily on machine-readable information about protein interactions, modifications, and activities. Key resources include protein interaction networks, databases of post-translational modifications, and curated models of gene and protein function. Software systems that read primary literature can potentially extend and update such resources while reducing the burden on human curators, but machine-reading software systems have a high error rate. Here we describe an approach to precisely assemble molecular mechanisms at scale using natural language processing systems and the Integrated Network and Dynamical Reasoning Assembler (INDRA). INDRA identifies overlaps and redundancies in information extracted from published papers and pathway databases and uses probability models to reduce machine reading errors. INDRA enables the automated creation of high-quality, non-redundant corpora for use in data analysis and causal modeling. We demonstrate the use of INDRA in extending protein-protein interaction databases and explaining co-dependencies in the Cancer Dependency Map.

Competing Interest Statement

PKS is a co-founder and member of the BOD of Glencoe Software, a member of the BOD for Applied Biomath, and a member of the SAB for RareCyte, NanoString and Montai Health; he holds equity in Glencoe, Applied Biomath and RareCyte. PKS is a consultant for Merck and the Sorger lab has received research funding from Novartis and Merck in the past five years. PKS declares that none of these activities have influenced the content of this manuscript. JAB is currently an employee of Google, LLC. BMG declares no outside interests.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted August 31, 2022.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Automated assembly of molecular mechanisms at scale from text mining and curated databases
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Automated assembly of molecular mechanisms at scale from text mining and curated databases
John A. Bachman, Benjamin M. Gyori, Peter K. Sorger
bioRxiv 2022.08.30.505688; doi: https://doi.org/10.1101/2022.08.30.505688
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Automated assembly of molecular mechanisms at scale from text mining and curated databases
John A. Bachman, Benjamin M. Gyori, Peter K. Sorger
bioRxiv 2022.08.30.505688; doi: https://doi.org/10.1101/2022.08.30.505688

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Systems Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (4683)
  • Biochemistry (10360)
  • Bioengineering (7675)
  • Bioinformatics (26336)
  • Biophysics (13528)
  • Cancer Biology (10686)
  • Cell Biology (15440)
  • Clinical Trials (138)
  • Developmental Biology (8497)
  • Ecology (12821)
  • Epidemiology (2067)
  • Evolutionary Biology (16857)
  • Genetics (11399)
  • Genomics (15478)
  • Immunology (10617)
  • Microbiology (25217)
  • Molecular Biology (10223)
  • Neuroscience (54468)
  • Paleontology (401)
  • Pathology (1668)
  • Pharmacology and Toxicology (2897)
  • Physiology (4342)
  • Plant Biology (9247)
  • Scientific Communication and Education (1586)
  • Synthetic Biology (2558)
  • Systems Biology (6781)
  • Zoology (1466)