Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Gene regulation network inference using k-nearest neighbor-based mutual information estimation-Revisiting an old DREAM

View ORCID ProfileLior I. Shachaf, View ORCID ProfileElijah Roberts, View ORCID ProfilePatrick Cahan, View ORCID ProfileJie Xiao
doi: https://doi.org/10.1101/2021.12.20.473242
Lior I. Shachaf
1Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lior I. Shachaf
  • For correspondence: shachaflior@jhu.edu
Elijah Roberts
1Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
210x Genomics, 6230 Stoneridge Mall Road, Pleasanton, CA 94588-3260, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Elijah Roberts
Patrick Cahan
3Institute for Cell Engineering, Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, 733 N. Broadway, Baltimore, MD 21205, U. S. A.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Patrick Cahan
Jie Xiao
4Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, 725 N. Wolfe Street, WBSB 708, Baltimore, MD, 21205
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jie Xiao
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past twenty years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization.

Results In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov-Stoögbauer-Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods.

Conclusions Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction - which combines CMIA, and the KSG-MI estimator - achieves an improvement of 20-35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or choose gene candidates for experimental validations.

Competing Interest Statement

The authors have declared no competing interest.

  • List of abbreviations

    GRN
    Gene regulatory network
    ODE
    Ordinary differential equations
    MI
    Mutual information
    PDF
    Probability density functions
    FB
    Fixed (width) binning
    AP
    Adaptive partitioning
    kNN
    k-nearest neighbor
    KDE
    Kernel density estimator
    CLR
    Context likelihood of relatedness
    CMIA
    Conditional mutual information augmentation
    KSG
    Kraskov-Stoögbauer-Grassberger
    RL
    Relevance networks
    ARACNE
    Algorithm for the Reconstruction of Accurate Cellular Networks
    SA-CLR
    Synergy-Augmented CLR
    ML
    Maximum likelihood
    MM
    Miller-Madow
    KL
    Kozachenko-Leonenko
    TC
    Total correlation
    MI3
    Three-way MI
    II
    Interaction information
    CMI
    Conditional mutual information
    DREAM
    Dialogue for reverse engineering assessments and methods
    AUPR
    Area under precision-recall curve
    CMI2rt
    Luo et al. inference algorithm named MI3
    DPI
    Data Processing Inequality
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted December 21, 2021.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Gene regulation network inference using k-nearest neighbor-based mutual information estimation-Revisiting an old DREAM
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Gene regulation network inference using k-nearest neighbor-based mutual information estimation-Revisiting an old DREAM
    Lior I. Shachaf, Elijah Roberts, Patrick Cahan, Jie Xiao
    bioRxiv 2021.12.20.473242; doi: https://doi.org/10.1101/2021.12.20.473242
    Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Gene regulation network inference using k-nearest neighbor-based mutual information estimation-Revisiting an old DREAM
    Lior I. Shachaf, Elijah Roberts, Patrick Cahan, Jie Xiao
    bioRxiv 2021.12.20.473242; doi: https://doi.org/10.1101/2021.12.20.473242

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (4369)
    • Biochemistry (9545)
    • Bioengineering (7068)
    • Bioinformatics (24767)
    • Biophysics (12559)
    • Cancer Biology (9923)
    • Cell Biology (14297)
    • Clinical Trials (138)
    • Developmental Biology (7929)
    • Ecology (12074)
    • Epidemiology (2067)
    • Evolutionary Biology (15954)
    • Genetics (10903)
    • Genomics (14705)
    • Immunology (9843)
    • Microbiology (23582)
    • Molecular Biology (9454)
    • Neuroscience (50691)
    • Paleontology (369)
    • Pathology (1535)
    • Pharmacology and Toxicology (2674)
    • Physiology (3997)
    • Plant Biology (8638)
    • Scientific Communication and Education (1505)
    • Synthetic Biology (2388)
    • Systems Biology (6415)
    • Zoology (1344)