RT Journal Article SR Electronic T1 Gene regulation network inference using k-nearest neighbor-based mutual information estimation-Revisiting an old DREAM JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.12.20.473242 DO 10.1101/2021.12.20.473242 A1 Lior I. Shachaf A1 Elijah Roberts A1 Patrick Cahan A1 Jie Xiao YR 2021 UL http://biorxiv.org/content/early/2021/12/21/2021.12.20.473242.abstract AB Background A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past twenty years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization.Results In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov-Stoögbauer-Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods.Conclusions Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction - which combines CMIA, and the KSG-MI estimator - achieves an improvement of 20-35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or choose gene candidates for experimental validations.Competing Interest StatementThe authors have declared no competing interest.GRNGene regulatory networkODEOrdinary differential equationsMIMutual informationPDFProbability density functionsFBFixed (width) binningAPAdaptive partitioningkNNk-nearest neighborKDEKernel density estimatorCLRContext likelihood of relatednessCMIAConditional mutual information augmentationKSGKraskov-Stoögbauer-GrassbergerRLRelevance networksARACNEAlgorithm for the Reconstruction of Accurate Cellular NetworksSA-CLRSynergy-Augmented CLRMLMaximum likelihoodMMMiller-MadowKLKozachenko-LeonenkoTCTotal correlationMI3Three-way MIIIInteraction informationCMIConditional mutual informationDREAMDialogue for reverse engineering assessments and methodsAUPRArea under precision-recall curveCMI2rtLuo et al. inference algorithm named MI3DPIData Processing Inequality