Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding

View ORCID ProfileTrang T. Le, View ORCID ProfileBryan A. Dawkins, View ORCID ProfileBrett A. McKinney
doi: https://doi.org/10.1101/861492
Trang T. Le
1Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Trang T. Le
Bryan A. Dawkins
2Department of Mathematics, University of Tulsa, Tulsa, OK 74104
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bryan A. Dawkins
Brett A. McKinney
2Department of Mathematics, University of Tulsa, Tulsa, OK 74104
3Tandy School of Computer Science, University of Tulsa, Tulsa, OK 74104
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brett A. McKinney
  • For correspondence: brett-mckinney@utulsa.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Machine learning feature selection methods are needed to detect complex interaction-network effects in complicated modeling scenarios in high-dimensional data, such as GWAS, gene expression, eQTL, and structural/functional neuroimage studies for case-control or continuous outcomes. In addition, many machine learning methods have limited ability to address the issues of controlling false discoveries and adjusting for covariates. To address these challenges, we develop a new feature selection technique called Nearest-neighbor Projected-Distance Regression (NPDR) that calculates the importance of each predictor using generalized linear model (GLM) regression of distances between nearest-neighbor pairs projected onto the predictor dimension. NPDR captures the underlying interaction structure of data using nearest-neighbors in high dimensions, handles both dichotomous and continuous outcomes and predictor data types, statistically corrects for covariates, and permits statistical inference and penalized regression. We use realistic simulations with interactions and other effects to show that NPDR has better precision-recall than standard Relief-based feature selection and random forest importance, with the additional benefit of covariate adjustment and multiple testing correction. Using RNA-Seq data from a study of major depressive disorder (MDD), we show that NPDR with covariate adjustment removes spurious associations due to confounding. We apply NPDR to eQTL data to identify potentially interacting variants that regulate transcripts associated with MDD and demonstrate NPDR’s utility for GWAS and continuous outcomes.

Footnotes

  • https://insilico.github.io/npdr/

  • https://github.com/lelaboratoire/npdr-paper

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted December 03, 2019.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding
Trang T. Le, Bryan A. Dawkins, Brett A. McKinney
bioRxiv 861492; doi: https://doi.org/10.1101/861492
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding
Trang T. Le, Bryan A. Dawkins, Brett A. McKinney
bioRxiv 861492; doi: https://doi.org/10.1101/861492

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3502)
  • Biochemistry (7343)
  • Bioengineering (5319)
  • Bioinformatics (20258)
  • Biophysics (10008)
  • Cancer Biology (7735)
  • Cell Biology (11293)
  • Clinical Trials (138)
  • Developmental Biology (6434)
  • Ecology (9947)
  • Epidemiology (2065)
  • Evolutionary Biology (13315)
  • Genetics (9359)
  • Genomics (12579)
  • Immunology (7696)
  • Microbiology (19008)
  • Molecular Biology (7437)
  • Neuroscience (41011)
  • Paleontology (300)
  • Pathology (1228)
  • Pharmacology and Toxicology (2134)
  • Physiology (3155)
  • Plant Biology (6858)
  • Scientific Communication and Education (1272)
  • Synthetic Biology (1895)
  • Systems Biology (5311)
  • Zoology (1087)