Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Gene-set Enrichment with Regularized Regression

View ORCID ProfileTao Fang, View ORCID ProfileIakov Davydov, View ORCID ProfileDaniel Marbach, View ORCID ProfileJitao David Zhang
doi: https://doi.org/10.1101/659920
Tao Fang
1Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
2European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tao Fang
Iakov Davydov
1Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Iakov Davydov
Daniel Marbach
1Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel Marbach
Jitao David Zhang
1Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jitao David Zhang
  • For correspondence: jitao_david.zhang@roche.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Motivation Canonical methods for gene-set enrichment analysis assume independence between gene-sets. In practice, heterogeneous gene-sets from diverse sources are frequently combined and used, resulting in gene-sets with overlapping genes. They compromise statistical modelling and complicate interpretation of results.

Results We rephrase gene-set enrichment as a regression problem. Given some genes of interest (e.g. a list of hits from an experiment) and gene-sets (e.g. functional annotations or pathways), we aim to identify a sparse list of gene-sets for the genes of interest. In a regression framework, this amounts to identifying a minimum set of gene-sets that optimally predicts whether any gene belongs to the given genes of interest. To accommodate redundancy between gene-sets, we propose regularized regression techniques such as the elastic net. We report that regression-based results are consistent with established gene-set enrichment methods but more parsimonious and interpretable.

Availability We implement the model in gerr (gene-set enrichment with regularized regression), an R package freely available at https://github.com/TaoDFang/gerr and submitted to Bioconductor. Code and data required to reproduce the results of this study are available at https://github.com/TaoDFang/GeneModuleAnnotationPaper.

Contact Jitao David Zhang (jitao_david.zhang{at}roche.com), Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124, 4070 Basel, Switzerland.

Footnotes

  • We provided more details about the simulation study with GO gene-sets, and made other modifications including references and discussions. The methodology and software described remain largely unchanged.

  • https://github.com/TaoDFang/GeneModuleAnnotationPaper

  • https://github.com/TaoDFang/gerr

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted August 28, 2019.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Gene-set Enrichment with Regularized Regression
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Gene-set Enrichment with Regularized Regression
Tao Fang, Iakov Davydov, Daniel Marbach, Jitao David Zhang
bioRxiv 659920; doi: https://doi.org/10.1101/659920
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Gene-set Enrichment with Regularized Regression
Tao Fang, Iakov Davydov, Daniel Marbach, Jitao David Zhang
bioRxiv 659920; doi: https://doi.org/10.1101/659920

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4840)
  • Biochemistry (10766)
  • Bioengineering (8026)
  • Bioinformatics (27215)
  • Biophysics (13947)
  • Cancer Biology (11095)
  • Cell Biology (16016)
  • Clinical Trials (138)
  • Developmental Biology (8764)
  • Ecology (13254)
  • Epidemiology (2067)
  • Evolutionary Biology (17332)
  • Genetics (11669)
  • Genomics (15890)
  • Immunology (11004)
  • Microbiology (26023)
  • Molecular Biology (10620)
  • Neuroscience (56406)
  • Paleontology (417)
  • Pathology (1729)
  • Pharmacology and Toxicology (2999)
  • Physiology (4534)
  • Plant Biology (9599)
  • Scientific Communication and Education (1610)
  • Synthetic Biology (2677)
  • Systems Biology (6963)
  • Zoology (1508)