PT - JOURNAL ARTICLE AU - Tao Fang AU - Iakov Davydov AU - Daniel Marbach AU - Jitao David Zhang TI - Gene-set enrichment with regularized regression AID - 10.1101/659920 DP - 2019 Jan 01 TA - bioRxiv PG - 659920 4099 - http://biorxiv.org/content/early/2019/06/04/659920.short 4100 - http://biorxiv.org/content/early/2019/06/04/659920.full AB - Motivation Canonical methods for gene-set enrichment analysis assume independence between gene-sets. While the assumption may be reasonable when the redundancy is low, its validity breaks down when gene-sets are overlapping or even redundant with each other. In practice, heterogeneous gene-sets from different sources are often used, leading to hit gene-sets that are partially or fully overlapping, which compromises statistical modelling and complicates results interpretation.Results We rephrase gene-set enrichment as a regression problem by treating genes-of-interest membership as a binary target variable, and gene-set membership as binary dependent variables. The goal is to identify a minimum set of gene-sets that best predict whether or not a gene belongs to a set of genes of interest. To accommodate redundancy between gene-sets, we propose to solve the problem with regularized regression techniques such as the elastic net. We found that regression-based results are consistent with established methods, but much more sparse and therefore interpretable.Availability We implement the model in an R package, gerr (gene-set enrichment with regularized regression), which is freely available at https://github.com/TaoDFang/gerr and has been submitted to Bioconductor. The scripts and the data used in this paper are available at https://github.com/TaoDFang/GeneModuleAnnotationPaper.Contact Jitao David Zhang (jitao_david.zhang{at}roche.com), Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124, 4070 Basel, Switzerland.