RT Journal Article SR Electronic T1 Gene-set Enrichment with Regularized Regression JF bioRxiv FD Cold Spring Harbor Laboratory SP 659920 DO 10.1101/659920 A1 Fang, Tao A1 Davydov, Iakov A1 Marbach, Daniel A1 Zhang, Jitao David YR 2019 UL http://biorxiv.org/content/early/2019/08/28/659920.abstract AB Motivation Canonical methods for gene-set enrichment analysis assume independence between gene-sets. In practice, heterogeneous gene-sets from diverse sources are frequently combined and used, resulting in gene-sets with overlapping genes. They compromise statistical modelling and complicate interpretation of results.Results We rephrase gene-set enrichment as a regression problem. Given some genes of interest (e.g. a list of hits from an experiment) and gene-sets (e.g. functional annotations or pathways), we aim to identify a sparse list of gene-sets for the genes of interest. In a regression framework, this amounts to identifying a minimum set of gene-sets that optimally predicts whether any gene belongs to the given genes of interest. To accommodate redundancy between gene-sets, we propose regularized regression techniques such as the elastic net. We report that regression-based results are consistent with established gene-set enrichment methods but more parsimonious and interpretable.Availability We implement the model in gerr (gene-set enrichment with regularized regression), an R package freely available at https://github.com/TaoDFang/gerr and submitted to Bioconductor. Code and data required to reproduce the results of this study are available at https://github.com/TaoDFang/GeneModuleAnnotationPaper.Contact Jitao David Zhang (jitao_david.zhang{at}roche.com), Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124, 4070 Basel, Switzerland.