Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A simple new approach to variable selection in regression, with application to genetic fine-mapping

View ORCID ProfileGao Wang, View ORCID ProfileAbhishek Sarkar, View ORCID ProfilePeter Carbonetto, View ORCID ProfileMatthew Stephens
doi: https://doi.org/10.1101/501114
Gao Wang
1Department of Human Genetics, The University of Chicago, Chicago, IL, 60637, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gao Wang
Abhishek Sarkar
1Department of Human Genetics, The University of Chicago, Chicago, IL, 60637, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Abhishek Sarkar
Peter Carbonetto
1Department of Human Genetics, The University of Chicago, Chicago, IL, 60637, USA
2Research Computing Center, The University of Chicago, Chicago, IL, 60637, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter Carbonetto
Matthew Stephens
1Department of Human Genetics, The University of Chicago, Chicago, IL, 60637, USA
3Department of Statistics, The University of Chicago, Chicago, IL, 60637, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Matthew Stephens
  • For correspondence: mstephens@uchicago.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model — the “Sum of Single Effects” (SuSiE) model — which comes from writing the sparse vector of regression coefficients as a sum of “single-effect” vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure — Iterative Bayesian Stepwise Selection (IBSS) — which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods, but instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under the SuSiE model. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a Credible Set of variables for each selection. Our methods are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine-mapping applications. We demonstrate through numerical experiments that our methods outper-form existing methods for this task, and illustrate their application to fine-mapping genetic variants influencing alternative splicing in human cell-lines. We also discuss the potential and challenges for applying these methods to generic variable selection problems.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Funding information, This work was supported by NIH grant HG002585 and by a grant from the Gordon and Betty Moore Foundation.

  • Added a section on identifiability

  • https://github.com/stephenslab/susie-paper

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted June 01, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A simple new approach to variable selection in regression, with application to genetic fine-mapping
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A simple new approach to variable selection in regression, with application to genetic fine-mapping
Gao Wang, Abhishek Sarkar, Peter Carbonetto, Matthew Stephens
bioRxiv 501114; doi: https://doi.org/10.1101/501114
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A simple new approach to variable selection in regression, with application to genetic fine-mapping
Gao Wang, Abhishek Sarkar, Peter Carbonetto, Matthew Stephens
bioRxiv 501114; doi: https://doi.org/10.1101/501114

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3514)
  • Biochemistry (7371)
  • Bioengineering (5347)
  • Bioinformatics (20328)
  • Biophysics (10048)
  • Cancer Biology (7781)
  • Cell Biology (11353)
  • Clinical Trials (138)
  • Developmental Biology (6454)
  • Ecology (9984)
  • Epidemiology (2065)
  • Evolutionary Biology (13359)
  • Genetics (9375)
  • Genomics (12614)
  • Immunology (7729)
  • Microbiology (19118)
  • Molecular Biology (7478)
  • Neuroscience (41163)
  • Paleontology (301)
  • Pathology (1235)
  • Pharmacology and Toxicology (2142)
  • Physiology (3183)
  • Plant Biology (6882)
  • Scientific Communication and Education (1276)
  • Synthetic Biology (1900)
  • Systems Biology (5328)
  • Zoology (1091)