Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Integrative approaches to improve the informativeness of deep learning models for human complex diseases

View ORCID ProfileKushal K. Dey, Samuel S. Kim, Steven Gazal, Joseph Nasser, Jesse M. Engreitz, Alkes L. Price
doi: https://doi.org/10.1101/2020.09.08.288563
Kushal K. Dey
1Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kushal K. Dey
  • For correspondence: kshldey@gmail.com
Samuel S. Kim
1Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven Gazal
1Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph Nasser
3Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jesse M. Engreitz
3Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alkes L. Price
1Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
3Broad Institute of MIT and Harvard, Cambridge, MA, USA
4Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Deep learning models have achieved great success in predicting genome-wide regulatory effects from DNA sequence, but recent work has reported that SNP annotations derived from these predictions contribute limited unique information for human complex disease. Here, we explore three integrative approaches to improve the disease informativeness of allelic-effect annotations (predicted difference between reference and variant alleles) constructed using several previously trained deep learning models: DeepSEA, Basenji and DeepBind (and a related machine learning model, deltaSVM). First, we employ gradient boosting to learn optimal combinations of deep learning annotations, using fine-mapped SNPs and matched control SNPs (on held-out chromosomes) for training. Second, we improve the specificity of these annotations by restricting them to SNPs implicated by (proximal and distal) SNP-to-gene (S2G) linking strategies, e.g. prioritizing SNPs involved in gene regulation. Third, we predict gene expression (and derive allelic-effect annotations) from deep learning annotations at SNPs implicated by S2G linking strategies — generalizing the previously proposed ExPecto approach, which incorporates deep learning annotations based on distance to TSS. We evaluated these approaches using stratified LD score regression, using functional data in blood and focusing on 11 autoimmune diseases and blood-related traits (average N =306K). We determined that the three approaches produced SNP annotations that were uniquely informative for these diseases/traits, despite the fact that linear combinations of the underlying DeepSEA, Basenji, DeepBind and deltaSVM blood annotations were not uniquely informative for these diseases/traits. Our results highlight the benefits of integrating SNP annotations produced by deep learning models with other types of data, including data linking SNPs to genes.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Following reviewer response, we have expanded the set of models from 2 deep learning models (DeepSEA and Basenji) to 4 deep learning/machine learning-based sequence models (DeepSEA, Basenji, DeepBind, deltaSVM). We have also updated the text to clarify the comparisons across methods and the features underlying the performance of these methods in greater detail.

  • https://github.com/kkdey/Imperio

  • https://alkesgroup.broadinstitute.org/LDSCORE/DeepLearning/Dey_DeepBoost_Imperio/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted August 13, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Integrative approaches to improve the informativeness of deep learning models for human complex diseases
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Integrative approaches to improve the informativeness of deep learning models for human complex diseases
Kushal K. Dey, Samuel S. Kim, Steven Gazal, Joseph Nasser, Jesse M. Engreitz, Alkes L. Price
bioRxiv 2020.09.08.288563; doi: https://doi.org/10.1101/2020.09.08.288563
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Integrative approaches to improve the informativeness of deep learning models for human complex diseases
Kushal K. Dey, Samuel S. Kim, Steven Gazal, Joseph Nasser, Jesse M. Engreitz, Alkes L. Price
bioRxiv 2020.09.08.288563; doi: https://doi.org/10.1101/2020.09.08.288563

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4237)
  • Biochemistry (9152)
  • Bioengineering (6789)
  • Bioinformatics (24037)
  • Biophysics (12142)
  • Cancer Biology (9550)
  • Cell Biology (13808)
  • Clinical Trials (138)
  • Developmental Biology (7649)
  • Ecology (11719)
  • Epidemiology (2066)
  • Evolutionary Biology (15522)
  • Genetics (10654)
  • Genomics (14337)
  • Immunology (9495)
  • Microbiology (22872)
  • Molecular Biology (9113)
  • Neuroscience (49070)
  • Paleontology (355)
  • Pathology (1485)
  • Pharmacology and Toxicology (2572)
  • Physiology (3851)
  • Plant Biology (8341)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2299)
  • Systems Biology (6199)
  • Zoology (1302)