PT  - JOURNAL ARTICLE
AU  - Kushal K. Dey
AU  - Samuel S. Kim
AU  - Steven Gazal
AU  - Joseph Nasser
AU  - Jesse M. Engreitz
AU  - Alkes L. Price
TI  - Integrative approaches to improve the informativeness of deep learning models for human complex diseases
AID  - 10.1101/2020.09.08.288563
DP  - 2021 Jan 01
TA  - bioRxiv
PG  - 2020.09.08.288563
4099  - http://biorxiv.org/content/early/2021/08/13/2020.09.08.288563.short
4100  - http://biorxiv.org/content/early/2021/08/13/2020.09.08.288563.full
AB  - Deep learning models have achieved great success in predicting genome-wide regulatory effects from DNA sequence, but recent work has reported that SNP annotations derived from these predictions contribute limited unique information for human complex disease. Here, we explore three integrative approaches to improve the disease informativeness of allelic-effect annotations (predicted difference between reference and variant alleles) constructed using several previously trained deep learning models: DeepSEA, Basenji and DeepBind (and a related machine learning model, deltaSVM). First, we employ gradient boosting to learn optimal combinations of deep learning annotations, using fine-mapped SNPs and matched control SNPs (on held-out chromosomes) for training. Second, we improve the specificity of these annotations by restricting them to SNPs implicated by (proximal and distal) SNP-to-gene (S2G) linking strategies, e.g. prioritizing SNPs involved in gene regulation. Third, we predict gene expression (and derive allelic-effect annotations) from deep learning annotations at SNPs implicated by S2G linking strategies — generalizing the previously proposed ExPecto approach, which incorporates deep learning annotations based on distance to TSS. We evaluated these approaches using stratified LD score regression, using functional data in blood and focusing on 11 autoimmune diseases and blood-related traits (average N =306K). We determined that the three approaches produced SNP annotations that were uniquely informative for these diseases/traits, despite the fact that linear combinations of the underlying DeepSEA, Basenji, DeepBind and deltaSVM blood annotations were not uniquely informative for these diseases/traits. Our results highlight the benefits of integrating SNP annotations produced by deep learning models with other types of data, including data linking SNPs to genes.Competing Interest StatementThe authors have declared no competing interest.