Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Genome-wide regulatory model from MPRA data predicts functional regions, eQTLs, and GWAS hits

Yue Li, Alvin Houze Shi, Ryan Tewhey, Pardis C. Sabeti, Jason Ernst, Manolis Kellis
doi: https://doi.org/10.1101/110171
Yue Li
1Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology 32 Vassar St, Cambridge, Massachusetts 02139, USA
2The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: liyue@mit.edu manoli@mit.edu
Alvin Houze Shi
1Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology 32 Vassar St, Cambridge, Massachusetts 02139, USA
2The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ryan Tewhey
2The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA
3Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pardis C. Sabeti
2The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA
3Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jason Ernst
4Department of Biological Chemistry, University of California, 615 Charles E Young Dr South, Los Angeles, California 90095, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manolis Kellis
1Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology 32 Vassar St, Cambridge, Massachusetts 02139, USA
2The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: liyue@mit.edu manoli@mit.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Massively-parallel reporter assays (MPRA) enable unprecedented opportunities to test for regulatory activity of thousands of regulatory sequences. However, MPRA only assay a subset of the genome thus limiting their applicability for genome-wide functional annotations. To overcome this limitation, we have used existing MPRA datasets to train a machine learning model that uses DNA sequence information, regulatory motif annotations, evolutionary conservation, and epigenomic information to predict genomic regions that show enhancer activity when tested in MPRA assays. We used the resulting model to generate global predictions of regulatory activity at single-nucleotide resolution across 14 million common variants. We find that genetic variants with stronger predicted regulatory activity show significantly lower minor allele frequency, indicative of evolutionary selection within the human population. They also show higher over-lap with eQTL annotations across multiple tissues relative to the background SNPs, indicating that their perturbations in vivo more frequently result in changes in gene expression. In addition, they are more frequently associated with trait-associated SNPs from genome-wide association studies (GWAS), enabling us to prioritize genetic variants that are more likely to be causal based on their predicted regulatory activity. Lastly, we use our model to compare MPRA inferences across cell types and platforms and to prioritize the assays most predictive of MPRA assay results, including cell-dependent DNase hypersensitivity sites and transcription factors known to be active in the tested cell types. Our results indicate that high-throughput testing of thousands of putative regions, coupled with regulatory predictions across millions of sites, presents a powerful strategy for systematic annotation of genomic regions and genetic variants.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 20, 2017.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Genome-wide regulatory model from MPRA data predicts functional regions, eQTLs, and GWAS hits
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Genome-wide regulatory model from MPRA data predicts functional regions, eQTLs, and GWAS hits
Yue Li, Alvin Houze Shi, Ryan Tewhey, Pardis C. Sabeti, Jason Ernst, Manolis Kellis
bioRxiv 110171; doi: https://doi.org/10.1101/110171
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Genome-wide regulatory model from MPRA data predicts functional regions, eQTLs, and GWAS hits
Yue Li, Alvin Houze Shi, Ryan Tewhey, Pardis C. Sabeti, Jason Ernst, Manolis Kellis
bioRxiv 110171; doi: https://doi.org/10.1101/110171

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4672)
  • Biochemistry (10337)
  • Bioengineering (7655)
  • Bioinformatics (26286)
  • Biophysics (13497)
  • Cancer Biology (10666)
  • Cell Biology (15412)
  • Clinical Trials (138)
  • Developmental Biology (8487)
  • Ecology (12803)
  • Epidemiology (2067)
  • Evolutionary Biology (16822)
  • Genetics (11381)
  • Genomics (15462)
  • Immunology (10596)
  • Microbiology (25165)
  • Molecular Biology (10198)
  • Neuroscience (54382)
  • Paleontology (399)
  • Pathology (1665)
  • Pharmacology and Toxicology (2889)
  • Physiology (4333)
  • Plant Biology (9234)
  • Scientific Communication and Education (1585)
  • Synthetic Biology (2554)
  • Systems Biology (6770)
  • Zoology (1461)