Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples

Jacob Schreiber, Jeffrey Bilmes, View ORCID ProfileWilliam Stafford Noble
doi: https://doi.org/10.1101/533273
Jacob Schreiber
1Paul G. Allen School of Computer Science and Engineering, University of Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey Bilmes
2Department of Electrical Engineering, University of Washington
1Paul G. Allen School of Computer Science and Engineering, University of Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
William Stafford Noble
3Department of Genome Sciences, University of Washington
1Paul G. Allen School of Computer Science and Engineering, University of Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for William Stafford Noble
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Motivation Recent efforts to describe the human epigenome have yielded thousands of uniformly processed epigenomic and transcriptomic data sets. These data sets characterize a rich variety of biological activity in hundreds of human cell lines and tissues (“biosamples”). Understanding these data sets, and specifically how they differ across biosamples, can help explain many cellular mechanisms, particularly those driving development and disease. However, due primarily to cost, the total number of assays that can be performed is limited. Previously described imputation approaches, such as Avocado, have sought to overcome this limitation by predicting genome-wide epigenomics experiments using learned associations among available epigenomic data sets. However, these previous imputations have focused primarily on measurements of histone modification and chromatin accessibility, despite other biological activity being crucially important.

Results We applied Avocado to a data set of 3,814 tracks of data derived from the ENCODE compendium, spanning 400 human biosamples and 84 assays. The resulting imputations cover measurements of chromatin accessibility, histone modification, transcription, and protein binding. We demonstrate the quality of these imputations by comprehensively evaluating the model’s predictions and by showing significant improvements in protein binding performance compared to the top models in an ENCODE-DREAM challenge. Additionally, we show that the Avocado model allows for efficient addition of new assays and biosamples to a pre-trained model, achieving high accuracy at predicting protein binding, even with only a single track of training data.

Availability Tutorials and source code are available under an Apache 2.0 license at https://github.com/jmschrei/avocado.

Contact william-noble{at}uw.edu or jmschr{at}cs.washington.edu

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted January 29, 2019.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples
Jacob Schreiber, Jeffrey Bilmes, William Stafford Noble
bioRxiv 533273; doi: https://doi.org/10.1101/533273
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples
Jacob Schreiber, Jeffrey Bilmes, William Stafford Noble
bioRxiv 533273; doi: https://doi.org/10.1101/533273

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3579)
  • Biochemistry (7525)
  • Bioengineering (5486)
  • Bioinformatics (20701)
  • Biophysics (10261)
  • Cancer Biology (7939)
  • Cell Biology (11585)
  • Clinical Trials (138)
  • Developmental Biology (6573)
  • Ecology (10144)
  • Epidemiology (2065)
  • Evolutionary Biology (13553)
  • Genetics (9502)
  • Genomics (12794)
  • Immunology (7888)
  • Microbiology (19457)
  • Molecular Biology (7618)
  • Neuroscience (41916)
  • Paleontology (307)
  • Pathology (1253)
  • Pharmacology and Toxicology (2182)
  • Physiology (3253)
  • Plant Biology (7010)
  • Scientific Communication and Education (1291)
  • Synthetic Biology (1942)
  • Systems Biology (5410)
  • Zoology (1108)