Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types

View ORCID ProfileMaxwell W. Libbrecht, Oscar Rodriguez, Zhiping Weng, Jeffrey A. Bilmes, Michael M. Hoffman, View ORCID ProfileWilliam S. Noble
doi: https://doi.org/10.1101/086025
Maxwell W. Libbrecht
1Department of Computer Science and Engineering, University of Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Maxwell W. Libbrecht
Oscar Rodriguez
2Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhiping Weng
3Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey A. Bilmes
4Department of Electrical Engineering, University of Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael M. Hoffman
5Princess Margaret Cancer Centre, Department of Medical Biophysics, Department of Computer Science, University of Toronto
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
William S. Noble
6Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for William S. Noble
  • For correspondence: noble@gs.washington.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Semi-automated genome annotation methods such as Segway enable understanding of chromatin activity. Here we present chromatin state annotations of 164 human cell types using 1,615 genomics data sets. To produce these annotations, we developed a fully-automated annotation strategy in which we train separate unsupervised annotation models on each cell type and use a machine learning classifier to automate the state interpretation step. Using these annotations, we developed a measure of the importance of each genomic position called the “conservation-associated activity score,” which we use to aggregate information across cell types into a multi-cell type view. The aggregated conservation-associated activity score provides a measure of importance directly attributable to a specific activity in a specific set of cell types. In contrast to evolutionary conservation, this measure is not biased to detect only elements shared with related species. Using the conservation-associated activity score, we combined all our annotations into a single, cell type-agnostic encyclopedia that catalogs all human transcriptional and regulatory elements, enabling easy and intuitive interpretation of the effect of genome variants on phenotype, such as in disease-associated, evolutionarily conserved or positively selected loci. These resources, including cell type-specific annotations, encyclopedia, and a visualization server, are available at http://noble.gs.washington.edu/proj/encyclopedia.

Author Summary Genome annotation algorithms are an effective class of tools for understanding the function of the genome. These algorithms take as input a set of genome-wide measurements about the activity at each base pair in a given tissue, such as where a given protein is binding or how accessible the DNA is to being read by a protein. The genome is then partitioned and each segment is assigned a label such that positions with the same label exhibit similar patterns in the input data. Such annotations are widely used for many applications, such as to understand the mechanism of impact of a given genetic variant. Here we present, to our knowledge, the most comprehensive set of genome annotations created so far, encompassing 164 human cell types and including 1,615 genomics data sets. These comprehensive annotations are made possible by a strategy that automates the previous interpretation step. Furthermore, we present several methodological innovations that make these genome annotations more useful.

Footnotes

  • Changes in terminology, and additional experimental results added.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted July 01, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types
Maxwell W. Libbrecht, Oscar Rodriguez, Zhiping Weng, Jeffrey A. Bilmes, Michael M. Hoffman, William S. Noble
bioRxiv 086025; doi: https://doi.org/10.1101/086025
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types
Maxwell W. Libbrecht, Oscar Rodriguez, Zhiping Weng, Jeffrey A. Bilmes, Michael M. Hoffman, William S. Noble
bioRxiv 086025; doi: https://doi.org/10.1101/086025

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4049)
  • Biochemistry (8682)
  • Bioengineering (6401)
  • Bioinformatics (23122)
  • Biophysics (11642)
  • Cancer Biology (9035)
  • Cell Biology (13134)
  • Clinical Trials (138)
  • Developmental Biology (7354)
  • Ecology (11275)
  • Epidemiology (2066)
  • Evolutionary Biology (14982)
  • Genetics (10333)
  • Genomics (13918)
  • Immunology (9017)
  • Microbiology (21882)
  • Molecular Biology (8673)
  • Neuroscience (46955)
  • Paleontology (349)
  • Pathology (1403)
  • Pharmacology and Toxicology (2459)
  • Physiology (3679)
  • Plant Biology (7972)
  • Scientific Communication and Education (1418)
  • Synthetic Biology (2189)
  • Systems Biology (5970)
  • Zoology (1235)