RT Journal Article
SR Electronic
T1 Multi-scale deep tensor factorization learns a latent representation of the human epigenome
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 364976
DO 10.1101/364976
A1 Jacob Schreiber
A1 Timothy Durham
A1 Jeffrey Bilmes
A1 William Stafford Noble
YR 2019
UL http://biorxiv.org/content/early/2019/04/11/364976.abstract
AB The human epigenome has been experimentally characterized by measurements of protein binding, chromatin acessibility, methylation, and histone modification in hundreds of cell types. The result is a huge compendium of data, consisting of thousands of measurements for every basepair in the human genome. These data are difficult to make sense of, not only for humans, but also for computational methods that aim to detect genes and other functional elements, predict gene expression, characterize polymorphisms, etc. To address this challenge, we propose a deep neural network tensor factorization method, Avocado, that compresses epigenomic data into a dense, information-rich representation of the human genome. We use data from the Roadmap Epigenomics Consortium to demonstrate that this learned representation of the genome is broadly useful: first, by imputing epigenomic data more accurately than previous methods, and second, by showing that machine learning models that exploit this representation outperform those trained directly on epigenomic data on a variety of genomics tasks. These tasks include predicting gene expression, promoter-enhancer interactions, replication timing, and an element of 3D chromatin architecture. Our findings suggest the broad utility of Avocado’s learned latent representation for computational genomics and epigenomics.