PT - JOURNAL ARTICLE AU - Schreiber, Jacob AU - Durham, Timothy AU - Bilmes, Jeffrey AU - Noble, William Stafford TI - Multi-scale deep tensor factorization learns a latent representation of the human epigenome AID - 10.1101/364976 DP - 2019 Jan 01 TA - bioRxiv PG - 364976 4099 - http://biorxiv.org/content/early/2019/04/11/364976.short 4100 - http://biorxiv.org/content/early/2019/04/11/364976.full AB - The human epigenome has been experimentally characterized by measurements of protein binding, chromatin acessibility, methylation, and histone modification in hundreds of cell types. The result is a huge compendium of data, consisting of thousands of measurements for every basepair in the human genome. These data are difficult to make sense of, not only for humans, but also for computational methods that aim to detect genes and other functional elements, predict gene expression, characterize polymorphisms, etc. To address this challenge, we propose a deep neural network tensor factorization method, Avocado, that compresses epigenomic data into a dense, information-rich representation of the human genome. We use data from the Roadmap Epigenomics Consortium to demonstrate that this learned representation of the genome is broadly useful: first, by imputing epigenomic data more accurately than previous methods, and second, by showing that machine learning models that exploit this representation outperform those trained directly on epigenomic data on a variety of genomics tasks. These tasks include predicting gene expression, promoter-enhancer interactions, replication timing, and an element of 3D chromatin architecture. Our findings suggest the broad utility of Avocado’s learned latent representation for computational genomics and epigenomics.