Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics

Jacob Schreiber; Deepthi Hegde; William Noble

doi:10.1101/801183

ABSTRACT

Recent large-scale efforts to characterize functional activity in human have produced thousands of genome-wide experiments that quantify various forms of biochemistry, such as histone modifications, protein binding, transcription, and chromatin accessibility. Although these experiments represent a small fraction of the possible experiments that could be performed, they also make human more comprehensively characterized than any other species. We propose an extension to the imputation approach Avocado that enables the model to leverage genome alignments and the large number of human genomics data sets when making imputations in other species. We found that not only does this extension result in improved imputation of mouse functional experiments, but that the extended model is able to make accurate imputations for protein binding assays that have been performed in human but not in mouse. This ability to make “zero-shot” imputations greatly increases the utility of such imputation approaches and enables comprehensive imputations to be made for species even when experimental data are sparse.

CCS CONCEPTS • Computing methodologies → Neural networks; Factorization methods; • Applied computing → Bioinformatics; Genomics.

ACM Reference Format Jacob Schreiber, Deepthi Hegde, and William Noble. 2020. Zero-shot imputations across species are enabled through joint modeling of human and mouse epigenomics. In ACM-BCB 2020: 11th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, Sept 21–24, 2020, Virtual. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/1122445.1122456

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

jmschr{at}uw.edu, deepthimhegde{at}gmail.com, william-noble{at}uw.edu
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions{at}acm.org.
This draft, which was submitted to ACM-BCB 2020, refocused the results sections by greatly expanding the cross-validation and zero-shot and removing the section about learning embeddings.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.