Sequential regulatory activity prediction across chromosomes with convolutional neural networks

  1. Jasper Snoek3
  1. 1Calico Labs, South San Francisco, California 94080, USA;
  2. 2Department of Computer Science, Harvard University, Cambridge, Massachusetts 02138, USA;
  3. 3Google Brain, Cambridge, Massachusetts 02142, USA
  • Corresponding author: drk{at}calicolabs.com
  • Abstract

    Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type–specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.227819.117.

    • Freely available online through the Genome Research Open Access option.

    • Received July 17, 2017.
    • Accepted March 23, 2018.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server