Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data

  1. Jonathan K. Pritchard1,3,5
  1. 1 Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA;
  2. 2 Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, Illinois 60637, USA;
  3. 3 Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois 60637, USA
    1. 4 These authors contributed equally to this work.

    Abstract

    Accurate functional annotation of regulatory elements is essential for understanding global gene regulation. Here, we report a genome-wide map of 827,000 transcription factor binding sites in human lymphoblastoid cell lines, which is comprised of sites corresponding to 239 position weight matrices of known transcription factor binding motifs, and 49 novel sequence motifs. To generate this map, we developed a probabilistic framework that integrates cell- or tissue-specific experimental data such as histone modifications and DNase I cleavage patterns with genomic information such as gene annotation and evolutionary conservation. Comparison to empirical ChIP-seq data suggests that our method is highly accurate yet has the advantage of targeting many factors in a single assay. We anticipate that this approach will be a valuable tool for genome-wide studies of gene regulation in a wide variety of cell types or tissues under diverse conditions.

    Footnotes

    • 5 Corresponding authors.

      E-mail rpique{at}uchicago.edu.

      E-mail jdegner{at}uchicago.edu.

      E-mail gilad{at}uchicago.edu.

      E-mail pritch{at}uchicago.edu.

    • [Supplemental material is available for this article. The regulatory map for lymphoblast cell lines and the source code for CENTIPEDE are available at http://centipede.uchicago.edu.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.112623.110.

    • Received July 7, 2010.
    • Accepted November 1, 2010.

    Freely available online through the Genome Research Open Access option.

    Related Article

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server