Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Combinatorial binding predicts spatio-temporal cis-regulatory activity

Abstract

Development requires the establishment of precise patterns of gene expression, which are primarily controlled by transcription factors binding to cis-regulatory modules. Although transcription factor occupancy can now be identified at genome-wide scales, decoding this regulatory landscape remains a daunting challenge. Here we used a novel approach to predict spatio-temporal cis-regulatory activity based only on in vivo transcription factor binding and enhancer activity data. We generated a high-resolution atlas of cis-regulatory modules describing their temporal and combinatorial occupancy during Drosophila mesoderm development. The binding profiles of cis-regulatory modules with characterized expression were used to train support vector machines to predict five spatio-temporal expression patterns. In vivo transgenic reporter assays demonstrate the high accuracy of these predictions and reveal an unanticipated plasticity in transcription factor binding leading to similar expression. This data-driven approach does not require previous knowledge of transcription factor sequence affinity, function or expression, making it widely applicable.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Generating a high-resolution atlas of mesodermal CRMs.
Figure 2: ChIP peaks are within 100 bp of transcription factor motifs, which are globally conserved.
Figure 3: ChIP CRMs act as discrete functional units.
Figure 4: Predicting CRM activity using a machine-learning approach.
Figure 5: Validation of CRM spatio-temporal predictions in vivo.

Accession codes

Primary accessions

ArrayExpress

Data deposits

All ChIP data are available in ArrayExpress under accession numbers E-TABM-648, E-TABM-649, E-TABM-650, E-TABM-651 and E-TABM-652, and the array design under A-AFFY-53. The CRM coordinates and transcription factor occupancy is available at http://furlonglab.embl.de/.

References

  1. Levine, M. & Davidson, E. H. Gene regulatory networks for development. Proc. Natl Acad. Sci. USA 102, 4936–4942 (2005)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  2. Ochoa-Espinosa, A. & Small, S. Developmental mechanisms and cis-regulatory codes. Curr. Opin. Genet. Dev. 16, 165–170 (2006)

    Article  CAS  PubMed  Google Scholar 

  3. Arnosti, D. N. & Kulkarni, M. M. Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? J. Cell. Biochem. 94, 890–898 (2005)

    Article  CAS  PubMed  Google Scholar 

  4. Small, S., Blair, A. & Levine, M. Regulation of even-skipped stripe 2 in the Drosophila embryo. EMBO J. 11, 4047–4057 (1992)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Studer, M., Popperl, H., Marshall, H., Kuroiwa, A. & Krumlauf, R. Role of a conserved retinoic acid response element in rhombomere restriction of Hoxb-1. Science 265, 1728–1732 (1994)

    Article  ADS  CAS  PubMed  Google Scholar 

  6. Arnosti, D. N., Barolo, S., Levine, M. & Small, S. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 122, 205–214 (1996)

    CAS  PubMed  Google Scholar 

  7. Halfon, M. S. et al. Ras pathway specificity is determined by the integration of multiple signal-activated and tissue-restricted transcription factors. Cell 103, 63–74 (2000)

    Article  CAS  PubMed  Google Scholar 

  8. Yuh, C. H., Bolouri, H. & Davidson, E. H. Cis-regulatory logic in the endo16 gene: switching from a specification to a differentiation mode of control. Development 128, 617–629 (2001)

    CAS  PubMed  Google Scholar 

  9. Knirr, S. & Frasch, M. Molecular integration of inductive and mesoderm-intrinsic inputs governs even-skipped enhancer activity in a subset of pericardial and dorsal muscle progenitors. Dev. Biol. 238, 13–26 (2001)

    Article  CAS  PubMed  Google Scholar 

  10. Oliveri, P., Carrick, D. M. & Davidson, E. H. A regulatory gene network that directs micromere specification in the sea urchin embryo. Dev. Biol. 246, 209–228 (2002)

    Article  CAS  PubMed  Google Scholar 

  11. Davidson, B. & Levine, M. Evolutionary origins of the vertebrate heart: Specification of the cardiac lineage in Ciona intestinalis . Proc. Natl Acad. Sci. USA 100, 11469–11473 (2003)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Hadchouel, J. et al. Analysis of a key regulatory region upstream of the Myf5 gene reveals multiple phases of myogenesis, orchestrated at each site by a combination of elements dispersed throughout the locus. Development 130, 3415–3426 (2003)

    Article  CAS  PubMed  Google Scholar 

  13. Lee, H. H. & Frasch, M. Nuclear integration of positive Dpp signals, antagonistic Wg inputs and mesodermal competence factors during Drosophila visceral mesoderm induction. Development 132, 1429–1442 (2005)

    Article  CAS  PubMed  Google Scholar 

  14. Zinzen, R. P., Senger, K., Levine, M. & Papatsenko, D. Computational models for neurogenic gene expression in the Drosophila embryo. Curr. Biol. 16, 1358–1365 (2006)

    Article  CAS  PubMed  Google Scholar 

  15. Rothbacher, U., Bertrand, V., Lamy, C. & Lemaire, P. A combinatorial code of maternal GATA, Ets and β-catenin-TCF transcription factors specifies and patterns the early ascidian ectoderm. Development 134, 4023–4032 (2007)

    Article  CAS  PubMed  Google Scholar 

  16. Sandmann, T. et al. A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. Dev. Cell 10, 797–807 (2006)

    Article  CAS  PubMed  Google Scholar 

  17. Zeitlinger, J. et al. Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev. 21, 385–390 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sandmann, T. et al. A core transcriptional network for early mesoderm development in Drosophila melanogaster . Genes Dev. 21, 436–449 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Jakobsen, J. S. et al. Temporal ChIP-on-chip reveals Biniou as a universal regulator of the visceral muscle transcriptional network. Genes Dev. 21, 2448–2460 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Li, X. Y. et al. Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol. 6, e27 (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  21. Vokes, S. A., Ji, H., Wong, W. H. & McMahon, A. P. A genome-scale analysis of the cis-regulatory circuitry underlying sonic hedgehog-mediated patterning of the mammalian limb. Genes Dev. 22, 2651–2663 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Davidson, E. H. The Regulatory Genome—Gene Regulatory Networks In Development and Evolution 2nd edn (Elsevier Publishers, 2006)

    Google Scholar 

  24. MacArthur, S. et al. Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol. 10, R80 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  25. Bintu, L. et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15, 116–124 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Janssens, H. et al. Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene. Nature Genet. 38, 1159–1165 (2006)

    Article  CAS  PubMed  Google Scholar 

  27. Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. & Gaul, U. Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535–540 (2008)

    Article  ADS  CAS  PubMed  Google Scholar 

  28. Baylies, M. K. & Bate, M. twist: a myogenic switch in Drosophila . Science 272, 1481–1484 (1996)

    Article  ADS  CAS  PubMed  Google Scholar 

  29. Yin, Z., Xu, X. L. & Frasch, M. Regulation of the Twist target gene tinman by modular cis-regulatory elements during early mesoderm development. Development 124, 4971–4982 (1997)

    CAS  PubMed  Google Scholar 

  30. Azpiazu, N. & Frasch, M. tinman and bagpipe: two homeo box genes that determine cell fates in the dorsal mesoderm of Drosophila . Genes Dev. 7 (7B). 1325–1340 (1993)

    Article  CAS  PubMed  Google Scholar 

  31. Bour, B. A. et al. Drosophila MEF2, a transcription factor that is essential for myogenesis. Genes Dev. 9, 730–741 (1995)

    Article  CAS  PubMed  Google Scholar 

  32. Lilly, B., Galewsky, S., Firulli, A. B., Schulz, R. A. & Olson, E. N. D-MEF2: a MADS box transcription factor expressed in differentiating mesoderm and muscle cell lineages during Drosophila embryogenesis. Proc. Natl Acad. Sci. USA 91, 5662–5666 (1994)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Zaffran, S., Kuchler, A., Lee, H. H. & Frasch, M. biniou (FoxF), a central component in a regulatory network controlling visceral mesoderm development and midgut morphogenesis in Drosophila . Genes Dev. 15, 2900–2915 (2001)

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Furlong, E. E. Integrating transcriptional and signalling networks during muscle development. Curr. Opin. Genet. Dev. 14, 343–350 (2004)

    Article  CAS  PubMed  Google Scholar 

  35. Sink, H. Muscle Development in Drosophila (Birkhäuser, 2006)

    Book  Google Scholar 

  36. Liu, Y. H. et al. A systematic analysis of Tinman function reveals Eya and JAK-STAT signaling as essential regulators of muscle development. Dev. Cell 16, 280–291 (2009)

    Article  CAS  PubMed  Google Scholar 

  37. Ji, H. & Wong, W. H. TileMap: create chromosomal map of tiling array hybridizations. Bioinformatics 21, 3629–3636 (2005)

    Article  CAS  PubMed  Google Scholar 

  38. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007)

    Article  ADS  CAS  PubMed  Google Scholar 

  39. Reiss, D. J., Facciotti, M. T. & Baliga, N. S. Model-based deconvolution of genome-wide DNA binding. Bioinformatics 24, 396–403 (2008)

    Article  CAS  PubMed  Google Scholar 

  40. Schwartz, Y. B. et al. Genome-wide analysis of Polycomb targets in Drosophila melanogaster . Nature Genet. 38, 700–705 (2006)

    Article  CAS  PubMed  Google Scholar 

  41. Cripps, R. M. et al. The myogenic regulatory gene Mef2 is a direct target for transcriptional activation by Twist during Drosophila myogenesis. Genes Dev. 12, 422–434 (1998)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Cripps, R. M., Zhao, B. & Olson, E. N. Transcription of the myogenic regulatory gene Mef2 in cardiac, somatic, and visceral muscle cell lineages is regulated by a Tinman-dependent core enhancer. Dev. Biol. 215, 420–430 (1999)

    Article  CAS  PubMed  Google Scholar 

  43. Cripps, R. M., Lovato, T. L. & Olson, E. N. Positive autoregulation of the Myocyte enhancer factor-2 myogenic control gene during somatic muscle development in Drosophila . Dev. Biol. 267, 536–547 (2004)

    Article  CAS  PubMed  Google Scholar 

  44. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Halfon, M. S., Gallo, S. M. & Bergman, C. M. REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila . Nucleic Acids Res. 36 (Database issue). D594–D598 (2008)

    Article  CAS  PubMed  Google Scholar 

  46. Bischof, J., Maeda, R. K., Hediger, M., Karch, F. & Basler, K. An optimized transgenesis system for Drosophila using germ-line-specific ϕC31 integrases. Proc. Natl Acad. Sci. USA 104, 3312–3317 (2007)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. Brown, C. D., Johnson, D. S. & Sidow, A. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317, 1557–1560 (2007)

    Article  ADS  CAS  PubMed  Google Scholar 

  48. Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 35 (Database issue). D88–D92 (2007)

    Article  CAS  PubMed  Google Scholar 

  49. Choo, B. G. et al. Zebrafish transgenic Enhancer TRAP line database (ZETRAP). BMC Dev. Biol. 6, 5 (2006)

    Article  PubMed  PubMed Central  Google Scholar 

  50. Sandmann, T., Jakobsen, J. S. & Furlong, E. E. ChIP-on-chip protocol for genome-wide analysis of transcription factor binding in Drosophila melanogaster embryos. Nature Protocols 1, 2839–2855 (2006)

    Article  CAS  PubMed  Google Scholar 

  51. Celniker, S. E. et al. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 3, RESEARCH0079 (2002)

    Article  PubMed  PubMed Central  Google Scholar 

  52. Tweedie, S. et al. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 37 (Database issue). D555–D559 (2009)

    Article  CAS  PubMed  Google Scholar 

  53. Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003)

    Article  CAS  PubMed  Google Scholar 

  54. Thomas-Chollier, M. et al. RSAT: regulatory sequence analysis tools. Nucleic Acids Res. 36 (Web Server issue) W119–W127 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Li, L., Liang, Y. & Bass, R. L. GAPWM: a genetic algorithm method for optimizing a position weight matrix. Bioinformatics 23, 1188–1194 (2007)

    Article  CAS  PubMed  Google Scholar 

  56. Hertz, G. Z. & Stormo, G. D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)

    Article  CAS  PubMed  Google Scholar 

  57. Lloyd, C. J. Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. J. Am. Stat. Assoc. 93, 1356–1364 (1998)

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to M. Leptin for providing an independent assessment of the expression patterns driven by tested CRMs. We thank H. Gustafson for fly work, J. de Graaf for array hybridizations, S. Müller for embryo injections, and R. Bourgon for sharing code on signal peak identification. We thank all members of the Furlong laboratory for discussions and comments on the manuscript. This work was supported by a grant to E.E.M.F. and by a fellowship to R.P.Z. from the Human Frontiers Science Program.

Author Contributions M.B. performed ChIP experiments. R.P.Z., E.E.M.F. and C.G. generated CAD. R.P.Z. performed transgenic reporter experiments including in situ hybridizations and imaging. C.G. performed ChIP data analysis and motif analysis. J.G. devised the statistical and SVM analyses. E.E.M.F., R.P.Z., C.G. and J.G. formulated the hypotheses, designed experiments and wrote the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eileen E. M. Furlong.

Supplementary information

Supplementary Information

This file contains Supplementary Methods, Supplementary Tables 1- 3 and 12, (for Supplementary Tables 4 -11 see separate files s2-s9), Supplementary Figures 1-15 with Legends and Supplementary References. (PDF 14712 kb)

Supplementary Table 4

This file contains CAD entries in tabular text format. Source, name and coordinates of each CAD entry is given with anatomy ontology terms and cross references to Flybase, PubMed, REDfly and FLDb (Furlong Db). A 'NR' key next to the source name indicates that entries have been modified during the CAD building process; in these cases, references to original entries are available in the cross references (using REDFly and FLDb references). File also contains embedded formatting comments. The associated CAD archive contains the various CAD input files as well as CAD in GFF format. (TXT 103 kb)

Supplementary Table 5

This file contains CRM Atlas in tabular format. The file provides ID, location and binding events for each CRM Atlas entry. (TXT 482 kb)

Supplementary Table 6

This file contains Regions reported by TileMap before cut-off selection. (TXT 7430 kb)

Supplementary Table 7

This file contains TileMap regions used to build the CRM Atlas; together with peak position and height. (TXT 1122 kb)

Supplementary Table 8

This file contains Training set for the Support Vector Machine. (TXT 29 kb)

Supplementary Table 9

This file contains Support Vector Machine predictions. (TXT 1922 kb)

Supplementary Table 10

This file contains Initial and Optimized Position Weight Matrices. (TXT 1 kb)

Supplementary Table 11

This file contains Cloned CRMs. (TXT 2 kb)

Supplementary Data

This contains Supplementary File 1, which was added on 25 Mar 2010. (ZIP 9 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zinzen, R., Girardot, C., Gagneur, J. et al. Combinatorial binding predicts spatio-temporal cis-regulatory activity . Nature 462, 65–70 (2009). https://doi.org/10.1038/nature08531

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature08531

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing