Experimental validation of predicted mammalian erythroid cis-regulatory modules

  1. Hao Wang1,2,
  2. Ying Zhang1,3,
  3. Yong Cheng1,2,
  4. Yuepin Zhou1,2,
  5. David C. King1,4,
  6. James Taylor1,5,
  7. Francesca Chiaromonte1,6,
  8. Jyotsna Kasturi1,5,
  9. Hanna Petrykowska1,2,
  10. Brian Gibb1,2,
  11. Christine Dorman1,2,
  12. Webb Miller1,5,7,
  13. Louis C. Dore8,
  14. John Welch8,
  15. Mitchell J. Weiss8, and
  16. Ross C. Hardison1,2,9
  1. 1 Center for Comparative Genomics and Bioinformatics of the Huck Institutes of Life Sciences,
  2. 2 Department of Biochemistry and Molecular Biology,
  3. 3 Intercollege Graduate Degree Program in Genetics,
  4. 4 Intercollege Graduate Degree Program in Integrative Biosciences,
  5. 5 Department of Computer Science and Engineering,
  6. 6 Department of Statistics, and
  7. 7 Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
  8. 8 Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA

Abstract

Multiple alignments of genome sequences are helpful guides to functional analysis, but predicting cis-regulatory modules (CRMs) accurately from such alignments remains an elusive goal. We predict CRMs for mammalian genes expressed in red blood cells by combining two properties gleaned from aligned, noncoding genome sequences: a positive regulatory potential (RP) score, which detects similarity to patterns in alignments distinctive for regulatory regions, and conservation of a binding site motif for the essential erythroid transcription factor GATA-1. Within eight target loci, we tested 75 noncoding segments by reporter gene assays in transiently transfected human K562 cells and/or after site-directed integration into murine erythroleukemia cells. Segments with a high RP score and a conserved exact match to the binding site consensus are validated at a good rate (50%–100%, with rates increasing at higher RP), whereas segments with lower RP scores or nonconsensus binding motifs tend to be inactive. Active DNA segments were shown to be occupied by GATA-1 protein by chromatin immunoprecipitation, whereas sites predicted to be inactive were not occupied. We verify four previously known erythroid CRMs and identify 28 novel ones. Thus, high RP in combination with another feature of a CRM, such as a conserved transcription factor binding site, is a good predictor of functional CRMs. Genome-wide predictions based on RP and a large set of well-defined transcription factor binding sites are available through servers at http://www.bx.psu.edu/.

Footnotes

  • 9 Corresponding author.

    9 E-mail rch8{at}psu.edu; fax (814) 863-7024.

  • [Supplemental material is available online at www.genome.org. The expression profile data obtained during MEL cell differentiation have been submitted to GEO under accession no. GSE2217.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5353806

    • Received April 6, 2006.
    • Accepted June 7, 2006.
  • Freely available online through the Genome Research Open Access option.

Related Articles

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server