All and only CpG containing sequences are enriched in promoters abundantly bound by RNA polymerase II in multiple tissues

BMC Genomics. 2008 Feb 5:9:67. doi: 10.1186/1471-2164-9-67.

Abstract

Background: The promoters of housekeeping genes are well-bound by RNA polymerase II (RNAP) in different tissues. Although the promoters of these genes are known to contain CpG islands, the specific DNA sequences that are associated with high RNAP binding to housekeeping promoters has not been described.

Results: ChIP-chip experiments from three mouse tissues, liver, heart ventricles, and primary keratinocytes, indicate that 94% of promoters have similar RNAP binding, ranging from well-bound to poorly-bound in all tissues. Using all 8-base pair long sequences as a test set, we have identified the DNA sequences that are enriched in promoters of housekeeping genes, focusing on those DNA sequences which are preferentially localized in the proximal promoter. We observe a bimodal distribution. Virtually all sequences enriched in promoters with high RNAP binding values contain a CpG dinucleotide. These results suggest that only transcription factor binding sites (TFBS) that contain the CpG dinucleotide are involved in RNAP binding to housekeeping promoters while TFBS that do not contain a CpG are involved in regulated promoter activity. Abundant 8-mers that are preferentially localized in the proximal promoters and exhibit the best enrichment in RNAP bound promoters are all variants of six known CpG-containing TFBS: ETS, NRF-1, BoxA, SP1, CRE, and E-Box. The frequency of these six DNA motifs can predict housekeeping promoters as accurately as the presence of a CpG island, suggesting that they are the structural elements critical for CpG island function. Experimental EMSA results demonstrate that methylation of the CpG in the ETS, NRF-1, and SP1 motifs prevent DNA binding in nuclear extracts in both keratinocytes and liver.

Conclusion: In general, TFBS that do not contain a CpG are involved in regulated gene expression while TFBS that contain a CpG are involved in constitutive gene expression with some CpG containing sequences also involved in inducible and tissue specific gene regulation. These TFBS are not bound when the CpG is methylated. Unmethylated CpG dinucleotides in the TFBS in CpG islands allow the transcription factors to find their binding sites which occur only in promoters, in turn localizing RNAP to promoters.

MeSH terms

  • Animals
  • Base Sequence
  • Binding Sites / genetics
  • Cells, Cultured
  • Chromatin Immunoprecipitation
  • CpG Islands*
  • DNA / genetics
  • DNA / metabolism
  • DNA Methylation
  • Gene Expression Regulation
  • Histones / metabolism
  • Keratinocytes / metabolism
  • Liver / metabolism
  • Mice
  • Myocardium / metabolism
  • Promoter Regions, Genetic*
  • RNA Polymerase II / metabolism*
  • Tissue Distribution
  • Transcription Factors / metabolism

Substances

  • Histones
  • Transcription Factors
  • DNA
  • RNA Polymerase II