Large-scale human promoter mapping using CpG islands

Nat Genet. 2000 Sep;26(1):61-3. doi: 10.1038/79189.

Abstract

Vertebrate genomic DNA is generally CpG depleted, possibly because methylation of cytosines at 80% of CpG dinucleotides results in their frequent mutation to thymine, and thus CpG to TpG dinucleotides. There are, however, genomic regions of high G+C content (CpG islands), where the occurrence of CpGs is significantly higher, close to the expected frequency, whereas the methylation concentration is significantly lower than the overall genome. CpG islands are longer than 200 bp and have over 50% of G+C content and CpG frequency, at least 0.6 of that statistically expected. Approximately 50% of mammalian gene promoters are associated with one or more CpG islands. Although biologists often intuitively use CpG islands for 5' gene identification, this has not been rigorously quantified. We have determined the features that discriminate the promoter-associated and non-associated CpG islands. This led to an effective algorithm for large-scale promoter mapping (with 2-kb resolution) with a concentration of false-positive predictions of promoters much lower than previously obtained. Using this algorithm, we correctly discriminated approximately 85% of the CpG islands within an interval (-500 to +1500) around a transcriptional start site (TSS) from those that lie further away from TSSs. We also correctly mapped approximately 93% of the promoters containing CpG islands.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Chromosome Mapping / methods*
  • CpG Islands*
  • DNA Methylation
  • Databases, Factual
  • Humans
  • Models, Genetic
  • Promoter Regions, Genetic*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Software