CpG mutation rates in the human genome are highly dependent on local GC content

Mol Biol Evol. 2005 Mar;22(3):650-8. doi: 10.1093/molbev/msi043. Epub 2004 Nov 10.

Abstract

CpG dinucleotides mutate at a high rate because cytosine is vulnerable to deamination, cytosines in CpG dinucleotides are often methylated, and deamination of 5-methylcytosine (5mC) produces thymidine. Previous experiments have shown that DNA melting is the rate-limiting step in cytosine deamination. Here we show, through the analysis of human single-nucleotide polymorphisms (SNPs), that the mutation rate produced by 5mC deamination is highly dependent on local GC content. In fact, linear regression analysis showed that the log(10) of the 5mC mutation rates (inferred from SNP frequencies) had slopes of -3 when graphed with respect to the GC content of neighboring sequences. This is the ideal slope that would be expected if the correlation between CpG underrepresentation and GC content had been solely caused by DNA melting. Moreover, this same result was obtained regardless of the SNP locations (all SNPs versus only SNPs in noncoding intergenic regions, excluding CpG islands) and regardless of the lengths over which GC content was calculated (SNP sequences with a modal length of 564 bp versus genomic contigs with a modal length of 163 kb). Several alternative interpretations are discussed.

Publication types

  • Comparative Study

MeSH terms

  • Animals
  • Base Composition
  • CpG Islands / genetics*
  • DNA Methylation
  • Genome, Human*
  • Humans
  • Mutation*
  • Pan troglodytes
  • Polymorphism, Single Nucleotide*