Error-prone polymerase activity causes multinucleotide mutations in humans

  1. Rasmus Nielsen2,3,4
  1. 1Department of Mathematics, University of California Berkeley, Berkeley, California 94703, USA;
  2. 2Department of Integrative Biology, University of California Berkeley, Berkeley, California 94703, USA;
  3. 3Department of Statistics, University of California Berkeley, Berkeley, California 94703, USA;
  4. 4Center for Bioinformatics, University of Copenhagen, 2200 Copenhagen, Denmark
  1. Corresponding author: kharris{at}math.berkeley.edu

Abstract

About 2% of human genetic polymorphisms have been hypothesized to arise via multinucleotide mutations (MNMs), complex events that generate SNPs at multiple sites in a single generation. MNMs have the potential to accelerate the pace at which single genes evolve and to confound studies of demography and selection that assume all SNPs arise independently. In this paper, we examine clustered mutations that are segregating in a set of 1092 human genomes, demonstrating that the signature of MNM becomes enriched as large numbers of individuals are sampled. We estimate the percentage of linked SNP pairs that were generated by simultaneous mutation as a function of the distance between affected sites and show that MNMs exhibit a high percentage of transversions relative to transitions, findings that are reproducible in data from multiple sequencing platforms and cannot be attributed to sequencing error. Among tandem mutations that occur simultaneously at adjacent sites, we find an especially skewed distribution of ancestral and derived alleles, with GC → AA, GA → TT, and their reverse complements making up 27% of the total. These mutations have been previously shown to dominate the spectrum of the error-prone polymerase Pol ζ, suggesting that low-fidelity DNA replication by Pol ζ is at least partly responsible for the MNMs that are segregating in the human population. We develop statistical estimates of MNM prevalence that can be used to correct phylogenetic and population genetic inferences for the presence of complex mutations.

Footnotes

  • Received December 5, 2013.
  • Accepted May 28, 2014.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents

Preprint Server