What can we learn about the distribution of fitness effects of new mutations from DNA sequence data?

Philos Trans R Soc Lond B Biol Sci. 2010 Apr 27;365(1544):1187-93. doi: 10.1098/rstb.2009.0266.

Abstract

We investigate several questions concerning the inference of the distribution of fitness effects (DFE) of new mutations from the distribution of nucleotide frequencies in a population sample. If a fixed sequencing effort is available, we find that the optimum strategy is to sequence a modest number of alleles (approx. 10). If full genome information is available, the accuracy of parameter estimates increases as the number of alleles sequenced increases, but with diminishing returns. It is unlikely that the DFE for single genes can be reliably estimated in organisms such as humans and Drosophila, unless genes are very large and we sequence hundreds or perhaps thousands of alleles. We consider models involving several discrete classes of mutations in which the selection strength and density apportioned to each class can vary. Models with three classes fit almost as well as four class models unless many hundreds of alleles are sequenced. Large numbers of alleles need to be sequenced to accurately estimate the distribution's mean and variance. Estimating complex DFEs may therefore be difficult. Finally, we examine models involving slightly advantageous mutations. We show that the distribution of the absolute strength of selection is well estimated if mutations are assumed to be unconditionally deleterious.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Animals
  • Base Sequence*
  • Computer Simulation
  • DNA / genetics*
  • Genetic Fitness*
  • Humans
  • Models, Genetic*
  • Mutation*
  • Polymorphism, Genetic

Substances

  • DNA