Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study

Mol Ecol. 2005 Jul;14(8):2611-20. doi: 10.1111/j.1365-294X.2005.02553.x.

Abstract

The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software STRUCTURE allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated 'log probability of data' does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic DeltaK based on the rate of change in the log probability of data between successive K values, we found that STRUCTURE accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Computer Simulation
  • Gene Frequency
  • Genetics, Population*
  • Microsatellite Repeats / genetics
  • Models, Genetic*
  • Polymorphism, Restriction Fragment Length
  • Population Dynamics
  • Sample Size
  • Software*