Introduction

A primary aim of population genetics is the measurement of genetic diversity and the characterisation of its hierarchical distribution among individuals, populations, or groups of populations. For molecular markers with a clear genetic interpretation such as microsatellites, isozymes and DNA sequences, widely used measures of diversity include allelic richness (A), gene diversity (He – ‘expected heterozygosity’, see eg Hartl and Clark, 1997), and, for DNA sequence data, the proportion of pairwise site differences (π). Several different methods have been used to quantify genetic differentiation among groups, most notably differentiation statistics related to FST (Wright, 1951; Weir and Cockerham, 1984). Although the utility of FST may be limited for highly diverse loci (eg Nagylaki, 1998), and its interpretation in terms of simple population genetic models is questionable (Whitlock and McCauley, 1999), FST and related measures continue to be quoted almost universally in studies of population genetic structure.

FST and the other summary statistics cited above have been very widely employed to quantify patterns of genetic variation in diploid organisms. However, many organisms are polyploid. Indeed, although polyploidy is particularly common among plants, fish, and amphibians, it is also found among birds, mammals, and many invertebrates (Leitch and Bennett, 1997; Otto and Whitton, 2000; Legatt and Iwama, 2003).

Polyploids are often conceptually divided into two groups: autopolyploids and allopolyploids (reviewed by Ramsey and Schemske, 2002). Autopolyploids are derived from the duplication of a single genome and – at least initially – there is no significant differentiation between duplicate genomes. Conversely, allopolyploids are derived from interspecific hybridisation, and therefore comprise two (or more) differentiated genomes. However, as species boundaries are rarely clear-cut, auto- and allopolyploidy actually represent opposite ends along a spectrum of intergenome differentiation, with ‘hybrids’ between differentiated lineages from within a single species forming the middle ground.

Polyploids can be further divided by their mode of inheritance. In newly formed autopolyploids, duplicated chromosomes do not have unique partners at meiosis. Chromosomes either pair at random or form multivalents (described as ‘polysomic’ inheritance), such that a tetraploid individual carrying alleles ABCD can form gametes AB, AC, AD, BC, BD, and CD (reviewed by Bever and Felber, 1992; Olson, 1997; Ronfort et al, 1998). Furthermore, recombination in multivalents can lead to sister chromatids segregating together, giving rise to AA, BB, CC, and DD alleles (‘Double reduction’, see discussion in Ronfort et al, 1998). Unlike newly formed autopolyploids, allopolyploids have differentiated pairs of chromosomes, which are often able to pair normally, as in the diploid progenitors (described as ‘disomic’ inheritance). As with auto- and allopolyploidy, disomic and polysomic inheritance constitute extremes from a continuum. This is both because allopolyploid hybrids between close relatives may allow for some multivalent formation, and because autopolyploids diploidise over time, eventually developing fully disomic inheritance (Wolfe, 2001; Ramsey and Schemske, 2002). Indeed, intermediate modes of inheritance may be probabilistic, with particular loci having a higher or lower probability of pairing with a partially differentiated partner (the ‘pairing preference’, eg Wu et al, 2001).

Unfortunately, the quantification of genetic diversity and population differentiation for organisms with polyploid genomes can be much more difficult than for diploids (Figure 1). In a polyploid, and unlike the diploid case, multiple alleles can be present in more than one copy. For example, a diploid carrying alleles A and B at a locus must have one copy of each, whereas a tetraploid carrying alleles A and B (‘allelic phenotype’ AB) may have any one of three different genotypes: AAAB, AABB, or ABBB.

Figure 1
figure 1

Isozyme banding patterns (allelic phenotypes) for glucose-6-phosphate isomerase (PGI, E.C. 5.3.1.9) in the weedy annual plant, Mercurialis annua L. (Euphorbiaceae). Left to right are: (a, d, g) diploid M. annua; (b, e, h) allohexaploid M. annua displaying cryptic disomy; and (c, f, i) allohexaploid M. annua showing fixed heterozygosity. The rows are: (a–c) gel photo; (d–f) interpretation of band presence and approximate intensity; and (g–i) allelic interpretation. PGI functions as a homodimer. Thus, when two alleles with different electrophoretic mobilities are present, three bands are visible because an interallele (hetero)dimer with intermediate mobility is formed (grey lines in the lower panels). For example, panel a shows three alleles (f, m, and s): lane one is genotype mm, lane two genotype ms, lane three is ss, and lane five is fm. When two different alleles are present in equal copy number (as in diploid heterozygotes), the three bands are expected to have intensity ratios of 1:2:1, as seen in panel a lanes two, four, five, seven, and nine. Hexaploid gels (b and c) may be considerably more complex. In panel b, lane one is a homozygote (with six copies of allele m), whereas lane four is a straightforward heterozygote (alleles f and s). Problems arise when more than two alleles are present, such as in panel b, lanes two and five, which are both heterozygous with alleles f, m and s (note that the fs heterodimer has identical mobility to the mm homodimer; superimposed hetero- and homodimers are indicated dotted lines in panels gi). The difference between these two lanes is in their allele copy number; lane two has more copies of allele s than allele f (it may have genotype fmmsss), whereas lane five probably has more copies of allele m (eg fmmmms). Similarly, lanes six and seven also differ only in copy number; lane six has approximately equal numbers of alleles f and m, whereas lane seven is ‘unbalanced’ towards m. This illustrates the two fundamental problems met when scoring polyploid genotypes: first, it is difficult to know the exact dosage (is lane seven ffmmmm or fmmmmm?), and second, given that these individuals display disomic inheritance, it is impossible to assign alleles to isoloci (if lane seven does have alleles ffmmmm, is it genotype ff, mm, mm, or fm, fm, mm?). Our solution is to avoid both issues by merely recording which alleles an individual carries: its ‘allelic phenotype’. Thus, in panel b, the allelic phenotypes are: m, fms, fm, fs, fms, fm, fm, fmsv, fmv, and fs, and in panel c all phenotypes are fms.

In some species, and particularly for low-order polyploids (eg tetraploids), it is possible to estimate this allele copy number (‘dosage’) on the basis of band intensity or electropherogram peak height (eg Arft and Ranker, 1998; Prober et al, 1998; Young et al, 1999; Hardy and Vekemans, 2001; Nassar et al, 2003). When this is the case, and inheritance is polysomic, the genotype follows directly from the allelic phenotype as it does in diploids. Consequently, extensions of standard diploid summary statistics such as He and FST can be used to quantify genetic diversity and differentiation (Nei, 1987; Ronfort et al, 1998; Thrall and Young, 2000), and computer programs are available to conduct analyses (SPAGEDi – Hardy and Vekemans, 2002; AUTOTET – Thrall and Young, 2000). However, in higher order polyploids, the genotype can rarely be inferred from gel banding patterns or electropherograms (eg Kahler et al, 1980; Krebs and Hancock, 1989; Brochmann et al, 1992).

A further difficulty arises when the target polyploid population displays disomic inheritance, because it is then often not clear which alleles are associated with which of the duplicate loci (homeologous loci, or ‘isoloci’). This problem applies both to autopolyploids that have become diploidised, and to allopolyploids in which alleles from each of the two parental species segregate in a diploid manner at different isoloci (see Figure 1 for an illustration). In both cases, although the genetic segregation is effectively diploid, it is typically very difficult or impossible to know whether a particular allele, scored as a band on a gel, is segregating at which of two or more isoloci. A tetraploid genotype that produces two distinct bands on a gel, for example, may be homozygous for different alleles at each of its two isoloci, or heterozygous at one or both loci. At one extreme, all individuals may have the same heterozygous genotype (heterozygosity is ‘fixed’; Figure 1c), whereas at the other extreme, several alleles may be shared among isoloci, making disomic inheritance superficially appear polysomic (Figure 1b); this latter situation has been termed ‘cryptic disomy’ (De Silva et al, 2005).

In situations where allele dosage can be scored for populations with disomic inheritance, the underlying allele frequencies can sometimes be estimated for different isoloci using the superficial genotypes, and these estimates used to calculate genetic diversity statistics (Waples, 1988; Hedrick et al, 1991; Bouza et al, 2001). Recently, De Silva et al (2005) have provided a sophisticated approach to estimating allele frequencies in both polysomic and disomic polyploids when allele dosage cannot be scored. However, in order to provide good estimates of allele frequencies with these methods, populations must be assumed to be at equilibrium, and independent estimates of the selfing rate are required.

An alternative approach has been to interpret polyploid gel banding patterns as allelic phenotypes (Figure 1), and to calculate simple summary statistics on the basis of gel phenotypic diversity without recourse to a full genetic interpretation (eg Jain and Singh, 1979; Gaur et al, 1980; Murdy and Carter, 1985; Bayer and Crawford, 1986; Chung et al, 1991; Brochmann et al, 1992; Rogers, 2000; Berglund and Westerbergh, 2001). With this approach, diversity can be measured in terms of the total number of different banding or allelic phenotypes in the population, or by calculating statistics similar to He (Nei's gene diversity, the probability that two alleles sampled at random are different) on the basis of allelic phenotype frequencies. Two such statistics have been widely used: HPhen, which is calculated as one minus the sum of squared phenotype frequencies, and is thus analogous to He (Yunus et al, 1991; Meerts et al, 1998), and HSW, which is a Shannon–Weaver diversity index of phenotypes (eg Jain and Singh, 1979; Gaur et al, 1980; Chung et al, 1991). Both these measures can be used to calculate population differentiation as the ratio of between-population to species-wide diversity, analogous to FST. However, because they treat gel phenotypes only as being either identical or different, they do not make use of all the information present on a gel, for example, they do not recognise the greater similarity of phenotypes that share more bands over those that share fewer.

Recently, Meirmans and van Tienderen (2004) and Meirmans (2004) have used several measures of interindividual similarity, including (1) the number of steps to convert one phenotype into the other, and (2) a measure related to the Dice coincidence index (Dice, 1945), calculated as the number of shared bands (alleles) between two individuals, divided by the total number of bands present. Bruvo et al (2004) took a similar approach for microsatellite data using the number of stepwise mutations that separate allelic phenotypes. However, the behaviour of none of these measures has been compared with that of FST.

Here, we introduce a new simple measure of allelic-phenotype diversity devised for use in allopolyploid species. This diversity measure (denoted H′) accounts for the fact that allelic phenotypes may share differing numbers of bands (alleles). Specifically, H′ is defined as the average number of alleles by which pairs of individuals differ at a single locus; thus

where n is the total number of individuals and xijk equals one if allele k is carried by either individual i or by individual j (but not by both), and is otherwise equal to zero. Although devised for use with allopolyploids, it may be possible also to apply this statistic to other forms of polyploid (see Discussion).

Based on this measure of diversity, we define a differentiation statistic, F′ST, as (H′T−H′S)/H′T. A computer program, ‘FDASH’, which uses allelic phenotype data to calculate the above statistics is available from the authors upon request. Below, we compare the statistical behaviour of H′ and F′ST with that of HPhen, HSW, and their associated differentiation statistics, PFST and SWFST. Because it is not clear how statistics derived from allelic phenotype data will respond to demographic processes such as migration, we use coalescent simulations in a preliminary exploration of their behaviour under the simple ‘island model’ of population structure. Although the island model is an unrealistic caricature of a subdivided population (Whitlock and McCauley, 1999), the expectation for FST under a given migration rate is known, and it thus provides firm ground on which to test a new statistic. We also consider the extent to which the statistics are affected by the polyploid level and the degree of differentiation between isoloci, for example, the degree divergence between the parental species of an allopolyploid population.

Methods

Model

We assumed an island model of population structure. Under this simple model, a (meta)population is divided into d discrete subpopulations or demes, each of size N. Each generation, a proportion m of the individuals in each deme are replaced by migrants drawn randomly from the rest of the metapopulation. Migration is haploid, as by pollen in a diploid organism. For m much greater than u, the mutation rate to new alleles, the expected value of FST is 1/(1+4Nm); this provides a simple expectation against which to compare differentiation statistics using phenotype-based diversity measures under focus here. In an infinite-allele framework, the sharing of alleles between isoloci must be owing to common ancestry; for example, in an ancestor of the parental taxa (in the case of allopolyploidy) or before the onset of disomic inheritance (in the case of a now-diploidised autopolyploid).

We used coalescence-based simulations (eg Hudson, 1990; Nordborg and Donnelly, 1997; Nordborg, 2001) to compare genotype- and phenotype-based statistics in terms of their response to polyploid level, their deviation from the expectation of FST, and their variance. In particular, we followed (Wakeley, 2001; Wakeley and Aliacar, 2001) in separating the coalescent process into two parts: an evolutionarily rapid ‘scattering phase’, in which lineages coalesce within demes or migrate out of them, and a slow ‘collecting phase’, in which lineages from different demes first migrating into common demes and then eventually coalesce at their common ancestor. The collecting phase is just a neutral coalescent, with the effective population size scaled to account for population structure (Wakeley and Aliacar, 2001; Rousset, 2003).

We modelled populations of allopolyploids with disomic inheritance by explicitly recognising that the multiple genetically distinct pairs of homeologous loci or isoloci share a common ancestral locus in the distant past. Thus, we considered an initial sample of lineages at time zero consisting of 2x-ploid individuals, with x=2 for tetraploids, x=3 for hexaploids, and so on, and recorded migration and coalescence events as the simulation proceeded backwards in time towards increasingly more inclusive common ancestors. The sample thus passed through the scattering phase and entered the collecting phase with x simultaneous coalescent processes, one for each independent isolocus. After a given point in time, coalescence was then allowed to occur between lineages from different isoloci. This threshold determines the extent to which isoloci share alleles by descent; it corresponds either to the speciation event that separated the two parental species of the simulated allopolyploid population or to the point at which polysomic inheritance became disomic through diploidisation. If the threshold is ancient, isoloci will share no alleles and the markers will be effectively diploid (ie a paleopolyploid); in contrast, if the threshold is recent, then isoloci will share alleles, and banding patterns may look superficially like polysomic inheritance (cryptic disomy).

In each simulation run, a given number of 2x-ploid individuals were sampled from each of several demes (see below). A genealogy for the sample was simulated, and a Poisson-distributed number of mutations was applied to each branch, with the parameter proportional to branch length and mutation rate, and the mutation process following assumptions of the infinite-alleles model (as appropriate, for example, for isozymes). Finally, the allelic state of each of the sampled alleles was identified, and diversity and differentiation statistics were calculated for the sample.

We calculated differentiation statistics based on phenotype frequencies, as described above. We also calculated a genotype-based estimate of FST (θ), following Weir (1996), with multilocus (ie multi-isolocus) estimates calculated as a ratio of averages. For comparison, the same genotype-based statistic was also calculated as if the polyploid had polysomic inheritance, that is, a single locus with four alleles rather than two isoloci with two alleles each (as described by Ronfort et al, (1998)). We calculated the expectation of FST for the island model as 1/(1+4Nm). To assess the quality of FST estimators, we used the mean square error of estimates, calculated as the sum of squared bias and the variance (ie bias2+var) (Balloux and Goudet, 2002).

Model parameters

For all simulations, the population comprised 500 demes each of 250 (polyploid) individuals, and samples consisted of 25 individuals drawn from each of 10 demes. For each parameter combination, the simulation was repeated 20 000 times to estimate statistic means and variances. To examine the effect of ploidy on H′T, F′ST, and the other phenotype-based statistics, we simulated diploids, tetraploids, and hexaploid populations with three different levels of divergence between isoloci. These were chosen such that low divergence (0.01 × 2Ne generations) resulted in most alleles occurring at all isoloci (cryptic disomy), and high divergence (100 × 2Ne generations) resulted in alleles almost never occurring at multiple isoloci (paleopolyploidy). Following an initial search of parameter space, we chose a migration rate (m=0.0062) and a mutation rate (μ=5.7 × 10−5) that yielded numbers of observed alleles (A=1.99) and values of population differentiation (FST=0.197) corresponding to the means reported for isozymes in outcrossing plants (Hamrick and Godt, 1990). To examine the relative utility of genotype- and phenotype-based differentiation statistics across a range of migration rates, we ran the simulation for tetraploids only using a single level of divergence between isoloci (2Ne generations) and the same parameters otherwise, with expected FST values between 0.025 and 0.995.

Results

The response to increasing polyploidy

As expected, the genotype-based genetic diversity statistic H, calculated as an average across isoloci for the whole population (ie Nei's gene diversity; allele dosage scored and alleles attributed to isoloci), did not vary with increasing polyploid level or increasing differentiation between isoloci (Figure 2a). In contrast, the genetic diversity statistics based on phenotype data (a function of diversity across multiple isoloci) increased with the level of polyploidy and the degree of differentiation between isoloci. This was true of both phenotype diversity measures calculated from phenotype frequencies (HSW and HPhen; Figure 2c and e, respectively), and the measure of diversity based on allele differences (H′; Figure 2g). The genotype-based differentiation statistic, θ (Weir and Cockerham, 1984), calculated across isoloci, did not vary greatly with polyploid level or differentiation between isoloci (Figure 2b). For this intermediate level of differentiation, there was almost no variation in the phenotypic differentiation statistics based on phenotype frequency (SWFST and PFST), or allele differences (F′ST), (Figures 2d, f, and h, respectively).

Figure 2
figure 2

Genetic diversity (a, c, e, g) and differentiation (b, d, f, h) statistics for polyploids with disomic inheritance, under an island model of population structure. Samples of 250 individuals (25 each from 10 demes) were drawn from a structured population of 500 demes each of 250 individuals; values are the average of 20 000 replicates. Statistics were calculated from genotypic data (H; a and b), Shannon–Weaver diversity (HSW; c and d), Phenotype frequencies (HPhen; e and f) and allele differences (H′; g and h), and are plotted with respect to polyploid level (2x−6x) and differentiation between isoloci (low, medium, and high; see main text). Phenotype-based diversity increases with polyploid level and differentiation between isoloci, whereas at this intermediate migration rate, differentiation statistics are largely unaffected by polyploidy.

The response to increasing migration rate

Qualitative effects

Three of the differentiation statistics deviated qualitatively from the expected FST in their response to migration rate (Figure 3). The differentiation statistics calculated from allelic-phenotype diversity (SWFST and PFST) did not approach zero as the migration rate increased (Figure 3a), although under the parameters explored here the discrepancy from expectation was extremely small for PFST (Figure 3a). This was not seen for FST (Figure 3a, grey line). When the polysomic genotype-based estimate θ (Ronfort et al, 1998) was calculated, as might be performed erroneously in the case of cryptic disomy, it did not tend towards one as the migration rate decreased (Figure 3b).

Figure 3
figure 3

(a) To illustrate a qualitative deviation from expected FST, differentiation statistics based on phenotype frequencies were plotted against the migration parameter 4Nm. A log scale is used for FST to highlight the fact that SWFST (and possibly also PFST) is not asymptotic to zero as migration rates increase (see main text for details). This does not seem to happen for F′ST (grey line), which was otherwise similar to PFST. (b) As expected, a qualitative deviation is also seen if polysomic θ (Ronfort et al, 1998) is calculated for a disomic system (dashed line versus solid line). This is because apparent heterozygosity, actually owing to differences between isoloci, is treated as if it were genuine diversity, thereby inflating subpopulation diversity, and leading to low estimates of differentiation (see main text). For both (a) and (b), samples of 250 individuals (25 each from 10 demes) were drawn from an island model of population structure, with 500 demes, each of 250 tetraploid individuals, and values given are the average of 20 000 replicates, calculated on the basis of coalescent simulations.

Phenotypes in place of genotypes

The use of phenotypic data also led to quantitative differences in measures of differentiation. To examine the relative loss of information associated with the use of allelic phenotype data in place of genotype data, the phenotype-based differentiation statistic, FST, was considered as an estimator of expected FST. As would be predicted, the tetraploid genotypic estimate, which requires alleles to be assigned to isoloci (two isoloci, dashed line in Figure 4) was always better than the diploid genotypic estimate (one locus, dot-dash line in Figure 4), reflecting the information gained from using two loci for the estimate in place of one. Under the parameters examined here, when differentiation was low (ie expected value of FST<ca. 0.5) genotype-based multilocus θ appeared to be a better estimator of FST than was the phenotype-based FST. However, when differentiation was high, F′ST and θ were very similar (Figure 4).

Figure 4
figure 4

To illustrate quantitative deviations from expected FST, the mean square error of genotype-based (θ) and phenotype-based (F′ST) differentiation statistics is plotted for a range of expected FST values (high to low migration rates). For high migration rates, allelic phenotype-based estimates of FST appear marginally worse than genotype-based estimates. Indeed, estimates are apparently worse than the single-locus case, despite the increase in information available from an additional isolocus (solid line versus dot-dash line). However, when the migration rate is low, phenotype- and genotype-based differentiation statistics are approximately equal in their ability to estimate FST (solid versus dashed line). Statistics were calculated from 20 000 replicates; samples were of 250 individuals (25 from each of 10 demes) drawn from an island model of population structure with 500 demes each of 250 tetraploid individuals. For other parameters, see Methods.

Discussion

The principal aim of our study was to compare statistics based on phenotype frequencies (HPhen and HSW), with one based on phenotypic similarity (H′), and to illustrate the extent to which these statistics are informative about population structure. Our results suggest that allelic phenotype-based diversity statistics for polyploids with disomic inheritance may depend strongly on details such as the polyploid level (number of isoloci) and the differentiation between isoloci (Figure 2). In contrast, differentiation statistics did not appear to be strongly affected by polyploid level (Figure 2). Differentiation statistics, when calculated as though inheritance were polysomic, and when calculated from allelic phenotype diversity, differed qualitatively from the island-model expectation of FST (Figures 2 and 3a). Below, we discuss the likely reason for these effects, and the implications for quantifying diversity and differentiation in polyploids with disomic inheritance.

Appropriate models of inheritance

When fixed heterozygosity is identified for multiple loci in a polyploid population, it is clear that inheritance must be disomic (Figure 1c). However, when isoloci share a large proportion of their alleles, the great inter-individual variation in the number of distinct alleles can make gel banding patterns look superficially polysomic (ie disomic inheritance is cryptic; Figure 1b). It is tempting to analyse such data using computer packages intended for autopolyploids (eg SPAGEDi: Ronfort et al, 1998; Hardy and Vekemans, 2002), without confirming that inheritance is polysomic. However, this procedure is inappropriate, because the apparent excess of heterozygotes (owing to disomic inheritance) will artificially inflate within-population diversity (HS), so that it is nonzero even when there are no differences between individuals within populations. Thus, the polyploid analogue of θ, calculated under the assumption that inheritance is polysomic (Ronfort et al, 1998), may be very small in a polyploid population with disomic inheritance, even when migration rates are almost zero (Figure 3b).

Utility of phenotype-based estimates of genetic diversity in disomic polyploids

Phenotype-based diversity statistics depend strongly on the number of isoloci (ie the polyploid level) and the degree of differentiation between isoloci (Figure 2). This is because phenotype-based diversity statistics simultaneously record the diversity at several duplicate isoloci. If isoloci shared no alleles, the overall phenotype diversity would be an additive function of diversity at each of the (diploid) isoloci, and thus should increase with the polyploid level (Figure 2). By contrast, genetic differentiation statistics do not vary much with polyploid level, given that other population parameters are the same (Figure 2). This is because differentiation statistics, such as FST, are essentially a ratio of within-population diversity to total diversity, and they will thus be affected relatively little by factors that simultaneously increase both. This means that, although direct comparisons of diversity statistics such as H′ and HSW cannot be made between polyploid levels, comparisons of differentiation statistics derived from them are likely to be informative.

Some of the phenotype-based differentiation statistics behave unexpectedly in response to migration, that is, they differ qualitatively from expected FST or genotype-based statistics (Figure 3). In particular, the differentiation statistic derived from the Shannon Weaver diversity of phenotype frequencies (SWFST) does not approach zero with increasing migration. We believe this is an effect of finite sample size; if FST is considered as a standardised variance in allele frequencies between populations (eg Weir, 1996), the variance in phenotype frequencies will be larger for a given sample size than the variance in allele frequencies. This is because alleles will be distributed differently between individuals in different samples, and unless the sample is very large, many rare phenotypes will not be included. The effect is strong for SWFST because HSW weights rare phenotypes disproportionately highly. However, there is also some suggestion that under high migration rates a small effect is seen for PFST (dashed line, Figure 3a). We therefore suggest that inference regarding relative migration rates, when based on differentiation statistics calculated from phenotype frequencies, should be treated with some caution as differentiation statistics may be appreciably greater than zero even under panmictic gene flow. FST, the differentiation statistic based on allele differences, does not appear to suffer from this limitation.

Although polyploidy presents a number of challenges, the concomitant increase in the number of loci in principle has the potential to provide more information for making inferences about population processes, such as migration. It is therefore interesting to ask whether the information gain associated with the availability of more (iso)loci outweighs the information lost through the use of allelic phenotype data in place of genetic data. The results of our simulations suggest that gains do indeed tend to balance the losses. Thus, under the parameters we examined, there was an overall loss in information when migration rates were high (ie with low differentiation), but no appreciable loss when migration rates were low (Figure 4).

Conclusions

We have made a preliminary investigation of the behaviour of phenotype-based statistics in a simple island model, for outcrossing polyploids with disomic inheritance. Our study suggests that for many purposes, the diversity statistic H′ is an informative way of summarising genetic diversity in disomic polyploids, and that the differentiation statistic derived from it (FST) behaves in a very similar way to other more widely used differentiation statistics. Furthermore, FST seems very little affected by polyploid level in disomic polyploids, so that comparisons between polyploid levels, and potentially among species that differ in ploidy, are most likely to be viable.

The statistics we have introduced here are, of course, purely descriptive. However, differences in diversity and differentiation might be tested for statistical significance using randomisation procedures. For example, by randomising population samples between ‘treatments’, we were able to infer a difference in HS and FST between different sexual systems in allohexaploid Mercurialis annua (Obbard et al, 2006). For some of the polyploid statistics discussed here, this randomisation procedure has been implemented in our program ‘FDASH’ (available on request).

The statistics FST and H′ were devised for use in polyploids with disomic inheritance (eg most allopolyploids), thus our simulations were limited to a disomic model of inheritance and assumed an infinite-allele model of mutation. However, although information from allele frequencies will be lost, H′ and FST should also capture essential information regarding genetic diversity in polyploids with other modes of inheritance, such as polysomic polyploids, and cases for which allele sharing may not be owing to common ancestry, such as microsatellite markers (for an application see Refoufi and Esnault, 2006). In particular, it is worth noting that patterns of allele sharing under polysomic inheritance are superficially similar to disomic inheritance (as in Figure 4, ‘low divergence’), so long as there is little divergence between isoloci. This would suggest that H′ and FST may be applicable to alternative modes of inheritance, although further work is required to test this. It will also important to determine how these statistics behave under more complex population models, such as those that include selfing or metapopulation processes. The coalescent approach adopted here will be ideally suited to making these extensions (Nordborg and Donnelly, 1997; Wakeley and Aliacar, 2001).