Introduction

Parentage inferences from molecular marker data have been widely applied to the studies of social behaviour/organization, reproductive success, mating systems, dispersal and spatial genetic structure in natural populations (Hughes, 1998; Coltman et al., 1999; Garant et al., 2001; Avise et al., 2002; Robledo-Arnuncio and Gil, 2005; Bretman and Tregenza, 2005). Several classes of statistical methods are proposed to perform parentage analyses using data of various genetic markers (Marshall et al., 1998; Jones and Ardren, 2003). To evaluate the statistical power of a parentage analysis and characterize the informativeness of markers, the parentage exclusion probability (PE) is usually calculated. It is defined as the average capability of any marker system to exclude a ‘random’ individual from parentage when the other parent (its genotype) is either known or unknown, or to exclude a ‘random’ pair of individuals as both parents of an offspring. A high PE value indicates that the marker system is highly informative for parentage analysis and that the parentage analysis using the marker system is highly powerful.

PE calculation was first described by Wiener et al. (1930) for biallelic loci, and was subsequently extended to loci with any number of codominant alleles (Jamieson, 1965; Selvin, 1980; Ohno et al., 1982; Chakraborty et al., 1988; Dodds et al., 1996; Weir 1996, p 209; Jamieson and Taylor, 1997) and to dominant loci (Chakraborty et al., 1974; Gerber et al., 2000). The probability of excluding a relative of (rather than a random individual unrelated to) the true father from paternity when the maternal genotype is known was also derived (Salmon and Brocteur, 1978; Thompson and Meagher, 1987; Double et al., 1997; Fung et al., 2002; Hu et al., 2005).

In all previous studies, however, PE is calculated invariably assuming that a single offspring is genotyped to infer its parent or pair of parents. A single offspring genotype contains only one paternal and one maternal allele at an autosomal diploid locus, and has no information about the other paternal and maternal alleles. The probability that at least one copy of each parental allele is represented in a set n offspring is 1−21−n, and the potential of the parental genotype being fully inferable increases rapidly with n. Therefore, genotyping multiple offspring increases parentage exclusion probability for any given marker system. Some statistical methods have been developed to use the genotypes of multiple offspring in inferring their common parentage (Emery et al., 2001; Jones, 2001; Sieberts et al., 2002). Calculating PE for multiple offspring helps in determining more accurately the power of a parentage analysis, in screening markers by their informativeness, and in deciding on the appropriate numbers of offspring and markers to be genotyped in designing a parentage assignment experiment.

Recently, various statistical methods have also been proposed to infer sibships in a one-generation sample of individuals, using the genotypes of these individuals at a number of marker loci (Painter, 1997; Almudevar and Field, 1999; Thomas and Hill, 2000, 2002; Smith et al., 2001; Beyer and May, 2003; Wang, 2004; Butler et al., 2004). These methods allow the inference of sibships consisting of an arbitrary number of individuals by exclusion (Smith et al., 2001; Butler et al., 2004) or likelihood (Thomas and Hill, 2000; Wang, 2004) approaches. Although two individuals cannot be excluded from a full sibship using any number of autosomal markers, three or more non-siblings can be excluded from a full sibship using autosomal markers with three or more codominant alleles (Almudevar and Field, 1999; see below). Sibship exclusion probability (SE) can be defined as the average capability of any marker system to exclude a group of ‘random’-unrelated individuals from a full-sib family. Similar to PE, SE can be calculated to evaluate the informativeness of markers in and the power of a sibship analysis. A formula for SE was derived by Almudevar and Field (1999).

In this paper, I derive equations for parentage and sibship exclusion probabilities when an arbitrary number of individuals are involved. The equations for parentage exclusion probabilities reduce to previous ones when only one offspring is genotyped and used in parentage analysis. The equations for sibship exclusion probabilities are more explicit and easier to calculate than previous ones (Almudevar and Field, 1999). I show that, for both parentage and sibship inferences, the power of analysis and amount of marker information increase rapidly with an increasing number of individuals involved in the exclusion analysis. Finally, I consider the inference of half-sib (HS), grandparent–grandchild and avuncular relationships, which have the same IBD (identity by descent) sharing between pairs of individuals and are thus indistinguishable using unlinked autosomal markers (Epstein et al., 2000; McPeek and Sun, 2000). I show that when one or more full siblings of each of the two individuals are also genotyped for some unlinked autosomal markers and included in the relationship analysis, however, the three relationships can be easily discriminated with a high statistical power.

Assumptions

I consider the use of autosomal diploid markers with an arbitrary number of codominant alleles in parentage or sibship analyses. The markers are assumed to follow Mendelian inheritance without mutations and genotyping errors, to be in Hardy–Weinberg equilibrium and to be unlinked and in linkage equilibrium. The allele frequencies of a marker are known. These assumptions were also made explicitly or implicitly in previous calculations of PE.

Parentage exclusion probability

In classical parentage analyses, an individual is excluded as the parent of an offspring if the offspring's genotype at some locus cannot be generated from the genotype of the individual as a parent following Mendelian inheritance and barring mutations. The average probability (PE) that a ‘random’ individual unrelated to the true parent of an offspring is excluded from the parentage of the offspring using the information from a marker can be calculated, which depends on the allele frequencies of the marker only and indicates the informativeness (capability) of the marker in a parentage analysis. PE also measures the power of a parentage analysis using a given set of markers.

Three cases of parentage exclusion can be distinguished in practice. An individual is excluded from parentage of an offspring when the other parent (its genotype) is either known or unknown, or a pair of individuals is excluded as both parents of an offspring. Traditionally, PE is calculated for the three cases assuming a single offspring is genotyped to infer its parentage. I extend previous studies by considering an arbitrary number of full-sib offspring being genotyped to infer their parentage.

Parentage exclusion probability when one parent is known, PE1

Without loss of generality, I suppose a number of n (1) full-sib offspring and their mother are genotyped at an autosomal locus with k codominant alleles (Au) of known frequencies (pu, u=1, 2,…, k). The problem is to obtain the probability that a random male unrelated to the true parents is excluded from paternity of the n offspring, using the offspring and mother genotypes and the allele frequencies of the marker. Table 1 lists all possible mother–father–offspring genotype combinations and the corresponding excluded paternal genotypes given mother and offspring genotypes. It also shows, for each combination, the probabilities of the mother genotype, father genotype, offspring genotypes (conditional on parents’ genotypes), and the excluded paternal genotype given the mother's and offspring's genotypes. PE1 is obtained by summing the product of the joint probability of a mother–father–offspring genotype combination and the corresponding exclusion probability. For illustration, consider row 2 of Table 1 as an example. The joint probability of an AuAu mother (with probability pu2, column 2), an AvAy (v≠y) father (with probability 2pupy, column 4) and n AuAv offspring (with probability (1/2)n conditional on parental genotypes, column 6) is , and this mother–offspring genotype combination excludes all males that do not have an Av allele. Such males have a frequency of (1−pv)2 (column 8), so that the combination listed in row 2 of Table 1 has a combined exclusion probability of . Adding these probabilities for all mother–father–offspring genotype combinations listed in the table leads, after some tedious algebra, to

where

and , the sum of the sth power of allele frequencies.

Table 1 Paternity exclusion configurations of multiple offspring genotypes at a k-codominant allele locus: mother known

For a single offspring (n=1), (1) simplifies to

which is the same as derived earlier (Jamieson and Taylor, 1997). For a given number of offspring (n) and a given number of alleles (k) at a locus, PE1 increases as the allele frequencies become increasingly even. The maximum value is reached when all k alleles have an equal frequency of 1/k,

which again reduces to

when n=1 as derived previously (Weir, 1996). PE1 is also a monotonically increasing function of the number of offspring (n), and the maximum value when n → ∞ is

The maximum value computed by (5) is generally quickly attained with an increasing n, except when the marker is very uninformative (that is, few alleles with uneven frequencies).

Parentage exclusion probability when no parent is known, PE2

An individual can be excluded from the parentage of a group of offspring if the genotype of any offspring at a locus cannot be generated from the genotype of the individual. Without loss of generality, I assume a number of n (1) full-sib offspring are genotyped at an autosomal locus with k codominant alleles to infer their paternity without knowledge of their maternal genotype. The (average) probability that a random male unrelated to the true parents is excluded from paternity of the n offspring, using the offspring genotypes and allele frequencies of the marker, can be derived similar to PE1,

where a, b, c and d are as defined in (1). As is expected, (6) reduces, for a single offspring (n=1), to

as derived previously (Jamieson and Taylor, 1997). Similar to PE1, PE2 increases as the allele frequencies become increasingly even for a given number of offspring (n) and a given number of alleles at a locus. The maximum value is reached when all k alleles have frequency 1/k,

When n=1, (8) reduces to

as derived previously (Jamieson and Taylor, 1997). Like PE1, PE2 is also a monotonically increasing function of the number of offspring (n), and the maximum value when n → ∞ is

Exclusion probability of a pair of individuals as parents, PE3

For a group of n (1) full-sib offspring, we may be interested in inferring the pair of parents that have produced them. A pair of individuals can be excluded as both parents of the n offspring if their genotypes cannot be explained fully by those of the two individuals as parents at a locus. The average probability of excluding two random individuals, who are unrelated between themselves and to the true parents, as parents of the n offspring using an autosomal marker with k codominant alleles is

which can be derived using an approach similar to that in deriving PE1 and PE2. In (11), a, b, c and d are as defined in (1). For the case of a single offspring (n=1), (11) is simplified to

as derived earlier (Jamieson and Taylor, 1997). Similar to PE1 and PE2, the maximum value of PE3 is reached when all k alleles at a locus have frequency 1/k,

When n=1, (13) reduces to

as derived earlier (Jamieson and Taylor, 1997). Like PE1 and PE2, PE3 is also a monotonically increasing function of the number of offspring (n), and its maximum value when n → ∞ is

Sibship exclusion probability

A group of individuals are excluded from comprising a full sibship if their genotypes at a locus cannot be generated by any pair of parental genotypes. For an autosomal codominant locus, sibship exclusion is warranted, for example, when the individuals display five or more alleles. Like parentage exclusion, we can calculate the average probability, SE, of excluding a group of n-unrelated individuals as full siblings using a k-allele codominant marker. Therefore, SE signifies the capability of a marker in a sibship analysis, and the statistical power of a sibship analysis using a given set of markers.

For an autosomal diploid locus with k codominant alleles, the average probability of excluding n-unrelated individuals as full siblings can be derived (Appendix) as

where (note cstcts) for . SE simplifies greatly in the following special cases.

(1) n=2 or k=2

It can be shown that SE≡0 when either n=2 or k=2. A pair of individuals (n=2) is never excluded from being full siblings no matter how polymorphic a marker is, and biallelic loci (k=2) do not allow sibship exclusion regardless of n.

(2) n=3 and n=4

For trios and quadruplets of unrelated individuals, (16) reduces to

respectively, where

(3) Equal allele frequency

Like PE, SE increases for given values of n (>2) and k (>2) when allele frequencies become increasingly even. The maximum SE is attained when all alleles have the same frequency of 1/k,

Exclusion probabilities for multiple loci and multiple tests

For a number of L-independent loci, the cumulative exclusion probability is calculated as

where Pl is the exclusion probability for locus l calculated by (1), (6), (11) or (16).

The above calculations are for a single test. In almost all practical analyses, however, usually a large number of groups of individuals are tested for parentage or sibship and the aim is ideally to exclude all false parentage or sibship relationships. For a given marker system, the aim is obviously more difficult to achieve with a larger number of tests. For a number of M independently replicated tests, the number of non-exclusions of a false relationship, m, is roughly binomially distributed, mBinomial (M,1–P), with a mean of M(1−P) and a variance of MP(1−P). Using a given set of markers, the number of non-excluded false parentage (or sibship) events is expected to increase linearly with the number of tests M. The probability that exclusions occur to all of the M tests (that is, perfect exclusion of all false relationships) is PM, which decreases exponentially with M.

Theoretical examples of exclusion probabilities

Parentage (sibship) exclusion probabilities depend on the allele frequencies at a locus and the number of full-sib offspring (the number of unrelated individuals) that are genotyped for determining their parentage (sibship), n. The effects of allele frequencies on exclusion probabilities are well known. As a numerical example to illustrate the effect of n, I calculated exclusion probabilities using a locus with k alleles in triangular frequencies of for . The changes of PE1, PE2, PE3 and SE with n for a locus with k=5 alleles are shown in Figure 1. For any given value of n, PE3 is the largest and PE2 is the smallest among the three parentage exclusion probabilities. PE1 is always larger than PE2 because the extra maternal information allows more exclusions of false fathers. All of the four exclusion probabilities increase with an increasing value of n and quickly become attenuated at n=4–7. For this particular locus, the minimum values of PE1, PE2, PE3 and SE are 0.53, 0.35, 0.71 and 0.00, respectively, when n=1, and maximum values are 0.81, 0.66, 0.93 and 1.00, respectively, when n>7.

Figure 1
figure 1

Exclusion probabilities as a function of the number (n) of full-sib offspring in a parentage analysis or of unrelated individuals in a sibship analysis. A single locus having five codominant alleles with frequencies in a triangular distribution is used in the calculation.

Increasing n is more beneficial for less informative markers. Figure 2 plots exclusion probabilities (PE1) for multiple offspring (n>1) relative to that for a single offspring (n=1) as a function of n. A locus with k (=2,4,6,8,10) equally frequent alleles was used in calculating PE1 from (3). As can be seen, a smaller k leads to a faster increase in PE1 with n. While the relative exclusion probability for n=10 is only about 120% for a highly informative locus with k=10 alleles, it is about 200% for a much less informative biallelic locus (k=2). Therefore, less informative markers are more efficiently compensated by sampling and genotyping multiple offspring in parentage analyses. The same conclusion is true for PE2 and PE3, and for loci with any allele frequency distributions.

Figure 2
figure 2

Paternity exclusion probabilities for multiple offspring (n>1) relative to those for a single offspring (n=1). The maternal genotypes are assumed known, and a locus with k alleles of an equal frequency is used in calculating PE1. The five lines correspond to k=2, 4, 6, 8, 10 as indicated in the graph.

Because parentage exclusion probabilities increase with both the number of loci and the number of full-sib offspring, one may want to know in a practical parentage analysis whether it is more rewarding to genotype more loci or more offspring for a given cost or alternatively whether it is more economical to genotype more loci or more offspring to achieve a given statistical power. Figure 3 shows the exclusion probabilities for L=1 and n=2, 4, 6 relative to those for L=2 and n=1, as a function of the number of alleles (k) per locus. The allele frequencies are assumed to be in a triangular distribution. As can be seen, L=1 and n=6 results in similar (for PE1 and PE3) or even larger (for PE2) parentage exclusion probabilities than L=2 and n=1. While one should always try to use as many markers as possible in a parentage analysis for maximal power, he/she can also improve the analysis power by genotyping more offspring. The development of markers such as microsatellites in relationship analyses for a given species is expensive, and furthermore, the number of informative markers available may be limited. In such situations, therefore, the power of parentage analyses can be improved substantially by genotyping more offspring in a litter.

Figure 3
figure 3

Parentage exclusion probabilities with a single locus and multiple offspring relative to those with two loci and a single offspring. It is assumed that a locus has 2–20 codominant alleles with frequencies in a triangular distribution, and the number (n) of full-sib offspring used in parentage analyses is 2, 4 or 6. The upper, middle and lower panels are for relative parentage exclusion probabilities PE1, PE2 and PE3, respectively.

Practical examples of exclusion probabilities

The allele frequencies of seven microsatellite loci in Atlantic salmon were published in Villanueva et al. (2002). The parentage and sibship exclusion probabilities for the loci are calculated using (1), (6), (11) and (16) and are listed in Table 2. The most informative locus for parentage or sibship inference is locus 1 which has 14 alleles, rather than locus 2 that has 21 alleles. This is because the allele frequencies of locus 1 are more even than those of locus 2. The rank order of the informativeness among the seven markers changes only slightly, depending on the specific exclusion probability being computed and compared. Locus 2 gives slightly larger values of PE1 and PE2 but smaller values of PE3 and SE than locus 3. For each of the seven loci, its exclusion power increases rapidly with an increasing number of full-sib offspring genotyped and included in a relationship analysis.

Table 2 Exclusion probabilities calculated for seven microsatellite loci in Atlantic salmona

The combined exclusion probabilities over the seven loci are very high, close to the maximum value of 1. However, this does not necessarily mean that all false relationships can be excluded in a given parentage analysis. The accuracy of a parentage analysis depends on not only how informative the markers are as indicated by exclusion probabilities, but also the total number of tests to be carried out in the analysis. As an example, consider the exclusion of paternity for offspring with known maternal genotypes. The combined PE1 for the seven microsatellites is 1−2.00 × 10−4=0.9998 when a single offspring (n=1) is tested for paternity. If a sample of N offspring–mother pairs and N candidate fathers unrelated to any of the N offspring is obtained, and each candidate is tested for paternity of each offspring, there would be a total number of N2 tests. These N2 tests are not truly independent, because an individual appears in multiple tests. However, to a good approximation, the following calculation is conducted under the assumption of independent tests. The probability of complete exclusion of the N2 false offspring–father relationships is 0.9998 N 2 , which is 0.923 for N=20, 0.135 for N=100 and 0.000335 for N=200. With an increasing sample size, the exclusion power of the 7 microsatellites diminishes exponentially. If there are four full-sib offspring instead of only one offspring in each of the N mother–offspring group, the combined PE1 for the seven microsatellites is increased to 1–3.13 × 10−7=0.999999687. The probability of complete exclusion of the N2 false offspring–father relationships becomes 0.999999687 N 2 , which is 0.9999 for N=20, 0.9969 for N=100 and 0.9876 for N=200.

The above numerical examples illustrate that a high exclusion probability may still result in a low probability of complete exclusion of all false parentage if the sample size is large, and that the use of multiple full-sib offspring can increase the power dramatically. In a similar context, we should realize the importance of recording exclusion probabilities with a sufficient number of significant digits. In the above numerical example of N offspring–mother pairs (n=1) and N candidate fathers with N=100, the probability of complete exclusion of the 104 false offspring–father relationships is 0.135 if EP1=0.9998 but becomes 0.368 if EP1=0.9999 and 0.050 if EP1=0.9997. A tiny change in exclusion probability can translate to a substantial alteration in the measurements of the overall power of a parentage analysis. For this reason, it is more convenient to calculate and record non-exclusion probabilities rather than exclusion probabilities.

Distinguishing half-sib, avuncular and grandparent–grandoffspring relationships

In a parentage analysis, the use of multiple full-sib offspring increases dramatically the probability of excluding a false parent or a false pair of parents as shown above. Analysing trios rather than pairs of individuals simultaneously for genealogical relationships in a likelihood framework also increased the power substantially (Sieberts et al., 2002). It is well known that the three relationships, half-sib (HS), avuncular and grandparent–grandoffspring (GG), between a pair of individuals cannot be distinguished using unlinked autosomal markers and can be distinguished with very low power using linked autosomal markers (Epstein et al., 2000). In this section, I show that when one or both individuals in the pair have one or more relatives (for example, fullsibs) and the genotype data of the two individuals and their relatives are analysed jointly, these three relationships can be easily differentiated using unlinked autosomal markers. Avuncular refers to any of the four combinations of aunt–nephew, aunt–niece, uncle–nephew and uncle–niece, and I consider aunt–niece (AN) as an example.

Suppose a pair of individuals, A and B, may have a HS, AN or GG relationship, and n1–1 full siblings to A and n2–1 full siblings to B are also sampled and genotyped at an autosomal marker with k codominant alleles. Here I consider the likelihood of these n1+n2 individuals falling into the three possible pedigrees as depicted in Figure 4. When n1=n2=1, this reduces to inferring HS, AN and GG relationships between a pair of individuals, A and B. Notice that any pair of individuals taken separately from clusters 1 with n1 individuals and 2 with n2 individuals has the same relationship for the HS or AN pedigree, but can have one of two possible relationships, grandparent–grandoffspring or grandaunt–grandniece, for the GG pedigree if n1>1 (Figure 4).

Figure 4
figure 4

Pedigrees involving aunt–niece (AN), half-sib (HS) and grandparent–grandoffspring (GG) relationships. Males and females are indicated by squares and circles, respectively, and individuals that are sampled and unsampled are indicated by solid and broken lines, respectively.

The likelihoods of the two full-sib clusters with n1 and n2 individuals falling into the HS, AN and GG pedigrees in Figure 4 are

where

is the probability of observing the genotype of offspring j in cluster i (i=1,2; j=1, …, ni), given parental genotypes Guv and Gwx. Pr(GijGuw)=1 if the genotype of offspring j in cluster i has both alleles u and w and Pr(GijGuw)=0 if otherwise. Note that in LGG, t indexes the two alleles, c and d, in the genotype of the grandparent of full-sib cluster 2.

It can be shown that LHSLANLGG when n1=n2=1, LHSLGGLAN when n1=1 and n2>1, LHSLANLGG when n1>1 and n2=1, indicating that these three relationships cannot be distinguished no matter how many markers are used when n1=1 and/or n2=1. If both n1>1 and n2>1, however, the three likelihood values are different for an autosomal marker and therefore the three relationships can be differentiated. As a numerical example, the seven microsatellite markers in Atlantic salmons listed in Table 2 are utilized to distinguish the three relationships when the two full-sib clusters have various sizes. The HS, AN or GG pedigrees depicted in Figure 4 are simulated and the genotypes of the two full-sib clusters at the seven microsatellite loci are generated following Mendelian segregation. LHS, LAN and LGG are then calculated from the genotype data, and the relationship between the two clusters of full siblings is inferred as the one with the maximum likelihood. Whenever two or three relationships have the same maximum likelihood, they are assigned as the true relationship with an equal probability. Each pedigree is simulated 100 000 times for a given value of n1 or n2, assuming n1=n2. The rates that an actual relationship is inferred as HS, AN and GG are plotted against n1 (or n2) in Figure 5. When n1=n2=1, the three relationships are indistinguishable regardless of the actual relationship, resulting in a correct classification rate of 1/3. With an increasing value of n1 (=n2), however, the relationship-misclassification rate decreases rapidly for all of the three simulated pedigrees. Even with n1=n2=3, the misclassification rate is only 0.10, 0.10 and 0.08 for the simulated relationship of AN, HS and GG, respectively. The statistical power of the analysis using merely seven microsatellites is extremely high compared with that of the analysis using pairs of individuals but hundreds of linked markers (Epstein et al., 2000). Using 399 autosomal markers (each with four equally frequent alleles) spaced at 10-cM intervals across the human genome, a pair of individuals with GG, HS and AN relationships is still assigned an incorrect relationship at a rate of 0.28, 0.63 and 0.38, respectively (Epstein et al., 2000). The comparison highlights the impact of analysing simultaneously multiple individuals with partially known or completely unknown relationships. When the relationships among the n1+n2 sampled individuals in Figure 4 are completely unknown but are confined to a few candidates such as GG, HS, AN and full-sib (FS), it is still possible to reconstruct the pedigree using unlinked autosomal markers in a likelihood approach (Sieberts et al., 2002; Wang, 2004) if n1>1 and n2>1. The accuracy of the inferences can be smaller than that shown in Figure 5, but the difference in accuracy should diminish rapidly with an increasing amount of marker information.

Figure 5
figure 5

The effect of the number of full siblings on the accuracy of distinguishing aunt–niece, grandparent–grandoffspring and half-sib relationships. Lines marked by AN, GG and HS show the proportions of the actually simulated aunt–niece (top), grandparent–grandoffspring (middle) or half-sib (bottom) relationships between the two full-sib clusters being inferred as AN, GG and HS relationships, respectively. The numbers of individuals in the two full-sib clusters are assumed to be the same (n1=n2) as shown on the x axis. The data are simulated using the seven microsatellite markers in the Atlantic salmons.

Discussion

In marker-assisted parentage analyses, it is common that a mother and a number of her offspring (or fertilized eggs/seeds) are genotyped to infer their paternity (Kichler et al., 1999; Adams et al., 2005; Bretman and Tregenza, 2005; Chapple and Keogh, 2005; Gosselin et al., 2005; Madsen et al., 2005). Some of the offspring sampled from a mother may be full siblings fathered by the same male, and their genotypes can be used jointly to infer the paternity more accurately. Although the full- and HS relationships among the offspring from a mother are usually unknown, they can be identified and utilized to infer their paternity by some recently developed statistical methods (Emery et al., 2001; Sieberts et al., 2002). In such cases, therefore, parentage exclusion probabilities calculated using previous equations assuming a single offspring underestimate the statistical power of parentage analyses and undervalue the amount of information of markers. The equations derived in this study allow more accurate determination of marker information and of the power of parentage analyses. In addition, they can be used to guide experimental designs of parentage analyses in selecting markers and determining the number of offspring to be sampled and genotyped.

Recently, exclusion (Smith et al., 2001; Butler et al., 2004) and likelihood (Thomas and Hill, 2000; Wang, 2004) approaches have been developed to infer sibships in a sample of individuals using their marker genotypes without parental information. To assess the power of and the informativeness of markers in a sibship analysis, Almudevar and Field (1999) derived equations for the probabilities of excluding a number of n-unrelated individuals, a number of n–1 full siblings and 1 unrelated (or HS) individual as comprising a full sibship. In the present study, I derived a formula for sibship exclusion probability (SE) which reduces to very simple forms in some special cases (equations 17–19). I showed that, for a given marker system, SE increases very rapidly with n, indicating that a group of unrelated individuals is much easily excluded from a full sibship if the group size (n) is large. In other words, an inferred sibship of n individuals becomes increasingly reliable with an increasing value of n, regardless of the methodology (exclusion or likelihood) used in a sibship analysis using a given marker system. The implication for sibship analyses is that the statistical power would be low if most sibships are small in size (say, n<4). In such a case, more informative markers are required to attain sufficient statistical power. On the contrary, when most sibships in a sample of individuals are large, then it is easy to infer the sibships with even a small number of markers.

Traditionally, genealogical relationships or relatedness is inferred between a pair of individuals. Although simple to implement, the pairwise approach suffers from a number of drawbacks. First, valuable information may be lost in breaking the sampled individuals into pairs and considering each in isolation (Sieberts et al., 2002; Wang, 2004). All individuals in a sample may provide direct and indirect information concerning the relationship of a dyad, especially those closely related to the dyad. In diploid species, for example, sibship exclusion of a group of n individuals is impossible if n=2 but is feasible if n>2 from codominant marker data, as is shown by the present study. Indeed, more accurate relationship inferences are achieved by analysing trios rather than pairs of individuals (Sieberts et al., 2002). This investigation further demonstrates that HS, avuncular and GG relationships can be easily discriminated using unlinked markers when three or more related individuals are analysed jointly. If only a pair of individuals are analysed, however, the three relationships are indistinguishable using unlinked markers, and are only marginally differentiated using linked markers (Epstein et al., 2000). Second, the inferred pairwise relationships are not guaranteed to be self-compatible. Among three individuals, for example, two dyads may be inferred as fullsibs and the other dyad as non-fullsibs from the pairwise methods. The three inferred pairwise relationships are obviously incompatible. In a pairwise parentage analysis, a male and a female may be inferred independently as the father and mother of an offspring, respectively. When the trio are considered jointly, however, the two adults may be incompatible as both parents of the offspring. Third, pairwise approaches infer direct relationships at the lowest level, between a pair of individuals. Such pairwise relationships suffice in some instances in which they are used, for example, to avoid mating between relatives in managing conservation populations (Herbinger et al., 1995). In most cases, however, knowledge of higher order relationships is desirable, which requires all the individuals in a sample to be allocated into various genetic groups (Smith et al., 2001). Further information may be lost in subsequent analyses, such as estimating heritability (Thomas and Hill, 2000), if only pairwise relationships are inferred and used. Although it is possible to first infer pairwise relationships and then cluster them into genetic groups (Blouin et al., 1996; Beyer and May, 2003), such a two-step procedure does not exploit the marker information fully and has to resort to some heuristic rules to resolve the conflicts among some pairwise relationships. This study highlights the great benefits of analysing multiple-related (for example, in inferring parentage or distinguishing GG, HS and AN relationships) or -unrelated (for example, in sibship analyses) individuals to infer their relationships.

I wish to emphasize that parentage (sibship) exclusion probabilities measure adequately the informativeness of markers and the power of parentage (sibship) analyses only when the exclusion approach is adopted in relationship inferences. For other approaches such as likelihood, these probabilities serve the purposes only approximately. In general, a set of markers with a high cumulative exclusion probability and thus a high power in relationship exclusion analyses is also highly informative and gives a high statistical power in likelihood analyses. However, exceptions do exist. For example, biallelic dominant markers such as AFLPs do not allow paternity exclusion in the absence of maternal genotypes (Chakraborty et al., 1974; Gerber et al., 2000). No matter how many such loci are used, therefore, PE2≡0 and the paternity exclusion analysis is powerless. These markers are nevertheless informative and can be used to infer paternity in the likelihood framework. Similarly, biallelic codominant markers such as SNPs are completely uninformative in sibship exclusion analyses (SE=0) but provide information to differentiate sibship from other relationships by likelihood (Wang, 2006). Some alternative informativeness measurements other than exclusion probabilities have been proposed to measure the information content of markers in inferring genealogical relationships, which apply to all kinds of markers (dominant or codominant, two or more alleles per locus) and relationships and allow for genotyping errors (Wang, 2006).

In the derivation, I followed previous studies in assuming that the markers are in Hardy–Weinberg equilibrium (HWE). It should be noted that this assumption may be violated in real populations, leading to an under- or over-estimation of the exclusion probabilities. A number of conditions are required for a population to reach at and remain in HWE (Crow and Kimura, 1970). Deviation from HWE is resulted when, for example, the marker is under direct or indirect selection, population size is small, mating is not at random with respect to kin (for example, inbreeding avoidance, population subdivision). Whatever the cause of the deviation, its impact on exclusion probability can be formulated using Wright (1965) statistic of FIS denoted by f, following the same approach as adopted in deriving (1), (6), (11) and (16). The formulas become quite complicated, however. For the case of paternity exclusion probability with a known mother, for example, the formula can be derived as

where f1=1−f and f2=2−f, and a, b, c and d are as defined in (1). When the marker is in HWE so that f=0, the above formula reduces to (1) as expected. The impact of f on PE1 is shown in Figure 6 for a locus with five codominant alleles of an equal frequency. It can be seen that the magnitude of effect on PE1 of the deviation from HWE is relatively small, and that the direction of effect on PE1 depends on n. When n is small, inbreeding (positive f) leads to an increase in PE1, while when n is large, inbreeding results in a decrease in PE1. As an empirical example, consider the marker with 14 codominant alleles with frequencies listed in the first row of Table 2. When n=1, the values of PE1 are 0.7759, 0.7873, 0.7985 with f=–0.1, 0, 0.1, respectively. When n=4, the values of PE1 become 0.9430, 0.9336, 0.9251 with f=–0.1, 0, 0.1, respectively.

Figure 6
figure 6

The effect of deviation from Hardy–Weinberg equilibrium (f) on the paternity exclusion probability (PE1) when a known mother and n full-sib offspring are used in a paternity analysis. A single locus with five equal frequency codominant alleles is assumed in the calculation.

The assumption of linkage equilibrium (LE) is required to calculate multi-locus exclusion probabilities simply from single-locus values. Unlike HWE, it is difficult to relax this assumption in deriving the exclusion probabilities. However, like HWE, slight deviations from LE should have a small effect on exclusion probabilities. To investigate quantitatively the impact of deviation from LE, a further simulation study should be conducted.