Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Genetic relatedness analysis: modern data and new challenges

An Author Correction to this article was published on 22 December 2021

This article has been updated

Key Points

  • Population and quantitative genetics theory is built with parameters that describe relatedness, and estimation of these parameters from genetic markers enables progress in fields as disparate as plant breeding, human disease gene mapping and forensic science.

  • Relatedness can be described by the probabilities that two individuals share zero, one or two pairs of alleles that are identical-by-descent. More probabilities are needed if the individuals are inbred, meaning that their parents were related.

  • Alternative hypotheses about the relationship between two individuals can be evaluated by dividing the probability of the observed genotypes of the individuals under one hypothesis by the probability of the genotypes under the other. The ratio of probabilities is called the likelihood ratio. In paternity testing, it is called the paternity index.

  • The probabilities of patterns of identity-by-descent can be estimated by the method of maximum likelihood.

  • Even for individuals whose parents are not related, and who are therefore not inbred, account needs to be taken of 'background relatedness' that is due to evolutionary history in a population.

  • Even though the probabilities of identity-by-descent are defined by the family and population relatedness of two individuals, there is variation in actual identity-by-descent along the genome. This reflects the differences in actual genealogies at different loci, and it is influenced by recombination along with mutation and natural selection.

  • Relationship is best estimated by highly polymorphic markers, to minimize the ambiguity between identity-in-state and identity-by-descent. However, reliable estimates can be obtained with a sufficiently large number of biallelic SNPs.

Abstract

Individuals who belong to the same family or the same population are related because of their shared ancestry. Population and quantitative genetics theory is built with parameters that describe relatedness, and the estimation of these parameters from genetic markers enables progress in fields as disparate as plant breeding, human disease gene mapping and forensic science. The large number of multiallelic microsatellite loci and biallelic SNPs that are now available have markedly increased the precision with which relationships can be estimated, although they have also revealed unexpected levels of genomic heterogeneity of relationship measures.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Complete set of identity-by-descent measures.
Figure 2: Likelihood ratios for putative full-siblings.
Figure 3: Effect of background relatedness on coancestry estimates.
Figure 4: Variation in estimated coancestries along a chromosome.

Similar content being viewed by others

Change history

References

  1. Bowers, J. E. & Meredith, C. E. The parentage of a classic wine grape: Cabernet Sauvignon. Nature Genet.16, 84–87 (1997).

    Article  CAS  Google Scholar 

  2. Jeffreys, A. J., Wilson, V. & Thein, S. L. Individual-specific fingerprints of human DNA. Nature316, 76–79 (1985).

    Article  CAS  Google Scholar 

  3. Harris, D. L. Genotypic covariances between inbred relatives. Genetics50, 1319–1348 (1964). An early paper in which co-variances in trait values between individuals are expressed as functions of IBD probabilities.

    Article  CAS  Google Scholar 

  4. Brenner, C. H. & Weir, B. S. Issues and strategies in the DNA identification of World Trade Center victims. Theor. Popul. Biol.63, 173–178 (2003). Describes the procedures that are used to identify victims from mass disasters using DNA from known relatives, with a focus on statistical, combinatorial and population genetic issues.

    Article  CAS  Google Scholar 

  5. Wenk, R. E. & Chiafari, F. A. Distinguishing full siblings from half-siblings in limited pedigrees. Transfusion40, 44–47 (2000).

    Article  CAS  Google Scholar 

  6. Gaytmenn, R., Hildebrand, D. P., Sweet, D. & Pretty, I. A. Determination of the sensitivity and specificity of sibship calculations using AmpF/STR Profiler Plus. Int. J. Legal Med.116, 161–164 (2002).

    Article  CAS  Google Scholar 

  7. Reid, T. M. et al. Specificity of sibship determination using the ABI identifiler multiplex system. J. Forensic Sci.49, 1262–1264 (2004).

    Article  CAS  Google Scholar 

  8. Tzeng, C. H. et al. Determination of sibship by PCR-amplified short tandem repeat analysis in Taiwan. Transfusion40, 840–845 (2000).

    Article  CAS  Google Scholar 

  9. Bieber, F. R., Brenner, C. H. & Lazer, D. Finding criminals through DNA of their relatives. Science312, 1315–1316 (2006).

    Article  CAS  Google Scholar 

  10. Olaisen, B., Stenersen, M. & Mevag, B. Identification by DNA analysis of the victims of the August 1996 Spitsbergen civil aircraft disaster. Nature Genet.15, 402–405 (1997).

    Article  CAS  Google Scholar 

  11. Leclair, B., Fegeau, C. J., Bowen, K. L. & Fourney, R. M. Enhanced kinship analysis and STR-based DNA typing for human identification in mass fatality incidents: the Swissair Flight 111 disaster. J. Forensic Sci.49, 939–953 (2004).

    CAS  PubMed  Google Scholar 

  12. Thompson, E. A. Estimation of pairwise relationships. Ann. Hum. Genet.39, 173–188 (1975). The classical treatment of maximum likelihood estimation of the three-parameter set of relatedness measures for non-inbred relatives.

    Article  CAS  Google Scholar 

  13. Milligan, B. G. Maximum-likelihood estimation of relatedness. Genetics163, 1153–1167 (2003). An important demonstration of the superiority of maximum likelihood methods. Contains details on the implementation and performance of maximum likelihood relatedness estimation when the individuals who are being compared might be inbred.

    Article  Google Scholar 

  14. Yu, J. et al. A unified mixed-model method for association mapping accounting for multiple levels of relatedness. Nature Genet.38, 203–208 (2006).

    Article  CAS  Google Scholar 

  15. Liu, W. & Weir, B. S. Affected sib-pair tests in inbred populations. Ann. Hum. Genet.68, 606–619 (2004). The authors develop an analogue of a standard affected sib-pair test for linkage for use in inbred populations.

    Article  CAS  Google Scholar 

  16. Balding, D. J. & Nichols, R. A. DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands. Forensic Sci. Int.64, 125–140 (1994). One of the early papers that deals with the calculation of genotype probabilities for individuals who are allowed to come from a subpopulation of the population from which the allele frequencies have been calculated.

    Article  CAS  Google Scholar 

  17. Ewens, W. J. Mathematical Population Genetics. 1. Theoretical Introduction 2nd edn (Springer, New York, 2004). A useful reference that contains a treatment of the theory that underlies the calculation of genotypic probabilities for structured populations.

    Book  Google Scholar 

  18. Hinds, D. A. et al. Whole-genome patterns of common DNA variation in three human populations. Science307, 1072–1079 (2005).

    Article  CAS  Google Scholar 

  19. International HapMap Consortium. A haplotype map of the human genome. Nature437, 1299–1320 (2005).

  20. Ayres, K. The expected performance of single nucleotide polymorphism loci in paternity testing. Forensic Sci. Int.154, 167–172 (2005).

    Article  CAS  Google Scholar 

  21. Sobrino, B., Brion, M. & Carracedo, A. SNPs in forensic genetics: a review of SNP typing methodologies. Forensic Sci. Int.154, 181–194 (2005). Shows that when IBD is measured with respect to distant ancestry, IBD sharing between two individuals varies appreciably across the genome.

    Article  CAS  Google Scholar 

  22. Gill, P. An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes. Int. J. Legal Med.114, 204–210 (2001).

    Article  CAS  Google Scholar 

  23. Amorim, A. & Pereira, L. Pros and cons in the use of SNPs in forensic kinship investigation: a comparative analysis with STRs. Forensic Sci. Int.150, 17–21 (2005).

    Article  CAS  Google Scholar 

  24. Hepler, A. B. Improving Forensic Identification using Bayesian Networks and Relatedness Estimation. Ph.D. Thesis, North Carolina State Univ., Raleigh (2005).

    Google Scholar 

  25. Weir, B. S., Cardon, L., Anderson, A. D., Nielsen, D. M. & Hill, W. G. Heterogeneity of measures of population structure along the human genome. Genome Res.15, 1468–1476 (2005).

    Article  CAS  Google Scholar 

  26. Ballantyne, J. Mass disaster genetics. Nature Genet.15, 329–331 (1997).

    Article  CAS  Google Scholar 

  27. DeWoody, J. A. Molecular approaches to the study of parentage, relatedness, and fitness: practical applications for wild animals. J. Wildl. Manage.69, 1400–1418 (2005).

    Article  Google Scholar 

  28. Williams, C. L., Serfass, T. L., Cogan, R. & Rhodes, O. E. Microsatellite variation in the reintroduced Pennsylvania elk herd. Mol. Ecol.11, 1299–1310 (2002).

    Article  CAS  Google Scholar 

  29. Slager, S. L. & Schaid, D. J. Evaluation of candidate genes in case–control studies: a statistical method to account for related subjects. Am. J. Hum. Genet.68, 1457–1462 (2001).

    Article  CAS  Google Scholar 

  30. Voight, B. F. & Pritchard, J. K. Confounding from cryptic relatedness in case–control association studies. PLoS Genet.1, 302–311 (2005). Demonstrates that unknown relatedness between supposedly unrelated cases or controls can lead to an increased false-positive rate in genetic association studies.

    Article  CAS  Google Scholar 

  31. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics55, 997–1004 (1999).

    Article  CAS  Google Scholar 

  32. Merila, J. & Crnokrak, P. Comparison of genetic differentiation at marker loci and quantitative traits. J. Evol. Biol.14, 892–903 (2001).

    Article  Google Scholar 

  33. Cockerham, C. C. Higher order probability functions of identity of alleles by descent. Genetics69, 235–246 (1971). Cockerham considers the sharing of IBD alleles between two individuals in terms of 15 IBD parameters, develops procedures for calculating the values of these parameters from pedigree data and examines their properties under various mating schemes.

    Article  CAS  Google Scholar 

  34. Jacquard, A. Structures Génétiques des Populations (Masson & Cie, Paris, 1970); English translation available in Charlesworth, D. & Chalesworth, B. Genetics of Human Populations (Springer, New York, 1974). Considers relatedness in terms of nine IBD coefficients: these are now the most commonly used parameters for describing relatedness between two (possibly inbred) individuals.

    Book  Google Scholar 

  35. Budowle, B. & Moretti, T. R. Genotype profiles for six population groups at the 13 CODIS short tandem repeat core loci and other PCR-based loci. US Department of Justice Forensic Science Communications [online], <http://www.fbi.gov/hq/lab/fsc/backissu/july1999/budowle.htm> (1999).

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by grants from the National Institutes of Health, the National Institute of Justice and the National Science Foundation. We are grateful to W.G. Hill and the reviewers for helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruce S. Weir.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Related links

Related links

FURTHER INFORMATION

Centre d'Etude du Polymorphisme Humain

Homepage for the Department of Biostatistics, University of Washington

International HapMap Project

Nature Reviews Genetics audio supplement

The a-China DNA Project

Glossary

Additive variance

The portion of the variance of a quantitative trait that is due to the single effects of alleles at the loci that influence the trait.

Dominance variance

The portion of the variance of a quantitative trait that is due to the interaction of the two alleles that an individual carries at the loci that influence the trait.

Affected-relative linkage studies

Studies that aim to estimate the degree of linkage between a disease and a marker locus on the basis of the marker genotypes of relatives who have the disease.

Microsatellite

Also known as a short tandem repeat. A class of repetitive DNA that is made up of repeats that are 2–5 nucleotides in length. The number of these repeats is usually extremely variable in a population.

Linkage disequilibrium

The non-random association of alleles at different loci, whether or not the loci are linked.

Minisatellite

A region of DNA in which repeat units of 10–50 bp are tandemly arranged in arrays that are 0.5–30 kb in length.

Association study

A study that aims to identify the joint occurrence of two genetically encoded characteristics in a population. Often, an association between a genetic marker and a phenotype (for example, a disease) is assessed.

Inbreeding coefficient

The probability that an individual carries two identical-by-descent alleles at a locus.

Coancestry coefficient

The probability that two alleles at a locus, one taken at random from two individuals, are identical-by-descent. It is also called the coefficient of parentage or coefficient of consanguinity.

Unordered genotypes

The probability of unordered genotypes does not require specifying which genotype belongs to which individual (for example, which is for the parent and which is for the child). By contrast, the probability of ordered genotypes requires this information.

Likelihood ratio

The ratio of two probabilities for the same observations, calculated under alternative hypotheses. In the context of relatedness analysis, the likelihood ratio is formed by dividing the probability of the observed pair of genotypes using the identical-by-descent probabilities for one possible relationship by the probability of the genotypes using identical-by-descent probabilities for the other possible relationship. The likelihood ratio is a continuous variable that can take any non-negative value, and values greater than one support the relationship used for the numerator.

CODIS forensic set

A set of 13 highly polymorphic and essentially unlinked microsatellite markers that were developed by the US Federal Bureau of Investigations for human identification purposes.

Bayesian (framework)

An inference framework in which the posterior probability of a parameter depends explicitly on its prior probability, reflecting some previous belief about this parameter.

Maximum likelihood (method)

The process of estimating parameters by choosing their values to maximize the probability of some observed data.

Bayes theorem

The means of going from a probability of one event, given another, to the probability of the second event, given the first. It is often used to express the (posterior) probability of a hypothesis, given some data, as being proportional to the probability of the data, given the hypothesis, multiplied by the (prior) probability of the hypothesis.

Prior probability

The probability of an event or hypothesis before consideration of some data that will alter the probability of that event or hypothesis.

Posterior probability

The probability of an event or hypothesis after consideration of some data that have altered the probability of that event or hypothesis.

Population substructure

The existence of groups of individuals within a population that have some degree of reproductive isolation from the rest of the population, and for which the allele frequencies are likely to be different from the population as a whole.

Kin selection

William D. Hamilton's theory to explain the evolution of the hallmark of social life: altruistic cooperation (carrying out functions that are costly to the individual but that benefit others). By helping a relative, an individual increases its fitness by increasing the number of copies of its genes in the population.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weir, B., Anderson, A. & Hepler, A. Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet 7, 771–780 (2006). https://doi.org/10.1038/nrg1960

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1960

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing