Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs

https://doi.org/10.1016/j.fsigen.2007.06.008Get rights and content

Abstract

Tests that infer the ancestral origin of a DNA sample have considerable potential in the development of forensic tools that can help to guide crime investigation. We have developed a single-tube 34-plex SNP assay for the assignment of ancestral origin by choosing ancestry-informative markers (AIMs) exhibiting highly contrasting allele frequency distributions between the three major population-groups. To predict ancestral origin from the profiles obtained, a classification algorithm was developed based on maximum likelihood. Sampling of two populations each from African, European and East Asian groups provided training sets for the algorithm and this was tested using the CEPH Human Genome Diversity Panel. We detected negligible theoretical and practical error for assignments to one of the three groups analyzed with consistently high classification probabilities, even when using reduced subsets of SNPs. This study shows that by choosing SNPs exhibiting marked allele frequency differences between population-groups a practical forensic test for assigning the most likely ancestry can be achieved from a single multiplexed assay.

Introduction

Ancestry-informative markers (AIMs) that can indicate the likely population of origin of a DNA sample where the source individual is not known or is unable to declare their ancestry are an increasingly important part of genetic association studies but to date have not been developed into a practical forensic test. A range of DNA polymorphisms are available with potential to be used as AIMs including autosomal and Y-chromosome short tandem repeats (STRs) and mitochondrial sequence variation (mtDNA) [1], [2], [3], [4], [5] but these have limitations. Micro-satellites do not exhibit large enough contrasts in allele frequencies between populations to be especially useful in numbers below 50 loci, mainly due to their mutational instability. Y-chromosome loci and mtDNA variation, while phylogeographically informative [6], [7], [8], are haploid so require very large databases to properly gauge population variability plus there is a risk of finding intact lineages atypical of the population [9]. Autosomal SNPs have emerged as amongst the best ancestry markers due to their stability, density of distribution and full range of allele frequency patterns amongst populations. Since the over-riding majority of human worldwide genetic diversity takes the form of geographic clines rather than clades [10], [11], [12], it is essential to find the small number of SNPs that show the most pronounced allele frequency discontinuities between continental regions to create marker sets with population “diagnostic” genotypes [13]. To help locate such SNPs one approach is to examine gene variation that has been subjected to strong regional positive selection in the recent past creating localized adaptations [14], [15], [16]. Well-documented examples [reviewed in 17] include SLC45A2 and de-pigmentation in Europe [18], DARC and Plasmodium vivax resistance in sub-Saharan Africa [19] and LCT implicated in pastoralist adaptation in Northern Europe [20]. We examined these genes and others to collect highly regionalized SNP variation.

In this study we aimed to develop a suitably powerful single-tube SNP test that showed the least error and was based on the most informative AIMs available, with these goals (1) to select SNPs that, in the first instance, gave a clear differentiation of sub-Saharan African, European and East Asian population-groups; (2) to validate allele frequencies to ensure within population-group variation was a minor proportion of total variability; (3) to balance the chromosome distribution of the final set to avoid linkage disequilibrium between SNP pairs; (4) to establish a straightforward Bayesian system for predicting ancestral origin and to estimate the misclassification rate by statistical means and by testing the CEPH human genome diversity cell line panel (CEPH-HGDP) comprising samples of confirmed geographic origin [21].

A forensic test handling single profiles requires a fast and flexible alternative to the widely used genetic clustering algorithm STRUCTURE [22] to offer easier classifications in real time. Therefore, the final stage of development of the test outlined here was the incorporation of the classification algorithm into an open access web portal to allow simple analysis of SNP profiles, including those with partial data. This portal was enhanced to allow analysis of a users custom populations and SNP markers with the same Bayesian classification algorithm and error estimation systems.

Section snippets

Population samples

Training sets for the classification algorithm were created for each population-group by combining two population samples comprising: sub-Saharan Africans (60 Mozambican and 60 Somali), Europeans (60 Galician from NW Spain and 60 Danish) and East Asians (60 Mainland Chinese and 60 Taiwanese). In all cases informed consent was obtained. Except for Somalis resident in Denmark samples were collected in the corresponding geographic region. The CEPH-HGDP panel comprising 1064 samples from 51

Patterns of SNP variability

The allele frequency distributions for 34 SNPs in the three population-groups studied are outlined in Fig. 2. To compare the training set and CEPH-HGDP frequencies the populations from each were combined in their group affiliations separately and arranged in paired plots. Allele frequencies for the 58 populations studied are listed in Table S2 in online supplementary data. All populations were in Hardy–Weinberg equilibrium and pair-wise analysis did not detect linkage disequilibrium within the

Discussion

An ancestral origin test limited to a single multiplex can run the risk of failing to adequately differentiate the population-groups analyzed by the test. Thirty-four SNPs is an upper limit for a primer extension assay but can be extended using systems such as Genplex dye-linked oligo-ligation with proven forensic performance [30]. Nevertheless tests using small SNP numbers must still maximize allele frequency differences between groups to have any chance of success on a broad enough scale. The

Acknowledgements

The African–American panel was supplied by Peter Vallone and John Butler at NIST and the authors are indebted to them for making these samples available. The work was supported by the European Commission GROWTH program, SNPforID project, contract G6RD-CT-2002-00844. Funding from Xunta de Galicia: (PGIDTIT06PXIB228195PR) and a grant from the Ministerio de Educación y Ciencia: (project BIO2006-06178) given to MVL supported this project. The ‘Ramón y Cajal’ Spanish programme from the Ministerio de

References (30)

  • T.E. King et al.

    Africans in Yorkshire? The deepest-rooting clade of the Y phylogeny within an English genealogy

    Eur. J. Hum. Genet.

    (2007)
  • D. Serre et al.

    Evidence for gradients of human genetic diversity within and among continents

    Genome Res.

    (2004)
  • A. Manica et al.

    Geography is a better determinant of genetic differentiation than ethnicity

    Hum. Genet.

    (2005)
  • N.A. Rosenberg et al.

    Clines, clusters, and the effect of study design on the inference of human population structure

    PLoS Genet.

    (2005)
  • N.A. Rosenberg et al.

    Genetic structure of human populations

    Science

    (2002)
  • Cited by (313)

    • Population genetics and human health in the genomic era

      2023, Journal of King Saud University - Science
    View all citing articles on Scopus
    View full text