Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs
Introduction
Ancestry-informative markers (AIMs) that can indicate the likely population of origin of a DNA sample where the source individual is not known or is unable to declare their ancestry are an increasingly important part of genetic association studies but to date have not been developed into a practical forensic test. A range of DNA polymorphisms are available with potential to be used as AIMs including autosomal and Y-chromosome short tandem repeats (STRs) and mitochondrial sequence variation (mtDNA) [1], [2], [3], [4], [5] but these have limitations. Micro-satellites do not exhibit large enough contrasts in allele frequencies between populations to be especially useful in numbers below 50 loci, mainly due to their mutational instability. Y-chromosome loci and mtDNA variation, while phylogeographically informative [6], [7], [8], are haploid so require very large databases to properly gauge population variability plus there is a risk of finding intact lineages atypical of the population [9]. Autosomal SNPs have emerged as amongst the best ancestry markers due to their stability, density of distribution and full range of allele frequency patterns amongst populations. Since the over-riding majority of human worldwide genetic diversity takes the form of geographic clines rather than clades [10], [11], [12], it is essential to find the small number of SNPs that show the most pronounced allele frequency discontinuities between continental regions to create marker sets with population “diagnostic” genotypes [13]. To help locate such SNPs one approach is to examine gene variation that has been subjected to strong regional positive selection in the recent past creating localized adaptations [14], [15], [16]. Well-documented examples [reviewed in 17] include SLC45A2 and de-pigmentation in Europe [18], DARC and Plasmodium vivax resistance in sub-Saharan Africa [19] and LCT implicated in pastoralist adaptation in Northern Europe [20]. We examined these genes and others to collect highly regionalized SNP variation.
In this study we aimed to develop a suitably powerful single-tube SNP test that showed the least error and was based on the most informative AIMs available, with these goals (1) to select SNPs that, in the first instance, gave a clear differentiation of sub-Saharan African, European and East Asian population-groups; (2) to validate allele frequencies to ensure within population-group variation was a minor proportion of total variability; (3) to balance the chromosome distribution of the final set to avoid linkage disequilibrium between SNP pairs; (4) to establish a straightforward Bayesian system for predicting ancestral origin and to estimate the misclassification rate by statistical means and by testing the CEPH human genome diversity cell line panel (CEPH-HGDP) comprising samples of confirmed geographic origin [21].
A forensic test handling single profiles requires a fast and flexible alternative to the widely used genetic clustering algorithm STRUCTURE [22] to offer easier classifications in real time. Therefore, the final stage of development of the test outlined here was the incorporation of the classification algorithm into an open access web portal to allow simple analysis of SNP profiles, including those with partial data. This portal was enhanced to allow analysis of a users custom populations and SNP markers with the same Bayesian classification algorithm and error estimation systems.
Section snippets
Population samples
Training sets for the classification algorithm were created for each population-group by combining two population samples comprising: sub-Saharan Africans (60 Mozambican and 60 Somali), Europeans (60 Galician from NW Spain and 60 Danish) and East Asians (60 Mainland Chinese and 60 Taiwanese). In all cases informed consent was obtained. Except for Somalis resident in Denmark samples were collected in the corresponding geographic region. The CEPH-HGDP panel comprising 1064 samples from 51
Patterns of SNP variability
The allele frequency distributions for 34 SNPs in the three population-groups studied are outlined in Fig. 2. To compare the training set and CEPH-HGDP frequencies the populations from each were combined in their group affiliations separately and arranged in paired plots. Allele frequencies for the 58 populations studied are listed in Table S2 in online supplementary data. All populations were in Hardy–Weinberg equilibrium and pair-wise analysis did not detect linkage disequilibrium within the
Discussion
An ancestral origin test limited to a single multiplex can run the risk of failing to adequately differentiate the population-groups analyzed by the test. Thirty-four SNPs is an upper limit for a primer extension assay but can be extended using systems such as Genplex dye-linked oligo-ligation with proven forensic performance [30]. Nevertheless tests using small SNP numbers must still maximize allele frequency differences between groups to have any chance of success on a broad enough scale. The
Acknowledgements
The African–American panel was supplied by Peter Vallone and John Butler at NIST and the authors are indebted to them for making these samples available. The work was supported by the European Commission GROWTH program, SNPforID project, contract G6RD-CT-2002-00844. Funding from Xunta de Galicia: (PGIDTIT06PXIB228195PR) and a grant from the Ministerio de Educación y Ciencia: (project BIO2006-06178) given to MVL supported this project. The ‘Ramón y Cajal’ Spanish programme from the Ministerio de
References (30)
- et al.
The distribution of human genetic diversity: a comparison of mitochondrial, autosomal and Y-chromosome data
J. Hum. Genet.
(2000) - et al.
Inferring ethnic origin by means of an STR profile
Forensic Sci. Int.
(2001) - et al.
Informativeness of genetic markers for inference of ancestry
Am. J. Hum. Genet.
(2003) - et al.
The making of the African mtDNA landscape
Am. J. Hum. Genet.
(2002) - et al.
Charting the ancestry of African–Americans
Am. J. Hum. Genet.
(2005) - et al.
Complex signatures of natural selection at the Duffy blood group locus
Am. J. Hum. Genet.
(2002) - et al.
The SNPforID consortium, evaluation of the Genplex SNP typing system and a 49-plex forensic marker panel
Forensic Sci. Int. Genet.
(2007) - et al.
A classifier for the SNP-based inference of ancestry
J. Forensic Sci.
(2003) - et al.
Inferring the most likely geographical origin of mtDNA sequence profiles
Ann. Hum. Genet.
(2005) - et al.
The human Y chromosome: an evolutionary marker comes of age
Nat. Rev. Genet.
(2004)
Africans in Yorkshire? The deepest-rooting clade of the Y phylogeny within an English genealogy
Eur. J. Hum. Genet.
Evidence for gradients of human genetic diversity within and among continents
Genome Res.
Geography is a better determinant of genetic differentiation than ethnicity
Hum. Genet.
Clines, clusters, and the effect of study design on the inference of human population structure
PLoS Genet.
Genetic structure of human populations
Science
Cited by (313)
Comprehensive landscape of non-CODIS STRs in global populations provides new insights into challenging DNA profiles
2024, Forensic Science International: GeneticsA proof-of-principle study: The potential application of MiniHap biomarkers in ancestry inference based on the QNome nanopore sequencing
2024, Forensic Science International: GeneticsThe LASSIE MPS panel: Predicting externally visible traits in dogs for forensic purposes
2023, Forensic Science International: GeneticsDevelopment of SNP markers for Cucurbita species discrimination
2023, Scientia HorticulturaePopulation genetics and human health in the genomic era
2023, Journal of King Saud University - Science