Forensic features and genetic legacy of the Baloch population of Pakistan and the Hazara population across Durand-line revealed by Y chromosomal STRs

Atif Adnan; Shao-Qing Wen; Allah Rakha; Rashed Alghafri; Shahid Nazir; Muhammad Rehman; Chuan-Chao Wang; Jie Lu

doi:10.1101/2020.11.21.392456

ABSTRACT

Hazara population across Durand-line has experienced extensive interaction with Central Asian and East Asian populations. Hazara individuals have typical Mongolian facial appearances and they called themselves descendants of Genghis Khan’s army. The people who speak the Balochi language are called Baloch. Previously, a worldwide analysis of Y-chromosomal haplotype diversity for rapidly mutating (RM) Y-STRs and with PowerPlex Y23 System (Promega Corporation Madison, USA) kit was created with collaborative efforts, but Baloch and Hazara population from Pakistan and Hazara population from Afghanistan were missing. A limited data with limited number of markers and samples is available which poorly define these populations. So, in the current study, Yfiler Plus PCR Amplification Kit loci were examined in 260 unrelated Hazara individuals from Afghanistan, 153 Hazara individuals, and 111 Balochi individuals from Baluchistan Pakistan. For the Hazara population from Afghanistan and Pakistan overall, 380 different haplotypes were observed on these 27 Y-STR loci, gene diversities ranged from 0.51288 (DYS389I) to 0.9257 (DYF387S1) and haplotype diversity was 0.9992 +/- 0.0004. For the Baloch population, every individual was unique at 27 Y-STR loci, gene diversity ranged from 0.5718 (DYS460) to 0.9371(DYF387S1). Twelve haplotypes shared between 178 individuals while only two haplotypes among these twelve were shared between 87 individuals in Hazara populations. Rst and Fst pairwise genetic distance analyses, multidimensional scaling (MDS) plot, Neighbor-joining (NJ) tree, linear discriminatory analysis (LDA), and median-joining network (MJNs) were performed, which shed light on the history of Hazara and Baloch populations. Interestingly null alleles were observed at DYS448 with specific mutation patterns in Hazara populations. The results of our study showed that the Yfiler Plus PCR Amplification Kit marker set provided substantially stronger discriminatory power in the Baloch population of Pakistan and the Hazara population across the Durand-line.

INTRODUCTION

The variation pattern in Human DNA usually provides a balance between natural selection and neutral processes. Y chromosomal variant analysis for determining the patterns of present and past flow of genes between populations is very helpful ¹. Y-chromosome short tandem repeats (YSTRs) plays an important role in forensic molecular biology ^2–5. Usually, Y-STRs are used for (i) decidedly determine the male component of DNA mixtures under the presence of a high female DNA background as typically confronted with materials from sexual assault cases⁶, (ii) to test for paternal relationships between male individuals particularly in deficiency paternity cases with the mother not being available⁷, or (iii) for special cases in missing-person or (iv) disaster-victim identification involving males⁸, or (v) for evolutionary purposes because male family members share same haplotype distribution which may be different from individual to individual within a population group, or (vi) different geographic regions or in different ethnic groups. Normally, more paternal lineages can be differentiated with an increased number of Y-STRs ⁹, such as the Powerplex Y Kit (Promega) containing 12 Y-STRs ¹⁰, the AmpFlSTR Y-filer PCR Amplification Kit (Life Technologies) (subsequently referred to as Y-filer) containing 17 Y-STRs ¹¹ or Powerplex Y23 Kit (Promega) containing 23 Y-STRs ¹², relative to the initially proposed 9-loci haplotype ¹³. So, Applied Biosystems have developed Yfiler Plus PCR Amplification Kit ¹⁴. The Yfiler Plus kit provides enhanced discrimination power because it includes the Yfiler loci and 10 additional STRs in which 6 are rapidly mutating (RM) Y STRs. These rapidly mutating Y STRs showed a higher mutation rate of about a few mutations every 100 generations per locus (μ > 10^-2) compared with all other commonly used Y-STRs. Molecular biological and cytogenetical studies give us an insight into the presence of many structural variants within the human Y chromosome, which might be deletions^15–17, duplications^18–20, and inversions ^19–23. Null alleles or allele droop-out are well-established factors that can occur with any PCR-based STR typing system. The reason could be the primer binding site problem or deletions within the target region ^24,25. DYS448 lied in the proximal part of the azoospermia factor c (AZFc) region, which is considered important in spermatogenesis and made up of “ampliconic” repeats which act as substrates for nonallelic homologous recombination (NAHR). NAHR could delete larger blocks of the Y chromosome which included DYS448²⁶. This null alleles or allelic drop-out phenomenon is more commonly observed in Central Asian and East Asian populations but in the Hazara population of Pakistan, its occurrence was >16% ²⁷.

Durand Line is a boundary established in the Hindu Kush around 1893 running through the tribal lands between Afghanistan and British India (modern-day Pakistan), marking their respective scopes of influence. The recognition of this line, which was named after Sir Mortimer Durand, has settled the Indo-Afghan frontier problem for the rest of the British period. Now, this is an established border between Afghanistan and Pakistan. The origin of the Hazara population is disputed. The Hazara could be of Turko-Mongol ancestry and theorized to be the descendants of an occupying army left in Afghanistan by Genghis Khan in thirteen hundred AD²⁸. The Hazara population speaks Persian with some Mongolian words. The total population of Hazaras in the world is 4.5 million. Afghanistan is considered the mainland for the Hazara population (3 million) and they are the third largest ethnic group (9%) after Tajiks (27%) and Pashtuns (42%) ²⁹, while in Pakistan, Hazara is one of the distinct but small groups comprising 0.08% of the total population (http://www.pbscensus.gov.pk). The tribes who speak the Balochi language are called Baloch³⁰. Balochi population is 3.6% of total Pakistani population (http://www.pbscensus.gov.pk). They are also found in the neighboring areas of Iran and Afghanistan. Perhaps, the origin of Baloch homeland lay on the Iranian plateau. The Baloch were mentioned in Arabic chronicles of the 10th century. The Seljuq invasion of Kerman in the 11th century started the eastward migration of the Balochi population³⁰.

In this study, we have investigated the Baloch and Hazara population from Pakistan and the Hazara population from Afghanistan using 27 Y STRs to determine their genetic history and gene diversity. This data has defined the Hazara and Baloch populations better and are supplement to the Y STR haplotype reference database (YHRD).

2. RESULTS AND DISCUSSIONS

2.1 Allelic frequencies and Forensic parameters

We successfully obtained genotypes of 524 individuals in three ethnic groups (Balochi population, Hazara population from Afghanistan, and Pakistan) (Supplementary Table 1). Allelic frequencies of Baloch ethnic group from Baluchistan, Pakistan, and Hazara ethnic groups from Pakistan and Afghanistan along with gene diversity values were shown in Supplementary Table 2.

DYF387S1 showed the highest gene diversity/heterozygosity in Baloch and both Hazara populations from Afghanistan and Pakistan with 0.9371, 0.9242, and 0.8792, respectively. Overall DYS570 (0.8624) showed the highest or DYS437 (0.2383) showed the lowest gene diversity/heterozygosity for single Y STR markers. Within three populations, single Y-STR markers DYS570 (0.8624), DYS449 (0.8468), DYS627 (0.7949) showed the highest gene diversity/heterozygosities while DYS460 (0.5718), DYS391 (0.3916), and DYS437 (0.2383) showed the lowest gene diversity/heterozygosities in the Baloch and both the Hazara populations from Afghanistan and Pakistan, respectively. After pooling Hazara populations together DYF387S1, DYS437 showed the highest or lowest gene diversity/heterozygosities with 0.9257 and 0.4053 respectively. The observed numbers of alleles were 222, 240, and 188 for Baloch and both the Hazara populations from Afghanistan and Pakistan, respectively on 27 Y STRs.

Allelic frequencies ranged from 0.0090 to 0.6036 in the Baloch population, 0.0038 to 0.6654 in the Hazara population from Afghanistan, and 0.0065 to 0.8627 in the Pakistani Hazara population.

We evaluated forensic parameters at seven levels (Table 2), the minimal 9 Y-STRs loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, and DYS385a/b), the extended 11 Y-STRs loci (MHT+DYS438 and DYS439), PowerPlex Y12 STRs loci (extended 11 Y STRs + DYS437), Y-filer 17 STRs loci (PPY12+DYS448, DYS456, DYS458, DYS635, and Y_GATA_H4), Y21STRs loci(Y-filer + DYS481, DYS533, DYS570, and DYS576), Y27 Yfiler Plus loci (21 STRs + DYF387S1, DYS449, DYS460, DYS518, and DYS627), and 6 rapidly mutating Y STRs loci (DYS570, DYS576, DYF387S1, DYS449, DYS518, and DYS627) which are summarized in Table 2. The discrimination capacity (DC) ranged from 87.38% (the minimal 9 Y-STRs loci) to 100% (Y27 Yfiler Plus loci) with random matching probability from 0.0162 (MHT) to 0.009 (Y27 Yfiler Plus loci) and haplotype diversity (HD) ranged 0.9928 (the minimal 9 Y-STRs loci) to 1.0 (Y27 Yfiler Plus loci) in the Baloch population of Pakistan. The discrimination capacity (DC) ranged from 47.06% (the minimal 9 Y-STRs loci) to 99.35% (Y27 Yfiler Plus loci) with random matching probability from 0.0745 (MHT) to 0.0066 (Y27 Yfiler Plus loci) and haplotype diversity (HD) ranged from 0.9316 (the minimal 9 Y-STRs loci) to 0.9999 (Y27 Yfiler Plus loci) in Pakistani Hazara population while DC ranged 41.15% (the minimal 9 Y-STRs loci) to 88.46% (Y27 Yfiler Plus loci) with random matching probability from 0.0329 (MHT) to 0.0057 (Y27 Yfiler Plus loci) and HD ranged from 0.9708 (the minimal 9 Y-STRs loci) to 0.9937 (Y27 Yfiler Plus loci) for Hazara population from Afghanistan. Pooling both populations together DC ranged 40.19% (the minimal 9 Y-STRs loci) to 92% (Y27 Yfiler Plus loci) with random matching probability from 0.0334 (MHT) to 0.0032 (Y27 Yfiler Plus loci) and HD ranged from 0.9689 (the minimal 9 Y-STRs loci) to 0.9992 (Y27 Yfiler Plus loci). Interestingly six rapidly mutating Y STRs which are included in Yfiler plus kit detects high haplotype diversity (Table 2). We have observed 101 (90.99%) different haplotypes out of 111, among them, 95 (85.58%) were unique in the Baloch population and we have observed 139 (90.84%) different haplotypes out of 153, among them 131 (85.62%) were unique in Pakistani Hazara population while in Afghani Hazara population observed haplotypes were 188 (72.30%) out of 260, among them 152(58.46%) were unique. These six STRs (RM Y STRs) showed the almost same diversity, shown by PPY 23 loci. The above results are showing that Yfiler plus kit loci showed strong discrimination capacity, haplotype diversity, and random mating probabilities which provide utility for forensic identification and paternity testing in three ethnic groups (Baloch and Hazara from Pakistan while Hazara from Afghanistan).

View this table:

Table 1:

Reference Populations from Central, Eastern and South Asia populations selected as reference populations used in LDA, NJ tree and multidimensional scaling (MDS) analysis.

View this table:

Table 2:

Forensic parameters on 7 different levels in three ethnic groups

2.2 Phylogenetic analyses and Population comparisons

Since the anthropological or ethno-historical relationships between studied populations and reference populations which are included for analysis were already known, so we used two different methods on the basis of their similarity with a priori expectations. Fst is a standardized variance of haplotype frequency and assumes genetic drift as being the agent that differentiates populations. Rst is a standardized variance of haplotype size and takes into account both drift and mutation as causes of population differentiation, assuming a stepwise model in which each mutation creates a new allele either by adding or deleting a single repeat unit. To assess the relationship between these three populations (Baloch, Hazara from Pakistan and Afghani Hazaras), and the other relevant populations which are summarized in Table 1, pair-wise genetic distances (Rst and Fst) and their corresponding p-values were calculated and were shown in Supplementary Table 3. These Rst and Fst values were visualized using hierarchical clustering heat-map (Supplementary Figure 1 a & b). Dendrograms give us a clear picture about the organization of the data which can be compared with NJ trees or MDS plots. The utlization of mean-linkage dendrograms to Y STR data gives us a consistent basis of comparison. Heat-map matrix based on Rst values showed that Hazara from Pakistan were clustered more closely to Central and East Asian (i.e. Kazakh and Mongols) populations while the Baloch population was clustered with other Pakistani (i.e. Pathan and Sindhi) populations and Hazara from Afghanistan were clustered with local Afghan populations. On another hand, the heat-map matrix based on Fst values showed that the Hazara population from Pakistan was tightly clustered with local (i.e. Baloch, Arain, and Pathans) populations while the Hazara population from Afghanistan was clustered with Afghanistan Pathan and Northern Talysh population. The observed pattern of inter-population diversity from Rst was in support of anthropological knowledge, while that based on Fst revealed unexpected and unconvincing population affinities. These results are consistent with our previous study results ³¹. The pairwise Rst genetic distances values between Baloch and other relevant populations ranged from −0.0402 to 0.1417. According to Rst values, the Baloch population of Pakistan showed the closest genetic distance to Turks (−0.0402) from Ardabil, Iran while Kazakh (0.1417) from Gansu, China showed the greatest genetic distance. For the Afghan Hazara population, the Afghan population (0.0009) from Afghanistan showed the closest genetic distance and for the Pakistani Hazara group, the Afghan population (0.0381) from Afghanistan showed the closest genetic distance To investigate the paternal relationship among these three and other reference populations, we have generated the MDS plot (figure 1) based on pairwise Rst matrix from supplementary table 3. In the MDS plot, we have seen that the Hazara population from Afghanistan is located closer to the Afghan population from Afghanistan and the Pathan population from northern Afghanistan which is similar to the results of another study ³², while Pakistani Hazara lined closer to Kazakh and Mongolian population which is similar to our previous study’s results^27,33.

Figure 1:

Two-dimensional plot from multi-dimensional scaling analysis of Rst-values based on Yfiler haplotypes for the Baloch population of Pakistan and Hazara populations across the Durand line with reference populations.

According to Fst values, the Afghan Hazara population is closest to the Afghan population (0.0053) followed by the Hazara population from Balochistan, Pakistan (0.0057), and Iranian population from Mashhad, Iran (0.0077). Evolutionary relationships between the Baloch and Hazara population of Pakistan, the Hazara population from Afghanistan, and other reference populations were inferred from the Neighbor-joining tree based on Fst values (Figure 2). In neighbor-joining trees, an admixed population will always lie on the path between the source populations³⁴. In total, we have observed 14 clusters for 62 populations in NJ-tree and the Baloch population placed itself in the second cluster along with West-south Asian populations. Hazara populations from Pakistan and Afghanistan came to the fourth cluster along with the Afghani and Iranian populations. The pattern of inter-population diversity based on Rst was consistent with ethnohistorical and anthropological knowledge, while that based on Fst shown surprising and unaccepted population affinities.

Figure 2:

Neighbor-joining tree based on the Fst values between the Baloch population of Pakistan and Hazara populations across the Durand line with reference populations.

2.3 Inference of ancestry based on Y STRs

The Y haplogroups were predicted using the online Y-haplogroup predictor software (http://www.nevgen.org/). C2 (previously known as C3-Star cluster) was the most frequent haplogroup in Pakistani and Afghan Hazaras.

The median-joining network of haplotypes (Figure 3) showed a bulky central star-like cluster which represents predicated haplogroup M217 and another big cluster representing haplogroup M420 and comprises many of the identical or highly similar haplotypes. These types of features are usually inferred as past male-lineage expansions³⁵. Star-like features of haplotypes comprising haplogroup M217 (C2) have been reported previously in Hazara, Mongol, and Kazakh populations^27,33,36. An explanation about its origin in Mongolia was about ~ 1,000 years ago ³⁶. The frequency of R haplogroup in the Baloch population is 36.03%, 22.22% in Pakistani Hazara, and 21.15% in Afghani Hazara. This haplogroup originated in north Asia about 27,000 years ago (http://isogg.org/tree/index.html). R is one of the most frequent haplogroups in Europe, with its branches reaching 80% of the population in some regions. One branch is believed to have originated in the Kurgan culture, known to be the first speakers of the Indo-European languages and responsible for the domestication of the horse³⁷. From somewhere in Central Asia, some descendants of the man carrying the M207 mutation on the Y chromosome headed south to arrive in India about 10,000 years ago³⁸. This is one of the frequent haplogroups in Pakistan and North India. In the Baloch population frequency of haplogroup L1 is 22.5% and 1.53% in Afghani Hazara. In sub-continental populations its frequency is about 7-15%^39,40. Genetic studies suggest that this may be one of the original haplogroups of the creators of Indus Valley Civilization^41,42. The frequency of L1 is about 28% in Pakistan and Baluchistan, from where the agricultural creators of this civilization emerged⁴³. The origins of this haplogroup can be traced to the rugged and mountainous Pamir Knot region in Tajikistan³⁸.

Figure 3:

The median-joining network of the Baloch population of Pakistan and Hazara populations across the Durand line based on 20 Y STRs.

In an earlier study³⁶, the star-cluster (C3) profile for DYS389I-DYS389b-DYS390-DYS391-DYS392-DYS393-DYS388-DYS425-DYS426-DYS434-DYS435-DYS436-DYS437-DYS438-DYS439 was 10-16-25-10-11-13-14-12-11-11-11-12-8-10-10. In present study mostly occurring haplotype for loci DYS19-DYS389I-DYS389II-DYS390-DYS391-DYS392-DYS393-DYS437-DYS438-DYS439 was 15-13-29-24-10-11-13-14-11-12 which repeated itself in 43 individuals while 14-13-29-24-8-11-13-14-11-11 repeated in 9 individuals and 15-13-29-24-11-11-13-14-11-12 repeated in 8 individuals in Pakistani Hazara population while in Afghani Hazara 16-13-29-25-10-11-13-14-10-10, 15-13-29-24-10-11-13-14-11-12, 14-12-28-23-10-11-12-15-9-11, 14-13-29-24-11-13-12-15-12-12 and 15-14-32-25-11-11-13-14-9-10 haplotypes were repeated in 30, 17, 15, 12 and 11 individuals, respectively. The occurrence of these haplotypes were previously observed in Mongols and Kazakhs³⁵. Allelic ranges of Kazak³⁵ population from Kazakhstan Central Asia were similar while Mongol population from Inner Mongolia were almost similar on above mentioned 10 Y STRs. In our earlier study³¹, results showed that Hazaras have a close genetic affinity with Turkic-speaking (Kazakh, Kyrgyz and Uyghur) and Mongolian people. Admixture and outgroup findings further clarified that Hazara have 57.8% gene pool from Mongolians.

Here we also speculated a hypothesis that is based on hearsay that Hazaras living in Pakistan are more conserved and they only mate with the Hazaras while across the Durand line the Hazaras mate with other ethnic groups in Afghanistan. Results of gene diversity/heterozygosity and F-statistics tests are also supporting this hypothesis. According to results, all loci showed more diversity in the Hazara population from Afghanistan when compared with the Hazara population from Pakistan (Figure 4). F-statistics test within Hazara populations showed variations at four loci only (DYS393-0.05002, DYS449- 0.01694, DYF387S1- 0.00662 and DYS385a/b- 0.00004) (Supplementary Table 4). These variations may be the sampling effect, population diversity, or maybe geographical boundaries. LDA is a transformation technique which is commonly used to understand genome diversity and was performed on the Hazara population, Central Asian, South Asian including the Baloch population, East Asian, and Russian population samples to explore their genetic homology. Figure 5 shows all individual samples plotted on the two LDA factors (F1 and F2). LDA Plot showed the association of the Hazara population with East and Central Asian populations.

Figure 4:

Heterozygosity scattered plot for three populations

Figure 5:

LDA Analysis between the Baloch population of Pakistan and Hazara populations across the Durand line, Central Asia, South Asia, Russia, and East Asian populations.

2.4 Physical characterization of DYS448 deletions

By using the Yfiler plus kit, we have observed the null allele at DYS448 in 29 individuals in the Hazara population from Afghanistan (Figure 6). Certain factors can cause the phenomena of null alleles and these are deletions within the target region, primer binding sites problem that destabilize hybridization of at least one of the primers flanking the target region ^44–47. This phenomenon was previously reported, in which other commercial kits were used ^48–53. The current population study represents the highest frequencies of the null allele at DYS448 when compared with the previously reported population to date (Table 3). The core repeat motif of the DYS448 locus is the hexanucleotide repeat AGAGAT⁵⁴. DYS448 has two polymorphic domains separated by an invariant 42-bp region.

Figure 6:

Electropherogram of an individual showing null type at DYS448.

View this table:

Table 3:

Frequencies of the null allele at DYS448 in various ethnic groups across continents

We have observed 29 null alleles among these, long deletions were covering at a minimum the N42 region and the core AGAGAT repeats downstream, and small deletions encompassing upstream repeats as well (all alignments were based on allele 20). Observed null alleles at locus DYS448 in 29 individuals from the Hazara population of Afghanistan, which were later confirmed with the GoldenEye Y20 System kit were successfully amplified using self-designed primers and sequenced (Supplementary Table 5) which were submitted to genbank under accession numbers MN623385 to MN623413. Overall we have observed 55 null alleles at DYS448 in the Hazara population from Pakistan and Afghanistan. Interestingly, all individuals (55) who showed deletion at DYS448 belongs to haplogroup C2 which is most frequent haplogroup in Mongol and Kazakh populations. This high frequency of allele drop-out / mutation is DYS448 in Hazara population from Pakistan and Afghanistan strongly support the evidence that they have Kazakh and Mongol origin. Whole genome or Y Chromosomal sequencing is required to get more insight of this polymorphism. The frequency of the null allele at DYS448 is more frequent in Asia more specifically in East and Central Asia when compared to the rest of the world^26,49. The commercial companies should pay special attention while designing DYS448 primers.

2.5 Concluding Remarks

Finally, our study demonstrates that the Yfiler plus kit detects high haplotype diversity in Baloch population from Pakistan and Hazara populations from across the Durand line (Pakistan and Afghanistan) of which two (Baloch and Afghani Hazara) were not previously studied at Yfiler plus STR loci, which in general makes it suitable for forensic casework in these groups. The recent inclusion of these data in the YHRD allows widespread use for forensic and other purposes.

3. MATERIALS AND METHODS

3.1 Samples

A total of 524 blood samples were collected, in which 111 Balochi individuals from Baluchistan Pakistan, 153 from Hazara Town Quetta, Baluchistan Pakistan (Participants were part of an earlier study ²⁷ and were agreed to the secondary use of their DNA samples), and 260 from Bamyan, Afghanistan. All participants who were included in this study were unrelated individuals of at least three generations. All participants gave their informed consent either orally and with thumbprint (in case they could not write) or in writing after the study aims and procedures were carefully explained to them. This collaborative study was approved by the ethical review boards of China Medical University, Shenyang, Liaoning Province, People’s Republic of China (2019/067-P), University of Health Sciences Lahore Pakistan (2017-CMU-1/14), and Ministry of Public Health, Forensic Medicine Directorate, Kabul, Afghanistan (FC-2017-02). All the experimental procedures were performed in accordance with the standards of the Declaration of Helsinki.

3.2. DNA extraction

Axygen AxyPrep Blood Genomic DNA Miniprep Kit was used to extract genomic DNA according to the manufacturer’s protocol (Axygen Biosciences; CA, USA).

3.3 PCR Amplification

DNA was amplified using Yfiler Plus PCR Amplification Kit (Thermo Fisher Scientific) PCR amplification was carried out using the Applied Biosystems GeneAmp PCR System 9700 thermal cyclers. PCR amplifications were performed as recommended by the manufacturer, although using half of the recommended reaction volume (12.5 μl).

3.4. 27Y-STRs genotyping

After successful PCR amplification, The PCR products were analyzed by using an 8 capillary ABI 3500 DNA Genetic Analyzer with POP-4 polymer (Life Technologies) according to the manufacturer’s protocol. GeneMapper Software version 4.0 (Life Technologies) was used for the genotype assignment. DNA typing was performed according to the manufacturer’s protocol by using the locus panel and allele bins supplied by the manufacturer and allele designations corresponding with the allelic ladder supplied by the manufacturer. Genotype nomenclature was based on the recommendations of the International Society for Forensic Genetics ⁵⁵.

3.5. Confirmation of Null DYS 448

For the confirmation of samples that showed no allele call at DYS448, they were re-amplified by using the Goldeneye 20Y amplification kit (Goldeneye Technology Ltd.). After confirmed with two different kits (Yfiler Plus and GoldenEye 20Y), these samples were amplified and sequenced as described elsewhere ²⁷.

3.6. Quality control

Our laboratory has participated and passed the YHRD quality assurance exercise 2015. Haplotype data were already made accessible via the Y-chromosome Haplotype Reference Database (YHRD) under accession number YA004595 (Balochi) in 61st release on dated 2019 June 24, YA004312-2 (Hazara Pakistan) and YA004503 (Hazara Afghanistan) in 59th release on dated 2018 November 01. 29 sequenced samples at null allele call at DYS448 were also submitted to genbank under accession numbers MN623385 to MN623413 on dated 2019 october 28.

3.7. Statistical analysis

Allelic and haplotype frequencies were computed by direct counting method and haplotype diversity (HD) was calculated according to: where n is the male population size and p_i is the frequency of ith haplotype. Discrimination capacity (DC) was calculated as the ratio of unique haplotypes in the samples. Match probabilities (MP) were calculated as Σ Pi², where Pi is the frequency of the i-th haplotype. Genetic distances were evaluated using the Rst⁵⁶ and Fst^57–59 statistic, between reference populations and currently studied populations on overlapping STRs (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, and Y_GATA_H4) were calculated by using Arlequin Software v3.5⁶⁰. We calculated both Rst and Fst values because in the generalized stepwise mutation model, Rst offers relatively unbiased evaluations of migration rates and times of population divergence while on other hand Fst tends to show too much population similarity, predominantly when migration rates are low or divergence times are long⁵⁶. Reduced dimensionality spatial representation of the populations was performed based on Rst values using multi-dimensional scaling (MDS) with IBM SPSS Statistics for Windows, Version 23.0 (IBM Corp., Armonk, NY, USA). Heatmaps were generated using Rst and Fst values were generated using R program V3.4.1 platform with the help of a ggplot2 module.

Phylogenetic analysis

A neighbor-joining phylogenetic tree was constructed for the Hazara and the reference populations based on a distance matrix of Fst using the Mega7 software⁶¹. We also predicted Y-SNP haplogroups in the samples from Y STR haplotypes (Yfiler STRs) using the Y-DNA Haplogroup Predictor NEVGEN (http://www.nevgen.org). We have used FTDNA order for 17 Y STRs (Yfiler loci). The microvariant alleles were truncated to the next lowest integer value since values in the database were treated similarly. Any haplotypes which have null alleles or duplication variants in the Baloch or Hazara population from Pakistan or Afghanistan were excluded from the analysis. The results of NEVGEN were cross checked with Athey’s Haplogroup Predictor (http://www.hprg.com/hapest5/index.html).

Linear discriminant analysis

R program V3.4.1 platform with the help of a ggplot2 module was used to perform linear discriminant analysis (LDA) for Hazara (Pakistan), Hazara (Afghanistan), Central Asia, East Asia, the Middle East, and Southwest Asian (Baloch) samples ⁶² on overlapping (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, and Y_GATA_H4) STRs. The multi-copy marker like (DYS385ab) and haplotypes that have null alleles or duplication variants in the Baloch or Hazara population or any of the reference populations were excluded from the analysis. For DYS389I and DYS389II, we have subtracted DYS389I from DYS389II and used DYS389II-I for analysis.

The median-joining network

To define the genetic relationships among Balochi and Hazara individuals for 20 Y STRs (DYS19, DYS389II-I, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS448, DYS456, DYS458, DYS635, Y_GATA_H4, DYS549, DYS460, DYS481, DYS533, DYS570, DYS576, DYS627), we used the stepwise mutation model and Median Joining-Maximum Parsimony algorithm ⁶³ by using the program Network 5 as described at the Fluxus Engineering website (http://www.fluxus-engineering.com), and the weighting criteria for Y-STRs following ²⁷ Any haplotypes which have null alleles or duplication variants in the Baloch or Hazara population from Pakistan or Afghanistan were excluded from the analysis.

COMPETING FINANCIAL INTERESTS

None.

AUTHOR CONTRIBUTION

J.L. and A.A. designed this study. A.A., A.R., and S.N. and M.R., collected the samples. A.A. experimented and wrote the manuscript. A.A., J.L., A.R., S.N., R.A., S.W., and C.W., analyzed the results. A.A., and J.L., modified the manuscript. All authors reviewed the manuscript.

COMPLIANCE WITH ETHICAL STANDARDS

The study was approved (2019/067-P) by the ethical review board of China Medical University, Shenyang, Liaoning Province, People’s Republic of China, and in accordance with the standards of the Declaration of Helsinki. All participants who were included in this study were unrelated individuals of at least three generations. All participants gave their informed consent either orally and with thumbprint (in case they could not write) or in writing after the study aims and procedures were carefully explained to them.

Electronic Supplementary Materials (ESM)

Supplementary Figure 1: Heatmap generated using Rst and Fst values.

Supplementary Table 1: Raw genotypic data of 3 ethnic groups typed with Yfiler plus

Supplementary Table 2: Allele Frequencies and Forensic Parameters 3 ethnic groups

Supplementary Table 3: Pairwise Rst and Fst values between 3 ethnic groups and other reference populations

Supplementary Table 4: F-statistics analysis between Hazara population from Pakistan and Afghanistan

Supplementary Table 5: Sequence in the relevant flanking and repeat region of the DYS448 locus for null alleles.

ACKNOWLEDGMENTS

We thank all volunteers who provided material and data for this project, especially Guanglin He and Abul-Hasan Fawad. This study was financially supported by the China Medical University postdoctoral research grant (100/1210619014).

References

1.↵
Oppenheimer, S. Out-of-Africa, the peopling of continents and islands: tracing uniparental gene trees across the map. Philosophical Transactions of the Royal Society B: Biological Sciences 367, 770–784 (2012).
OpenUrl CrossRef PubMed
2.↵
Adnan, A., Ralf, A., Rakha, A., Kousouri, N. & Kayser, M. Improving empirical evidence on differentiating closely related men with RM Y-STRs: A comprehensive pedigree study from Pakistan. Forensic Sci Int Genet 25, 45–51 (2016).
OpenUrl
3.
Adnan, A. et al. Population data of 17 Y-STRs (Yfiler) from Punjabis and Kashmiris of Pakistan. International Journal of Legal Medicine (2017) doi:10.1007/s00414-017-1611-9.
OpenUrl CrossRef
4.
Adnan, A., Rakha, A., Lao, O. & Kayser, M. Mutation analysis at 17 Y-STR loci (Yfiler) in father-son pairs of male pedigrees from Pakistan. Forensic Science International: Genetics (2018) doi:10.1016/j.fsigen.2018.07.001.
OpenUrl CrossRef
5.↵
Ballantyne, K. N. et al. Toward male individualization with rapidly mutating y-chromosomal short tandem repeats. Hum. Mutat. 35, 1021–1032 (2014).
OpenUrl
6.↵
Prinz, M., Boll, K., Baum, H. & Shaler, B. Multiplexing of Y chromosome specific STRs and performance for mixed samples. Forensic Science International 85, 209–218 (1997).
OpenUrl PubMed
7.↵
Ballantyne, K. N. & Kayser, M. Additional Y-STRs in Forensics: Why, Which, and When. Forensic Sci Rev 24, 63–78 (2012).
OpenUrl
8.↵
Calacal, G. C. et al. Identification of Exhumed Remains of Fire Tragedy Victims Using Conventional Methods and Autosomal/Y-Chromosomal Short Tandem Repeat DNA Profiling: The American Journal of Forensic Medicine and Pathology 26, 285–291 (2005).
OpenUrl
9.↵
Vermeulen, M. et al. Improving global and regional resolution of male lineage differentiation by simple single-copy Y-chromosomal short tandem repeat polymorphisms. Forensic Science International: Genetics 3, 205–213 (2009).
OpenUrl
10.↵
Krenke, B. E. et al. “Validation of a male-specific, 12-locus fluorescent short tandem repeat (STR) multiplex” [Forensic Sci. Int. 148 (1) (2005) 1–14], Forensic Science International 151, 111–124 (2005).
OpenUrl CrossRef PubMed
11.↵
Mulero, J. J. et al. Development and validation of the AmpFISTR Yfiler PCR amplification kit: a male specific, single amplification 17 Y-STR multiplex system. J. Forensic Sci. 51, 64–75 (2006).
OpenUrl
12.↵
Thompson, J. M. et al. Developmental validation of the PowerPlex^® Y23 System: A single multiplex Y-STR analysis system for casework and database samples. Forensic Science International: Genetics 7, 240–250 (2013).
OpenUrl
13.↵
Kayser, M. et al. Evaluation of Y-chromosomal STRs: a multicenter study. International Journal of Legal Medicine 110, 125–133 (1997).
OpenUrl CrossRef PubMed Web of Science
14.↵
Gopinath, S. et al. Developmental validation of the Yfiler^® Plus PCR Amplification Kit: An enhanced Y-STR multiplex for casework and database applications. Forensic Science International: Genetics 24, 164–175 (2016).
OpenUrl
15.↵
Jobling, M. A. et al. Recurrent duplication and deletion polymorphisms on the long arm of the Y chromosome in normal males. Hum. Mol. Genet. 5, 1767–1775 (1996).
OpenUrl CrossRef PubMed Web of Science
16.
Jobling, M. A. et al. Structural variation on the short arm of the human Y chromosome: recurrent multigene deletions encompassing Amelogenin Y. Human Molecular Genetics 16, 307–316 (2007).
OpenUrl CrossRef PubMed Web of Science
17.↵
Repping, S. et al. Polymorphism for a 1.6-Mb deletion of the human Y chromosome persists through balance between recurrent mutation and haploid selection. Nature Genetics 35, 247–251 (2003).
OpenUrl CrossRef PubMed Web of Science
18.↵
Bosch, E. & Jobling, M. A. Duplications of the AZFa region of the human Y chromosome are mediated by homologous recombination between HERVs and are compatible with male fertility. Hum. Mol. Genet. 12, 341–347 (2003).
OpenUrl CrossRef PubMed Web of Science
19.↵
Repping, S. et al. High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nature Genetics 38, 463–467 (2006).
OpenUrl CrossRef PubMed Web of Science
20.↵
Verma, R. S., Rodriguez, J. & Dosik, H. The clinical significance of pericentric inversion of the human Y chromosome: a rare ‘third’ type of heteromorphism. J. Hered. 73, 236–238 (1982).
OpenUrl PubMed Web of Science
21.
Affara, N. A. et al. Variable transfer of Y-specific sequences in XX males. Nucleic Acids Res. 14, 5375–5387 (1986).
OpenUrl CrossRef PubMed Web of Science
22.
Bernstein, R., Wadee, A., Rosendorff, J., Wessels, A. & Jenkins, T. Inverted Y chromosome polymorphism in the Gujerati Muslim Indian population of South Africa. Hum. Genet. 74, 223–229 (1986).
OpenUrl PubMed Web of Science
23.↵
Page, D. C. Sex reversal: deletion mapping the male-determining function of the human Y chromosome. Cold Spring Harb. Symp. Quant. Biol. 51 Pt 1, 229–235 (1986).
OpenUrl Abstract/FREE Full Text
24.↵
Budowle, B. et al. Null allele sequence structure at the DYS448 locus and implications for profile interpretation. Int. J. Legal Med. 122, 421–427 (2008).
OpenUrl PubMed
25.↵
Westen, A. A. et al. Analysis of 36 Y-STR marker units including a concordance study among 2085 Dutch males. Forensic Science International: Genetics 14, 174–181 (2015).
OpenUrl
26.↵
Balaresque, P. et al. Dynamic nature of the proximal AZFc region of the human Y chromosome: multiple independent deletion and duplication events revealed by microsatellite analysis. Human Mutation 29, 1171–1180 (2008).
OpenUrl CrossRef PubMed Web of Science
27.↵
Adnan, A. et al. Genetic characterization of Y-chromosomal STRs in Hazara ethnic group of Pakistan and confirmation of DYS448 null allele. International Journal of Legal Medicine (2018) doi:10.1007/s00414-018-1962-x.
OpenUrl CrossRef
28.↵
Siddique, A. Afghanistan’s Ethnic Divides. www.cidobafpakproject.com (2012).
29.↵
Afghanistan: a country study. (Claitor’s Pub. Division, c2001).
30.↵
Dashti, N. The Baloch and Balochistan: a historical account from the beginning to the fall of the Baloch state. (Trafford, 2012).
31.↵
He, G. et al. A comprehensive exploration of the genetic legacy and forensic features of Afghanistan and Pakistan Mongolian-descent Hazara. Forensic Science International: Genetics 42, el–el2 (2019).
OpenUrl
32.↵
Haber, M. et al. Afghanistan’s ethnic groups share a Y-chromosomal heritage structured by historical events. PLoS ONE 7, e34288 (2012).
OpenUrl CrossRef PubMed
33.↵
Adnan, A. et al. Phylogenetic relationship and genetic history of Central Asian Kazakhs inferred from Y-chromosome and autosomal variations. Mol Genet Genomics (2019) doi:10.1007/s00438-019-01617-0.
OpenUrl CrossRef
34.↵
Kopelman, N. M., Stone, L., Gascuel, O. & Rosenberg, N. A. The behavior of admixed populations in neighbor-joining inference of population trees. Pac Symp Biocomput 273–284 (2013).
35.↵
Tarlykov, P. V. et al. Mitochondrial and Y-chromosomal profile of the Kazakh population from East Kazakhstan. Croat. Med. J. 54, 17–24 (2013).
OpenUrl
36.↵
Zerjal, T. et al. The Genetic Legacy of the Mongols. The American Journal of Human Genetics 72, 717–721 (2003).
OpenUrl CrossRef PubMed Web of Science
37.↵
Smolenyak, M. & Turner, A. Trace your roots with DNA: using genetic tests to explore your family tree. (Rodale□; Distributed to the trade by Holtzbrinck Publishers, 2004).
38.↵
Wells, S. Deep ancestry: inside the genographic project. (National Geographic, 2007).
39.↵
Basu, A., Sarkar-Roy, N. & Majumder, P. P. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc Natl Acad Sci USA 113, 1594–1599 (2016).
OpenUrl Abstract/FREE Full Text
40.↵
Di Cristofaro, J. et al. Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge. PLoS ONE 8, e76748 (2013).
OpenUrl CrossRef PubMed
41.↵
Mcelreavey, K. & Quintana-Murci, L. A population genetics perspective of the Indus Valley through uniparentally-inherited markers. Annals of Human Biology 32, 154–162 (2005).
OpenUrl PubMed
42.↵
Sengupta, S. et al. Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions and Reveal Minor Genetic Influence of Central Asian Pastoralists. The American Journal of Human Genetics 78, 202–221 (2006).
OpenUrl CrossRef PubMed Web of Science
43.↵
Qamar, R. et al. Y-Chromosomal DNA Variation in Pakistan. The American Journal of Human Genetics 70, 1107–1124 (2002).
OpenUrl CrossRef PubMed Web of Science
44.↵
Chang, C.-W., Mulero, J. J., Budowle, B., Calandro, L. M. & Hennessy, L. K. Identification of a Novel Polymorphism in the X-Chromosome Region Homologous to the DYS456 Locus. Journal of Forensic Sciences 51, 344–348 (2006).
OpenUrl PubMed
45.
Collins, F. S., Brooks, L. D. & Chakravarti, A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8, 1229–1231 (1998).
OpenUrl FREE Full Text
46.
Fredman, D. HGVbase: a curated resource describing human DNA variation and phenotype relationships. Nucleic Acids Research 32, 516D–519 (2004).
OpenUrl CrossRef PubMed Web of Science
47.↵
NCBI Resource Coordinators et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 46, D8–D13 (2018).
OpenUrl CrossRef PubMed
48.↵
Chang, Y. M., Perumal, R., Keat, P. Y. & Kuehn, D. L. C. Haplotype diversity of 16 Y-chromosomal STRs in three main ethnic populations (Malays, Chinese and Indians) in Malaysia. Forensic Sci. Int. 167, 70–76 (2007).
OpenUrl CrossRef PubMed Web of Science
49.↵
Park, M. J. et al. Characterization of Deletions in the DYS385 Flanking Region and Null Alleles Associated with AZFc Microdeletions in Koreans. Journal of Forensic Sciences 53, 331–334 (2008).
OpenUrl PubMed
50.
Parkin, E. J. et al. Diversity of 26-locus Y-STR haplotypes in a Nepalese population sample: Isolation and drift in the Himalayas. Forensic Science International 166, 176–181 (2007).
OpenUrl PubMed
51.
Mizuno, N. et al. 16 Y chromosomal STR haplotypes in Japanese. Forensic Science International 174, 71–76 (2008).
OpenUrl PubMed
52.
Roewer, L. et al. Y-chromosomal STR haplotypes in Kalmyk population samples. Forensic Science International 173, 204–209 (2007).
OpenUrl PubMed
53.↵
Sánchez, C. et al. Haplotype frequencies of 16 Y-chromosome STR loci in the Barcelona metropolitan area population using Y-Filer™ kit. Forensic Science International 172, 211–217 (2007).
OpenUrl PubMed
54.↵
Redd, A. J. et al. Forensic value of 14 novel STRs on the human Y chromosome. Forensic Sci. Int. 130, 97–111 (2002).
OpenUrl CrossRef PubMed Web of Science
55.↵
Roewer, L. et al. DNA commission of the International Society of Forensic Genetics (ISFG): Recommendations on the interpretation of Y-STR results in forensic analysis. Forensic Science International: Genetics 48, 102308 (2020).
OpenUrl
56.↵
Slatkin, M. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139, 457–462 (1995).
OpenUrl Abstract/FREE Full Text
57.↵
Weir, B. S. & Cockerham, C. C. Estimating F-Statistics for the Analysis of Population Structure. Evolution 38, 1358 (1984).
OpenUrl CrossRef PubMed Web of Science
58.
Michalakis, Y. & Excoffier, L. A generic estimation of population subdivision using distances between alleles with special reference for microsatellite loci. Genetics 142, 1061–1064 (1996).
OpenUrl Abstract/FREE Full Text
59.↵
Reynolds, J., Weir, B. S. & Cockerham, C. C. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 105, 767–779 (1983).
OpenUrl Abstract/FREE Full Text
60.↵
Excoffier, L. & Lischer, H. E. L. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources 10, 564–567 (2010).
OpenUrl
61.↵
Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Molecular Biology and Evolution 33, 1870–1874 (2016).
OpenUrl CrossRef PubMed
62.↵
R Core Team. R: A language and environment for statistical computing. (R Foundation for Statistical Computing, 2015).
63.↵
Bandelt, H. J., Forster, P. & Rohl, A. Median-joining networks for inferring intraspecific phylogenies. Molecular Biology and Evolution 16, 37–48 (1999).
OpenUrl CrossRef PubMed Web of Science
64.
Igen’s, C. & Tillmar, A. O. Population genetics of 29 autosomal STRs and 17 Y-chromosomal STRs in a population sample from Afghanistan. International Journal of Legal Medicine 128, 279–280 (2014).
OpenUrl
65.
Achakzai, N. M. et al. Y-chromosomal STR analysis in the Pashtun population of Southern Afghanistan. Forensic Sci Int Genet 6, el03–105 (2012).
OpenUrl
66.
Lacau, H. et al. Y-STR profiling in two Afghanistan populations. Leg Med (Tokyo) 13, 103–108 (2011).
OpenUrl
67.
Nothnagel, M. et al. Revisiting the male genetic landscape of China: a multi-center study of almost 38,000 Y-STR haplotypes. Human Genetics 136, 485–497 (2017).
OpenUrl
68.
Kwak, K. D. et al. Y-chromosomal STR haplotypes and their applications to forensic and population studies in east Asia. International Journal of Legal Medicine 119, 195–201 (2005).
OpenUrl CrossRef PubMed Web of Science
69.
Zhang, D. et al. RETRACTED ARTICLE: Y Chromosomal STR haplotypes in Chinese Uyghur, Kazakh and Hui ethnic groups and genetic features of DYS448 null allele and DYS19 duplicated allele. Int J Legal Med (2019) doi:10.1007/s00414-019-02049-6.
OpenUrl CrossRef
70.
Shan, W. et al. Genetic polymorphism of 17 Y chromosomal STRs in Kazakh and Uighur populations from Xinjiang, China. Int J Legal Med 128, 743–744 (2014).
OpenUrl
71.
Wang, C. et al. Genetic polymorphisms of 27 Yfiler^® Plus loci in the Daur and Mongolian ethnic minorities from Hulunbuir of Inner Mongolia Autonomous Region, China. Forensic Science International: Genetics 40, e252–e255 (2019).
OpenUrl
72.
Fu, X. et al. Genetic polymorphisms of 26 Y-STR loci in the Mongolian minority from Horqin district, China. International Journal of Legal Medicine 130, 941–946 (2016).
OpenUrl
73.
Ou, X. et al. Haplotype analysis of the polymorphic 40 Y-STR markers in Chinese populations. Forensic Science International: Genetics 19, 255–262 (2015).
OpenUrl
74.
Zhu, B. et al. Y-STRs haplotypes of Chinese Mongol ethnic group using Y-PLEX™ 12. Forensic Science International 153, 260–263 (2005).
OpenUrl PubMed
75.
Bian, Y. et al. Analysis of genetic admixture in Uyghur using the 26 Y-STR loci system. Scientific Reports 6, (2016).
76.
Roewer, L., Willuweit, S., Stoneking, M. & Nasidze, I. A Y-STR database of Iranian and Azerbaijanian minority populations. Forensic Sci Int Genet 4, e53–55 (2009).
OpenUrl CrossRef PubMed
77.
Sayyari, M., Salehzadeh, A., Tabatabaiefar, M. A. & Abbasi, A. Profiling of 17 Y-STR loci in Mazandaran and Gilan provinces of Iran. Turk J Med Sci 49, 1277–1286 (2019).
OpenUrl
78.
Sayyari, M., Salehzadeh, A., Tabatabaiefar, M. A. & Abbasi, A. Genetic polymorphisms of Y-chromosome short tandem repeats (Y-STRs) in a male population from Golestan province, Iran. Mol Biol Res Commun (2020) doi:10.22099/mbrc.2020.35547.1462.
OpenUrl CrossRef
79.
Nasidze, I., Schädlich, H. & Stoneking, M. Haplotypes from the Caucasus, Turkey and Iran for nine Y-STR loci. Forensic Science International 137, 85–93 (2003).
OpenUrl PubMed
80.
Javed, F. et al. Male individualization using 12 rapidly mutating Y-STRs in Araein ethnic group and shared paternal lineage of Pakistani population. Int. J. Legal Med. 132, 1621–1624 (2018).
OpenUrl
81.
Ullah, I. et al. High Y-chromosomal Differentiation Among Ethnic Groups of Dir and Swat Districts, Pakistan. Ann. Hum. Genet. 81, 234–248 (2017).
OpenUrl
82.
Adnan, A. et al. Population data of 17 Y-STRs (Yfiler) from Punjabis and Kashmiris of Pakistan. International Journal of Legal Medicine (2017) doi:10.1007/s00414-017-1611-9.
OpenUrl CrossRef
83.
Lee, E. Y. et al. Analysis of 22 Y chromosomal STR haplotypes and Y haplogroup distribution in Pathans of Pakistan. Forensic Science International: Genetics 11, 111–116 (2014).
OpenUrl
84.
Adnan, A. et al. Genetic structure and forensic characteristics of Saraiki population from Southern Punjab, Pakistan, revealed by 20 Y-chromosomal STRs. Int J Legal Med 134, 977–979 (2020).
OpenUrl
85.
Perveen, R., Shahid, A. A., Shafique, M., Shahzad, M. & Husnain, T. Genetic variations of 15 autosomal and 17 Y-STR markers in Sindhi population of Pakistan. International Journal of Legal Medicine 131, 1239–1240 (2017).
OpenUrl
86.
Zhabagin, M. et al. Development of the Kazakhstan Y-chromosome haplotype reference database: analysis of 27 Y-STR in Kazakh population. Int J Legal Med 133, 1029–1032 (2019).
OpenUrl
87.
Aliferi, A. et al. UK and Irish Y-STR population data—A catalogue of variant alleles. Forensic Science International: Genetics 34, e1–e6 (2018).
OpenUrl
88.
Ali, N., Coulson-Thomas, Y. M., Norton, A. L., Dixon, R. A. & Williams, D. R. Announcement of population data: genetic data for 17 Y-STR AmpFℓSTR^® Yfiler^™ markers from an immigrant Pakistani population in the UK (British Pakistanis). Forensic Sci Int Genet 7, e40–42 (2013).
OpenUrl
89.
Mizuno, N. et al. 16 Y chromosomal STR haplotypes in Japanese. Forensic Science International 174, 71–76 (2008).
OpenUrl PubMed
90.
Gutiérrez-Alarcón, A. B., Moguel-Torres, M., León-Jiménez, A. K., Cuéllar-Nevárez, G. E. & Rangel-Villalobos, H. Allele and haplotype distribution for 16 Y-STRs (AmpFISTR^® Y-filer^™ kit) in the state of Chihuahua at North Center of Mexico. Legal Medicine 9,154–157 (2007).
OpenUrl

View the discussion thread.

Posted November 22, 2020.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Evolutionary Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5210)
Biochemistry (11740)
Bioengineering (8750)
Bioinformatics (29189)
Biophysics (14967)
Cancer Biology (12093)
Cell Biology (17410)
Clinical Trials (138)
Developmental Biology (9420)
Ecology (14178)
Epidemiology (2067)
Evolutionary Biology (18301)
Genetics (12239)
Genomics (16797)
Immunology (11865)
Microbiology (28070)
Molecular Biology (11583)
Neuroscience (60953)
Paleontology (451)
Pathology (1870)
Pharmacology and Toxicology (3238)
Physiology (4957)
Plant Biology (10425)
Scientific Communication and Education (1683)
Synthetic Biology (2884)
Systems Biology (7338)
Zoology (1651)

[1] 1.↵
Oppenheimer, S. Out-of-Africa, the peopling of continents and islands: tracing uniparental gene trees across the map. Philosophical Transactions of the Royal Society B: Biological Sciences 367, 770–784 (2012).
OpenUrl CrossRef PubMed

[2] 2.↵
Adnan, A., Ralf, A., Rakha, A., Kousouri, N. & Kayser, M. Improving empirical evidence on differentiating closely related men with RM Y-STRs: A comprehensive pedigree study from Pakistan. Forensic Sci Int Genet 25, 45–51 (2016).
OpenUrl

[3] 3.
Adnan, A. et al. Population data of 17 Y-STRs (Yfiler) from Punjabis and Kashmiris of Pakistan. International Journal of Legal Medicine (2017) doi:10.1007/s00414-017-1611-9.
OpenUrl CrossRef

[4] 4.
Adnan, A., Rakha, A., Lao, O. & Kayser, M. Mutation analysis at 17 Y-STR loci (Yfiler) in father-son pairs of male pedigrees from Pakistan. Forensic Science International: Genetics (2018) doi:10.1016/j.fsigen.2018.07.001.
OpenUrl CrossRef

[5] 5.↵
Ballantyne, K. N. et al. Toward male individualization with rapidly mutating y-chromosomal short tandem repeats. Hum. Mutat. 35, 1021–1032 (2014).
OpenUrl

[6] 6.↵
Prinz, M., Boll, K., Baum, H. & Shaler, B. Multiplexing of Y chromosome specific STRs and performance for mixed samples. Forensic Science International 85, 209–218 (1997).
OpenUrl PubMed

[7] 7.↵
Ballantyne, K. N. & Kayser, M. Additional Y-STRs in Forensics: Why, Which, and When. Forensic Sci Rev 24, 63–78 (2012).
OpenUrl

[8] 8.↵
Calacal, G. C. et al. Identification of Exhumed Remains of Fire Tragedy Victims Using Conventional Methods and Autosomal/Y-Chromosomal Short Tandem Repeat DNA Profiling: The American Journal of Forensic Medicine and Pathology 26, 285–291 (2005).
OpenUrl

[9] 9.↵
Vermeulen, M. et al. Improving global and regional resolution of male lineage differentiation by simple single-copy Y-chromosomal short tandem repeat polymorphisms. Forensic Science International: Genetics 3, 205–213 (2009).
OpenUrl

[10] 10.↵
Krenke, B. E. et al. “Validation of a male-specific, 12-locus fluorescent short tandem repeat (STR) multiplex” [Forensic Sci. Int. 148 (1) (2005) 1–14], Forensic Science International 151, 111–124 (2005).
OpenUrl CrossRef PubMed

[11] 11.↵
Mulero, J. J. et al. Development and validation of the AmpFISTR Yfiler PCR amplification kit: a male specific, single amplification 17 Y-STR multiplex system. J. Forensic Sci. 51, 64–75 (2006).
OpenUrl

[12] 12.↵
Thompson, J. M. et al. Developmental validation of the PowerPlex^® Y23 System: A single multiplex Y-STR analysis system for casework and database samples. Forensic Science International: Genetics 7, 240–250 (2013).
OpenUrl

[13] 13.↵
Kayser, M. et al. Evaluation of Y-chromosomal STRs: a multicenter study. International Journal of Legal Medicine 110, 125–133 (1997).
OpenUrl CrossRef PubMed Web of Science

[14] 14.↵
Gopinath, S. et al. Developmental validation of the Yfiler^® Plus PCR Amplification Kit: An enhanced Y-STR multiplex for casework and database applications. Forensic Science International: Genetics 24, 164–175 (2016).
OpenUrl

[15] 15.↵
Jobling, M. A. et al. Recurrent duplication and deletion polymorphisms on the long arm of the Y chromosome in normal males. Hum. Mol. Genet. 5, 1767–1775 (1996).
OpenUrl CrossRef PubMed Web of Science

[16] 16.
Jobling, M. A. et al. Structural variation on the short arm of the human Y chromosome: recurrent multigene deletions encompassing Amelogenin Y. Human Molecular Genetics 16, 307–316 (2007).
OpenUrl CrossRef PubMed Web of Science

[17] 17.↵
Repping, S. et al. Polymorphism for a 1.6-Mb deletion of the human Y chromosome persists through balance between recurrent mutation and haploid selection. Nature Genetics 35, 247–251 (2003).
OpenUrl CrossRef PubMed Web of Science

[18] 18.↵
Bosch, E. & Jobling, M. A. Duplications of the AZFa region of the human Y chromosome are mediated by homologous recombination between HERVs and are compatible with male fertility. Hum. Mol. Genet. 12, 341–347 (2003).
OpenUrl CrossRef PubMed Web of Science

[19] 19.↵
Repping, S. et al. High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nature Genetics 38, 463–467 (2006).
OpenUrl CrossRef PubMed Web of Science

[20] 20.↵
Verma, R. S., Rodriguez, J. & Dosik, H. The clinical significance of pericentric inversion of the human Y chromosome: a rare ‘third’ type of heteromorphism. J. Hered. 73, 236–238 (1982).
OpenUrl PubMed Web of Science

[21] 21.
Affara, N. A. et al. Variable transfer of Y-specific sequences in XX males. Nucleic Acids Res. 14, 5375–5387 (1986).
OpenUrl CrossRef PubMed Web of Science

[22] 22.
Bernstein, R., Wadee, A., Rosendorff, J., Wessels, A. & Jenkins, T. Inverted Y chromosome polymorphism in the Gujerati Muslim Indian population of South Africa. Hum. Genet. 74, 223–229 (1986).
OpenUrl PubMed Web of Science

[23] 23.↵
Page, D. C. Sex reversal: deletion mapping the male-determining function of the human Y chromosome. Cold Spring Harb. Symp. Quant. Biol. 51 Pt 1, 229–235 (1986).
OpenUrl Abstract/FREE Full Text

[24] 24.↵
Budowle, B. et al. Null allele sequence structure at the DYS448 locus and implications for profile interpretation. Int. J. Legal Med. 122, 421–427 (2008).
OpenUrl PubMed

[25] 25.↵
Westen, A. A. et al. Analysis of 36 Y-STR marker units including a concordance study among 2085 Dutch males. Forensic Science International: Genetics 14, 174–181 (2015).
OpenUrl

[26] 26.↵
Balaresque, P. et al. Dynamic nature of the proximal AZFc region of the human Y chromosome: multiple independent deletion and duplication events revealed by microsatellite analysis. Human Mutation 29, 1171–1180 (2008).
OpenUrl CrossRef PubMed Web of Science

[27] 27.↵
Adnan, A. et al. Genetic characterization of Y-chromosomal STRs in Hazara ethnic group of Pakistan and confirmation of DYS448 null allele. International Journal of Legal Medicine (2018) doi:10.1007/s00414-018-1962-x.
OpenUrl CrossRef

[28] 28.↵
Siddique, A. Afghanistan’s Ethnic Divides. www.cidobafpakproject.com (2012).

[29] 29.↵
Afghanistan: a country study. (Claitor’s Pub. Division, c2001).

[30] 30.↵
Dashti, N. The Baloch and Balochistan: a historical account from the beginning to the fall of the Baloch state. (Trafford, 2012).

[31] 31.↵
He, G. et al. A comprehensive exploration of the genetic legacy and forensic features of Afghanistan and Pakistan Mongolian-descent Hazara. Forensic Science International: Genetics 42, el–el2 (2019).
OpenUrl

[32] 32.↵
Haber, M. et al. Afghanistan’s ethnic groups share a Y-chromosomal heritage structured by historical events. PLoS ONE 7, e34288 (2012).
OpenUrl CrossRef PubMed

[33] 33.↵
Adnan, A. et al. Phylogenetic relationship and genetic history of Central Asian Kazakhs inferred from Y-chromosome and autosomal variations. Mol Genet Genomics (2019) doi:10.1007/s00438-019-01617-0.
OpenUrl CrossRef

[34] 34.↵
Kopelman, N. M., Stone, L., Gascuel, O. & Rosenberg, N. A. The behavior of admixed populations in neighbor-joining inference of population trees. Pac Symp Biocomput 273–284 (2013).

[35] 35.↵
Tarlykov, P. V. et al. Mitochondrial and Y-chromosomal profile of the Kazakh population from East Kazakhstan. Croat. Med. J. 54, 17–24 (2013).
OpenUrl

[36] 36.↵
Zerjal, T. et al. The Genetic Legacy of the Mongols. The American Journal of Human Genetics 72, 717–721 (2003).
OpenUrl CrossRef PubMed Web of Science

[37] 37.↵
Smolenyak, M. & Turner, A. Trace your roots with DNA: using genetic tests to explore your family tree. (Rodale□; Distributed to the trade by Holtzbrinck Publishers, 2004).

[38] 38.↵
Wells, S. Deep ancestry: inside the genographic project. (National Geographic, 2007).

[39] 39.↵
Basu, A., Sarkar-Roy, N. & Majumder, P. P. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc Natl Acad Sci USA 113, 1594–1599 (2016).
OpenUrl Abstract/FREE Full Text

[40] 40.↵
Di Cristofaro, J. et al. Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge. PLoS ONE 8, e76748 (2013).
OpenUrl CrossRef PubMed

[41] 41.↵
Mcelreavey, K. & Quintana-Murci, L. A population genetics perspective of the Indus Valley through uniparentally-inherited markers. Annals of Human Biology 32, 154–162 (2005).
OpenUrl PubMed

[42] 42.↵
Sengupta, S. et al. Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions and Reveal Minor Genetic Influence of Central Asian Pastoralists. The American Journal of Human Genetics 78, 202–221 (2006).
OpenUrl CrossRef PubMed Web of Science

[43] 43.↵
Qamar, R. et al. Y-Chromosomal DNA Variation in Pakistan. The American Journal of Human Genetics 70, 1107–1124 (2002).
OpenUrl CrossRef PubMed Web of Science

[44] 44.↵
Chang, C.-W., Mulero, J. J., Budowle, B., Calandro, L. M. & Hennessy, L. K. Identification of a Novel Polymorphism in the X-Chromosome Region Homologous to the DYS456 Locus. Journal of Forensic Sciences 51, 344–348 (2006).
OpenUrl PubMed

[45] 45.
Collins, F. S., Brooks, L. D. & Chakravarti, A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8, 1229–1231 (1998).
OpenUrl FREE Full Text

[46] 46.
Fredman, D. HGVbase: a curated resource describing human DNA variation and phenotype relationships. Nucleic Acids Research 32, 516D–519 (2004).
OpenUrl CrossRef PubMed Web of Science

[47] 47.↵
NCBI Resource Coordinators et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 46, D8–D13 (2018).
OpenUrl CrossRef PubMed

[48] 48.↵
Chang, Y. M., Perumal, R., Keat, P. Y. & Kuehn, D. L. C. Haplotype diversity of 16 Y-chromosomal STRs in three main ethnic populations (Malays, Chinese and Indians) in Malaysia. Forensic Sci. Int. 167, 70–76 (2007).
OpenUrl CrossRef PubMed Web of Science

[49] 49.↵
Park, M. J. et al. Characterization of Deletions in the DYS385 Flanking Region and Null Alleles Associated with AZFc Microdeletions in Koreans. Journal of Forensic Sciences 53, 331–334 (2008).
OpenUrl PubMed

[50] 50.
Parkin, E. J. et al. Diversity of 26-locus Y-STR haplotypes in a Nepalese population sample: Isolation and drift in the Himalayas. Forensic Science International 166, 176–181 (2007).
OpenUrl PubMed

[51] 51.
Mizuno, N. et al. 16 Y chromosomal STR haplotypes in Japanese. Forensic Science International 174, 71–76 (2008).
OpenUrl PubMed

[52] 52.
Roewer, L. et al. Y-chromosomal STR haplotypes in Kalmyk population samples. Forensic Science International 173, 204–209 (2007).
OpenUrl PubMed

[53] 53.↵
Sánchez, C. et al. Haplotype frequencies of 16 Y-chromosome STR loci in the Barcelona metropolitan area population using Y-Filer™ kit. Forensic Science International 172, 211–217 (2007).
OpenUrl PubMed

[54] 54.↵
Redd, A. J. et al. Forensic value of 14 novel STRs on the human Y chromosome. Forensic Sci. Int. 130, 97–111 (2002).
OpenUrl CrossRef PubMed Web of Science

[55] 55.↵
Roewer, L. et al. DNA commission of the International Society of Forensic Genetics (ISFG): Recommendations on the interpretation of Y-STR results in forensic analysis. Forensic Science International: Genetics 48, 102308 (2020).
OpenUrl

[56] 56.↵
Slatkin, M. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139, 457–462 (1995).
OpenUrl Abstract/FREE Full Text

[57] 57.↵
Weir, B. S. & Cockerham, C. C. Estimating F-Statistics for the Analysis of Population Structure. Evolution 38, 1358 (1984).
OpenUrl CrossRef PubMed Web of Science

[58] 58.
Michalakis, Y. & Excoffier, L. A generic estimation of population subdivision using distances between alleles with special reference for microsatellite loci. Genetics 142, 1061–1064 (1996).
OpenUrl Abstract/FREE Full Text

[59] 59.↵
Reynolds, J., Weir, B. S. & Cockerham, C. C. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 105, 767–779 (1983).
OpenUrl Abstract/FREE Full Text

[60] 60.↵
Excoffier, L. & Lischer, H. E. L. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources 10, 564–567 (2010).
OpenUrl

[61] 61.↵
Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Molecular Biology and Evolution 33, 1870–1874 (2016).
OpenUrl CrossRef PubMed

[62] 62.↵
R Core Team. R: A language and environment for statistical computing. (R Foundation for Statistical Computing, 2015).

[63] 63.↵
Bandelt, H. J., Forster, P. & Rohl, A. Median-joining networks for inferring intraspecific phylogenies. Molecular Biology and Evolution 16, 37–48 (1999).
OpenUrl CrossRef PubMed Web of Science

[64] 64.
Igen’s, C. & Tillmar, A. O. Population genetics of 29 autosomal STRs and 17 Y-chromosomal STRs in a population sample from Afghanistan. International Journal of Legal Medicine 128, 279–280 (2014).
OpenUrl

[65] 65.
Achakzai, N. M. et al. Y-chromosomal STR analysis in the Pashtun population of Southern Afghanistan. Forensic Sci Int Genet 6, el03–105 (2012).
OpenUrl

[66] 66.
Lacau, H. et al. Y-STR profiling in two Afghanistan populations. Leg Med (Tokyo) 13, 103–108 (2011).
OpenUrl

[67] 67.
Nothnagel, M. et al. Revisiting the male genetic landscape of China: a multi-center study of almost 38,000 Y-STR haplotypes. Human Genetics 136, 485–497 (2017).
OpenUrl

[68] 68.
Kwak, K. D. et al. Y-chromosomal STR haplotypes and their applications to forensic and population studies in east Asia. International Journal of Legal Medicine 119, 195–201 (2005).
OpenUrl CrossRef PubMed Web of Science

[69] 69.
Zhang, D. et al. RETRACTED ARTICLE: Y Chromosomal STR haplotypes in Chinese Uyghur, Kazakh and Hui ethnic groups and genetic features of DYS448 null allele and DYS19 duplicated allele. Int J Legal Med (2019) doi:10.1007/s00414-019-02049-6.
OpenUrl CrossRef

[70] 70.
Shan, W. et al. Genetic polymorphism of 17 Y chromosomal STRs in Kazakh and Uighur populations from Xinjiang, China. Int J Legal Med 128, 743–744 (2014).
OpenUrl

[71] 71.
Wang, C. et al. Genetic polymorphisms of 27 Yfiler^® Plus loci in the Daur and Mongolian ethnic minorities from Hulunbuir of Inner Mongolia Autonomous Region, China. Forensic Science International: Genetics 40, e252–e255 (2019).
OpenUrl

[72] 72.
Fu, X. et al. Genetic polymorphisms of 26 Y-STR loci in the Mongolian minority from Horqin district, China. International Journal of Legal Medicine 130, 941–946 (2016).
OpenUrl

[73] 73.
Ou, X. et al. Haplotype analysis of the polymorphic 40 Y-STR markers in Chinese populations. Forensic Science International: Genetics 19, 255–262 (2015).
OpenUrl

[74] 74.
Zhu, B. et al. Y-STRs haplotypes of Chinese Mongol ethnic group using Y-PLEX™ 12. Forensic Science International 153, 260–263 (2005).
OpenUrl PubMed

[75] 75.
Bian, Y. et al. Analysis of genetic admixture in Uyghur using the 26 Y-STR loci system. Scientific Reports 6, (2016).

[76] 76.
Roewer, L., Willuweit, S., Stoneking, M. & Nasidze, I. A Y-STR database of Iranian and Azerbaijanian minority populations. Forensic Sci Int Genet 4, e53–55 (2009).
OpenUrl CrossRef PubMed

[77] 77.
Sayyari, M., Salehzadeh, A., Tabatabaiefar, M. A. & Abbasi, A. Profiling of 17 Y-STR loci in Mazandaran and Gilan provinces of Iran. Turk J Med Sci 49, 1277–1286 (2019).
OpenUrl

[78] 78.
Sayyari, M., Salehzadeh, A., Tabatabaiefar, M. A. & Abbasi, A. Genetic polymorphisms of Y-chromosome short tandem repeats (Y-STRs) in a male population from Golestan province, Iran. Mol Biol Res Commun (2020) doi:10.22099/mbrc.2020.35547.1462.
OpenUrl CrossRef

[79] 79.
Nasidze, I., Schädlich, H. & Stoneking, M. Haplotypes from the Caucasus, Turkey and Iran for nine Y-STR loci. Forensic Science International 137, 85–93 (2003).
OpenUrl PubMed

[80] 80.
Javed, F. et al. Male individualization using 12 rapidly mutating Y-STRs in Araein ethnic group and shared paternal lineage of Pakistani population. Int. J. Legal Med. 132, 1621–1624 (2018).
OpenUrl

[81] 81.
Ullah, I. et al. High Y-chromosomal Differentiation Among Ethnic Groups of Dir and Swat Districts, Pakistan. Ann. Hum. Genet. 81, 234–248 (2017).
OpenUrl

[82] 82.
Adnan, A. et al. Population data of 17 Y-STRs (Yfiler) from Punjabis and Kashmiris of Pakistan. International Journal of Legal Medicine (2017) doi:10.1007/s00414-017-1611-9.
OpenUrl CrossRef

[83] 83.
Lee, E. Y. et al. Analysis of 22 Y chromosomal STR haplotypes and Y haplogroup distribution in Pathans of Pakistan. Forensic Science International: Genetics 11, 111–116 (2014).
OpenUrl

[84] 84.
Adnan, A. et al. Genetic structure and forensic characteristics of Saraiki population from Southern Punjab, Pakistan, revealed by 20 Y-chromosomal STRs. Int J Legal Med 134, 977–979 (2020).
OpenUrl

[85] 85.
Perveen, R., Shahid, A. A., Shafique, M., Shahzad, M. & Husnain, T. Genetic variations of 15 autosomal and 17 Y-STR markers in Sindhi population of Pakistan. International Journal of Legal Medicine 131, 1239–1240 (2017).
OpenUrl

[86] 86.
Zhabagin, M. et al. Development of the Kazakhstan Y-chromosome haplotype reference database: analysis of 27 Y-STR in Kazakh population. Int J Legal Med 133, 1029–1032 (2019).
OpenUrl

[87] 87.
Aliferi, A. et al. UK and Irish Y-STR population data—A catalogue of variant alleles. Forensic Science International: Genetics 34, e1–e6 (2018).
OpenUrl

[88] 88.
Ali, N., Coulson-Thomas, Y. M., Norton, A. L., Dixon, R. A. & Williams, D. R. Announcement of population data: genetic data for 17 Y-STR AmpFℓSTR^® Yfiler^™ markers from an immigrant Pakistani population in the UK (British Pakistanis). Forensic Sci Int Genet 7, e40–42 (2013).
OpenUrl

[89] 89.
Mizuno, N. et al. 16 Y chromosomal STR haplotypes in Japanese. Forensic Science International 174, 71–76 (2008).
OpenUrl PubMed

[90] 90.
Gutiérrez-Alarcón, A. B., Moguel-Torres, M., León-Jiménez, A. K., Cuéllar-Nevárez, G. E. & Rangel-Villalobos, H. Allele and haplotype distribution for 16 Y-STRs (AmpFISTR^® Y-filer^™ kit) in the state of Chihuahua at North Center of Mexico. Legal Medicine 9,154–157 (2007).
OpenUrl

Forensic features and genetic legacy of the Baloch population of Pakistan and the Hazara population across Durand-line revealed by Y chromosomal STRs

ABSTRACT

INTRODUCTION

2. RESULTS AND DISCUSSIONS

2.1 Allelic frequencies and Forensic parameters

2.2 Phylogenetic analyses and Population comparisons

2.3 Inference of ancestry based on Y STRs

2.4 Physical characterization of DYS448 deletions

2.5 Concluding Remarks

3. MATERIALS AND METHODS

3.1 Samples

3.2. DNA extraction

3.3 PCR Amplification

3.4. 27Y-STRs genotyping

3.5. Confirmation of Null DYS 448

3.6. Quality control

3.7. Statistical analysis

Phylogenetic analysis

Linear discriminant analysis

The median-joining network

COMPETING FINANCIAL INTERESTS

AUTHOR CONTRIBUTION

COMPLIANCE WITH ETHICAL STANDARDS

Electronic Supplementary Materials (ESM)

ACKNOWLEDGMENTS

References

Citation Manager Formats

Subject Area