Abstract
Analyses of Y chromosome variations of extant people have resulted in two models for the paternal phylogenetic tree of modern humans with roots either in Africa or East Asia. These two trees are differentiated mainly by when and where their mega-haplogroups branched apart. This paper examines previously published Y chromosome sequencing data of 17 ancient samples to compare these two competing models. As ancient samples have had less time to evolve, they are expected to have mutated in some, but not all, of the sites that define present day haplogroups to which they belong. Indeed, most of the ancient DNAs here showed that expected pattern for both the terminal and the basal haplogroups to which they belong, all of the ones which are non-controversial or considered real by both of the two competing models followed that pattern. However, for basal haplogroups not shared by the two models, such expected pattern could be observed only if the haplogroups specific to the Asia rather than the Africa model are real, including ABCDE, ABDE, AB, A00-A1b. Another important point is that, if the mega-haplogroups of the Africa model were real, including BT, CT, CF and F, it would mean that numerous alleles would be shared between these haplogroups and several ancient A1b1b2 samples, which is unexpected and unseen in present day samples. Sharing alleles like this would also violate the infinite site assumption that makes the Africa rooting possible in the first place. Therefore, the data from ancient Y chromosomes confirm the actual existence of the haplogroups specific to the Asia model.
Two competing models of modern human origins termed “Multiregional” and “Recent Out-of-Africa” have long been accepted 1,2. The multiregional model considers extant people of any given region, e.g., East Asia, to be largely descended from ancient people living in the same region at ~200-2000 ky ago, such as Peking man. The model has support from fossils and cultural remains but molecular evidence has been lacking until recently 3. Analyses based on a more complete molecular evolutionary framework, the maximum genetic diversity (MGD) theory 4, suggest multiregional origins for autosomes but root both uniparental DNAs in East Asia 3. The rooting of mtDNA tree in Asia independently confirms an earlier paper 5, and has been verified by ancient mtDNAs findings that show the earlier appearance of haplogroup R compared to N 6,7. The Out of Africa model posits that modern humans originated in Africa and then migrated to Eurasia, largely replacing local archaic humans with limited genetic mixing 1,8,9. The rooting of uniparental DNAs in Africa relies on the assumption of neutral mutations throughout the entire genome 9–11, which is known to be a poor explanatory framework for evolutionary phenomena 4,12. The infinite site assumption emerges from the neutral framework, which states that mutations appear once in the evolutionary history, and the related inference of derived alleles underlies the Y tree topology and rooting of the Africa model 10. However, mutation saturation and natural selection is far more common than initially thought, which would invalidate the currently accepted inference of derived alleles 3,4,13,14. Certain haplogroups contain a large number of derived alleles that define other haplogroups, e.g., A has many derived alleles for BT (42.4% of informative sites) and A and B have many derived alleles for CT (18.9% informative sites), which violates the method underlying the Africa model in the first place 3,15. In contrast, haplogroups in the Asia model are defined by alleles shared by all members within a haplogroup, regardless of their derived status. The rooting in the Asia model of uniparental DNAs relies on the reasoning that the original haplotype should be the common type, since mutations leading to alternative types should be rare events 3,5–7. The ancestor type should have many alleles different from the outgroup to qualify as a modern type. Some of those alleles may revert back to archaic alleles as modern types migrated to new environments and admixed with archaic humans. Co-evolution with admixed autosomes may cause modern uniparental DNAs to mutate back to archaic alleles.
The biggest difference between the two competing models of Y tree topology comes from how they handle the mega-haplogroups in the Africa model upstream of G (Figure 1), such as BT or CT. This category includes the vast majority of haplotypes among present day diversity and is thought to result from new mutations since the original type. In contrast, the Asia model considers the alleles of mega-haplogroups like BT or CT to be the ancestor type carried by the first modern human individual. It is the non BT or non CT haplotypes such as A or AB that have acquired new mutations. We here used ancient DNAs to test these two competing models.
Only major branches and representative SNPs are shown with branch lengths not to scale. The tree topology was built without making use of any ancient DNAs. A. The Out of East Asia model. B. The Out of Africa model.
To confirm the expectation that ancient samples should have mutated in only a fraction of the sites that define the present day haplogroups to which they belong, we studied a total of 17 published ancient Y chromosome sequences of relatively high coverage (Table 1), including A1b1, B, C, E, H, I, and R haplogroups 16–23, for informative sites as found in the 1000 genomes project (1kGP) or in the Y-DNA haplogroup tree from the International Society of Genetic Genealogy (ISOGG, http://www.isogg.org, version 13.07) 24. Most of these ancient DNAs showed mutations in some but not all of the sites that define the present day haplotypes to which they belong (Figure 2A, Supplementary Table S1 and S2). For example, ATP12-1240 sample from Atapuerca Spain had 58 out of 67 informative sites mutated for I-M170 and 20 of 25 informative sites mutated for the basal haplogroup IJ-M429 21; I2966 sample from Malawi had 3 of 3 informative sites mutated for B2b1, 30 of 37 informative sites mutated for B2-M182 and 7 of 9 mutated for the basal branch B-M181 16. To exclude the possibility of sequencing or calling errors, we studied all samples for mutations in sites that define an irrelevant haplogroup A1a and found all to have essentially no mutations in A1a, as expected if there were essentially no calling errors (Figure 2B, Table 1). Only an extremely low rate of unexpected changes was found (9 out of 9882 A1a sites called for all samples), which may be either mutations or sequencing/calling errors.
A. The fractions of mutated sites among the informative sites defining a haplogroup were shown for ancient samples belonging to each haplogroup as shown. B. The fractions of non-mutated sites among the informative sites defining A1a haplogroup were shown for ancient samples belonging to each haplogroup as shown. Most of the listed haplogroups have only one ancient sample but some have more than one.
Haplogroup-defining SNPs were identified using ISOGG and the 1kGP dataset. The first number in the site # row refers to the number of haplotype-defining sites in ISOGG and the second number refers to sites found in the 1kGP. The total combined number of sites from ISOGG and 1kGP was used in the analysis. For numbers in the remaining cells, the first number refers to mutations in haplotypes of the Out of East Asia (OOEA) model, the second number refers to mutations in haplotypes of the Out of Africa (OOA) model, and the third number represents the total number of informative sites. Numbers in bold highlight cases where only a fraction of the sites defining a basal haplogroup were mutated.
Many but not all ancient DNAs showed the expected mutation pattern for basal branches specific to the Asia model such as A (A00-A1b), AB, ABDE, and ABCDE, and they all showed a near complete absence of mutations in the haplogroups to which they do not belong (Figure 3A and B, Table 1). For example, for the five samples belonging to the AB haplogroup, there were a total of 395 mutations among 491 informative sites that classify them as AB while the remaining 96 sites are shared with non AB haplogroups and represent the genotype of the original type. In contrast, essentially no ancient DNAs showed the expected mutation pattern for branches specific to the Africa model such as A1, A1b, BT, CT, CF, and F, and some showed unexpected mutations in the haplogroups to which they do not belong (Figure 3C and D, Table 1). For example, for the 12 samples belonging to the CT haplogroup, all 1473 informative sites would have mutated to CT-specific alleles if the CT branch was legitimate. There was only one case with the expected pattern, if any of the mega-haplogroup in the Africa model are real, where the I2966 B2b1 sample had 91 of 92 sites mutated to BT (Table 1). However, as this involved only 1 out of 92 sites, it may just be a sequencing/calling error. In addition, the five non-CT samples had 96 out of 491 informative sites for CT mutated to CT-specific alleles. As no A or B haplogroup today are known to share CT alleles in CT defining sites, the finding of ancient A or B samples carrying CT alleles is highly unusual and unexpected. It would mean that those sites have mutated more than once, which would violate the infinite site assumption that makes the Africa rooting possible in the first place.
Shown are the number of mutated, non-mutated, and total informative sites in ancient DNAs under the Out of East Asia model (A) or the Out of Africa model (C). Also shown are the fractions of mutated and non-mutated sites in ancient DNAs under the Out of East Asia model (B) or the Out of Africa model (D). The number following the haplogroup name shows the number of ancient samples carrying the haplogroup.
Ancient haplogroups with some sites not mutated in the basal and terminal haplogroups to which they belong may not have persisted until today. Relative to the 36000 year old C1b K14 sample that had some sites not mutated in ABCDE-M89 and C-M130 basal haplogroups, the similarly aged C1a2 Sunghir SI, SIII, and SIV samples all had mutations in all sites informative to ABCDE and C-M130, and may better qualify as ancestors of some present day lineages. Regardless, ancient samples with only some, but not all, sites mutated for a basal haplogroup could serve to show that the basal haplogroups as inferred by studying present day samples did indeed exist.
The mutation pattern in the ancient Y chromosomes as revealed here confirms the expectation that ancient haplogroups should mutate in only a fraction of the sites that define a haplogroup they belonged to. Two observations here confirm the Asia model and invalidate the Africa model. First, only haplogroups specific to the Asia model showed the expected mutation pattern in ancient samples. Second, the genetic reality that a haplogroup, be it ancient or present, should not carry mutations found in basal haplogroups to which they do not belong is only met by the Asia model but not the Africa model. For example, I9133 A1b1b12a sample showed numerous shared alleles with BT, CT, CF, and F of the Africa model, to which it does not belong. However, it didn’t share any alleles with any haplogroups specific to the Asia model such as A00A0, to which it does not belong. We conclude that the Asia rooting of the Y chromosome phylogenetic tree has been independently confirmed by ancient DNAs.
Materials and Methods
Whole genome sequencing data of ancient DNAs were downloaded from links provided by the previous publications. These ancient DNA samples were shown in Table 1. BWA 0.7.10 was used to aligned the fastq format to the human genome GRCh37 25. The mapped reads were then filtered and sorted using SAMtools 0.1.19 and Picardtools 1.107 (http://picard.sourceforge.net) 26. The Picardtools 1.107 was also used to remove duplicates. The SNPs of genotypes were called using GATK-3.2-2 software with mapping quality setting at Q=30 27. Haplogroups defining SNPs were identified using the ISOGG2018 Y chromosome tree (https://isogg.org) and the 1kGP dataset (http://www.intemationalgenome.org).
Additional Information
Competing Interests
The authors declare that they have no competing interests.
Author contributions
H.C. and Y.Z. performed data analysis. S.H. devised the project, analyzed the data, and wrote the manuscript. All authors edited the manuscript.
Acknowledgments
Supported by the National Natural Science Foundation of China grant 81171880 (S.H.).