Insight into the genomic history of the Near East from whole-genome sequences and genotypes of Yemenis

We report high-coverage whole-genome sequencing data from 46 Yemeni individuals as well as genome-wide genotyping data from 169 Yemenis from diverse locations. We use this dataset to define the genetic diversity in Yemen and how it relates to people elsewhere in the Near East. Yemen is a vast region with substantial cultural and geographic diversity, but we found little genetic structure correlating with geography among the Yemenis – probably reflecting continuous movement of people between the regions. African ancestry from admixture in the past 800 years is widespread in Yemen and is the main contributor to the country’s limited genetic structure, with some individuals in Hudayda and Hadramout having up to 20% of their genetic ancestry from Africa. In contrast, individuals from Maarib appear to have been genetically isolated from the African gene flow and thus have genomes likely to reflect Yemen’s ancestry before the admixture. This ancestry was comparable to the ancestry present during the Bronze Age in the distant Northern regions of the Near East. After the Bronze Age, the South and North of the Near East therefore followed different genetic trajectories: in the North the Levantines admixed with a Eurasian population carrying steppe ancestry whose impact never reached as far south as the Yemen, where people instead admixed with Africans leading to the genetic structure observed in the Near East today.


Text
Yemen's location and history suggest it has played an important role in early human migrations and thus is of considerable importance for understanding modern human genetic diversity. 1 But Yemen as well as the rest of the Near East have so far been poorly represented in databases that catalogue human variation: The 1000 Genomes Project did not include any Near Eastern samples among its many populations from around the globe, 2 and the Human Genome Diversity Project included only samples from the Northern region of the Near East. 3 Here, we continue our efforts 4; 5 to catalogue the diversity of genetically under-represented populations by genotyping and whole-genome sequencing individuals from Yemen, and provide the new data as a resource for future genetic studies in the region. We present an initial exploration of the data by asking how present-day Near Easterners are related to each other and to the ancient people who inhabited the Near East before them, focusing on the historical events and processes that have contributed to the genetic structure in Yemen today. Samples were recruited from diverse locations in Yemen ( Figure 1, Table 1) and DNA extraction from 3 ml blood performed in Yemen at Sanaa University. All subjects provided an informed consent form and the study was approved by the IRB of the Lebanese American University. Samples (n=169) were genotyped at the Wellcome Sanger Institute in the UK using the Illumina HumanOmni2.5-8 BeadChip which covers 2.5 million SNPs. In addition, 46 samples were whole-genome sequenced at >30× depth using the Illumina HiSeq X Ten and the reads were mapped to build 37 and processed as described previously. 6 The use of the samples in genetic studies was approved by the Human Materials and Data Management Committee at the Wellcome Sanger Institute (HMDMC approval number 14/072).
We merged the newly generated data with previously published data from modern and ancient populations creating two datasets for the analyses: 1) The geno set included the new Yemen genotype data merged with the Human Origin dataset 7-9 and with published ancient DNA data 4; 7; 8; 10-13 resulting in a set of 3060 samples and 211,966 SNPs. 2) The seq set included the new Yemen sequencing data merged with data from the Simons Genome Diversity Project 14 and with sequencing data from Lebanon 4 in addition to published ancient DNA data, 4; 7; 8; 10-13 resulting in a set of 731 individuals and 938,848 SNPs. In this work we refer to Yemen as being the South of the Near East, while Lebanon and the surrounding Levant regions are considered the North of the Near East. This differentiation is based on geography but also captures our previous knowledge of the genetic structure in the region. 5 The newly sequenced data contained 1,212,228 SNPs not previously reported in dbSNP build 151.
Among the novel SNPs we found 46,970 SNPs that were common (>5%) in the Yemeni population; annotation on build 37 using VEP v79 as described previously 15 showed these included 336 moderately common missense SNPs, 23 splice donor/acceptor SNPs and 10 stop gained SNPs at >2% frequency (Table 2) which could be relevant to future medical studies in the region.
Investigation of the demographic history of the Yemeni populations by applying multiple sequentially Markovian coalescent (MSMC) 16 analyses to the Yemen genomes showed a typical Eurasian population size change pattern characterized by a bottleneck around 50,000-60,000 years, followed by a population size expansion ( Figure 2). But the Yemen population size increase after the bottleneck appears to be limited compared with the increase which occurred in the North of the Near East ( Figure   3). To understand how the Yemenis were related to other populations in the Near East, we projected We observe on the PCA two genetic clines that affect the Near Easterners differently: 1) The African cline, which appears to be a major contributor to genetic diversity in Yemen and 2) The Eurasian cline, towards which the Northern Near Easterners appear shifted compared with the Yemenis. The Northern Near Easterners are themselves structured on the African cline with Palestinians, Jordanians, Syrians, and Lebanese Muslims having more African ancestry than Assyrians, Jews, Druze and Lebanese Christians (Figure 4) confirming our previous observation. 5 We investigated the time when admixture has occurred in North and South of the Near East by using linkage disequilibrium, 17; 18 using the Lebanese Muslims and the Yemenis to represent the two areas, and setting Africans, East Asians and West Eurasians in the geno set as references ( Figure 6). We found two significant admixture events in the Lebanese Muslims; the first occurred around 600BCE-500CE (Z=3.5) and the second occurred around 1580CE-1750CE (Z=3.4), confirming our previous results on the date of admixture in North of the Near East. 19 In contrast, we detect one significant admixture event in Yemen occurring 1190CE-1290CE (Z=14.6) and thus these results suggest that the shifts of the North and South of the Near East along the African cline could arise from independent events.
Next, we investigated genetic structure within Yemen and found it was loosely correlated with geography ( Figure 7): some clustering by region is visible, but many individuals from the same region do not cluster together. We interpret this structure as resulting from continuous movement of people between the different regions. However, we note that individuals from Hudayda and from Hadramout have increased African ancestry compared with most other Yemenis, while individuals from Maarib appear to be the least admixed clustering at the base of the African cline in Yemen. We confirm these results by running ADMIXTURE 20 on the Yemenis, Lebanese, Africans and East Asians found in the seq set using K=3 to account for the African, Asian and Near Eastern ancestry. The ADMIXTURE results We thus find that the genetic structure of present-day Yemenis can be accounted for by mixture of post-glacial Levantine and other Eurasian populations with Africans within the last 1000 years. In particular, we do not find evidence for African ancestry that might have remained specifically in this region since the major expansion out of Africa 50,000-60,000 years ago. 1 Our model of the genetic relationships in the Near East relies on the available genetic data today and thus will likely be refined in the future when, most importantly, ancient DNA data from the Southern regions of the Near East become available -work that has so far been hindered by the warm climate in the Near East negatively impacting the survival of ancient DNA.