Abstract
Farming was first introduced to southeastern Europe in the mid-7th millennium bce-brought by migrants from Anatolia who settled in the region before spreading throughout Europe. However, the dynamics of the interaction between the first farmers and the indigenous hunter-gatherers remain poorly understood because of the near absence of ancient DNA from the region. We report new genome-wide ancient DNA data from 204 individuals–65 Paleolithic and Mesolithic, 93 Neolithic, and 46 Copper, Bronze and Iron Age–who lived in southeastern Europe and surrounding regions between about 12,000 and 500 bce. We document that the hunter-gatherer populations of southeastern Europe, the Baltic, and the North Pontic Steppe were distinctive from those of western Europe, with a West-East cline of ancestry. We show that the people who brought farming to Europe were not part of a single population, as early farmers from southern Greece are not descended from the Neolithic population of northwestern Anatolia that was ancestral to all other European farmers. The ancestors of the first farmers of northern and western Europe passed through southeastern Europe with limited admixture with local hunter-gatherers, but we show that some groups that remained in the region mixed extensively with local hunter-gatherers, with relatively sex-balanced admixture compared to the male-biased hunter-gatherer admixture that we show prevailed later in the North and West. After the spread of farming, southeastern Europe continued to be a nexus between East and West, with intermittent steppe ancestry, including in individuals from the Varna I cemetery and associated with the Cucuteni-Trypillian archaeological complex, up to 2,000 years before the Steppe migration that replaced much of northern Europe’s population.
Introduction
The southeastern quadrant of Europe was the beachhead in the spread of agriculture into Europe from its source in the Fertile Crescent of southwestern Asia. After the first appearance of agriculture in the 7th millennium bce1,2, southeastern Europe incubated a succession of Early Neolithic cultures prior to the spread of farming westward via a Mediterranean route and northwestward via a Danubian route, reaching both Iberia and Central Europe by 5600 bce.3,4 Ancient DNA studies have shown that the spread of farming across Europe was accompanied by a massive movement of people5–8 closely related to the farmers of northwestern Anatolia9–11 but nearly all the evidence about the ancestry of the first farmers in Europe is from central and far western Europe, with only three individuals reported from northern Greece9. In the millennia following the establishment of agriculture in the Balkan Peninsula, a series of complex societies formed, culminating in large tell settlements and sites like the mid 5th millennium bce necropolis at Varna, which has some of the earliest evidence of extreme inequality in wealth, with one individual (ANI152/grave 43) from whom we extracted DNA buried with more gold than is known from any site prior to that time. By the end of the 6th millennium BCE, agriculture had reached eastern Europe, in the form of the Cucuteni-Trypillian complex in the area of present-day Moldova, Romania and Ukraine, with densely settled “mega-sites” in Ukraine housing hundreds, and perhaps thousands, of people.12 After around 4000 bce, these societies further transformed, the tell settlements were largely abandoned, and there is archaeological evidence of contact with nomadic pastoralist populations from the Eurasian steppe. However, the population movements that accompanied these events are not immediately evident from the archaeological record and remain largely unknown.
Results
We generated new genome-wide data from 204 ancient humans (195 reported for the first time), from the Balkan Peninsula, the Carpathian Basin, the North Pontic Steppe, and neighboring regions mostly dated to between 12,000 and 1000 bce (Figure 1A, Supplementary Data Table 1, Supplementary Information, section 1). To obtain genome-wide data in an efficient way, we enriched the DNA libraries in solution for sequences overlapping 1.24 million single nucleotide polymorphisms (SNPs) before sequencing.7,10,13 We filtered out individuals with low coverage (less than 15,000 SNPs covered by at least one sequence) or that had unexpected ancestry for their archaeological context and were not directly dated. We also report data for, but exclude from analysis, nine individuals that were first-degree relatives of others in the dataset, resulting in an analysis dataset of 195 individuals.
We applied principal component analysis (PCA; Figure 1B, Extended Data Figure 1), and supervised and unsupervised ADMIXTURE (Figure 1D, Extended Data Figures 2 and 3)14 to obtain an overview of this dataset in terms of population structure. We combined this genetic assessment of structure with archaeological and chronological information to cluster the individuals into populations. We used D-statistics to evaluate whether pairs of populations were consistent with being clades15 and the programs qpAdm and qpGraph to fit mixture proportions and admixture graphs to the data15. To investigate how these individuals fit into the wider context of European prehistory and present-day genetic diversity, we co-analyzed them with data from 265 previously reported ancient individuals9–11,16–28 as well as 799 present-day individuals genotyped on the Illumina “Human Origins” array24, and 300 high coverage genomes from the Simons Genome Diversity Project29 (SGDP).
Hunter-Gatherer substructure and transitions
We report new genome-wide data for 101 individuals from Paleolithic, Mesolithic and eastern European Neolithic contexts. In eastern Europe, the term “Neolithic” often refers to the presence of pottery30–32 (and may not include a transition to agricultural subsistence which is the key criterion for the use of these terms in western Europe). Because of the differences in meaning across our region of study, we avoid the use of “Neolithic” as a general term and use terms corresponding to economic subsistence strategy (either “hunter-gatherer” or “farmer”), or genetic ancestry (we use “hunter-gatherer ancestry” to refer to genetic ancestry derived from a population closely related to Mesolithic Europeans, and “farmer ancestry” to refer to ancestry derived from a population like northwestern Anatolian farmers).
Hunter-gatherers from central Europe have both western and eastern European hunter-gatherer ancestry (WHG and EHG) – a cline that is clearly visible in PCA (Figure 1B). This motivated us to investigate whether genetic population structure of Mesolithic and Early Neolithic hunter-gatherers in Europe was determined purely by physical distance. We fit admixture proportions with qpAdm and also a model estimating a spatially continuous migration surface under an isolation-by-distance model,33 and inferred a migration barrier separating populations that are predominantly WHG from EHG, but with some diffusion of ancestry across this boundary (Figure 2A). However, we also show that this frontier was not static, with dramatic local shifts in ancestry over time (Figure 2B) and substructure in phenotypically important variants (Supplementary Information, section 2).
From present-day Ukraine, our study reports new genome-wide data from 31 hunter-gatherers: five Mesolithic individuals dating from 9500–6000 bce, and 26 Neolithic and Copper Age (“Eneolithic” or “Chalcolithic”) individuals dating from ~6000–3500 bce. On the cline from western (WHG) to eastern hunter-gatherer ancestry (EHG, represented by individuals from Karelia and Samara), the Ukrainian Mesolithic individuals fall towards the eastern end, intermediate between EHG and hunter-gatherers from Sweden7. The Ukrainian Neolithic population has significant differences in ancestry compared to the Ukrainian Mesolithic population. A previous study of two individuals (for one of which we generated new data) from the Mesolithic Vasil’evka 3 and Neolithic Vovnigi 2 sites34 suggested that between the Ukrainian Mesolithic and Neolithic there was an increase in Ancient North Eurasian (ANE) ancestry related to the 24,000-year old Siberian Mal’ta 1.26 However, our larger sample shows the opposite – specifically that ANE ancestry decreases and WHG ancestry increases – as shown by the statistics D(Mbuti, X, Ukraine_Mesolithic, Ukraine_Neolithic), which is Z=−4.9 when X is the Mal’ta 1 individual and Z=9.1 for X=WHG (Supplementary Data Table 2).
Individuals associated with the Bronze Age Y amnaya Complex from Ukraine, like previously reported Yamnaya individuals from Samara7 and Kalmykia16, have little evidence of WHG ancestry, but do have a third source of ancestry related to hunter-gatherers from the Caucasus 20 (CHG) and early Iranian farmers23,35 (Supplementary Data Table 3). Two Yamnaya individuals – one from Ozera in Ukraine and one from Bulgaria (I1917 and Bul4, both dated to ~3000 bce) – in addition have evidence of early European farmer related admixture, which is the first evidence of such ancestry in Yamnaya individuals (Figure 1B,D, Supplementary Data Table 2). Similarly, one Copper Age individual (I4110) dated to ~3600–3400 BCE from Dereivka in Ukraine has both CHG and farmer ancestry (Figure 1D, Supplementary Data Table 2). This is by far the earliest appearance of farmer ancestry this far East in Eurasia, which was previously not known on the Steppe until the Srubnaya Complex after ~1800 bce.
At Zvejnieki in Latvia (17 newly reported individuals, added to 5 first reported in Ref. 34 for which we report additional data here) there is a transition in hunter-gatherer ancestry that is the opposite of that seen in Ukraine. Consistent with similar data from the Baltic States 34,36,37, we find that Mesolithic and Early Neolithic individuals associated with the Kunda and Narva cultures had ancestry that was genetically intermediate between western and eastern hunter-gatherers (we estimate 70% WHG and 30% EHG, but perhaps with some differential relatedness to ANE, relative to EHG; Supplementary Data Table 3). However, there is a dramatic shift between the Early Neolithic and the Middle Neolithic Comb Wear Complex, who are almost entirely EHG in ancestry (we estimate 73% EHG, but two out of four individuals appear almost 100% EHG in PCA space). The most recent individual, associated with the Final Neolithic Corded Ware Complex, provides evidence of yet another transition, clustering closely with Yamnaya from Samara7, Kalmykia16 and Ukraine.
We report new data from hunter-gatherers from France, Sicily and Croatia, as well as higher coverage data from three previously published hunter-gatherers from France and Germany.18 The Sicilian and Croatian individuals dating to 12,000 and 6100 bce cluster closely with western hunter-gatherers, including individuals from Loschbour24 (Luxembourg, 6100 bce), Bichon20 (Switzerland, 11,700 bce), and Villabruna18 (Italy 12,000 bce). These results demonstrate that the “western hunter-gatherer” population24 was widely distributed from the Atlantic seaboard of Europe in the West, to Sicily in the South, to the Balkan Peninsula in the Southeast, for at least six thousand years, strengthening the evidence that the western hunter-gatherers represent a population that expanded from a southeastern European refugium following the last Ice Age around 15,000 years ago–in the process displacing or admixing with the existing population of western Europe.18,38
A particularly important hunter-gatherer population that we newly report in this study is from the Iron Gates region that straddles the border of present-day Romania and Serbia. This region is close to the route taken by farmer migrants on their way from the Balkans to central Europe, and the population is represented in our study by 35 individuals from the Iron Gates sites of Hajdučka Vodenica, Ostrovul Corbului, Padina, Schela Cladovei and Vlasac, one individual (I2534) with similar (but more EHG) hunter-gatherer ancestry from the Early Neolithic site of Măgura Buduiasca 250km east of the Iron Gates, and two individuals with farmer ancestry from the Mesolithic-Neolithic site of Lepenski Vir. Modeling Iron Gates hunter-gatherers as a mixture of WHG and EHG (Supplementary Table 3) shows that they are – as expected given their geographic location and the hunter-gatherer ancestry cline – intermediate between WHG (87%) and EHG (13%). However, this qpAdm model does not fit well (p=0.0003, Supplementary table 3) and we note that the Iron Gates hunter-gatherers carry mitochondrial haplogroups K1 (8/36) as well as other subclades of haplogroup U (27/36) and haplogroup H (1/36). This contrasts with WHG, EHG and Scandinavian hunter-gatherers who almost all carry haplogroup U5 or U2. Therefore the Iron Gates hunter-gatherers have ancestry that is not present in WHG or EHG. This suggests either genetic contact between the ancestors of the Iron Gates population and hunter-gatherers from Anatolia, or that the Iron Gates population is related to the source population from which the WHG split off during a post-LGM re-expansion into Europe.
In contrast, the two individuals (I4665 and I4666, dated to 6205–5907 calBCE and 6222–5912 calBCE respectively) that we sampled from Iron Gates site of Lepenski Vir are genetically farmers rather than hunter-gatherers, despite having been buried in the local Mesolithic tradition. Strontium isotope data shows that many of the individuals buried after ~6100 bce at Lepenski Vir–including one of the two that we sampled (I4665, burial 54E)–were not originally from the Danube Gorges.39 These observations, combined with one individual from Padina (I5232), dated to 6061–5841 calBCE that has both farmer and hunter-gatherer ancestry, demonstrates that the Iron Gates region was one where farmer and hunter-gatherer groups interacted both genetically and culturally, and provides a window into the first few generations of interactions between these disparate groups.
Population structure and transformation of farmers in southeast Europe
Neolithic populations from present-day Bulgaria, Croatia, Macedonia, Serbia and Romania cluster closely with the Neolithic populations of northwestern Anatolia (Figure 1). Modeling Balkan Neolithic populations as a mixture of Anatolia Neolithic, western hunter-gatherer and Ukraine Mesolithic, we estimate that Balkan Neolithic populations derive 98% (95% confidence interval [CI]; 97–100%) of their ancestry from populations related to those of the northwestern Anatolian Neolithic. A striking exception to the pattern of limited hunter-gatherer admixture in Balkan Neolithic populations is evident in 8 out of 9 individuals from Malak Preslavets, a site in present-day Bulgaria close to the Danube river.40 These individuals likely lived in the mid-6th millennium bce and have significantly more hunter-gatherer ancestry than other Balkan Neolithic individuals as shown by PCA and ADMIXTURE as well as D-statistics and qpAdm modeling (Figure 1B,D, Extended Data Figures 1–3, Supplementary Data Tables 2–4). We find that a model of 82% (CI: 77–86%) Anatolian Neolithic, 15% (CI: 12–17%) WHG, and 4% (CI: 0–9%) EHG ancestry is a good fit to the data. This hunter-gatherer ancestry with a ~4:1 WHG:EHG ratio plausibly represents a contribution from local Balkan hunter-gatherers like those that we sampled from the Iron Gates. By the Late Mesolithic, hunter-gatherers populations in the Balkans were likely concentrated in sites along the coast and major rivers such as the Danube,41 which directly connects the Iron Gates with Malak Preslavets. This suggests a heterogeneous landscape of farmer populations with different proportions of hunter-gatherer ancestry during the early Neolithic, and that farmer groups with the most hunter-gatherer ancestry, like those at Malak Preslavets and, possibly, Lepenski Vir were those that lived close to the highest densities of hunter-gatherers.
In the Balkans, Copper Age populations have significantly more hunter-gatherer ancestry than Neolithic populations as shown, for example, by the statistic D(Mbuti, WHG, Balkans_Neolithic, Balkans_Chalcolithic); Z=5.18 ( Supplementary Data Table 2). This is consistent with changes in funeral rites42 and roughly contemporary with the “resurgence” of hunter-gatherer ancestry previously reported in central Europe and Iberia7,10,43.
We also report the first data from the Late Neolithic Globular Amphora Complex. Globular Amphora individuals from two sites in Poland and Ukraine form a tight genetic cluster, showing genetic homogeneity over a large distance (Figure 1B,D). We find that this population had more hunter-gatherer ancestry than Middle Neolithic groups from Central Europe7 (we estimate 25% [CI: 22–27%] WHG ancestry, similar to Chalcolithic Iberia). This finding further extends our knowledge of the variable landscape of hunter-gatherer and farmer admixture proportions in Europe (Supplementary Data Table 3). In east-central Europe, the Globular Amphora Complex immediately precedes the Corded Ware Complex that marks the first appearance of steppe ancestry in the region.7,16 The Globular Amphora abutted populations with steppe-influenced material cultures for hundreds of years and yet the individuals in our study have no evidence of steppe ancestry, suggesting that this persistent culture frontier corresponded to a genetic barrier.
The migrations from the Pontic-Caspian steppe associated with the Yamnaya Cultural Complex in the 3rd millennium bce made a profound contribution to the genetic ancestry of central Europe, contributing about 75% of the ancestry of individuals associated with the Corded Ware Complex and about 50% of the ancestry of succeeding material cultures such as the Bell Beaker Complex.7,16 In a few individuals from southeastern Europe, we find evidence of steppe-related ancestry far earlier (defined here as a mixture of EHG and CHG similar to the genetic signature of individuals of the later Yamnaya; Figure 1B,D). One individual (ANI163) from the Varna I cemetery dates to 4711–4550 bce, one (I2181) from nearby Smyadovo dates to 4550–4450 bce, and a third individual (I1927) from Verteba cave, associated with the Cucuteni-Trypillian complex, dates to 3619–2936 bce. These findings push back by almost 2000 years the first evidence of steppe ancestry this far West in Europe, demonstrating the resumption of genetic contact between southeastern Europe and the Steppe that also occurred in the Mesolithic. Other Copper Age (~5000–4000 bce) individuals from the Balkans have little evidence of steppe ancestry, but Bronze Age (~3400–1100 bce) individuals do (we estimate 30%; CI: 26–35%). The four latest Balkan Bronze Age individuals in our data (later than ~1700 bce) all have more steppe ancestry than earlier Bronze Age individuals (3200–2500 bce; Figure 1D), showing that the contribution of the Steppe to southeast European populations increased further during the Bronze Age.
New evidence about the spread of farming into, and throughout, Europe
This study resolves two open questions about the initial spread of farming into Europe. The first is the question of whether the first farmers of the Danubian Route that brought agriculture to northern Europe along the Danube River valley, and those that spread along the Mediterranean coast to Iberia and other southern European locations, were derived from a single ancestral population or instead represent separate migrations from different Anatolian sources. A challenge in studying the relationship among Early Neolithic populations is the different proportions of hunter-gatherer ancestry they carry, which obscures the more subtle differences between their farmer ancestries. One approach to this problem is to explicitly model both sources of ancestry using an admixture graph (Supplementary Information, section 3). We confirm that Mediterranean populations, represented in our study by individuals of the Impressa complex from Croatia and the Epicardial Early Neolithic from Spain7, are closely related to the Danubian population represented by the Linearbandkeramik (LBK) from central Europe7,44 and show that both groups are closely related to the Balkan Neolithic population. These three populations form a clade with Northwest Anatolians as an outgroup, consistent with a single migration from a population closely related to the northwestern Anatolian Neolithic farming population into the Balkan peninsula, which then split into two populations that followed the Danubian and Mediterranean routes.
A related question about the spread of farming into Europe concerns whether its initial arrival in present-day Greece and subsequent expansion was mediated by a single population migrating from Anatolia – as has been consistent with genetic data up until now9 – or whether there were multiple initial groups, as suggested by the archaeological record.45,46 We find that four southern Greek (Peloponnese) Neolithic individuals – three from Diros Cave and one from Franchthi Cave, plus one previously published individual from Diros27 – are not consistent with descending from the same source population as other European farmers. In PCA these individuals are outliers; shifted away from northwestern Anatolian and European Early Neolithic individuals, in a direction opposite from WHG. D-statistics (Supplementary Data Table 2) show that in fact, these “Peloponnese Neolithic” individuals have less WHG-related ancestry than Anatolia Neolithic ones, and that they form an outgroup relative to Anatolian and Balkans Neolithic populations, suggesting an independent migration into Europe from a population that split off from the ancestors of the northwest Anatolian individuals from which we have data. Admixture graph modeling (Supplementary Information, section 3) supports this interpretation, confirming that their Near Eastern ancestry is derived from a lineage that is close, or basal, to the non-WHG component of Anatolian Neolithic ancestry. One possibility is that this independent migration is related to an earlier Aceramic Neolithic in Greece that was derived from the pre-pottery Neolithic (PPNB) of Cyprus and the Levant46. Under this model, the earliest Neolithic populations in Greece migrated from the Levant, perhaps via the southwestern Anatolian coast as early as 7000 bce,45,46 but the majority of Neolithic ancestry arrived around 500 years later via a route that passed through northwestern Anatolia. The predictions of this hypothesis could be further tested with genome-wide data of Early Neolithic individuals from Cyprus, Crete and southwest Anatolia. Populations related to the Peloponnese Neolithic potentially made a small contribution to the ancestry of other Mediterranean Neolithic populations like Early Neolithic Iberia and Neolithic farmers from northern Greece9 but we do not strongly reject models without such a contribution (Supplementary Information, section 3).
Sex-biased admixture between hunter-gatherers and farmers
We provide the first evidence for sex-biased admixture between hunter-gatherers and farmers in Europe, showing that the Middle Neolithic “resurgence” of hunter-gatherer ancestry7,43 in central Europe and Iberia was driven more by male than by female hunter-gatherers (Figure 3B&C, Supplementary Data Table 5). One way of detecting this is to compare ancestry proportions on the autosomes and the X chromosome. Since males always inherit their X chromosome from their mothers, differences between ancestry on the autosomes and chromosome X imply sex-biased mixture. In the Balkan Neolithic there is no evidence of sex bias using ancestry estimates obtained from qpAdm (Z=−0.65 where a positive Z-score implies male hunter-gatherer bias), nor in the LBK and Iberian_Early Neolithic (Z=−0.24 and 1.04). However, in the Middle Neolithic and later populations, this effect reverses. In the Balkan Copper Age there is weak evidence of bias (Z=1.77) but in Iberia and central Europe Middle Neolithic there is clear bias is in favor of male hunter-gatherer ancestry (Z=2.98, and 2.77 in Iberia Copper Age and central European Middle Neolithic). This result is independently supported by uniparental markers (Figure 3C). Proportions of typically hunter-gatherer mitochondrial haplogroups (haplogroup U)47 are low in all populations and within the intervals of genome-wide ancestry proportions. On the other hand, hunter-gatherer Y chromosomes (haplogroups I2, R1 and C1)18 are much more common: 6/7 in the Iberian Neolithic/Copper Age and 7/8 in Middle-Late Neolithic central Europe (Central_MN and Globular_Amphora). Under a single pulse model of admixture, the autosomal/X chromosome ancestry proportions imply that in the central European Middle Neolithic population that shows the strongest evidence of sex bias, 35–50% of the male ancestors were hunter-gatherers, compared to 0–5% of the female ancestors (Extended Data Figure 4).
The merging of hunter-gatherer and farmer populations was a dynamic process that unfolded over thousands of years, and proceeded in a profoundly different way in different parts of Europe. Our analysis shows that in some places – for example at Malak Preslavets in Bulgaria – there was extensive mixing between hunter-gatherers and farmers, likely driven by the high local hunter-gatherer population density. In other places–in particular in western, central and northern Europe–hunter-gatherers and farmers lived in close proximity for long periods of time with minimal mixture6,43,48. When they did finally mix, we find evidence that the hunter-gather admixture was male-biased, implying a different dynamic. Farming was initially unable to expand widely in central and northern Europe because early farming techniques were only suitable for specific regions within the loess belt of the northern European plain. Thus, northern and central European hunter-gatherers were protected from the demographic impact of farming migrations, resulting in persistent frontiers between farmers and hunter-gatherers.49,50 This may have given hunter-gatherers and farmers time to learn from each other and interact in a different way than during the more rapid expansion of the first farmers in the South.
No evidence of Copper Age Balkans-to-Anatolia migration
One version of the Steppe Hypothesis of Indo-European language origins suggests that Proto-Indo European languages developed in the steppe north of the Black and Caspian seas, and that the earliest known diverging branch – Anatolian – was spread into Asia Minor by movements of steppe peoples through the Balkan peninsula during the Copper Age around 4000 BCE, as part of the same incursions from the steppe that coincided with the decline of the tell settlements.51 If this were correct, then one way to detect evidence of it would be the appearance of large amounts of characteristic steppe ancestry first in the Balkan Peninsula, and then in Anatolia. However, our genetic data do not support this scenario. While we find steppe ancestry in Balkan Copper Age and Bronze Age individuals, this ancestry is sporadic across individuals in the Copper Age, and at low levels in the Bronze Age. Moreover, while Bronze Age Anatolian individuals27 have CHG / Iran Neolithic related ancestry, they have neither the EHG ancestry characteristic of all steppe populations sampled to date20, nor the WHG ancestry that is ubiquitous in southeastern Europe in the Neolithic (Figure 1A, Supplementary Data Table 2, Supplementary Information section 1). This pattern is consistent with that seen in northwestern Anatolia11 and later in Copper Age Anatolia23, suggesting continuing migration into Anatolia from the East rather than from Europe.
An alternative hypothesis is that the ultimate homeland of Proto-Indo European languages was in the Caucasus or in Iran. In this scenario, westward movement contributed to the dispersal of Anatolian languages, and northward movement and mixture with EHG was responsible for the formation of the population associated with the Yamnaya complex. These steppe pastoralists plausibly spoke a “Late Proto-Indo European” language that is ancestral to many of the non-Anatolian branches of the Indo-European language family.52 On the other hand, our data could still be consistent with the Steppe-Balkans-Anatolia route hypothesis model, albeit with constraints. It remains possible that populations dating to around 1600 bce in the regions where the Indo-European Luwian, Hittite and Palaic languages were spoken did have European hunter-gatherer ancestry. However, our results would require that such ancestry was not ubiquitous in Bronze Age Anatolia, and was perhaps tightly linked to Indo-European speaking groups. We predict that additional insight about the genetic origins of the potential speakers of early Indo-European languages will be obtained when ancient DNA data become available from additional sites in this key period in Anatolia and the Caucasus.
Discussion
Our study shows that southeastern Europe consistently served as a genetic contact zone between different populations. This role likely contributed to the extraordinary series of cultural innovations that characterize the region, from the elegant figurines of the Neolithic to the ornaments and precious metalwork of Varna. Before the arrival of farming, this region saw constant interaction between highly diverged groups of hunter-gatherers, and this interaction continued, perhaps accelerating, after the arrival of farming. We find evidence that some early farmers from Greece derived ancestry from a different source compared to the one that contributed the majority of ancestry of all other farmers in Europe. In eastern Europe we document the appearance of CHG/Iranian Neolithic ancestry north of the Black Sea, and its eventual extension as far north as the Baltic. In some ways, this expansion parallels the expansion of Anatolian farmer ancestry into western Europe although it is less dramatic, and several thousand years later. These expansions set up the two, largely separate, populations in western and eastern Europe that would come together in the Final Neolithic and Early Bronze Age to form the ancestry of present-day Europe.
This study describes key ancestral components that contributed to present-day West Eurasian genetic diversity. However, the more recent processes that created present-day southeastern European populations are unknown, and understanding this will require dense sampling of Bronze Age, Iron Age, Roman, and Medieval groups and comparison to present-day populations. At the most ancient end of our time series, while information about hunter-gatherer population structure in northern and western Europe now extends back throughout the Upper Paleolithic,18 we have little data about how these populations fit into a wider Eurasian context, and more data from hunter-gatherer populations in Anatolia, the Near East and East Asia will be needed to resolve that question. Finally, many questions about the nature of the interactions between populations remain unresolved. For example, we report evidence for sex-bias in one particular set of interactions between hunter-gatherers and farmers, and other interactions may have had similar dynamics37,53. However, many more examples of such interactions need to be collected before it will become possible to make generalizable claims about the patterns of sex-biased interactions among human populations as they came into contact and mixed during prehistory.
Methods
Ancient DNA Analysis
We extracted DNA and prepared next-generation sequencing libraries in four different dedicated ancient DNA laboratories (Adelaide, Boston, Budapest, and Tuebingen). Sample powder was also generated in a fifth laboratory (Dublin) and sent to Boston for DNA extraction and library preparation (Supplementary Table 1).
Two samples were processed at the Australian Centre for Ancient DNA, Adelaide, Australia, according to previously published methods7 and sent to Boston for subsequent screening, 1240k capture and sequencing.
Seven samples were processed as previously described28 at the Institute of Archaeology RCH HAS, Budapest, Hungary, and amplified libraries were sent to Boston for screening, 1240k capture and sequencing.
Seventeen samples were processed at the Institute for Archaeological Sciences of the University of Tuebingen and at the Max Planck Institute for the Science of Human History in Jena, Germany. Extraction54 and library preparation55,56 followed established protocols. We performed in-solution capture as described below (“1240k capture”) and sequenced on an Illumina HiSeq 4000 or NextSeq 500 for 76bp either single or paired-end.
The remaining 195 samples were processed at Harvard Medical School, Boston, USA. From about 75mg of sample powder from each sample (extracted in Boston or University College Dublin, Dublin, Ireland), we extracted DNA following established methods54 replacing the column assembly with the column extenders from a Roche kit.57 We prepared double barcoded libraries with truncated adapters from between a ninth to a third of the DNA extract. Most libraries included in the nuclear genome analysis (90%) were subjected to partial (“half”) Uracil-DNA-glycosylase (UDG) treatment before blunt end repair. This treatment reduces by an order of magnitude the characteristic cytosine-to-thymine errors of ancient DNA data58, but works inefficiently at 5’ ends,56 and thereby leaves a signal of characteristic damage at the terminal ends of ancient sequences. Some libraries were not UDG treated (“minus”). For some samples we increased coverage by preparing additional libraries from the existing DNA extract using the partial UDG library preparation, but replacing the MinElute column cleanups in between enzymatic reactions with magnetic bead cleanups, and the final PCR cleanup with SPRI bead cleanup.59,60
We screened all libraries from Adelaide, Boston and Budapest by enriching for the mitochondrial genome plus about 3000 (50 in an earlier, unpublished, version) nuclear SNPs using a bead-capture61 but with the probes replaced by amplified oligonucleotides synthesized by CustomArray Inc. After the capture, the adapter sites were completed by PCR, and thereby dual index combinations62 were attached to each enriched library. We sequenced the products of between 100 and 200 libraries together with the non-enriched libraries (shotgun) on an Illumina NextSeq500 using v2 150 cycle kits for 2x76 cycles and 2x7 cycles.
In Boston, We performed two rounds of in-solution enrichment (“1240k capture”) for a targeted set of 1,237,207 SNPs using previously reported protocols.7,13,24 When we enriched additional libraries to increase coverage, multiple libraries from the same sample were pooled in equimolar ratios before the capture. All sequencing was performed on an Illumina NextSeq500 using v2 150 cycle kits for 2x76 cycles and 2x7 cycles. We attempted to sequence each enriched library up to the point where we estimated that it was economically inefficient to sequence further. Specifically, we iteratively sequenced more and more from each individual and only stopped when we estimated that the expected increase in the number of targeted SNPs hit at least once would be less than about one for every 100 new read pairs generated. After sequencing, we removed individuals with evidence of contamination based on mitochondrial DNA polymorphism63 or difference in PCA space between damaged and undamaged reads64, a high rate of heterozygosity on chromosome X despite being male64,65, or an atypical ratio of X to Y sequences. We report, but do not analyze, data from 17 individuals that had low coverage (less than 15,000 SNPs hit on the autosomes), were first-degree relatives of others in the dataset, or were undated and had unusual ancestry for their archaeological context.
After removing a small number of sites that failed to capture, we were left with a total of 1,233,013 sites of which 32,670 were on chromosome X and 49,704 were on chromosome Y, with a median coverage at targeted SNPs on the 138 new individuals of 0.73 (range 0.017-9.2; Supplementary Table 1). We generated “pseudo-haploid” calls by selecting a single read randomly for each individual at each SNP. Thus, there is only a single allele from each individual at each site, but adjacent alleles might come from either of the two haplotypes of the individual. We merged the newly reported data with previously reported data from 266 other ancient individuals9–11, 16–28, making pseudo haploid calls in the same way at the 1240k sites for individuals that were shotgun sequenced rather than captured.
Using the captured mitochondrial sequence from the screening process, we called mitochondrial haplotypes. Using the captured SNPs on the Y chromosome, we called Y chromosome haplogroups for males by restricting to sequences with mapping quality >30 and bases with base quality >30. We determined the most derived mutation for each individual, using the nomenclature of the International Society of Genetic Genealogy (http://www.isogg.org) version 11.110 (21 April 2016).
Population genetic analysis
To analyze these ancient individuals in the context of present day genetic diversity, we merged them with the following two datasets:
300 high coverage genomes from a diverse worldwide set of 142 populations sequenced as part of the Simons Genome Diversity Project29 (SGDP merge).
799 West Eurasian individuals genotyped on the Human Origins array24, with 597,573 sites in the merged dataset (HO merge).
We computed principal components of the present-day individuals in the HO merge and projected the ancient individuals onto the first two components using the “Isqproject: YES” option in smartpca (v15100)66 (https://www.hsph.harvard.edu/alkes-price/software/).
We ran ADMIXTURE (v1.3.0) in both supervised and unsupervised mode. In supervised mode we used only the ancient individuals, on the full set of SNPs, and the following population labels fixed (so, for k=4, we used labels 1–4 in the list below, for example):
Anatolia_Neolithic
WHG
EHG
Yamnaya
Ukraine_Mesolithic
SHG
For unsupervised mode we used the HO merge, including 799 present-day individuals. We flagged individuals that were genetic outliers based on PCA and ADMIXTURE, relative to other individuals from the same time period and archaeological culture.
We computed D-statistics using qpDstat (v710), and fitted admixture proportions with qpAdm (v610 and admixture graphs with qpGraph (v6021)15 (https://github.com/DReichLab/AdmixTools), in each case using the SGDP merge. We computed standard errors for Z scores with the default block jackknife parameters. For qpAdm we used the following set of seven populations as outgroups or “right populations”:
Mbuti.DG
Ust_Ishim_HG_published.DG
Mota.SG
MA1_HG.SG
Villabruna
Papuan.DG
Onge.DG
Han.DG
For some analyses (Extended Data Table 4) we used an extended set of 14 right populations, including additional Upper Paleolithic European individuals18:
ElMiron
Mota.SG
Mbuti.DG
Ust_Ishim_HG_published.DG
MA1_HG.SG
AfontovaGora3
GoyetQ116-1_published
Villabruna
Kostenki14
Vestonice16
Karitiana.DG
Papuan.DG
Onge.DG
Han.DG
For 40 Mesolithic individuals we estimated an effective migration surface using the software EEMS (https://github.com/dipetkov/eems)33. We computed pairwise differences between individuals using the bed2diffs2 program provided with EEMS. We set the number of demes to 400 and defined the outer boundary of the region by the polygon (in latitude-longitude co-ordinates) [(66,60), (60,10), (45,−15), (35,−10), (35,60)]. We ran the MCMC ten times with different random seeds, each time with one million burn-in and four million regular iterations, thinned to one in ten thousand.
To analyze potential sex bias in admixture, we used qpAdm to estimate admixture proportions on the autosomes (default option) and on the X chromosome (option “chrom: 23”). We computed Z scores for the difference between the autosomes and the X chromosome as where pA and pX are the hunter-gatherer admixture proportions on the autosomes and the X chromosome, and σA and σX are the corresponding jackknife standard deviations. Thus, a positive Z-score means that there is more hunter-gatherer admixture on the autosomes than on the X chromosome and thus the hunter-gatherer admixture was male-biased. Because X chromosome standard errors are high and qpAdm results can be sensitive to which population is first in the list of outgroup populations, we checked that the patterns we observe were robust to cyclic permutation of the outgroups. To compare frequencies of hunter-gatherer uniparental markers we counted the individuals with mitochondrial haplogroup U and Y chromosome haplogroups C2, I2 and R1 which are all common in Mesolithic hunter-gatherers but rare or absent in the Anatolian Neolithic. The Iron Gates hunter-gatherers also carry H and K1 mitochondrial haplogroups so the proportion of haplogroup U represents the minimum maternal hunter-gatherer contribution. We computed binomial confidence intervals for the proportion of markers using the Agresti-Coull method67,68 implemented in the binom package in R.
Given autosomal and X chromosome admixture proportions, we estimated the proportion of male and female hunter-gatherer ancestors by assuming a single pulse model of admixture. If the proportions of male and female ancestors that are hunter-gatherer are given by m and f, respectively, then the proportions of hunter-gatherer ancestry on the autosomes and the X chromosome are given by and . We approximated the sampling error in the observed admixture proportions by the estimated jackknife sampling error, computed the likelihood surface for (m,f) over a grid ranging from (0,0) to (1,1).
Direct AMS 14C Bone Dates
We report new direct AMS 14C bone dates in this study from multiple AMS radiocarbon laboratories. In general, bone samples were manually cleaned and demineralized in weak HCl and, in most cases (PSU, UCIAMS, OxA), soaked in an alkali bath (NaOH) at room temperature to remove contaminating soil humates. Samples were then rinsed to neutrality in Nanopure H2O and gelatinized in HCL.69 The resulting gelatin was lyophilized and weighed to determine percent yield as a measure of collagen preservation (% crude gelatin yield). Collagen was then directly AMS 14C dated (Beta, AA) or further purified using ultrafiltration (PSU/UCIAMS, OxA, Poz, Wk, MAMS).70 It is standard in some laboratories (PSU/UCIAMS, OxA, Wk) to use stable carbon and nitrogen isotopes as an additional quality control measure. For these samples, the %C, %N and C:N ratios were evaluated before AMS 14C dating. C/N ratios for well-preserved samples fall between 2.9 and 3.6, indicating good collagen preservation.71
All 14C ages were δ13C-corrected for mass dependent fractionation with measured 13C/12C values72 and calibrated with OxCal version 4.2.3 73 using the IntCal13 northern hemisphere calibration curve.73
Acknowledgments
We thank David Anthony, Iosif Lazaridis, and Mark Lipson for comments on the manuscript; Bastien Llamas and Alan Cooper for supporting laboratory work. Support for this project was provided by the Human Frontier Science Program fellowship LT001095/2014-L to I.M.; by DFG grant AL 287 / 14–1 to K.W.A.; by Irish Research Council grant GOIPG/2013/36 to D.F.; by MEN-UEFISCDI grant, Partnerships in Priority Areas Program – PN II (PN-II-PT-PCCA-2013-4-2302) to C.L.; by Croatian Science Foundation grant IP-2016-06-1450 to M.N.; by European Research Council grant ERC StG 283503 and Deutsche Forschungsgemeinschaft DFG FOR2237 to K.H.; by ERC starting grant ADNABIOARC (263441) to R.P.; and by US National Science Foundation HOMINID grant BCS-1032255, US National Institutes of Health grant GM100233, and the Howard Hughes Medical Institute to D.R.