Population history from the Neolithic to present on the Mediterranean island of Sardinia: An ancient DNA perspective

Recent ancient DNA studies of western Eurasia have revealed a dynamic history of admixture, with evidence for major migrations during the Neolithic and Bronze Age. The population of the Mediterranean island of Sardinia has been notable in these studies – Neolithic individuals from mainland Europe cluster more closely with Sardinian individuals than with all other present-day Europeans. The current model to explain this result is that Sardinia received an initial influx of Neolithic ancestry and then remained relatively isolated from expansions in the later Neolithic and Bronze Age that took place in continental Europe. To test this model, we generated genome-wide capture data (approximately 1.2 million variants) for 43 ancient Sardinian individuals spanning the Neolithic through the Bronze Age, including individuals from Sardinia’s Nuragic culture, which is known for the construction of numerous large stone towers throughout the island. We analyze these new samples in the context of previously generated genome-wide ancient DNA data from 972 ancient individuals across western Eurasia and whole-genome sequence data from approximately 1,500 modern individuals from Sardinia. The ancient Sardinian individuals show a strong affinity to western Mediterranean Neolithic populations and we infer a high degree of genetic continuity on the island from the Neolithic (around fifth millennium BCE) through the Nuragic period (second millennium BCE). In particular, during the Bronze Age in Sardinia, we do not find significant levels of the “Steppe” ancestry that was spreading in many other parts of Europe at that time. We also characterize subsequent genetic influx between the Nuragic period and the present. We detect novel, modest signals of admixture between 1,000 BCE and present-day, from ancestry sources in the eastern and northern Mediterranean. Within Sardinia, we confirm that populations from the more geographically isolated mountainous provinces have experienced elevated levels of genetic drift and that northern and southwestern regions of the island received more gene flow from outside Sardinia. Overall, our genetic analysis sheds new light on the origin of Neolithic settlement on Sardinia, reinforces models of genetic continuity on the island, and provides enhanced power to detect post-Bronze-Age gene flow. Together, these findings offer a refined demographic model for future medical genetic studies in Sardinia.

the initial insights these studies reveal, none of them analyze genome-wide autosomal data, which 130 has proven to be of great use for studies of population history (Pickrell and Reich, 2014). 131 To provide a more detailed perspective on Sardinian population history, we generated genome-132 wide data from the skeletal remains of 43 Sardinian individuals radiocarbon dated to between 133 4,100-1,000 BCE. We analyzed their genetic variation in the context of reference panels of ancient 134 and contemporary individuals. Our goal was to investigate three aspects of Sardinian popula-

144
Ancient DNA from Sardinia 145 We organized a collection of skeletal remains from: 1) previously excavated samples throughout 146 Sardinia, in part drawing from samples initially used for isotopic analysis in Lai et al. (2013, 147 Supp. Info. 1), and 2) the Seulo caves of central Sardinia (Skeates et al., 2013, Supp. Info. 2). 148 We generated and sequenced DNA libraries enriched for reads overlapping the complete mito- sites covered at least once per individual. We obtained age estimates for each individual by either 154 direct radiocarbon dating (n " 29), using previously reported radiocarbon dates (n " 10), or 155 using a combination of archaeological context and radiocarbon dates from the same burial site 156 (n " 4, Fig. 1). The estimated ages in our sample range from 4,100 years BCE to 1,000 years 157 BCE (Fig. 1, Supp. Mat. 1A). To facilitate analyses of temporal structure within ancient Sar-158 dinia, we pragmatically grouped the data into three broad periods: Neolithic and Early Copper 159 Age ('Sar-NECA', 4,100-3,000 BCE, n " 4), Early Middle Bronze Age ('Sar-EMBA', 2,500-1,500 160 BCE, n " 24) and Nuragic ('Sar-Nur', 1,500-1,000 BCE, n " 15). Uniparentally inherited markers 163 We were able to infer mitochondrial haplogroups for all 43 ancient Sardinian individuals (Supp. Mat. 1E), 164 including a subset (n " 10) previously reported by Olivieri et al. (2017). We confirm the obser-165 vation that ancient Sardinian mtDNA haplotypes belong almost exclusively to macro-haplogroups 166 HV (n " 16), JT (n " 17) and U (n " 9), a composition broadly similar to other European 167 Our genome-wide data allowed us to assign Y haplogroups for 25 ancient Sardinian individ-169 uals. More than half of them consist of R1b-V88 (n " 10) or I2-M223 (n " 7) (Sup. Fig. 3, 170 Supp. Mat. 1B). In our reference data set, these two Y-haplogroups appear first in Balkan 171 hunter-gatherer and Balkan Neolithic individuals, and also in more recent western Neolithic 172 populations. Francalacci (Fig. 3). Furthermore, we did not observe temporal substructure within the ancient Sardinian 234 individuals in the top two PCs -they form a coherent cluster (Fig. 2). In stark contrast,

238
In the presence of significant influx, differential genetic affinity of a test population x would   Fig. 9).

250
From the Nuragic to present-day Sardinia 251 Our results demonstrate that ancient Sardinian individuals are genetically closest to contem-252 porary Sardinian individuals among all the ancient individuals analyzed (Fig. 3), and relative 253 to other European populations, there is lower differentiation between present-day and ancient 254 individuals (Supp. Fig. 7). However, we also find multiple lines of evidence for appreciable gene 255 flow into Sardinia after the Nuragic period.   individuals as one source, yield highly significant negative values, indicating admixture (Fig. 4).

263
Using qpAdm we find that models of continuity from Nuragic Sardinia to present-day Sardinian

287
We also considered three-way models of admixture with qpAdm to further refine the geo-  and as such they can serve as single-source proxies in two-way admixture models with Nuragic 299 Sardinia (Table 1D).   Fig. 14) . only slightly weaker than most other provinces (Fig. 4, Supp. Fig. 12). Together, these results 318 suggest high levels of drift specific to Ogliastra (likely also driving the first two PCs of present-day 319 Sardinian variation), but simultaneously also less admixture than other Sardinian provinces.

320
In the previous section, we reported finding that many non-Sardinian modern populations x are test non-Sardinian modern populations, Fig. 4, Sup. Mat. 2D). Interestingly, the northern 324 province Olbia (north-east) and to some degree also Sassari (north-west) have the highest affinity 325 to most tested populations (Fig. 4). A three-way admixture model fit with qpAdm finds a similar the island (Olbia, Sassari, Supp. Fig. 12). In addition, we observed a marked shift of individuals 331 from Olbia and Sassari towards continental populations in the PCA (Fig. 4).    individuals of a similar time period (Fig. 2).

423
The history of gene flow into Sardinia is also relevant to understanding its relationship to high fraction of EEF ancestry (e.g., Fig. 5). This shared ancestry component likely contributes 433 to the high pairwise outgroup-f 3 (Fig. 3) between Basque and Sardinians, and explains how both 434 populations share a genetic affinity despite their geographic separation.

435
Overall, we find that genome-wide ancient DNA provides unique insights into the population 436 history of Sardinia. We do not detect any significant admixture from the the Neolithic period (average 1.6%) and nuclear contamination ă6% (average 1.1%). 497 We next generated genotype calls that were used for downstream population genetic analyses.

498
To account for sequencing errors we first removed any reads that overlapped a SNP on the capture 499 array with a base quality score less than 20. We also removed the last 3-bp on both sides of

510
To minimize technology-specific batch effects in genotype calls and thus downstream population 511 genetic inference, we focused on previously published ancient samples that had undergone the 512 capture protocol on the same set of SNPs targeted in our study. We processed these samples 513 through the same pipeline and filters described above, resulting in a dataset of 972 ancient sam-514 ples. Throughout our analysis, we used a subset of n "1,013,439 variants that was created by 515 removing SNPs missing in more than 80% of all ancients individuals (Sardinian and reference 516 dataset) with at least 60% of all captured SNPs covered.

517
This ancient dataset spans a wide geographic distribution and temporal range. Ancient 518 individuals are associated with a variety of different cultures, which provides rich context for 519 interpreting downstream results. Our reference ancient dataset is comprised of many individuals 520 sampled from a particular geographic locale, such as Germany or Hungary, in a transect of 521 multiple cultural changes through time (Fig. 2). For the PCA (Fig. 2) (1,577 individuals from the SardiNIA project). For both datasets, we normalized the genotype 541 matrix by mean-centering and scaling the genotypes at each SNP using the inverse of the square-542 root of heterozygosity (Patterson et al., 2006). We additionally filtered out rare variants with 543 minor allele frequency (p min ă 0.05).

544
To assess population structure in the ancient individuals, we projected them onto our pre-

549
We also projected a number of out-sample sub-populations from Sardinia onto our PCs.

550
Reassuringly, these out-of-sample Sardinian individuals project very close to Humans Origins 551 Sardinian individuals (Fig. 2). Moreover, the test-set Sardinia individuals with grand-parental 552 ancestry from Southern Italy cluster with reference individuals with ancestry from Sicily.

554
We applied ADMIXTURE to an un-normalized genotype matrix of ancient and modern samples replicates of ADMIXTURE for values of K " 2, . . . , 11. We display results for the replicate that 560 reached the highest log-likelihood after the algorithm converged (Supp. Fig. 8).

561
Estimation of f -statistics 562 We measured similarity between groups of individuals through computing an outgroup-f 3 statis- . We fixed the ancestral allele counts to n " 10 6 to avoid finite sample size correction 571 when calculating outgroup f 3 .

572
The f 3 -and f 4 -statistics that test for admixture were computed with scikit-allel using 573 average_patterson_f3 and average_patterson_d. We estimated standard errors with a block 574 jack-knife over 1000 markers (blen=1000). When analyzing ancient individuals that were repre-575 sented as pseudo-haploid genotypes, we analyzed only one allele to avoid an artificial appearance 576 of genetic drift -that could for instance mask a negative f 3 signal of admixture.

577
To measure pairwise genetic differentiation between two populations, we estimated average pair-579 wise F ST and standard error via block-jackknife over 1000 markers, using average_patterson_fst 580 from the package scikit-allel. When analyzing ancient individuals that were represented 581 as pseudo-haploid genotypes, we analyzed only one allele to avoid artificial genetic drift. For 582 this analysis, we removed first degree relatives within each population. Another estimator, 583 average_hudson_fst gave highly correlated results (r 2 " 0.89), differing mostly for populations 584 with very low sample size (n ď 5) and did not change any qualitative conclusions.

585
Estimation of admixture proportions with qpAdm 586 We estimated admixture fractions of a selected target population as well as model consistency for To determine the haplotype branch of the Y chromosome of male ancient individuals, we analyzed 596 informative SNPs on the Y-haplotype tree. For reference, we used markers from https:// 597 isogg.org/tree/ (Version: 13.238, 2018). We merged this data-set with our set of calls and 598 identified markers available in both to create groups of equivalent markers for sub-haplogroups.

599
Our targeted sequencing approach yielded calls for up to 32,681 such Y-linked markers per