Sex-biased admixture and assortative mating shape genetic variation and influence demographic inference in admixed Cabo Verdeans

Abstract Genetic data can provide insights into population history, but first, we must understand the patterns that complex histories leave in genomes. Here, we consider the admixed human population of Cabo Verde to understand the patterns of genetic variation left by social and demographic processes. First settled in the late 1400s, Cabo Verdeans are admixed descendants of Portuguese colonizers and enslaved West African people. We consider Cabo Verde’s well-studied historical record alongside genome-wide SNP data from 563 individuals from 4 regions within the archipelago. We use genetic ancestry to test for patterns of nonrandom mating and sex-specific gene flow, and we examine the consequences of these processes for common demographic inference methods and genetic patterns. Notably, multiple population genetic tools that assume random mating underestimate the timing of admixture, but incorporating nonrandom mating produces estimates more consistent with historical records. We consider how admixture interrupts common summaries of genomic variation such as runs of homozygosity. While summaries of runs of homozygosity may be difficult to interpret in admixed populations, differentiating runs of homozygosity by length class shows that runs of homozygosity reflect historical differences between the islands in their contributions from the source populations and postadmixture population dynamics. Finally, we find higher African ancestry on the X chromosome than on the autosomes, consistent with an excess of European males and African females contributing to the gene pool. Considering these genomic insights into population history in the context of Cabo Verde’s historical record, we can identify how assumptions in genetic models impact inference of population history more broadly.

The challenges of inferring human demographic history are particularly apparent on short 41 timescales of tens of generations, where changes in allele frequencies may be difficult to observe 42 and biased by dynamics not captured by classic models. Admixed populations provide an 43 opportunity to examine evolutionary processes on short timescales using admixture linkage-44 disequilibrium (LD) structure rather than potentially small changes in allele frequencies. 45 Understanding the patterns generating the distribution of ancestry within an admixed 46 population is also important as genomic data has shown how widespread admixture is  consider patterns of genetic variation alongside historical records that document many aspects 88 of Cabo Verdean history, including settlement patterns, timing, and sociocultural dynamics. 89 Specifically, we use distributions of genetic ancestry to test for ancestry-assortative mating and 90 sex-specific gene flow. We examine the consequences of these processes for genetic variation, 91 such as patterns of homozygosity, and for common demographic inference methods. In turn, by 92 elucidating the patterns of genetic variation social processes leave in human genomes for a 93 population with a well-studied historical and ethnographic record, we can better use 94 population-genomic methods to explore the history of populations without extensive records. Cabo Verde, and historical records document differences in population sizes, mating patterns, 106 and social customs by island population (see Methods for historical data sources).

108
The settlement of the islands was influenced by island geography and ecology, and is often 109 divided into three different temporal stages, which are associated with changes in the economy    Historical records 166 Throughout our analyses, we draw comparisons between genetic results and historical records. 167 We used primary historical documents, mainly historical letters to the Portuguese Crown,

203
The resulting local ancestry calls with all AFR and EUR reference panels correlated closely with 204 calls using GWD and IBS as reference panels (Supp Fig 2C). We also performed local ancestry 205 calling using ELAI, a method that performs both phasing and local ancestry assignment (Guan 206 2014). Again using the IBS and GWD reference genotypes described above, we ran ELAI under 207 a two-way admixture model using the following parameters: -mg (number of generations) 20, -s 208 (EM steps) 30, -C (upper clusters) 2, and -c (lower clusters) 10. The resulting local ancestry calls 209 from ELAI correlated closely with calls from RFMix (Supp Fig 2D).

210
Inference of admixture timing 211 We applied three distinct strategies for estimating the timing of the onset of admixture in Cabo  Tests for ancestry-assortative mating 234 We tested for ancestry-assortative mating over the last generation using ANCESTOR (James Y.    Using genome-wide SNP data from 563 individuals, we examined four island regions of Cabo 298 Verde (Fig 1; Supp Fig 1), which differ in their settlement histories. The partitioning of the 299 considered island regions was also supported by genetic patterns. Namely, these regions 300 showed quantitatively distinct distributions of IBD tracts, as we explore below, and were 301 supported by clustering patterns in PCA ( Supp Fig 1), in which the three northwestern islands  The process of admixture can influence measures such as IBD and ROH that are often used to 317 inform inference of population history, and theoretical expectations for these measures are less 318 clear for admixed populations compared to homogeneous populations. Thus, we use multiple 319 methods to examine relatedness in Cabo Verde, and we use this case study to underscore the 320 need for further empirical and theoretical work to understand the dynamics of IBD and ROH in 321 admixed populations. Patterns of IBD within and between populations provide opportunities to 322 examine common ancestry based on the number and sizes of segments of IBD. Using these 323 summaries of IBD to examine relatedness between and within regions of Cabo Verde, we found 324 that patterns of shared ancestry reflect the successive settlement history of the islands. Notably, 325 Santiago has the lowest mean number and total length of IBD segments between individuals 326 (Supp Fig 4-5). In contrast, we observe the highest levels of IBD within and between the 327 Northwest Cluster and Boa Vista.

329
To summarize patterns of IBD within and between the four island regions, we built a network   Fig 6B). This observation led us to hypothesize that other sociocultural processes beyond To test for ancestry assortative mating within Cabo Verde, we examined whether the genomic 362 ancestries of individuals in mating pairs correlate with each other. To this end, we applied 363 ANCESTOR to computationally infer the parental ancestry proportions that likely preceded the 364 ancestry haplotypes we observe today. We repeated this analysis for each chromosome 365 independently to prevent uncertainty introduced by matching interchromosomal haplotypes.

366
As an example, the inferred ancestries of the parental haplotypes that likely preceded 367 chromosome 7 are significantly positively correlated for all islands except for Boa Vista (Fig 2A). 368 We observed similar results across the full set of autosomes (Fig 2B). We found that the inferred  estimates of ancestry-assortative mating strength (Fig 3). The LAD-based method produced 380 older estimates of admixture timing under a model including both assortative mating and 381 constant migration (Fig 3, Supp Fig 8), with increasingly older timing estimates as assortative 382 mating strength increases (Supp Fig 8). Under the same set of migration rates (0 or 1% per 383 generation), when we varied the strength of ancestry-assortative mating from a situation of 384 random mating to a situation of strong ancestry-assortative mating, the models that considered 385 substantial ancestry-assortative mating (parental correlation in ancestry > 0.3), yielded 386 admixture timing estimates closest to the historical estimates (Fig 3, Supp Fig 8). We note that   Assuming a generation time of 25 years, decay rates estimated with ALDER suggest admixture 399 timing in the mid to late 1700s (Fig 3; all timing estimates are shown in Supp Table 1).

400
MultiWaver chose a multi-wave admixture model for Santiago (Fig 3A) and the Northwest 401 Cluster (Fig 3C), but selected an isolation model for Fogo (Fig 3B) and Boa Vista (Fig 3D).

403
The estimates of admixture timing that do not explicitly consider ancestry-assortative mating, both ancestry-assortative mating and constant migration (Fig 3, Supp Fig 8). While multiple 408 complex demographic factors likely impact these estimates of admixture timing (see "Patterns 409 of genetic variation and admixture in Cabo Verde influence demographic inference" in the 410 Discussion), the evidence of ancestry-assortative mating in Cabo Verde (Fig 2, Supp Fig 7) and 411 the LAD-based inferences incorporating assortative mating (Fig 3, Supp Fig 8) suggest that 412 explicitly accounting for ancestry-assortative mating improves estimates of admixture timing, 413 while the assumption of random mating leads to underestimation of the age of admixture. significance values for total ROH and by length class are provided in Supp Fig 11). Breaking 459 total ROH into shorter and long ROH, we see that this negative correlation is driven primarily 460 by shorter ROH (Fig 4; Supp Fig 11B-C, Supp Fig 9). Despite recent colonization bottlenecks, we observed that some admixed individuals in Cabo 472 Verde present lower levels of ROH per genome than either of the source population proxies.

473
Specifically, our classification of ROH into shorter and long ROH (Fig 4; Supp Fig 9)   To further understand population structure in Cabo Verde, we next consider how autosomal 508 and X-chromosome ancestry and genetic variation patterns can be used to infer the sex-specific 509 demographic history of the last ~20 generations since founding. Some earlier analyses of post-510 admixture population structure using Y-chromosome diversity suggested sex-biased 511 demographic processes (Beleza et al. 2012). Here, we examine autosomal and X-chromosome 512 ancestry to test for sex-biased migration. On all islands, autosomal versus X chromosome 513 ancestry patterns suggest that male and female contributions differ significantly by source 514 population ( Fig 5A). Specifically, there is higher West African ancestry on the X chromosome Given our observation that source populations make distinct contributions to ROH in Cabo 534 Verde (Fig 4), we next investigate differences in autosomal vs X chromosome ROH content.

535
Differences in autosomal vs X chromosome ROH may arise in part due to the source 536 populations contributing in a sex-biased manner. Notably, this effect would be seen most in 537 shorter ROH, since shorter ROH reflect the homozygosity of older haplotypes and background 538 relatedness from the source populations. However, it is challenging to disentangle processes 539 shaping X vs autosomal ROH, due to the smaller effective population size of the X chromosome.

540
Our results suggest that sex-biased admixture processes in Cabo Verde are reflected in ROH, 541 with lower levels of shorter ROH on the X chromosome than on autosomes (Fig 6A). European 542 individuals have higher levels of ROH than West African individuals, and the higher 543 contributions of African X chromosomes (vs European X chromosomes) may drive the lower 544 levels of short ROH in Cabo Verde on the X chromosome vs the autosomes (Fig 6A). Long ROH 545 reflects different dynamics, potentially including the reduced post-admixture population size 546 and other sex-specific processes.

548
In this study, we leveraged patterns of genetic variation in Cabo Verde to infer the demographic 549 history of the past ~20 generations. We found that distinct genetic patterns of four island 550 regions within the archipelago reflect the colonization history of the islands, including island-551 specific settlement timing, admixture dynamics, mating patterns, and sex-biased demography.

552
Together these results demonstrate how patterns of ancestry and genetic variation are shaped 553 by social and demographic forces on short timescales. By better understanding how complex 554 population histories generate genetic variation, we can improve interpretation of inference from 555 populations without historical records. Verde, despite the highly racially stratified, slavery-based system that characterized the first 564 settlement stage (Cabral 2001). The second and third stages were carried out by mostly already-565 admixed individuals who had become a significant group within Cabo Verdean society and 566 who migrated from the southern to the northern islands (Correia e Silva 2001). We found that 567 the staggered settlement history and the island-specific population dynamics shaped patterns of 568 ancestry and genetic variation within Cabo Verde.  Fig 1), IBD (Fig 1, Supp Fig 3-5

602
Despite only a few hundred years of unique population histories among the islands, we also 603 found that patterns of genetic variation reflect island-specific patterns in post-admixture  The island of Fogo stands out as having unique social and historical processes, even though it 645 was settled shortly after Santiago. Fogo's society was, since its origins, a conservative rural 646 society whose main economic activity was to produce goods to trade in the African coast. The admixture timing using local ancestry disequilibrium, we observed that Fogo has greater levels 656 of local ancestry disequilibrium than the other islands. Higher local ancestry disequilibrium 657 may be driven by ancestry-assortative mating. Additionally, it may be that Fogo has 658 experienced stronger founder effects, which would decrease the number of ancestral lineages.

659
Indeed, the higher levels of shorter ROH within Fogo (Fig 4; Supp Fig 9) are consistent with 660 founder effects increasing background relatedness and thus increasing shorter ROH. Though  ALDER is frequently applied in cases where admixture is not strictly instantaneous.

708
MultiWaver inferred more complex admixture models for two out of the four island regions, 709 but the consideration of additional migration parameters in the calculation of the admixture 710 times led to results similar to our LAD model with random mating. We also note that 711 MultiWaver may not be able to accurately infer demographic histories that deviate from its pre-712 defined admixture models. In general, all available demographic inference methods must make 713 simplifying assumptions about demographic history. Our findings suggest that two simplifying 714 assumptions (random mating and single-pulse admixture) push inferred estimates of admixture 715 timing to appear more recent than the historical estimates.

717
While allowing both assortative mating and constant admixture yielded admixture timing 718 estimates that were most consistent with the historical records, it is important to note that none 719 of the inferred ranges of admixture timing captured the historically-documented onset of 720 admixture. This may be due to additional simplifying assumptions underlying the demographic 721 inference methods, such assumptions about the stationary of effective population size, the 722 composition of the sample (i.e., that the sampled individuals are a random sample from the 723 population), or the neutrality of evolution. All of the timing methods we used placed admixture 724 timing for the different islands closer together than historical dates of settlement, consistent 725 with historical expectations that the initial admixture in the southern islands was significant, 726 and that many individuals that occupied the northern and eastern islands during the second 727 and third settlement stages of Cabo Verde were already admixed. For example, the serial 728 founding of the islands may explain why estimates of admixture timing for Boa Vista were 729 closer to historical records. While Boa Vista was founded most recently, it was founded by

733
This type of serial founding scenario is common throughout recent human migration, 20 underscoring that settlement patterns, in addition to settlement timing, are critical components 735 of accurately inferring human demographic history.

737
We found that Cabo Verde's island-specific demographic history and admixture dynamics have 738 important genomic consequences, as observed with ROH. Despite the relatively recent 739 colonization of the islands, some individuals presented even lower overall levels of ROH than 740 African reference populations. We found that low overall levels of ROH in Cabo Verde are 741 driven by shorter length ROH. This observation is consistent with the idea that shorter ROH 742 can be attributed to older shared ancestors from the source populations, and these tracts can be  (Fig 4). In contrast, Santiago has both the oldest population and the largest population 749 size, and has comparatively low levels of both shorter and long ROH. These observations 750 suggest that more work, both empirical and theoretical, is needed to understand the interacting 751 forces of local ancestry and ROH.

753
In sum, we provide insights into the population history of Cabo Verde and demonstrate how 754 admixed populations can provide powerful test cases for understanding demographic processes 755 and genomic consequences in recent human history. We show that patterns of shared ancestry 756 between and within the islands (quantified with IBD and kinship estimates) reflect serial 757 founder effects as well as settlement patterns such as post-admixture nonrandom mating. We 758 find that accounting for nonrandom mating allows us to improve inference of admixture timing 759 and better contextualize genomic consequences of admixture dynamics, such as ROH. We find 760 that differences in ancestry on the X chromosome vs the autosomes reflect sex-biased 761 demographic processes. Given the ubiquity of admixture throughout modern human 762 population, these results provide important, generalizable considerations for the study of recent 763 human evolution.

766
The de-identified local ancestry calls, ROH calls, and IBD calls will be made publicly available  (2013). However, the original genotype data will be made available upon signing a material 773 transfer agreement assuring that the data will only be used in accordance with the restrictions 774 of the informed consent, and agreeing that the data will be destroyed after the research is