The genomic footprint of social stratification in admixing American populations

Cultural and socioeconomic differences stratify human societies and shape their genetic structure beyond the sole effect of geography. Despite mating being limited by the permeability of sociocultural stratification, most demographic models in population genetics often assume random mating. Taking advantage of the correlation between sociocultural stratification and the proportion of genetic ancestry in admixed populations, we sought to infer the former process in the Americas. To this aim, we define a mating model where the individual proportions of the genome inherited from Native American, European and sub-Saharan African ancestral populations constrain the mating probabilities through ancestry-related assortative mating and sex bias parameters. We simulate a wide range of admixture scenarios under this model. Then, we train a deep neural network and retrieve good performance in predicting mating parameters from genomic data. Our results show how population stratification shaped by racial and gender hierarchies have constrained the admixture processes in the Americas since the European colonisation and the subsequent Atlantic slave trade.

. Thus, when information on genetic ancestry of mating couples 82 is not available, it is still possible to infer ancestry-related assortative mating through the compar-83 ison of the genetic ancestry of the two homologous chromosomes of the individuals Norris et al. 84 (2019). Nevertheless, despite these efforts, we still lack a rigorous and robust method to shed light 85 onto the patterns of ancestry-related non-random mating. More specifically, we are in need of a 86 comprehensive model of ancestry-related assortative mating and sex bias, two parameters which 87 have been rarely jointly modelled in population genetics.

88
Among human populations, the admixing populations from the Americas are of special interest 89 in admixture studies. We consider them as admixing populations, because their genetics is shaped 90 by an ongoing admixture process of three differentiated continental ancestries that started five 91 2 of 19 centuries ago, constrained by a strong social structure.

92
At the end of the 15 ℎ century, European powers initiated the colonisation process in the lands 93 inhabited by Native Americans. In this frame, European colonisers enslaved more than 10 million 94 people brought from sub-Saharan Africa Eltis (2018). As a result of this historical event, the popu-  In Latin America, in addition to segregation, European colonial powers and creole elites imple-105 mented eugenicist policies under the frame of mestizaje/mestiçagem. Mestizaje/mestiçagem refers to 106 the process of admixture of Native American, European and sub-Saharan ancestries in the context 107 of the European colonisation and therefore associated to the mixture across hierarchical differ-108 ences understood as "racial", differences of class and differences of gender. Since mid-nineteenth 109 century, Latin American nation-building elites have aimed to associate mestizaje/mestiçagem to an 110 equalising process, by claiming that it overcomes and blurs the socioeconomic differences related 111 to "race". However, critics have argued that the mestizaje/mestiçagem's notion of hybridity inher-112 ently entails the idea of its constitutive origins and the hierarchies that order those origins. In this 113 sense, mestizaje/mestiçagem attaches greater value to the interactions that move towards white-114 ness and masculinity and lower value to those that move towards blackness or indigeneity, and 115 femininity. Wade (2017, 2020); Abel (2021).

116
By analysing the impact of the European colonisation in the population structure through mat-117 ing, we aim to evaluate the stratification related to the genetic ancestry not only quantifying the 118 population subdivision but also measuring the genetic ancestry asymmetry between males and 119 females in mating. Following this approach, we conceptualise a novel mechanistic mating model 120 that explicitly integrates ancestry-related assortative mating and sex bias jointly, through an inter-121 sectional approach derived from the interrelated hierarchies observed in the admixture process.

122
Crenshaw (1989,1991); Wade (2017). We consider a three-way admixture scenario mirroring the 123 demography of the admixing American populations. We build and train a deep neural network to 124 infer non-random mating parameters using extensive synthetic data. We deploy this network to ge-125 nomic data from admixing American populations sequenced as part of the 1000 Genomes Project 126 Consortium et al. (2015) and quantify the extent of ancestry-related assortative mating and sex 127 bias. Finally, we discuss racial and gender hierarchies as inferred from their footprint on genetic 128 structure.

129
Results and discussion 130 We report our results in three sections: (i) the novel mating model and framework for simulations,

131
(ii) the performance of the neural network, and (iii) the inference of the ancestry-related mating 132 probabilities for admixing American populations.

133
An ancestry-related mating model 134 We present an admixture model defined by the mating probabilities of all possible male and fe-135 male couples, set by their ancestry proportion difference. For each ancestry, the ancestry-related 136 sex bias ( ) and the ancestry-related assortative mating ( ) parameters determine the mating 137 probability of each couple as a function of the difference in the ancestry proportion between male 138 and female. We assume that the differences in the ancestry proportions within the mating couples 139 follow a Normal distribution that translates into the mating probabilities (Figure 1). 140

of 19
The Therefore, when is close to zero, a couple with substantial differences in their ancestry propor-147 tions has a higher probability of mating if is lower (Figure 1A). Figure 1B shows how a sample and values that modulate the mating probabilities in a simulation example of 19 generations from the colonization of America to nowadays. The mating probability for a given couple is set as a function of the differences in the genetic ancestry proportions for each ancestry. We assume the mating probability follow a three dimensional normal distribution. In this normal distribution, sets the expected value and is inversely proportional to its variance. B:Ancestry proportions of mating couples at generations 4 and 7 in ternary plots (top) and barplots (bottom) based on the mating probabilities defined in A.. In the top plots, each arrow represents a couple. The arrow tail and head coordinates in the ternary plots show the ancestry proportions of the female and the male, respectively. In the bottom, the barplots represent male and female ancestry proportions, linked by curved lines reflecting mating. Red, yellow and blue correspond to ancestries 1, 2 and 3. The arrows in the ternary plot and the lines between barplots representing a mating couple are coloured with the colour corresponding to the predominant ancestry in both male and female, and they are depicted in black if it differs within between them.
We focus on the case of three-way ancestry, a model that describes the admixture of the pop- The mean across individuals summarises the 4-dimensional matrix in a population 3-dimensional matrix, which is used as the input of the neural network. The neural network has four fully connected layers that split in a branch for each parameter, each one made of a last hidden layer connected to the output layer.

178
To infer all parameters in our model, we train a deep neural network for each popualtion. By explor-179 ing the entire parameter space of and parameters (and 10 for the Two Pulses model), 180 we feed simulated continuous ancestry tract length profiles to a deep neural network consisting 181 of fully-connected layers (Figure Figure 2C). 182 The network sufficiently learn the weights for all parameters without overfitting over 40 epochs, 183 as shown by the decay of the loss function (mean squared error) (Figure 3-Figure Supplement 1). 184 We observe low mean squared error on the testing set for all parameters (Figure 3). Similarly, we

Mean Squared Error
Gene flow rate at gen 10 Assortative Mating Sex Bias One pulse Two pulses One pulse Two pulses One pulse Two pulses One pulse Two pulses One pulse Two pulses Two pulses Two pulses Two pulses One pulse Two pulses M e a n M e a n G F R g e n 1 0 ,1 G F R g e n 1 0 ,2 G F R g e n 1 0   The trained network exhibits better predictions for parameters than for and 10 188 parameters across all ancestries, populations, and migration models. Interestingly, the higher com-189 plexity of the Two Pulses migration model does not produce higher mean squared error or lower 190 2 values for any of the tested parameters. In fact, the mean of the mean squared error for the Two

191
Pulses model is only marginally higher than the mean of the mean squared error for the simpler 192 One Pulse model. (Figure 3, Figure 3- Figure Supplement 1, Figure 3-Figure Supplement 2). 193 The Native American and sub-Saharan genetic ancestries respectively shape the 194 mating probabilities in Latin American and African American populations 195 We sought to test the occurrence and extent of assortative mating and sex bias in the admixing  Figure 4A). 213 In the Two Pulses model, we allow for an additional migration pulse at generation 10 and we

233
We tackle the analysis of the ancestry-related non-random mating driven by social structure through 234 a mating model, which allows us to globally evaluate the forces that modulate population structure 235 9 of 19 but also study the effect of these population dynamics at the individual level. Our results show ev-236 idence of ancestry-related sex bias and assortative mating in American admixed populations. In

237
Latin Americans the proportion of Native American ancestry of men and women shape the mating 238 probabilities and, therefore, the genetic structure of the population. By contrast, in African Amer-239 icans, the sub-Saharan African ancestry modulates mating. Below, we discuss the significance of 240 these results and the importance of our approach in studying social stratification. Finally, we eval-241 uate the performance of our pipeline in discerning between migration and assortative mating and 242 we explore how next steps could incorporate more complex admixture scenarios.

243
Social stratification by racial and gender hierarchies 244 The ultimate aim of our approach is to infer social stratification in the Americas from the analysis 245 of the population genetic structure. To this end, we model ancestry-related assortative mating 246 mediating the extent of the effect of ancestry-related sex bias. We infer how they shape the mating 247 dynamics of the population and we evaluate how these two dimensions affect at the individual 248 level, thanks to the mating model framework. 249 We have defined this model from an intersectional perspective, which understands that racial,  In conclusion, an interdisciplinary approach that incorporates up-to-date insights from social sci-338 ences is essential to conceptualise population genetic models that aim to evaluate genetic struc- as a realisation of the event , between a female and male . 355 We calculate the probability of mating between a female and male as where ( | ⃗ ) and ( | ⃗ ) are the probabilities of either female or a male to start a mating event 357 that will have one child as the outcome. and: where is the vector of the expected means of the ancestry proportion differences (i.e ( { } − { } ) 378 for ancestry ) which define for each ancestry. The diagonal of Σ is the vector 2 , where the 379 variance 2 ( { } − { } ) is inversely proportional to for each ancestry (Figure 1). 380 The sum of the mean vector (i.e. the sum parameters for all the ancestries) is zero ( ∑ where 1,2 can be defined by the variances of the three dimensional multivariate normal distri-388 bution, including 2 3 : The mating model for three ancestries is set by sex bias for ancestries 1 and 2 ( 1 , 2 ) and 390 the assortative mating for ancestries 1, 2 and 3 ( 1 , 2 , 3 ). Therefore, for ancestries  The continuous ancestry tract lengths profile 415 The continuous ancestry tract length profile is a statistic that is commonly used to date admixture 416 events, assuming random mating. However, here we exploited the information summarised with 417 this statistic to assess the gene-flow related to both migration and assortative mating. In addition, 418 we included the population continuous ancestry tract length profiles of both autosomes and X 419 chromosome to provide to the neural network information that can be used to predict sex bias.

420
While both sexes contribute equally to the autosomal genepool, females and males contribute 421 2/3 and 1/3, respectively, to the X chromosome genepool. This asymmetric inheritance between 422 autosomes and X chromosome combined with local ancestry information is highly informative of 423 the complexity of sex-biased admixture histories Goldberg and Rosenberg (2015). 424 We calculated the continuous ancestry tract length profile on simulated data, for each individ-  We considered a tract the concatenation of contiguous 0.1 cM fragments with more than 0.9 449 of posterior probability of being inherited from one of the three ancestries. We used the same

475
Acknowledgments 476 We thank Sarah Abel and Andres Ruiz-Linares for carefully reading the manuscript and for their 477 insightful discussion regarding the implications of the findings. We would also like to thank Flora 478 Jay for her valuable feedback on the methods.

Mean Squared Error
Gene flow rate at gen 10 Assortative Mating Sex Bias One pulse Two pulses One pulse Two pulses One pulse Two pulses One pulse Two pulses One pulse Two pulses Two pulses Two pulses Two pulses One pulse Two pulses M e a n M e a n G F R g e n 1 0 ,1 F R g e n 1 0 ,2 F R g e n 1 0