Abstract
The distribution of deleterious genetic variation across human populations is a key issue in evolutionary biology and medical genetics. However, the impact of different modes of subsistence on recent changes in population size, patterns of gene flow, and deleterious mutational load remains to be fully characterized. We addressed this question, by generating 300 high-coverage exome sequences from various populations of rainforest hunter-gatherers and neighboring farmers from the western and eastern parts of the central African equatorial rainforest. We show here, by model-based demographic inference, that the effective population size of African populations remained fairly constant until recent millennia, during which the populations of rainforest hunter-gatherers have experienced a ∼75% collapse and those of farmers a mild expansion, accompanied by a marked increase in gene flow between them. Despite these contrasting demographic patterns, African populations display limited differences in the estimated distribution of fitness effects of new nonsynonymous mutations, consistent with purifying selection against deleterious alleles of similar efficiency in the different populations. This situation contrasts with that we detect in Europeans, which are subject to weaker purifying selection than African populations. Furthermore, the per-individual mutation load of rainforest hunter-gatherers was found to be similar to that of farmers, under both additive and recessive modes of inheritance. Together, our results indicate that differences in the subsistence patterns and demographic regimes of African populations have not resulted in large differences in mutational burden, and highlight the role of gene flow in reshaping the distribution of deleterious genetic variation across human populations.
Significance Statement The last 100,000 years of human history have been characterized by important demographic events, including population splits, size changes and gene flow, potentially affecting the distribution of deleterious mutations across populations and, ultimately, disease risk. We sequenced the exomes of various African rainforest hunter-gatherer and sedentary farming populations, reconstructed their demographic histories, and explored the effects of differences in lifestyles and demography on mutational load. We found that the recent demographic histories of hunter-gatherers and farmers differed considerably, with population collapses for hunter-gatherers and population expansions for farmers. However, these contrasted pasts have not translated into major differences in the efficiency of purifying selection against deleterious alleles, leading to a similar mutational burden in the two groups.
Introduction
Human populations have undergone radical changes in size over the last 100,000 years, due to various range expansions, bottlenecks, and periods of rapid growth (1-3). An understanding of the ways in which these demographic changes have affected the ability of human populations to purge deleterious variants is crucial for the dissection of the genetic architecture of human diseases (4-8). Theoretical population genetic studies have shown that most new mutations resulting in amino acid substitutions are rapidly culled from populations through purifying selection (9), at a rate dependent on both the effective population size (Ne) and the distribution of selection coefficients (s) (10, 11). Indeed, new mutations with deleterious effects are less efficiently purged from populations with a small Ne, in which genetic drift has a stronger effect than selection.
Recent empirical studies based on population genomic-scale datasets have revealed an unsuspectedly large burden of rare and low-frequency amino acid-altering variants in the human genome (12-16). Population genetic studies have also reported differences in the number, frequency and distribution of putatively deleterious variants across populations, and it has been suggested that these differences result from demographic events, including explosive growth, founder events, bottlenecks, and inbreeding (8, 17-24). For example, the higher proportion of deleterious variants detected in non-Africans has been interpreted as the result of a decrease in selection efficacy after the out-of-Africa bottleneck and recent explosive growth (17, 24). It has also been suggested that the mutation load, i.e., the difference between the theoretical optimal fitness and actual fitness of a population, increases with distance from Africa due to serial founder effects (19).
By contrast, other studies have reported no detectable differences between populations for summary statistics approximating the per-individual mutation load, consistent with an equal efficiency of purifying selection across populations, at least for an additive dominance (semi-dominant) model (25, 26). Several factors underlie these apparently conflicting results, including differences in the statistics used to evaluate selection efficacy and to approximate mutation load, the methods used to assess the significance of population differences, and the choice of predictive algorithms for defining deleteriousness (27-30). Most studies have compared statistics across populations without integrating explicit models of demography and selection, despite the availability of methods for estimating the distribution of fitness effects of new nonsynonymous mutations (DFE) in a population, accounting for its demographic history (31, 32).
In addition to these methodological considerations, it remains largely unknown how differences in cultural practices between populations have affected their demographic regimes, and the efficiency of purifying selection. About 5% of human populations currently rely on modes of subsistence not associated with recent population growth, such as hunting and gathering (33). Africa has the largest group of hunter-gatherer populations, the rainforest hunter-gatherers (historically known as “pygmies”), who have traditionally lived in small, mobile groups scattered across the central African equatorial forest (34). By contrast, the neighboring sedentary populations practice agriculture, and are known to have recently expanded across sub-Saharan Africa (35). These contrasting lifestyles are associated with differences in the demographic histories of these populations, but some aspects of the genetic and demographic histories of rainforest hunter-gatherers and farmers remain unclear, given the highly heterogeneous nature of the genetic data used and the limited sample sizes included in earlier studies (36-41). Furthermore, signatures of recent demographic change can be detected only if the full site frequency spectrum is obtained from sequencing data for relatively large sample sizes, a condition not met by any of the previous studies of these populations.
In this study, we aimed to understand how differences in demographic history might have affected the efficacy of purifying selection and the corresponding mutational load in populations with different subsistence strategies. We generated 300 high-coverage exomes from various rainforest hunter-gatherer and farming populations from central Africa, a dataset that was supplemented with 100 exomes from a population of European descent. Using a coalescent-based composite likelihood approach, we first estimated the demographic parameters characterizing these populations in terms of changes in population size, split times and gene flow. We then compared the efficiency of selection to purge deleterious alleles and the estimated deleterious mutational load across populations, with the aim of improving our understanding of the effects of traditional lifestyle and demographic history on the mutational burden of contemporary human populations.
Results
Population Exome Sequencing Dataset
We constituted a unique collection of central African populations with different historical lifestyles: 100 Baka mobile rainforest hunter-gatherers (wRHG) and 100 Nzebi and Bapunu sedentary farmers (wAGR) from western central Africa (i.e,. Gabon and Cameroon), and 50 BaTwa rainforest hunter-gatherers (eRHG) and 50 BaKiga farmers (eAGR) from eastern central Africa (i.e., Uganda) (Fig. 1A and Table S1). We first investigated the genetic structure, and potential substructure, of the study populations, using genome-wide SNP data for the 300 individuals (SI Appendix). ADMIXTURE (42) and principal component analysis (PCA) (43) separated African populations on the basis of mode of subsistence (AGR vs. RHG), before splitting RHG into western and eastern groups (Figs. 1B and S1 and S2) (36, 38, 39). The inbreeding coefficient (FIS) distributions and additional ADMIXTURE analyses provided no evidence of internal substructure within the groups studied (Figs. S3 and S4).
The studied populations were African rainforest hunter-gatherers (RHG), neighboring agriculturalists (AGR) and Europeans (EUR). (A) Location of the sampled populations; (B) Estimation of ancestry proportions with the clustering algorithm ADMIXTURE using the SNP array data; (C) Watterson’s estimator θW; (D) Pairwise nucleotide diversity θπ; (E) Tajima’s D. (C, D, E) All neutrality statistics were calculated with exome sequencing data for 4-fold degenerate synonymous sites and confidence intervals were obtained by bootstrapping by site. Significance was assessed between wAGR/wRHG and eAGR/eRHG in (C) and (E), and for all comparisons in (D). ***: P-value < 10-3.
We then performed whole-exome sequencing for the entire collection of 300 unrelated individuals at high coverage (mean depth 68x). We identified 406,270 quality-filtered variants, including 67,037 newly identified variants (Table S1, Figs. S5 and S6). For calibration of the demographic inferences for African populations, and comparison with a well-studied population for mutation load (17, 25, 26), we supplemented our dataset with high-coverage exome sequences from 100 Belgians of European ancestry (EUR) generated with the same experimental and analytical procedures (44), yielding a final dataset of 488,653 SNPs.
We first sought to obtain a broad view of diversity in RHG and AGR populations, by calculating neutral diversity statistics for synonymous variants (4-fold degenerate variants). Watterson’s θ (θW) was significantly higher in the western and eastern AGR populations than in the RHG populations (P-value < 10-3; Fig. 1C and Table S2), due to the larger proportion of low-frequency variants, as demonstrated by the significantly more negative Tajima’s D value obtained (P-value < 10-3 for both comparisons; Fig. 1E and Table S2). However, western RHG had the highest pairwise nucleotide diversity θπ (P-value < 10-3 for all comparisons; Fig. 1D and Table S2), suggesting a large historical Ne for this population. These results indicate that the African populations studied have similar levels of genetic diversity, regardless of their historical mode of subsistence, but their different allele frequency distributions suggest contrasting demographic histories.
Model-based Inference of Population Divergence Times, Size Changes and Gene Flow
We then evaluated how differences in subsistence strategies between RHG and AGR populations have affected their demographic histories. Demographic parameters were estimated by fitting models incorporating all the populations studied, including EUR, into pairwise (2D) site frequency spectra (SFS), with the coalescent-based composite likelihood approach fastsimcoal2 (45). Going forward in time, we assumed an early population size change for the ancestor of all populations, followed by size changes coinciding with population splits, and an additional population size change for EUR (Fig. S7). Because the chronology of divergences between these populations remains unknown, we formulated three branching models, each assuming that a different population (EUR, RHG or AGR) was the first to split off from the remaining groups (i.e., EUR-first, RHG-first, AGR-first, Fig. 2). Furthermore, admixture between wRHG and wAGR, eRHG and eAGR, and between eAGR and EUR has been documented (Fig. 1B) (41, 46, 47). We therefore estimated parameters by considering two epochs of continuous migration between population pairs, allowing for asymmetric gene flow.
(A) EUR-first branching model, in which the European population diverged from African populations before the divergence of the ancestors of RHG (aRHG) and AGR (aAGR), (B) RHG-first branching model, in which the aRHG was the first to diverge from the other groups, and (C) AGR-first branching model, in which aAGR was the first to diverge from the other groups. We assumed an ancient change in the size of the ancestral population of all humans (ANC). We assumed that each subsequent divergence of populations was followed by an instantaneous change in effective population size (Ne). We also assumed that there were two epochs of migration between the following population pairs: wAGR/aAGR and wRHG/aRHG, eAGR/aAGR and eRHG/aRHG, and EUR and eAGR/aAGR. The figure labels correspond to the estimated parameters of the model (Table S4). Bold arrows indicate 2Nm > 1.
The three branching models produced non-significant differences in likelihood (P-value > 0.05 for all comparisons; Table S3), and all three models fitted both observed marginal 1D SFS and FST values well (Figs. S8 and S9). These models consistently provided similar estimates for key demographic parameters (Table S4). Our results suggest that the ancestors of the contemporary RHG, AGR and EUR populations diverged between 85 and 140 thousand years ago (kya), from an ancestral population that underwent demographic expansion between 173 and 191 kya (TANC) (Fig. 2). After the initial population splits, the Ne of AGR and RHG (NaAGR and NaRHG) remained within a range extending from 0.55 to 2.2 times the ancestral African Ne (NHUM), whereas EUR (NaEUR) experienced a decrease in Ne by a factor of three to seven (Tables S4 and S5). The ancestors of the wRHG and eRHG populations diverged 18 to 20 kya (TRHG), and underwent a decreased in Ne by a factor of 3.8 to 5.7 for the wRHG (NwRHG) and 7.1 to 11 for the eRHG (NeRHG), regardless of the branching model considered. The ancestors of the AGR (NaAGR) split into western and eastern populations 6.7 to 11 kya (TAGR), and underwent a mild expansion, by a factor of 2.3 to 3.1 for the wAGR (NwAGR) and 1.2 to 2.2 for the eAGR (NeAGR). The EUR population experienced a 7.1-to 8.3-fold expansion (NEUR) 12 to 22 kya (TEUR). The 95% confidence intervals of estimated parameters were generally wide (Table S4), owing to the complexity of the models and the limited number of 4-fold degenerate synonymous sites used. Nevertheless, we obtained significant support, based on ratios of ancestral to current Ne, for a recent bottleneck in both wRHG and eRHG populations, and for a recent expansion of wAGR and EUR populations, for all branching models (i.e., NaRHG/NRHG > 1, and NaAGR/NwAGR and NaEUR/NEUR < 1; P-value < 0.05; Table S5).
The estimated migration parameters were also mostly similar between branching models (Table S6). We accounted for differences in Ne between the recipient populations, by comparing the effective strength of migration (2Nm) between populations exchanging migrants. For the ancient migration epoch, we found evidence for migration between the ancestors of RHG and AGR (Table S4), although we were unable to determine its direction and strength with confidence. During the recent epoch, migration between RHG and AGR was confidently inferred to be strong and mostly symmetric across models (2Nm > 17 and 8 in western and eastern groups, respectively). Finally, we inferred that 2Nm was larger for migration from EUR to AGR than the opposite direction, for both migration epochs (from 7-to 120-fold stronger across models and epochs; Table S4), consistent with back-to-Africa migrations (2, 46, 47).
Together, our analyses support a demographic history in which a large ancestral population of RHG continuously exchanged migrants with the ancestors of AGR until about 10,000-20,000 years ago, when the ancestors of the RHG and AGR populations experienced bottlenecks and expansions, respectively, and migration between these two groups increased markedly.
Comparing the Efficacy of Selection Across Populations
We then explored whether the different inferred demographic histories of RHG and AGR populations affected the efficiency with which deleterious alleles were purged by selection. We used a model-based inference of the distribution of fitness effects (DFE) of new nonsynonymous mutations across populations, using DFE-α (ver. 2.15), which explicitly incorporates models of nonequilibrium demography (48). We first fitted a three-epoch demographic model to the synonymous SFS per population (Table S7), yielding broadly consistent results with those generated by fastsimcoal2 based on 1D SFS (Table S8). We then fitted a gamma distribution DFE model to the nonsynonymous SFS, accounting for demography. The fit of the gamma DFE model to the data was good (Spearman’s ρ > 0.8 between the expected and observed SFS; Fig. S10).
The inferred parameters indicated an L-shaped DFE with a significantly lower mean, E(Nes), in Europeans than in Africans, but no significant difference between African populations (Table S7). However, it is difficult to estimate the E(Nes) parameter accurately, given that highly deleterious mutations with a strong impact on this parameter are unlikely to be detected, even in large samples (49). We therefore also summarized the DFE by computing the proportion of mutations assigned by the inferred gamma distribution into four Nes ranges (0-1, 1-10, 10-100 and >100, corresponding to neutral, weakly, moderately and strongly deleterious mutations, respectively), a summary much more accurately estimated by DFE-α (48). We estimated that 38.4% and ∼25% of new mutations were weakly-to-moderately deleterious (Nes 1-100) in Europeans and Africans (Fig. 3), respectively, consistent with weaker selection in Europeans (8, 17, 24, 28). By contrast, the differences in the density assigned to each Nes category were more limited for the RHG and AGR populations, and these differences were not statistically significant (Fig. 3). In particular, we obtained almost identical results with DoFE, an alternative method that does not assume an explicit demographic model and fits nuisance parameters to the synonymous SFS (50) (Table S7). Together, our findings indicate that the contrasting demographic regimes of RHG and AGR have not differentially affected the efficacy of selection in these populations.
The inferred fraction of new mutations in different bins of selection strength (Nes = 0-1, 1-10, 10-100, >100) with DFE-α, assuming a three-epoch demography fitted for each population separately. We used non-CpG sites and confidence intervals were calculated by bootstrapping by site 100 times.
Estimating Differences in Mutational Burden Between Populations
The similarity of selection efficacy across African populations suggests that these populations have similar deleterious mutation loads. However, the models used by DFE-α assume that deleterious mutations are additive (i.e., semi-dominant) and do not consider individual genotype information, which is crucial for the estimation of recessive mutation load (25, 51). We thus examined the per-individual distribution of heterozygous (Nhet) and homozygous (Nhom) variants and their weighted sum (number of derived alleles, Nalleles=Nhet+2×Nhom; (19, 25)), these last two variables being monotonically related to the individual load under recessive and additive models of dominance, respectively (25, 30). These statistics were not affected by sequencing quality (Fig. S11). The data were partitioned into variants presumed to evolve neutrally (synonymous) and variants likely to be under selective constraints (nonsynonymous variants, variants with Genomic Evolutionary Rate Profiling-Rejected Substitution [GERP RS] scores greater than 4, and loss-of-function mutations), as suggested by their site frequency spectra (Fig. S12). We also performed simulations to predict Nhom and Nalleles under neutrality across our three branching models (Fig. S13). For both observed genotype counts in all site classes and simulated counts, we used ratios of genotype counts between populations to facilitate comparisons and to make use of the shared history of populations to increase precision.
Population differences in Nalleles values observed in site classes with large numbers of variant sites (synonymous, nonsynonymous and GERP RS >4) and simulated under neutrality did not exceed 2% and were not significantly different for any of the population pairs examined (Fig. 4 and Table S9). For LOF mutations, for which there were fewer variant sites, the Nalleles ratio between populations did not exceed 8% and were not significantly different from one. Conversely, Nhom was more than 20% higher in Europeans than in Africans, for both simulated neutral and observed sites in all classes, probably due to the long-term bottleneck experienced by European populations. Much smaller differences in Nhom were observed between RHG and AGR populations; nonsynonymous Nhom was about 2-4% lower in wRHG than in other Africans, and ∼1-4% higher in eRHG than in other African populations, supporting the view that the recent demographic events experienced by these African populations had no major effect on their additive and recessive mutation loads.
Between-population ratios of the per-individual counts of (A) homozygous genotypes (Nhom) and (B) numbers of alleles (Nalleles=Nhet+2×Nhom). These ratios were obtained through simulations assuming neutrality under the demographic model with the highest likelihood (RHG-first) (panel “Simulated neutral”), and were computed with the observed data for several site classes (panels “Synonymous”, “Nonsynonymous”, “GERP RS >4” and “LOF”). Confidence intervals were calculated by dividing the SNP data into 1000 blocks and carrying out bootstrap resampling of sites 1000 times. ***: P-value < 10-3.
We then investigated the extent to which the similarity of Nhom between RHG and AGR populations could be attributed to the strong gene flow inferred between them. We performed simulations under our best-fitting demographic models, with migration between populations set to zero. The ratio of Nhom between RHG and AGR populations was much higher for simulations without migration (Fig. S14) than for simulations with migration (Fig. S13), with the strongest effect being observed for eRHG (ratio of Nhom for eRHG relative to eAGR of 1.1 in the absence of migration versus 1.025 with migration). Thus, our simulations suggest that the Nhom for RHG would be higher if there were no gene flow between these populations and AGR populations. Moreover, as RHG individuals have various degrees of AGR ancestry (Fig. 1B), it is possible to test empirically whether the proportion of AGR ancestry is related to proxies of mutation load. We found a significant negative correlation between the nonsynonymous Nhom of eRHG individuals and their estimated AGR ancestry (P-value = 4×10-3; Fig. S15), suggesting that admixture can effectively reduce the recessive mutation load.
Comparing the Genomic Distribution of Deleterious Variants
Despite the observed similarity between populations in terms of per-individual estimates of mutation load, we explored possible differences between populations in the genomic distribution of putatively deleterious alleles, due to variable levels of inbreeding, leading to extended runs of homozygosity (ROH) (Fig. S16; SI Appendix), or fluctuating selective pressures across gene functional categories. We found that Nhom and Nalleles were higher in ROH than in other regions of the genome, but no significant differences in these patterns were found between populations (Fig. S17). However, the lower rates of recombination for ROH (P-value < 10-3; Fig. S18) suggest that the accumulation of deleterious variants in ROH is not due to inbreeding (52), but instead to the well-documented ‘Muller’s ratchet’ process (53), resulting in a lower efficiency of selection in non-recombining genomes.
We also found that some Gene Ontology (GO) categories accumulated higher densities of amino acid-altering variants (SI Appendix), but the heterogeneous distribution of Nhom and Nalleles across GO categories was virtually identical in RHG, AGR and EUR populations (Figs. S19 and S20). However, some clinically relevant, disease-related variants, most of them initially discovered in Europeans, were observed at different, measureable frequencies across populations (SI Appendix, Table S10). Together, our analyses indicate that, despite marked differences in the demographic regimes of the African populations examined, these populations have similar mutational burdens, regardless of the average dominance coefficient and the genomic location of the deleterious mutations.
Discussion
Our study sheds new light on the demographic parameters characterizing the history of African rainforest hunter-gatherers and farmers, and shows that their contrasting demographic histories have not led to major differences in the burden of deleterious variants. Over the years, a number of demographic models have been proposed for these populations, generating conflicting results (36-41, 54). Several studies have dated the divergence of the ancestors of RHG and AGR to ∼60 kya (36-39), but a recent report, based on 16 whole-genome sequences, inferred a more ancient divergence, 90 to 155 kya, but had only limited power for the estimation of recent population size changes and gene flow due to the small sample size (40). Our results suggest that RHG and AGR populations diverged between 97 and 140 kya, consistent with an ancient divergence between these human groups (40). The inclusion of Europeans in our analyses made it possible to test different models for the chronology of major population splits. The similar likelihoods obtained with all models suggest that the ancestors of the RHG, AGR and EUR populations diverged from each other at around the same time (∼85-140 kya), during a period of major climate change (e.g., megadroughts dated between 75 and 135 kya) (55) that probably promoted population isolation and ancient structure on the African continent (40, 54).
Several other important conclusions about the recent demographic history of African populations can be drawn from our analyses, owing to the combined use of sequencing data from large sample sizes and refined demographic modeling. Regardless of the branching model considered, our inferences indicate that the effective population size of RHG was at least as large as that of the ancestors of AGR for most of their evolutionary past. RHG groups were, therefore, probably more demographically successful, or more interconnected by gene flow, in the past than in more recent millennia, as has also been suggested for the Khoe-San hunter-gatherers of southern Africa (56). More recently, RHG and AGR populations have undergone very different demographic events, with RHG populations experiencing major size reductions and AGR populations mild expansions, accompanied by a marked increase in gene flow between them. Together, our results support the notion that the traditional mode of subsistence of human populations is correlated with differences in demographic success (33, 36, 37, 41, 57). They extend previous findings by providing the precise parameters characterizing the history of African RHG and AGR over the last 150,000 years.
The impact of population growth and decline on the efficiency of purifying selection and mutational burden in humans has been the subject of intense research over the last few years (8, 16-24). We directly inferred parameters for selection efficiency from the distribution of fitness effects of new nonsynonymous mutations, rather than by simulation-based approximation (17, 26). Our results show that Europeans experienced, on average, weaker purifying selection than Africans, but that the proportions of mutations assigned to different classes of fitness effects did not differ significantly between the African RHG and AGR populations, in either the western or eastern groups. Thus, despite the strong collapse of RHG populations, our findings suggest that the recent nature of this demographic event, together with the historically large Ne of these populations, has resulted in a selection efficiency similar to that estimated for the expanding farmers.
We also compared the number of deleterious alleles per individual across populations – a statistic that has been shown to be monotonically related to the additive mutation load – accounting for evolutionary sampling variance by resampling sites (29, 30). We found negligible differences in the per-individual number of deleterious alleles between African and European populations. Furthermore, our neutral simulations, which included a strong out-of-Africa bottleneck and a recent expansion for Europeans, reproduced the empirical distribution of synonymous alleles per individual across populations. This result validates our demographic inferences based on independent aspects of the data (i.e., the SFS), and suggests that changes in effective population size alone can account for the observed relative counts of alleles across populations. Additionally, the confidence intervals of ratios of per-individual genotype counts obtained from simulations matched those obtained by resampling sites (Table S9), indicating that experimental variation due to factors, such as sequencing coverage, had not inflated the uncertainty of our estimates. Together, these results are consistent with theoretical predictions that the demographic changes experienced by African and European populations are too recent to have had an impact on the mutational load of these populations under an additive model of dominance (25).
In a scenario in which deleterious variants are partially recessive, homozygous sites would be expected to make the major contribution to mutation load. The number of homozygous functional sites in Europeans was significantly larger than that in Africans, consistent with previous findings (28), but this excess would not necessarily result in a higher deleterious load in Europeans (30). Our new empirical data show that African RHG and AGR populations differ only slightly in terms of the number of homozygous functional sites, suggesting equivalent mutational loads in these populations, regardless of the dominance model assumed. Our study revealed differences in the demographic regimes of African populations with different subsistence strategies, but these changes in population size occurred too recently for a difference in mutational burden between populations to have become evident. Furthermore, the strong gene flow inferred between RHG and AGR populations in recent millennia may have attenuated the effect of the strong bottleneck experienced by hunter-gatherers on the efficiency of purifying selection and mutation load.
Extensive exchanges of migrants between human populations have often occurred (58), and future studies on the dynamics of mutation load in admixed populations should improve our understanding of the impact of gene flow on the distribution of deleterious variants, and, ultimately, the genetic architecture of human diseases.
Methods
Population and Individual Selection
In total, we included 317 individuals from the AGR and RHG populations of western and eastern central Africa, and 101 individuals of European ancestry (44) (SI Appendix, Table S1). Informed consent was obtained from all participants in this study, which was overseen by the institutional review board of Institut Pasteur, France (2011-54/IRB/6), the Comité National d’Ethique du Gabon, Gabon (No. 0016/2016/SG/CNE), the University of Chicago, United States (IRB 16986A), and Makerere University, Uganda (IRB 2009-137).
Exome Sequencing
We generated whole-exome sequencing data for African samples, which were processed together with European samples (44), with the Nextera Rapid Capture Expanded Exome kit, which delivers 62 Mb of genomic content. We mapped read pairs onto the GRCh37 human reference genome with BWA v.0.7.7 (59), and exome data were processed with GATK v.3.5 (60). Stringent quality control filters were applied, and we obtained a final dataset of 400 high-quality whole-exome sequences (SI Appendix).
Variant Annotation
We defined synonymous and nonsynonymous sites as being, 4-fold and 0-fold degenerate, respectively, according to the genetic code. Nonsynonymous sites were further annotated with GERP RS scores (61). Loss-of-function (LOF) mutations were identified with the Ensembl Variant Effect Predictor (VEP version 84), and filtered for false positives with the Loss-of-Function Transcript Effect Estimator plugin (LOFTEE).
Demographic Inference
Demographic parameters were estimated with the coalescent maximum-likelihood method fastsimcoal2 (45). For all point estimates, we performed 500,000 simulations, 30 cycles of the Expectation-Maximization (EM) algorithm and 100 replicate runs with different random starting values. Confidence intervals were obtained by bootstrapping by site 100 times.
Distribution of Fitness Effects
We inferred the distribution of fitness effects (DFE) of new nonsynonymous mutations with two independent methods implemented in DFE-α v.2.15 (48) and DoFE (50). Confidence intervals were calculated by bootstrapping by site 100 times.
Estimating Mutation Load
Mutation load was approximated using the number of putatively deleterious mutations per individual in several functional categories: nonsynonymous variants, variants with GERP RS greater than 4 (61), and LOF mutations. We counted the number of deleterious alleles in each individual as Nalleles=Nhet+2×Nhom, with Nhet and Nhom corresponding to the numbers of heterozygous and homozygous genotypes, respectively (19, 25). The significance of population differences in per-individual genotype counts was assessed by estimating the confidence interval of ratios between population pairs. Confidence intervals were calculated by paired bootstrapping: we split the SNP data into 1000 blocks and resampled, with replacement, 1000 times. This approach takes into account the variance introduced by demographic processes (29, 30).
Additional methods for data analyses are presented in SI Materials and Methods
Acknowledgments
We thank all of the participants for providing the DNA samples used in this study. We thank the Paleogenomics and Molecular Genetics Platform of the MNHN-Musée de l’Homme for technical assistance in DNA sample preparation. We thank Guillaume Laval and Laurent Excoffier for helpful discussions, and Nicolas Joly for help in computational resources. This work was supported by the Institut Pasteur, the Centre National de la Recherche Scientifique (CNRS), and the Agence Nationale de la Recherche (ANR) Grant “AGRHUM” (ANR-14-CE02-0003-01).