Different historical generation intervals in human populations inferred from Neanderthal fragment lengths and patterns of mutation accumulation

After the main out-of-Africa event, humans interbred with Neanderthals leaving 1-2% of Neanderthal DNA scattered in small fragments in all non-African genomes today1,2. Here we investigate the size distribution of these fragments in non-African genomes3. We find consistent differences in fragment length distributions across Eurasia with 11% longer fragments in East Asians than in West Eurasians. By comparing extant populations and ancient samples, we show that these differences are due to a different rate of decay in length by recombination since the Neanderthal admixture. In line with this, we observe a strong correlation between the average fragment length and the accumulation of derived mutations, similar to what is expected by changing the ages at reproduction as estimated from trio studies4. Altogether, our results suggest consistent differences in the generation interval across Eurasia, by up to 20% (e.g. 25 versus 30 years), over the past 40,000 years. We use sex-specific accumulations of derived alleles to infer how these changes in generation intervals between geographical regions could have been mainly driven by shifts in either male or female age of reproduction, or both. We also find that previously reported variation in the mutational spectrum5 may be largely explained by changes to the generation interval and not by changes to the underlying mutational mechanism. We conclude that Neanderthal fragment lengths provide unique insight into differences of a key demographic parameter among human populations over the recent history.

admixture event private to Asians has also been proposed 6,7 because Asian genomes carry 41 larger amounts of Neanderthal sequence compared to European genomes. However, Asian 42 genomes will also have more archaic fragments if a single gene flow common to Eurasians 43 was followed by dilution of Neanderthal content in Europeans due to subsequent admixture 44 with a population without Neanderthal admixture 2,8 . 45 46 An independent source of information for estimating differences in generation time is the rate 47 and spectrum of derived alleles accumulating in genomes over a given amount of time 9,10 . 48 Pedigree studies have shown that the yearly mutation rate slightly decreases when the 49 generation time increases because the mutational burst in the germline before puberty 50 represents a high proportion of new mutations in young parents 4 . Moreover, the relative 51 proportion of different mutational types depends on both the paternal and maternal age at 52 reproduction. This has been exploited to estimate differences in generation intervals for males 53 and females between Neanderthals and humans 9 . 54 55 Here we investigate archaic fragment length distributions among extant non-Africans 56 genomes from the Simon Genome Diversity Project (SGDP) 3 and high coverage ancient 57 genomes. We report strong evidence for a single Neanderthal admixture event shared by all 58 Eurasian and American individuals, enabling us to make use of archaic fragment length 59 distributions as a measure of generation intervals since admixture. Differences in estimated 60 generation intervals are mirrored by concordant patterns of mutation accumulation, and 61 suggest significant differences by up to 20% in the generation time interval experienced by 62 different Eurasian regions since their splits. 63 the Skov et al 11 approach and less likely to carry SNPs that directly classify them as closest 101 to the Vindija Neanderthal. 102 To compare shared fragments in terms of length, we only consider fragments in an East Asian 103 that overlap with regions in the genome of West Eurasians that contain archaic sequence and 104 vice versa (Extended Figure 1d, Extended Figure 2c, S6). We observe that shared fragments 105 in East Asian individuals are on the average 1.13 fold longer than in West Eurasians (P value 106 < 1e-5, permutation test, S2, Fig. 2b, Table S4)  Data2_mutationspectrum.txt). This was done by first removing all derived alleles observed in 151 the Sub Saharan Africa outgroup, excluding those individuals with detectable West Eurasian 152 ancestry 3 . Furthermore, we masked all genomic regions with evidence of archaic introgression 153 in any individual since archaic variants would not be found in Sub-Saharan genomes and they 154 would affect our results because they accumulated under a different mutational process 9 . 155 Masking those regions also ensures that this analysis is independent of the archaic fragment 156 length analysis above. After these procedures, we were left with ~20% of the callable genome 157 (S7). 158 159 Fig. 3a shows that the rate of accumulation of derived alleles is significantly different among 160 groups (P value = 2.8e-4, permutation test, S2, Table S6). West Eurasia has accumulated 161 1.09% more derived alleles than East Asia (P value = 1.18e-3, permutation test, S2) since the 162 Out-of-Africa event. However, this difference in the accumulation of derived alleles could only 163 have happened when West Eurasia and East Asia were separated, which is only a part of the 164 time since the Out-of-Africa (Fig. S3). If we assume >60,000 years for the out-of-Africa and a 165 West Eurasia/East Asia split of <40,000 years 2,19-21 the difference in the rate of derived allele 166 accumulation is at least 60,000/40,000*1.09%=1.64% while West-Eurasia and East Asia were 167 apart (SI8). Using the pedigree-based estimate of the relationships between mean parental 168 age and mutation rate per generation 4 (SI8), we estimate that this difference corresponds to a 169 2.68 or 3.39 years shorter generation interval in West Eurasia if East Asian mean generation 170 time was 28 or 32 years respectively (SI8). These are lower bounds of the inferred differences 171 in generation intervals since the difference between out-of-Africa and population split times is 172 minimized. 173 6 The age of parents at conception, and hence generation time, also impacts the frequency of 175 which types of single nucleotide mutations occur 4 . Thus, a shift in generation time is predicted 176 to change the spectrum of new mutations 9,10 and partially explain differences in mutation 177 spectrum described among human populations 5,22,23 . We calculated the relative frequencies 178 of the six different types of single nucleotide mutations depending on their ancestral and 179 derived allele (S7, Fig. S2, Table S7) and related that to the average Neanderthal fragment 180 length for each individual (Fig. 3b). We observe significant associations with average archaic 181 fragment lengths for all six types (Table S8). We further subdivide C>T mutations in three 182 types: CpG>TpG which present a distinct mutational process 24 (Fig. S2, Table S7), TCC>TTC, 183 which is in great excess in European genomes and has been studied as a population-specific 184 mutational signature 5,22 (Fig. S2, Table S7) and the rest, denoted as C>T' (Fig. 3b, Table S8). 185 We find that the frequency of CpG>TpG transitions depends the least on fragment length. 186

187
To investigate whether these correlations could be due to differences in generation time 188 between geographical regions, we reanalysed the proportion of DNM mutation types as a 189 function of mean parental ages in the deCODE trio data set 4,25 (Fig. 3b inserts, S9, Table S8). 190 Comparing the correlations from the SGDP data with the deCODE data we see a strong 191 correspondence for most mutational types: in all mutation types where correlations with either 192 dataset are significant, the direction of the effects are concordant (Figure 3b, Fig. S6). The 193 deCODE dataset has a slight bias towards probands having older fathers than mothers (mean 194 = 2.77 years, sd = 4.25, Fig. S5), and this could affect the response of mutation type fraction 195 depending on mean parental age. However, no major change in the correlation coefficients 196 was observed when only probands with similar parental ages were analysed (S9, Fig. S6). 197

198
Since there is no a priori reason to expect a relationship between archaic fragment lengths 199 and derived allele accumulation, we consider it likely that the same underlying factor has 200 affected both. The general correspondence of these correlations with those expected from 201 DNM studies supports our hypotheses that this causal element is a change in generation time. 202 More specifically, the matching decreasing correlation with parental age of TCC>TTC mutation 203 indicates that this mutation signature will increase when the mean parental age decreases. 204 Thus a considerable reduction in mean generation time in West Eurasians, as suggested in 205 this study, offers an alternative explanation to the excess of TCC>TTC mutations in that region 206 compared to the rest of the world 5,26 . 207 208 An increase in the mean generation interval can be due to an increase in paternal or maternal 209 age, or both. Anthropological studies suggest that males have generally been older than 7 with sedentary populations 27 . To gain insight into sex-specific changes to generation time 212 intervals we first compared the accumulation of derived mutations between autosomes, which 213 spend the same amount of evolutionary time in both sexes, and X chromosomes, which spend 214 ⅔ of the time in females while ⅓ in males (S10). Thus, an increase of the male-to-female 215 generation interval is expected to increase the X chromosome to autosomes (X-to-A) mutation 216 accumulation ratio 28 , although other factors such as reproductive variance and changes in 217 population size can also influence the ratio. Fig. 4a shows the X-to-A ratio of derived alleles 218 accumulated per base pair as a function of the mean archaic fragment length, as mean 219 generation time proxy, for the females in the SGDP data (S10). We observe that the X-to-A 220 ratio is significantly different among regions (P value = 3.6e-4, permutation test, S2). East 221 Asians have a higher X-to-A ratio compared to American and Central Asia and Siberia, with 222 similar Neanderthal fragment sizes, and higher than West Eurasians, with smaller Neanderthal 223 fragment sizes. This result is compatible with East Asians having a higher mean generation 224 time than West Eurasians primarily due to an increased paternal age at reproduction as 225 compared to Americans and Central Asia and Siberia where the age at reproduction of both 226 sexes are inferred to have increased similarly. 227 Another sex-specific mutation signature are C>G mutations in genomic regions with 228 clustered de novo mutations in old mothers 4,29 . This signature can be explored to compare 229 maternal ages among groups 9 . We estimated the proportion of derived C>G alleles to other 230 derived allele types in these genomic regions and contrasted it to the same ratio for the rest 231 of the genome, for each individual (S10). When samples are grouped in the 5 main regions, 232 the C>G ratio in DNM clusters differs significantly (P value = 2.77e-3, permutation test, S2), 233 increasing with increasing Neanderthal fragment length (Fig. 4b). Notably, America has a 234 higher ratio than Central Asia and Siberians for similar Neanderthal fragment lengths, 235 suggesting a relatively larger impact of old mothers to the overall mean generation time 236 throughout their history. This is in line with the X chromosome analysis in that longer 237 generation times in America were more driven by older mothers as compared to older fathers 238 We have shown that the length of Neanderthal fragments in modern human genomes can be 248 used to obtain meaningful information about a fundamental demographic parameter, the mean 249 generation interval. We estimate surprisingly large differences across eurasian and american 250 groups suggesting stable differences over tens of thousands of years. Our approach depends 251 on the assumption that archaic fragments trace back to a single Neanderthal admixture event 252 shared by all non-African populations, for which we provide further evidence. With an increasing number of sequenced ancient and modern genomes we anticipate that the 278 approach we present here can be used to obtain a fine-grained picture of shifts in generation 279 interval during the last 40,000 years that can be directly related to changes in population 280 densities, climate and culture.  To test if there are differences between two groups for a statistic (for example, average archaic 415 fragment length), we subtract the means of each group. In this case, since this test is a two-416 tailed hypothesis test, we multiply the obtained P value by two. When we test differences for 417 multiple populations, we compute the F statistic. 418

419
The code to compute the statistical significance is provided on the GitHub page. 420 The mean is very sensitive to outliers. In our case, very long archaic fragments, for example 488 in East Asians, could increase the mean and thus show an unrealistic pattern among regions. 489

-Identification of archaic fragments in non-African individuals and ancient samples
To avoid that, we use median instead because it is more robust to outliers. The method used in this study is able to find archaic fragments whose variation is not fully 494 The method used in this study, returns the archaic fragments found in a genome with an 506 associated mean posterior probability. We restricted archaic fragments compared to be of a 507 high confidence (mean posterior probability >= 0.9). 508 509 When we study the archaic fragment difference among individuals in Eurasia and America 510 applying the multiple filters explained above, we can see that the pattern observed using all 511 fragments holds (Extended Figure 1). We conclude that the difference in archaic fragment 512 length is genuine and not depending on the factors exposed above. 513 of the fragments found in the three ancient samples. For the archaic fragment length, the mean 524 and the of SE (S1) is provided. 525

-West Eurasia and East Asia fragment comparison 52527
In this study, we compare fragments in West Eurasians and East Asians. The more individuals 528 used to recover archaic fragments, the more undiscovered fragments can be found 9 . Thus, 529 the imbalance in the number of individuals in each region in the SGDP data (71 West  530 Eurasians and 45 East Asians) can potentially affect any comparison between the two regions. 531 Therefore, we downsample the number of individuals used in West Eurasians to 45 randomly 532 chosen individuals to make comparisons fair. 533 534 First, we join all overlapping fragments for each region, hereby "joined region fragments" 535 (Extended Figure 2). To do that, we used bedtools software 32 with the following command: 536 537 bedtools merge -i ind1_regx.bed ind2_regx.bed … indN_regx.bed > joined_regx.bed 538 539 where x denotes either West Eurasia or East Asia regions and N denotes the number of 540 individuals in the corresponding region. 541 542 Then, we compared how much archaic sequence the two regions share (Extended Figure 2). 543 For that, we call the intercept between the two joined sets of fragments. We refer to it as the 544 "shared joined region sequence". We use the following command: 545 546 bedtools intercept -a joined_regx.bed -b joined_regy.bed > shared_joined.bed 547 548 where x denotes either West Eurasia or East Asia and y denotes the other region different 549 than x. 550 551 It follows that the rest of the fragments not included in this set are the "private joined region 552 sequence". 553

554
The amount of sequence for shared, private and total joined region fragments are provided in 555 Table S3.  556   557 For each individual, we classified the fragments as shared depending upon if there was an 558 overlapping fragment in the other joined region fragments (Extended Figure 2). We name 559 these fragments as "shared individual fragments". To get them, we ran the following command: 560 561 beedtools intercept -u -a indn_regx.bed -b joined_regy.bed > shared_indn_regx.bed

563
It follows that the rest of the fragments not included in this set are the "private individual 564 fragments". 565 566 Summary statistics for shared and private individual fragments are provided in Table S4. 567 568 Finally, we calculated the number of individuals that have an overlapping archaic fragment in we first divided each fragment in the joined region fragments into 1 kb segments 571 (joined_regx_1kb.bed). Then, we counted the number of individuals with an overlapping 572 archaic fragment for each 1kb segment with the following command: 573 574 bedtools intersect -c -a joined_regx_1kb.bed -b ind1_regx.bed ind2_regx.bed … indN_regx.bed > 575 freq_regx.bed 576 577 Extended Figure 3 shows a summary of the joined region fragments, shared joined region 578 sequence and the archaic frequency for each region. 579 580 581 Shared joined region fragments filtering by archaic affinity 582 583 The collapsed East Asian archaic sequence (916,369,000 bp) is 1,06 fold greater than the 584 collapsed West Eurasian archaic sequence (866,945,000 bp) and more than half of the 585 sequence is shared between the two (485,255,000 bp, Table S5). We partially attribute this 586 difference to the fact that East Asians have a higher Denisova component than West 587 Eurasians 31 . To study that we repeated the analysis above filtering archaic fragments in each 588 individual (before collapsing) depending on which of the three archaic genomes (Vindija 589 Neanderthal genome 12 , Altai Neanderthal genome 33 , Denisova genome 34 ) share the most 590 variants to (below), following the methods in 9 . Some fragments do not share variants with any 591 of the 3 sequenced archaic genomes, and thus we classify them as unknown. There are also 592 instances in which an archaic fragment does not share more SNPs with one of the archaic 593 genomes but multiple, so we can't classify the affinity of the fragments; these fragments are 594 called ambiguous fragments. 595 596 1) Denisova fragments 597 598 We only include archaic fragments which share more variants to Denisova genome than any 599 of the two Neanderthal genomes. 600 601 2) nonDenisova fragments 602 603 In this analysis we exclude fragments used above from all the fragments. Thus, we include 604 The analysis was repeated with fragments that share more variation with Neanderthal than 619 with Denisova (Neanderthal fragments). In this case, we observe a 1.07 fold higher 620 Neanderthal content in the East Asian group. We attribute this to the fact that since West 621 Eurasia archaic fragments tend to be shorter, they do not contain enough SNPs to classify 622 them to the category that they belong to. Thus, they are going to be more often classified as 623 unknown compared to fragments in East Asia. Furthermore, the 11 method has higher false 624 negative rate with short fragments, which will artificially decrease the total number of 625 fragments in that region. 626   Table S6. 715 716 Finally, we classified loci in different mutation types depending on the derived allele nucleotide, 717 the ancestral allele nucleotide and their 5' and 3' nucleotide context. For example, as shown 718 by the diagram below, a derived allele T that had an ancestral allele C with the context G and 719 A (5' and 3' respectively) would be denoted as GCA>T. Because we do not make distinction 720 of the strand in which the mutation occurred, we collapsed strand-symmetric mutations. This 721 is the same as saying that GCA>T is equivalent to TGC>A. This way, we end up with 96 722 mutation types.  Table S7.  region and mutation type, the mean and the of SE (S1) is provided. 771

-Estimation of the different parental generation time in West Eurasia and East Asia 772 773
As described in the main text, West Eurasia individuals have accumulated 1.09% more derived 774 alleles than East Asians since the split with Africans (Out of Africa). Because we are only 775 interested in the proportion of derived alleles accumulated after the split of West Eurasians 776 and East Asians, we need to correct for the span of time since the Out of Africa event until the 777 split of the two Eurasian populations (Fig. S3). Thus, we need to assume dates for the split 778 between Africans and non-Africans and the split between Eurasians. 779 780 We note that in the literature dating the Out of Africa is widely discussed and controversial, 781 since it was not a clean split between non-Africans and Africans. Instead, from MCMC results 782 and cros coalescence rate analysis in 2,19 the authors note that there might have been a 783 gradual separation among African populations and between Africans and non-Africans. They 784 suggest that this process created population structure between 200,000 -100,000 years ago 785 within Africa and that the non-African group had more gene flow with certain African groups 786 (i.e., Yorubans) than others (i.e., San). After that, the rate increased, indicative of an 787 accelerated split between Africans and non-Africans which has the median divergence point 788 between 80,000 -60,000 years ago. Similarly, the split among Eurasians was not clean either.  shows conceptually that the mutation rate could only be different after the split between East 839 Asians and West Eurasians (blue and green terminal branches). However, the difference in 840 derived allele accumulation is calculated since the split with Africans for each group (cyan and  841 blue, cyan and green). 842

-Mutation spectrum correlation with mean parental age 843 844
The germline mutation spectrum is dependent on the parental sex and age at conception 4 . In 845 this study, we observe differences in the abundance of derived alleles accumulated after the 846 out of Africa event when stratified by mutation type (Fig. S2, Table S7). Here, we study to 847 which extent these differences can be explained by changes in generation time in C>T'). Thus, the total amount of derived alleles do not consider these 3 types. We correlated 862 the fraction of derived alleles of each type with the mean archaic fragment length as a proxy 863 of mean generation time (Fig. 3b). We obtained the linear model of such correlation for each 864 mutation type using the following R function (Table S8) We downloaded the set of DNM called in 25 and the additional proband information from the 871 supplementary data provided in the publication. We join both in order to compute the mean 872 parental age for each DNM for each proband. Indels are filtered out. Following the 873 methodology in a similar test in 4 , we aggregate all mutation counts for each of the 9 types of 874 all probands with the same mean parental age. We then compute the fraction of each mutation 875 type. In other words, for each mutation type and mean parental age we have a single mutation 876 fraction value. Those data points that were obtained aggregating information from less than 2 877 probands were discarded. We obtained linear models for each mutation type using the 880 lm(mutation_fraction~mean_parental_age, weights = n_probands) 881

882
The correlations between the slopes of both datasets is shown in Fig. S4. 883

884
The probands of the deCODE dataset have a bias towards fathers being older than mothers, 885 with a mean of 2.77 years and the largest difference of more than 40 years (Fig. S5a). To 886 study if the correlation of mutation spectrum with the mean parental age is affected by the 887 mentioned bias, we rerun the correlation test with the deCODE dataset with only probands 888 that have parents with an age difference of less than 4 years. This way, we retaining more 889 than 50% of the data (Fig. S5b)  Due to the inheritance pattern of the X chromosome -2 copies transmitted in females while 935 only 1 in males -compared to autosomes -2 copies in both females and males -, it is expected 936 that the X chromosome has ¾ the diversity of the autosomes. However, this can be altered if 937 the mutation rate changes disproportionately between females and males due to shifts in 938 generation time between sexes. For example, an increase in the male mean generation time 939 will decrease the yearly mutation rate in males and thus, proportionally less mutations are 940 going to be accumulated in autosomes compared the X chromosomes 28 . Therefore, the ratio 941 of derived allele accumulation between the X chromosome and the autosomes will reflect 942 variation on the generation time between males and females: higher values of the X-to-A ratio 943 will be indicative of longer generation times in males compared to females and vice versa. 944 Although here we only consider generation time differences to affect the ratio, there are other 945 factors that can perturbe this ratio such as reproductive variance between sexes 35 , 946 demographic events 36 or differences in selection 37 . 947

948
To investigate that, we obtained the number of derived alleles in the autosomes and X 949 chromosomes of the females of the SGDP data (Table S9) where denotes the number of derived alleles, the number of callable base pairs in either 955 (X chromosome) or (autosomes). We then correlated the ratio with the mean archaic 956 fragment length for each individual obtained in S3 (Fig. 4a). 957 958 C>G maternaly enriched regions 959 960 As described in 4 , there are regions of the genome in which DNM are clustered (cDNM). Those 961 regions appear to be enriched in C>G mutations which originate in the maternal lineage. They 962 also show that these clusters increase in number more rapidly with maternal than paternal age 963 at conception. 964 individuals. Summary statistics of the derived allele accumulation per region on the X 1040 chromosome of males. For each region, the mean and the of SE (S1) is provided. 1041