An integrative skeletal and paleogenomic analysis of prehistoric stature variation suggests relatively reduced health for early European farmers

Human culture, biology, and health were shaped dramatically by the onset of agriculture ~12,000 years before present (BP). Subsistence shifts from hunting and gathering to agriculture are hypothesized to have resulted in increased individual fitness and population growth as evidenced by archaeological and population genomic data alongside a simultaneous decline in physiological health as inferred from paleopathological analyses and stature reconstructions of skeletal remains. A key component of the health decline inference is that relatively shorter statures observed for early farmers may (at least partly) reflect higher childhood disease burdens and poorer nutrition. However, while such stresses can indeed result in growth stunting, height is also highly heritable, and substantial inter-individual variation in the height genetic component within a population is typical. Moreover, extensive migration and gene flow were characteristics of multiple agricultural transitions worldwide. Here, we consider both osteological and ancient DNA data from the same prehistoric individuals to comprehensively study the trajectory of human stature variation as a proxy for health across a transition to agriculture. Specifically, we compared ‘predicted’ genetic contributions to height from paleogenomic data and ‘achieved’ adult osteological height estimated from long bone measurements on a per-individual basis for n=160 ancient Europeans from sites spanning the Upper Paleolithic to the Iron Age (~38,000-2,400 BP). We found that individuals from the Neolithic were shorter than expected (given their individual polygenic height scores) by an average of −4.47 cm relative to individuals from the Upper Paleolithic and Mesolithic (P=0.016). The average osteological vs. expected stature then increased relative to the Neolithic over the Copper (+2.67 cm, P=0.052), Bronze (+3.33 cm, P=0.032), and Iron Ages (+3.95 cm, P=0.094). These results were partly attenuated when we accounted for genome-wide genetic ancestry variation in our sample (which we note is partly duplicative with the individual polygenic score information). For example, in this secondary analysis Neolithic individuals were −3.48 cm shorter than expected on average relative to individuals from the Upper Paleolithic and Mesolithic (P=0.056). We also incorporated observations of paleopathological indicators of non-specific stress that can persist from childhood to adulthood in skeletal remains (linear enamel hypoplasia, cribra orbitalia, and porotic hyperostosis) into our model. Overall, our work highlights the potential of integrating disparate datasets to explore proxies of health in prehistory.


Introduction
The agricultural revolution -beginning ~12,000 BP (years before present) in Mesopotamia 1,2 and then spreading [3][4][5] or occurring independently 6,7 across much of the inhabited planet -precipitated profound changes to human subsistence, social systems, and health. Seemingly paradoxically, the agricultural transition may have presented conflicting biological benefits and costs for early farming communities 8,9 . Specifically, demographic reconstructions from archaeological and population genetic records suggest that the agricultural transition led to increased individual fitness and population growth 6,10-12 , likely due in part to new food production and storage capabilities. Yet, bioarchaeological analyses of human skeletal remains from this cultural period suggest simultaneous declines in individual physiological well-being and health, putatively from i) nutritional deficiency and/or ii) increased pathogen loads as a function of greater human population densities, sedentary lifestyles, and proximity to livestock 9,13-18 .
To date, anthropologists have used two principal approaches to study health across the foragingto-farming transition in diverse global regions 13,19,20 . The first approach involves identifying paleopathological indicators of childhood stress that persist into adult skeletal remains. For example, porotic hyperostosis (porous lesions on the cranial vault) and cribra orbitalia (porosity on the orbital roof) reflect a history of bone marrow hypertrophy or hyperplasia resulting from one or more periods of infection, metabolic deficiencies, malnutrition, and/or chronic disease [21][22][23][24][25][26] . Meanwhile, linear enamel hypoplasia (transverse areas of reduced enamel thickness on teeth) occurs in response to similar childhood physiological stressors (e.g., disease, metabolic deficiencies, malnutrition, weaning) that disrupt enamel formation in the developing permanent dentition [27][28][29][30] . Broadly, these paleopathological indicators of childhood stress tend to be observed at higher rates among individuals from initial farming communities relative to earlier periods, potentially reflecting their overall "poorer" health 14,[31][32][33][34][35][36] .
A second approach uses skeleton-based estimates of achieved adult stature as a proxy for health during childhood growth and development [37][38][39] . Since stature is responsive to the influences of nutrition and disease burden alongside other factors, relatively short "height-for-age" (or "stunting") has been used as an indicator of poorer health in both living and bioarchaeological contexts [39][40][41][42][43] . When studying the past, individual stature can be estimated from long bone measurements and regression equations [44][45][46][47] . Using these methods, multiple prior studies have reported a general profile of relatively reduced stature for individuals from early agricultural societies in Europe 15,36,[48][49][50] , North America 51-53 , the Levant 16,32 , and Asia 54,55 . For example, estimated average adult mean statures for early farmers are ~10 cm shorter relative to those for preceding hunter-gatherers in both Western Europe (females -8 cm; males -14 cm) 49,50 and the Eastern Mediterranean (females -11 cm; males -8 cm) 56 . This pattern is not universal, as a few studies do not report such changes 57,58 ; the variation could be informative with respect to identifying potential underlying factors 59 .
However, in addition to environmental effects like childhood nutrition and disease, inherited genetic variation can have an outsized impact on terminal stature, with ~80% of the considerable degree of height variation within many modern populations explainable by heritable genetic variation [60][61][62][63] . Moreover, migration and gene flow likely accompanied many subsistence shifts in human prehistory. For example, there is now substantial paleogenomic evidence of extensive population turnover across prehistoric Europe [64][65][66][67][68][69] . Therefore, from osteological studies alone we are unable to quantify the extent to which temporal changes in height reflect variation in childhood health versus changes/differences in the frequencies of alleles associated with height variation.
In this study we have performed a combined analysis of ancient human paleogenomic and osteological data where both are available from the same n=160 prehistoric European individuals representing cultural periods from the Upper Paleolithic (~38,000 BP) to the Iron Age (~2,400 BP). This approach allows us to explore whether 'health', as inferred from the per-individual difference between predicted genetic contributions to height and osteological estimates of achieved adult height, changed over the Neolithic cultural shift to agriculture in Europe. When craniodental elements were preserved and available for analysis (n=91 of the 160 individuals), we also collected porotic hyperostosis, cribra orbitalia, and linear enamel hypoplasia paleopathological data in order to examine whether patterns of variation between osteological height and genetic contributions to height are explained in part by the presence/absence of these indicators of childhood or childhood-inclusive stress.
Subsistence strategies in the Upper Paleolithic and Mesolithic focused on gathering, collecting, and hunting food. The Neolithic is marked by the emergence of plant cultivation and animal domestication (to varying degrees and tempos of integration), long-term settlements, larger populations, and increased social complexity -processes which then intensified and expanded in subsequent periods 70 . The overlapping dates among the different cultural periods reflect both geographical variation in the timing of cultural change and the potential for co-occurrence of multiple cultural traditions within a single region.

Confirmation of an average osteological stature dip in the Neolithic
We used an osteometric board to newly estimate the lengths of preserved long bones for n=86 of the n=160 total individuals (53.7%) in our database (SI Appendix, Table S1). We also recorded published and unpublished (previously collected) long bone length estimates for an additional n=54 individuals (33.7%). In these cases we estimated osteological stature from the long bone length data 44 . Finally, for n=20 individuals only pre-calculated terminal height estimates were available (12.5%).
We observed osteological stature variation among cultural periods ( Figure 1B). Reconstructions of osteological stature for females and males are lower during the initial shift to farming during the Neolithic compared with earlier and post-Neolithic periods. Specifically, individuals from the pre-Neolithic periods (female average stature = 156.7 cm ± 6.8 (s.d.) cm, males=168.8 ± 7.4 cm) were ~4 cm taller on average than those from the Neolithic (females=150. 3 44 . C) Polygenic height scores generated using genome-wide association summary statistics for height-associated single nucleotide polymorphisms and individual ancient DNA genotype data. D,E) The relationship between polygenic height score and estimated osteological stature (cm) for females, males, and for the full sample with height differences from mean stature calculated separately for females (mean=156.45 ± 4.32 cm) and males (mean=167.04 ± 1.95 cm), respectively. F) Residuals of the relationship between polygenic height score and osteological height with sex as a covariate for all individuals, by cultural period. Mean and median are represented by the black and blue dashed lines, respectively. Skeletal illustration by Katharine Thompson.  Table S2a, and S2b).
The overall pattern from our data roughly parallels previously-published reports. Specifically: a) stature decreased slightly from the Upper Paleolithic to the Mesolithic 35,36,71 ; b) marked stature reduction occurred during the initial agricultural transition in the Neolithic 14,16,34,55 (although this is not universal 70,[72][73][74]; and c) stature rebounded during subsequent post-Neolithic periods of agricultural intensification 14,16 .

Early farmers were relatively shorter than expected given their polygenic height scores
We next considered the osteological height estimates in the context of ancient DNA-based polygenic height scores for the same individuals. Using an established approach for working with ancient DNA genotype data 65,75,76 , we estimated a polygenic score for each prehistoric individual based on their available genome-wide genotypes in the context of results from a large-scale genome-wide association study of stature variation in modern Europeans 77 (data from: http://www.nealelab.is/uk-biobank/). For the results presented in the main text and figures, we used a version of the dataset in which all variants possibly affected by deamination-based ancient DNA damage [78][79][80] were masked. We also performed all analyses with the unmasked dataset and obtained similar results (SI Appendix, Fig. S1, Table S3).
While the polygenic scores that we estimated for the n=160 individuals were somewhat variable across cultural periods ( Figure 1C; SI Appendix, Table S4 and S5) as expected based on prior work 58 , we were most interested in using these data to begin to account for genetic contributions to achieved adult (osteological) height on a per-individual basis. Polygenic height scores and osteological estimates of stature were positively correlated for females (n=65; Fig. 1D; r 2 =0.023; P=0.117), males (n=95; Fig. 1D; r 2 =0.044; P=0.023), and for the combined dataset when calculating the difference between per-individual osteological height and mean stature separately for females (mean=156.45 ± 4.32 cm) and males (mean=167.04 ± 1.95 cm) ( Fig. 1E; r 2 =0.043 P=0.005). These results support the general biological plausibility of our integrative analysis of paleogenomic and osteological data. Importantly and expectedly, there is still considerable inter-individual variation in the relationship between polygenic height score and achieved adult stature, which could reflect any combination of incomplete genetic information, long bone measurement error, polygenic height score or stature estimate error, and the effects of childhood nutrition, disease, and other environmental variables on growth. Accordingly, we next analyzed the residuals from the combined-sex osteological stature and ancient DNA-based polygenic score model (Fig. 1E) to test whether individuals tended to have taller or shorter adult stature relative to expectations given their individual polygenic scores across the different cultural periods. These residuals are expressed in +/-cm from predicted stature per individual.
When using our ancient DNA-based approach to partly account for the predicted contribution of genetic variation to adult stature, we observed that individuals from the Neolithic were indeed osteologically shorter than expected (i.e. based on their polygenic height scores, and in the context of our overall sample) compared to individuals from other cultural periods ( Fig. 1F; SI Appendix, Table S6). Specifically, pre-Neolithic individuals (average residual = +1.99 ± 6.8 cm) were +4.47 cm taller than expected on average relative to Neolithic individuals (average residual = -2.48 ± 9.9 cm; P=0.016; FDR=0.064). The average osteological vs. genetic height score residual then increased steadily in the Copper Age (+2.67 cm relative to the Neolithic; P=0.052; We confirmed that these results cannot be explained by geographic variation (latitude and longitude) in our sample (SI Appendix, Fig. S2, Table S7). We also obtained similar results when we separately analyzed females and males (SI Appendix, Fig. S3, Table S6) and when we separately analyzed the lengths of individual long bones as opposed to the reconstructed stature estimates (for the femur and radius, in particular, although there are sample size limitations) (SI Appendix, Fig. S4 and S5, Table S8).
In contrast, our results were partially muted when we included variables explicitly reflecting genetic ancestry in our model. Our primary approach (as above) already accounts for ancestry variation via individual-level calculations of polygenic scores. However, polygenic height scores explain only a proportion of total heritable variation [81][82][83] . Therefore, we repeated our hypothesis tests after conducting a multi-dimensional scaling (MDS) analysis with the genome-wide genotype data for all n=160 ancient individuals to then including the first four MDS axes from this analysis as factors in an updated linear model, following Cox et al. 84 . After doing so, pre-Neolithic individuals (average residual = +2.31 ± 6.8 cm) were +3.48 cm taller than expected on average relative to Neolithic individuals (average residual = -1.16 ± 6.9 cm; P=0.056, FDR=0.222). The trend towards increasing average values through the Copper Age (+0.88 cm relative to the Neolithic; P=0.503; FDR=0.671), Bronze Age (+1.46 cm; P=0.310; FDR=0.620), and Iron Age (+0.62 cm; P=0.700; FDR=0.770) still exists but is considerably less pronounced (SI Appendix, Fig. S6, Table S9).

Paleopathological indicators of non-specific stress
Adverse early life conditions may negatively impact adult stature. To begin to investigate whether individual-level early life effects on prehistoric stature could be identified, we incorporated observations of paleopathological indicators of non-specific stress that can persist from childhood to adult skeletal remains into our analytical model. To do so, we characterized the presence/absence of one or more of cribra orbitalia (porosity on the orbital roof), porotic hyperostosis (porosity on the cranial vault), and linear enamel hypoplasia (reduced areas of enamel thickness) for n=91 of the n=160 (56.9%) individuals in our study (n=75 newly characterized; n=16 published/previously characterized SI Appendix, Table S10).
For 53 of these 91 individuals (58.2%), crania were sufficiently complete for assessment of the presence/absence of all three stress indicators ( Fig. 2A). Of this subsample, one or more stress indicators were present for 39 (73.6%) of the individuals, two or more indicators were observed in 18 (34%) individuals, and all three paleopathological indicators were present in only two (3.8%) individuals. Thus, stresses on health were relatively common overall in prehistoric Europe.
A striking 92.9% (13/14) of Neolithic individuals had one or more stress indicators ( Fig. 2A). While the proportion of Copper Age individuals with one or more stress indicators (10/18; 55.6%) was lower compared to that for the Neolithic (Fisher's exact test; P=0.044), the Neolithic result is not unique, with one or more stress indicators also recorded for all but one individual in the Bronze Age sample (11/12; 91.7%).
Considering the larger dataset of n=91 individuals with presence/absence data for at least one stress indicator to maximize sample sizes, we observed a distinct difference between the Neolithic and Bronze Age patterns (Fig. 2B). Specifically, porotic hyperostosis is common in the Neolithic

Proportion of individuals with stress indicator
Proportion of individuals with LEH, CO, or PH across cultural periods (n=91 evaluated; affected/observed) Proportion of individuals with 1+, 2+, or 3 non-specific stress indicators across cultural periods (n=53; affected/observed) Proportion of individuals with 1+, 2+ or 3 indicators

C.
Presence/absence of paleopathological indicators of stress and the relationship between osteological height and polygenic height score

D.
Presence/absence of cribra orbitalia and the relationship between osteological height and polygenic height score across cultural period We next tested whether the presence of paleopathological indicators of stress is predictive of individual-level deviations from the overall relationship between osteological stature and polygenic height score estimates. Based on the subset of individuals with presence/absence data for all three paleopathological indicators, the 39 individuals with one or more stress indicator were -1.36 cm shorter than expected on average compared to the 14 individuals without any stress indicators (SI Appendix, Fig. S7, Table S11). With the moderate magnitude of this difference and the relatively small sample size available for this analysis, this result was not unlikely based on chance expectations (t-test; P=0.448). The 18 individuals with two or more stress indicators were -1.80 cm shorter than expected on average relative to individuals with no stress indicators (P=0.411).
Using our larger dataset, we next separately analyzed the effect of each paleopathological stress indicator on the osteological stature-polygenic height score relationship (Fig. 2C). We found that the n=21 individuals with cribra orbitalia were slightly but not significantly shorter (-1.52 cm) than expected on average compared to the n=53 individuals without cribra orbitalia (P=0.361; FDR=0.968). Linear enamel hypoplasia and porotic hyperostosis presence/absence were negligibly associated with osteological stature vs. polygenetic height score residual variation ( Fig.  2D; SI Appendix, Table S12a). These patterns did not change appreciably when excluding the few individuals with active cribra orbitalia or porotic hyperostosis lesions from their respective analyses (active lesions reflect adult stress while potentially masking evidence of lesions from childhood; SI Appendix, Fig. S8, Table S12b).
Finally, we investigated the relationship between cribra orbitalia, porotic hyperostosis, and linear enamel hypoplasia presence/absence and osteological vs. polygenic height score residual in the context of cultural period (SI Appendix, Fig. S9, Table S13, Table S14). Of note: the n=6 Neolithic individuals with cribra orbitalia were -4.44 cm shorter than expected on average compared to the n=13 individuals from the same cultural period without cribra orbitalia (Fig 2D; P=0.306). This preliminary but suggestive effect was nearly absent in the Copper Age (-0.015 cm difference; P=0.994), the only other period with sufficient presence/absence sample sizes for analysis.

Discussion
Bioarchaeologists have equated repeated observations of relatively shorter average adult statures in the Neolithic to a likely general health decline for individuals during this cultural period 13,14,16,34,55,85 . Combinations of reduced nutritional diversity, unpredictable food availability (e.g., crop failure, storage loss), and increased infectious disease burden may have negatively impacted childhood health and growth 18,86,87 . Understandably, those prior studies did not account for the contribution of inter-individual variation in the contribution of heritable genetic factors to adult stature. Yet this consideration is especially important in light of updated understandings of considerable migration and gene flow processes associated with various farming transitions [88][89][90][91] .
In our study we sampled 160 prehistoric European individuals for whom both genome-wide ancient DNA data and intact long bones were available for analysis, making it possible for the first time to test whether Neolithic individuals were still osteologically shorter than expected when accounting (at least partly) for individual-level genetic contributions to height. Using this approach we found that the average Neolithic farmer was indeed relatively shorter than expected compared to pre-Neolithic individuals (Fig. 1F). Average osteological versus expected stature then increased over each post-Neolithic cultural period. This gradual recovery may reflect a history of continued (although variable) cultural and technological innovations that ameliorated and/or overpowered the initial nutritional and disease stressors faced by the earliest farmers [66][67][68][69]92,93 .
Our framework is related to but differs from that of a previous study by Cox and colleagues 58 , who compared population-level osteological height estimates (n=1,159 total individuals; the osteological data were from Niskanen et al. 94 and ancient DNA-based polygenic height scores (n=1,071) across prehistoric Europe. These two estimates were computed separately, i.e. typically not for the same individuals, thereby facilitating the large sample sizes. In contrast, our approach expressly considers individual-level dynamics in the relationship between these two variables, which is sample size restrictive yet potentially insightful. Interestingly, mean osteological stature estimates and polygenic height scores were both similar between the European Mesolithic and Neolithic 58,94 . This result is in contrast to our own osteological height estimate observations in this study and those of multiple previous bioarchaeological studies 15,36,48 , which again may reflect potentially interesting inter-population variability as part of the nuanced complexity underlying subsistence shifts 92 .
One important potential limitation of our study, aside from the number of individuals, is uncertainty regarding the ultimate portability of polygenic scores over genetic, geographic, and temporal distances 81,82,95,96 . Our analyses were also based on incomplete ancient DNA genotype data (however, note that we did exclude potential error from deamination-based ancient DNA damage by masking all potentially affected sites in the primary versions of our analyses). Yet the significant, positive relationship between polygenic height scores and estimated osteological statures across our overall sample (Fig. 1F) demonstrates the biological plausibility of our model. Moreover, our primary results were unchanged when we incorporated archaeological site latitude and longitude variables into our analyses.
However, even with complete genome-wide genotype data, polygenic height scores only capture a proportion of the heritable component of stature variation 83,[97][98][99] . Therefore, our primary analytical approach might incompletely capture the stature-relevant effects of any genetic ancestry variation across our sample. That is, with respect to our hypotheses, we would only be partially accounting for any cultural period-confounded migration/gene flow among populations with different genetic height profiles. For example, gene flow as a result of the spreading of the Yamnaya/Corded Ware cultures ("Steppe ancestries") starting ~4,000 to 5,000 years ago may have been associated with the introduction of relatively greater proportions of 'tall' alleles into various regions of Europe 58,93 . If this is a general phenomenon that extends to small effect and other loci not included in our individual-level polygenic height score calculations, then our cultural period-related inferences could be erroneous.
Accordingly, to help explore the potential effects of these processes on our results we conducted a parallel set of analyses in which we included factors reflecting genome-wide genetic ancestry (from the first four axes of a multi-dimensional scaling analysis) in our model. To provide a sense of the relative magnitudes of potential impact of these effects, we observed that 21.3% of the variation in the difference between individual and per-sex mean osteological stature is explained by a model including both individual polygenic risk scores and the MDS components (SI Appendix; Table S14). In contrast, individual-level polygenic height score alone (i.e. without MDS factors) explained 4.35% of the variation (Fig. 1E).
When including the MDS factors in the analysis, our downstream finding of shorter osteological statures than expected (i.e., based on polygenic height scores and MDS factors, and in the context of the overall sample) in the Neolithic relative to other cultural periods was in fact partly attenuated. While this approach represents an over-correction (or, more precisely, some level of double accounting) via the inclusion of correlated genetic signals that cannot be deconvoluted readily due to the phenotype's pervasive polygenicity, the findings nonetheless call for cautious interpretations of the main results in our study while awaiting expanded future sample sizes.
For a subset of the individuals in our study (n=91) we were additionally able to consider the extent to which three paleopathological markers of non-specific childhood and childhood-inclusive stress (linear enamel hypoplasia, cribra orbitalia, and porotic hyperostosis; each of which are maintained in the skeleton into adulthood) are associated with the relationship between osteological stature estimates and polygenic height scores. We observed a slight trend of relatively shorter than expected (given polygenic height score) adult statures among individuals with one or more childhood stress markers present (SI Appendix, Fig. S7). However, larger sample sizes will be necessary to more fully explore interplays among specific paleopathological indicators, osteological versus genetic height scores, and cultural periods. Still, our findings do at least suggest that factors underlying skeletal growth trajectories are separable, at least in part, from those leading to paleopathological indicators of stress. In particular, high rates of paleopathology are still observed in post-Neolithic cultural periods, for example in the Bronze Age, even after absolute osteological stature and actual versus expected stature averages have recovered.
In summary, we united previously disparate osteological and paleogenomic datasets for 160 prehistoric European individuals on a per-individual basis. Our results represent a novel advance in the study of whether and how a major cultural transition in human evolution affected physiological health. In particular, we show that the average Neolithic individual may have been relatively short even when correcting for expected individual genetic contributions to adult stature. This result may reflect reduced nutrition and/or increased infectious disease burden. We also preliminarily developed a framework for further consideration of these results in the context of particular paleopathological indicators of childhood stress. Looking forward, our model can be expanded in various dimensions, for example to different world regions or to more constrained spatial and temporal contexts, in order to further the study of emergent physiological trade-offs across periods of dramatic cultural or environmental change. Integrated osteological-genetic approaches will increasingly become important components of the toolkit for studying the dynamics of past human health.

Genotyping and imputation
We implemented GATK UnifiedGenotyper 123 followed by imputation to maximize the amount of genetic information for downstream analyses and since genotype likelihood scores can be generated for imputation. We opted to impute diploid genotypes and missing sites for the individuals in our data set (using the 1000 Genomes Project Consortium 124 reference panel) to minimize potential reference bias that may otherwise occur when using an alternative approach of randomly sampling one allele at each site 125 .

Assessing accuracy of imputation
We evaluated how sequence coverage may be impacting imputation accuracy (i.e., whether imputation is outperforming under high vs. low coverage conditions). We compared our imputed genotype data in the high coverage Loschbour individual (~16x) 104 with downsampled BAM files (using SAMtools -s parameter 122 ) from 5x coverage to 0.5x for chromosome 1 using SnpSift 127 . We obtained a concordance rate (in terms of total sites recovered) of approximately 97-99%, suggesting imputation accuracy is not dramatically lower in the low coverage imputed genotype data (SI Appendix, Table S16).
We also assessed the imputation accuracy of heterozygous sites by comparing the not imputed high coverage genotype data for Loschbour with each downsampled imputed BAM file from 0.1x to 3x coverage. At the lowest coverage of 0.3x, approximately 85% of heterozygous sites are recovered with subsequent increases of ~98% at 3x coverage (SI Appendix, Table S17).
For our data, polygenic height scores were estimated using PLINK 1.9 128 with clumping of independent SNPs. Clumping was used to identify the SNP with the lowest p-value in each LD block 128 . This approach retains SNPs with the strongest statistical evidence while reducing the correlation between the remaining SNPs 129 . Although all common SNPs could be used in polygenic scoring, clumping to remove SNPs that have limited statistical association is also a practical approach 129 . Clumping was performed at the genome-wide p-value 5e-08 using PLINK 1.9 parameters "--clump-r2 0.1", "--clump-kb 1000" with the 1000 Genomes "Europeans" reference population panel to retain the most correlated SNPs ("index SNPs") from the UK Biobank height summary statistics, which was then used to calculate the polygenic height scores. Polygenic scores were calculated using "--geno 0" to exclude missing genetic markers and the "--score" flag, extracting the specific index SNPs (github.com/smmarciniak/aDNA_osteo_height).

Osteological data collection
The 160 individuals in our data set have a broad geographical, temporal, and cultural period ranges. Radiocarbon or archaeologically calibrated dates, latitude/longitude coordinates, genetic sex, and archaeological/cultural period were obtained from the original paleogenomic and archaeological publications of the paleogenomic data used for this study (SI Appendix, Table S1).

Long bone measurement and stature reconstruction
Both newly-collected (n=86 individuals) and previously collected/published (n=54) osteological data were included in this study (SI Appendix, Table S1). For n=20 of the latter set of individuals, only pre-calculated terminal height estimates (based on unavailable long bone length measurements) were available (SI Appendix, Table S1). Only adult individuals were included in our study. For newly collected data this assessment was based on the complete fusion of all long bone epiphyses; we otherwise relied on classifications of 'adult' in the published record (which were likely based on the same criterion).
Permissions to collect new long bone measurement data were coordinated with researchers (coauthors on this publication) affiliated with the museums and university departments housing the various individuals. An osteometric board was used to measure the maximum length measurements (to the nearest millimeter) of the femur, tibia, radius, and humerus following standard osteological methods 44,130,131 . Intact long bones were selected, either the left or right side, depending on availability and preservation; if both sides were available and fully preserved, then both were measured.
We used a regression-based approach to reconstruct osteological stature from Ruff and colleagues 44 . These equations were developed using 501 individuals from across Europe ~7,000 BC-1900 AD, broadly approximating the geographical and temporal span of the individuals in our dataset. Sex-specific regression equations for the femur, tibia, humerus, and radius were used (SI Appendix, Table S2a), with standard error estimates ranging from 1.66% to 2.73% 44 . For the tibia, separate "north" and "south" equations are available 44 . Given the potential for migration occurring across the temporal and cultural periods in our data set we computed estimates from both equations and averaged them for the tibia-derived stature estimate for all individuals in our study (with available tibia measurements) regardless of geographic origin. When measurements from multiple different bones from the same individual were available, stature estimates derived from each of the different bones were estimated separately and then averaged to obtain a single point estimate per individual.

Paleopathological indicators of non-specific stress
For paleopathological evaluations, 75 individuals were newly characterized and 16 were published/previously characterized (SI Appendix, Table S10). Crania with at least one permanent incisor were examined to record the presence or absence and severity of three skeletal indicators of non-specific stress: porotic hyperostosis, cribra orbitalia, and linear enamel hypoplasia 21,22,27,28 . Cribra orbitalia was assessed according to Stuart-Macadam (1991) on a five-stage scale of severity (n=74 evaluated) and whether the lesions were healed or active (n=73) 21 ; porotic hyperostosis was evaluated on a three-stage scale (n=73) and whether the lesions were healed or active (n=72) 22 ; and linear enamel hypoplasia was assessed as present or absent (i.e., whether one or more linear bands of decreased enamel thickness were visible) (n=76) 27 . The differences in the number of individuals assessed for healed or active lesions is the exclusion of a previously published individual (n=1 Neolithic) 29 identified as having active lesions and n=1 Mesolithic individual for whom the nature of the porotic hyperostosis lesions was unspecified 132 .

Statistical analyses
Statistical analyses were conducted in RStudio (v1.2.5033). Our main linear model was generated using osteological height and genetic height scores with sex as a co-variate. Data normality was assessed using 'ggResidpanel' (v0.3.0) 133 (SI Appendix, Fig. S10). The residuals from this model were the basis for downstream analyses comparing patterns of stature variation across cultural periods as well as with the paleopathology data. Statistical analyses (t-tests) on the residuals from various linear models (including those below) were performed and the results are provided in the supplementary tables. False discovery rate calculations (p.adjust, method="fdr") were performed in R for each analysis set in order to help quantify the multiple testing effect.
We performed two additional analytical iterations of our linear model to evaluate consistency of downstream results. First, we included latitude and longitude as additional factors in the linear model framework described above. Second, we included factors related to genetic ancestry variation into our linear model, following the approach of Cox and colleagues 84 . Specifically, we performed a multi-dimensional scaling (MDS) analysis in PLINK (v1.9) 128 using the full genomewide SNP genotype data available for all of the individuals in our study. We generated the "plink.genome" file (plink --file <input_plink_files> --genome) which was used as input for the MDS analysis specifying 4 dimensions (plink --file <input_plink_files> --read-genome plink.genome -cluster --mds-plot 4). The first four axes (C1, C2, C3 and C4) were each included in the linear model.

Data availability
All osteological measurements, final stature estimates, and other skeletal individual-level information (e.g., ID, sex, radiocarbon dates, archaeological/ cultural period, geographical coordinates, publication sources for the ancient DNA data) are provided in Table S1 and additional supplementary tables. Although no new paleogenomic data were generated directly for this study, for n=28 individuals the analyzed ancient DNA data are from primary manuscripts currently in preparation or submitted for publication. References and accession numbers for these 28 individuals will be incorporated into updated versions of Table S1 as they are available. Scripts related to genotype calling, imputation, polygenic scoring, and statistical analyses are available at github.com/smmarciniak/aDNA_osteo_height. Figure S1. Linear regressions and residuals of osteological height and genetic height score with sex as a co-variate without deamination filtering. Figure S2. Residuals of osteological height and genetic height score with sex, latitude and longitude as co-variates. Figure S3. Residuals of osteological height and genetic height score with sex as a co-variate, plotting females and males separately. Figure S4. Replicability of the residuals of osteological height and genetic height score with sex as a co-variate using long bone lengths       Table S1. Description of individuals included in data set (n=160). Table S2a. Average osteological heights across cultural periods. Table S2b. Comparisons of the residuals from a linear model of osteological stature and sex. Table S3. Comparisons of the residuals from a linear model of osteological stature and polygenic height score with sex as a co-variate for data not filtered for deamination. Table S4. Polygenic height scores for deamination filtered and not deamination filtered data. Table S5. Polygenic height score t-test results. Table S6. Comparisons of the residuals from a linear model of osteological stature and polygenic height score with sex as a co-variate for deamination filtered data. Table S7. Comparisons of the residuals from a linear model of osteological stature and polygenic height score with sex, latitude and longitude as co-variates. Table S8. Comparisons of the residuals from a linear model of average long bone length and polygenic height score with sex as a co-variate for deamination-filtered data. Table S9. Comparisons of residuals from a linear model of osteological stature and polygenic height score with sex and ancestries as covariates. Table S10. Paleopathological summary for 91 individuals. Table S11. Comparison of residuals from the main linear model for individuals with 0, 1+ and 2+ indicators of paleopathological stress (n=53 individuals). Table S12a. Comparisons for individuals with LEH, cribra orbitalia or porotic hyperostosis with the residuals from a linear model of osteological and polygenic height score with sex as a covariate Table S12b. Comparisons for individuals with LEH and healed cribra orbitalia or porotic hyperostosis with the residuals from a linear model of osteological height and polygenic height score with sex as a covariate. Table S13a. Comparisons for individuals with LEH, cribra orbitalia or porotic hyperostosis with the residuals from a linear model of osteological and polygenic height score with sex as a covariate within cultural periods. Table S13b. Comparisons for individuals with LEH and healed cribra orbitalia or porotic hyperostosis with the residuals from a linear model of osteological and polygenic height score with sex as a co-variate within cultural periods. Table S14a. Comparisons for individuals with LEH, cribra orbitalia or porotic hyperostosis with the residuals from a linear model of osteological and polygenic height score with sex as a covariate across cultural periods. Table S14b. Comparisons for individuals with LEH and healed cribra orbitalia or porotic hyperostosis with the residuals from a linear model of osteological and polygenic height score with sex as a co-variate across cultural periods.

Supplementary Materials
Table S15. A model including both individual polygenic risk scores and MDS components Table S16. Assessing imputation accuracy in high coverage vs. low coverage paleogenomic data Table S17. Assessing imputation of heterozygote sites in imputed data.