Abstract
Existing models of human growth provide little insight into the mechanisms responsible for inter-individual and inter-population variation in children’s growth trajectories. Building on general theories linking growth to metabolic rates, we develop causal parametric models of height and weight growth incorporating a novel representation of human body allometry and a phase-partitioned representation of ontogeny. These models permit separation of metabolic causes of growth variation, potentially influenced by diet and disease, from allometric factors, potentially under strong genetic control. We estimate model parameters using a Bayesian multilevel statistical design applied to temporally-dense height and weight measurements of U.S. children, and temporally-sparse measurements of Indigenous Amazonian children. This facilitates a comparison of the metabolic and allometric contributions to observed cross-cultural variation in the growth trajectories of the two populations. These theoretical growth models constitute an initial step toward a better understanding of the causes of growth variation in our species, while potentially guiding the development of appropriate, and desired, healthcare interventions in societies confronting growth-related health challenges.
Short Summary New causal models of human growth facilitate cross-cultural comparisons of metabolism and allometry.
Introduction
Between birth and adulthood, humans generally increase in height and weight following trajectories unique to our species, and notably different from those of other mammals (von Bertalanffy 1957; Bogin 1999; Hamada and Udono 2002). However, like many widely-distributed animals, the typical adult body form, and thus the typical shape of the growth trajectory, differs between populations in different parts of the world (Eveleth and Tanner 1990). Some aspects of body form are heavily influenced by genes (Lello et al. 2018; Zoccolillo et al. 2020). Thus, both drift and natural selection may contribute to this intra-species variation through their action on genes at the population level over generations in different terrestrial ecosystems (e.g., Allen’s and Bergmann’s Rules: Katzmarzyk and Leonard (1998)). Other important contributions to variation in human body form come from environmental factors, such as food availability, pathogen exposure, and chronic stress (Stewart et al. 2013; Li et al. 2020; Tanner et al. 1982; Bogin 2022), that directly, and differentially, affect the growth of individuals in different societies around the world. Distinguishing among these contributions to population-level variation in growth is a major challenge (Hruschka 2021), both for our understanding of human ontogeny, and for the provision of appropriate healthcare support to individuals whose particular patterns of growth result from exposure to health challenges that can be alleviated (e.g., stunting and wasting as indicators of malnutrition: World Health Organization and United Nations Children’s Fund (2009); de Onis and Branca (2016)). As an initial attempt to address this challenge, we develop new causal models of human growth that permit separation of different sources of growth variation.
Growth models
Over the last century, interest in human growth has resulted in a rich set of mathematical models to describe the non-trivially complex growth trajectories of children. Growth models are often classified as “parametric” or “non-parametric”. Parametric models define a specific range of possible forms for the growth trajectory, implemented as a mathematical relationship between a (relatively small) number of parameters, the values of which (and the shape of the resulting trajectory) may vary from person to person. Ideally, the mathematical relationship is derived from some causal theory of growth, and the parameter values thus have a mechanistic interpretation. However, for most popular parametric models (e.g., Jolicoeur et al. (1992)), generally neither is true (Hauspie and Molinari 2004; Gasser et al. 2004). Non-parametric models, e.g., splines, are explicitly descriptive, and can be made to fit growth data arbitrarily well. However, their (generally numerous) parameters are, by design, biologically uninterpretable, offering no causal insight. A third class, called “shape invariant” models, employ either a parametric (e.g., QEPS: Nierop et al. (2016)) or non-parametric (e.g., SITAR: Cole et al. (2010)) function to represent the population mean trajectory, and then use a smaller set of individual-level parameters to account for each individual’s deviation from this mean trajectory (Gasser et al. 2004). All of these models are valuable tools for representing and comparing individual growth trajectories given the temporal resolution of most available data (where higher resolution data motivates structurally different models: Lampl (2012); Suki and Frey (2017)). Importantly, however, none of these models is derived from a causal theory of human growth, and they therefore provide little insight into the mechanisms through which individual characteristics (e.g., genes) and experiences (e.g., illness) result in an observed growth trajectory.
Though rarely applied specifically to humans, general causal models of organismal growth have a long history in biology. Pütter (1920) proposed a theory, further developed and popularized by von Berta-lanffy (1938, 1957), such that organisms grow when the energy released by breaking down molecules acquired from the environment (via catabolism) exceeds the energy required for homeostasis (e.g., cellular maintenance). The excess energy can be used to synthesize new structural molecules (via anabolism) using both the products of catabolism and other substrates acquired from the environment. Because all living cells perform catabolic reactions, an organism’s total mean catabolic rate is a function of its mass. However, the production of the excess energy required for anabolic structural growth is a function of the surface area of the body’s interface with the environment through which structural and energy-containing molecules are absorbed. In recent years, this basic principle has been incorporated into more general theories, detailing how environmental substrates of a given energy density, acquired by cells and transformed into reserves, are then allocated to maintenance, growth, and reproduction (Kooijman 2001, 2010), how anabolic and catabolic processes (and therefore growth) depend on temperature (Gillooly et al. 2001), and how anabolism (and therefore growth) is limited by the structure of the supply network (e.g., capillaries) transporting environmental substrates to internal cells (West et al. 1997, 2001). We develop a causal theory of growth tailored to humans by combining the basal model of Pütter (1920) and von Berta-lanffy (1938) with a novel representation of human body allometry, and a phase-partitioned representation of human ontogeny. This results in new parametric models of the metabolic processes and allometric relationships that underlie both the unique species-typical pattern, and the individual variations, of human growth.
Model estimation
We apply our new models to height and weight measurements from two populations that, on average, exhibit different patterns of growth: affluent U.S. children born in the late 1920s and Indigenous Matsigenka children from Amazonian Peru measured in 2017-2019. Importantly, these two datasets also differ in temporal resolution: each U.S. child was measured (at least) annually from birth to age 18 (Tuddenham and Snyder 1954), while most Matsigenka were measured (by the authors) only once or twice between the ages of one and 24. We believe the temporal sparsity of the Matsigenka dataset is typical of data feasibly collected in many rural, isolated, or marginalized societies around the world. Attention to such societies is essential, both to understand the range of growth variation found in our species, and, even more importantly, to provide desired and appropriate support to address growth-affecting health challenges faced by many of these populations.
To accommodate the differing temporal resolutions of the two datasets, we design a Bayesian multilevel (mixed-effects) estimation strategy (McElreath 2020; Vincenzi et al. 2020; Johnson 2015). Given our theoretical model, this allows us to quantify differences between U.S. and Matsigenka children in growth-phase-specific metabolic rates, plausibly resulting from population-level differences in diet and/or disease exposure, as well as differences in growth-phase-speci fic allometric relationships, potentially (though not necessarily: Tanner et al. (1982)) under strong genetic influence. An important goal of this approach is a tool to suggest when, and what form of, healthcare interventions would increase child well-being in a particular population, using data feasibly collected across a broad range of human societies.
Results
Theoretical model
Equations 1 and 2 present the growth functions we derived (see Methods) for height h and weight m, respectively, at total age t years since conception:
where H ≥ 0 is synthesized mass per unit surface (via anabolism), and K ≥ 0 is destructed mass per unit mass (via catabolism). The shape of the human body is stylized as a cylinder with height h, radius r, and substrate-absorbing (i.e., intestinal) surface area 2πrh. The parameter 0 < q < 1 represents the allometric relationship between radius and height as the body grows in both dimensions, such that r = hq.
Human post-natal growth is traditionally divided into three phases, which tend to be characterized by distinctive hormonal profiles and growth rates (Karlberg 1989): 1. infancy, 2. childhood, and 3. adolescence. To represent these three phases, we follow Nierop et al. (2016) by modeling cumulative growth as the sum of three growth functions (i.e., without their model’s fourth “stop” function). In our model, each component function may have different values for the parameters H, K, and q, represented by subscripts 1 to 3. This yields the following nine-parameter composite growth models for height (Equation 3) and weight (Equation 4) at total age t:
Figure 1 presents a graphical illustration of these composite growth models for height and weight. Note that, in these models, the infant growth phase includes fetal growth beginning at conception. Importantly, the relative positions of the three component functions with respect to age are estimated empirically (with guidance from priors), and are not fixed a priori. Thus, use of these growth models does not require us to define the three growth phases in terms of age. Note also that the domains (x-axis ages) over which the three component functions change value (on the y-axis) differ between the height and weight models when fit to growth data. For instance, most height growth in the adolescent phase is predicted to occur between 12 and 17 years since conception, while most weight growth in the same phase is predicted to occur between ages 7 and 20. Similarly, the relative amounts of height and weight growth occurring in each phase are estimated to differ.
Composite growth models and component functions. Component growth functions in the height (upper plot) and weight (lower plot) models correspond to the three phases of growth (infancy, childhood, and adolescence), and to the three terms in Equations 3 and 4, respectively. The infancy function has parameters H1, K1, and q1, while the childhood and adolescent functions contain parameters H2, K2, and q2, and H3, K3, and q3, respectively. The sum of these three component functions is the cumulative growth trajectory for height or weight (solid black lines) for total age since conception. Shown here are mean trajectories estimated for U.S. girls.
Such contrasting patterns of growth in height and weight are not unexpected. Height increases are primarily due to changes in bone structure, and thus height generally does not decrease between birth and adulthood. In contrast, weight changes result from bone development (contributing to increases) and soft tissue development (contributing to increases or decreases), both of which are controlled by relative rates of anabolism and catabolism. Metabolic parameters (K and H) will differ when estimated from height and weight trajectories to the extent that patterns of soft tissue (e.g., muscle and fat) development cause deviations from the constant relationship between height, intestinal surface area, and weight assumed by the theory (see Methods). In the composite models, these parameters influence the shape of the growth functions in each growth phase, which may therefore take different forms when fit to height and weight data from the same children.
Figure 2 compares the form of the composite height model to those of the shape-invariant SITAR model (Cole et al. 2010) and the parametric JPA-1 model (Jolicoeur et al. 1992), when fit to data from U.S. children. As shown in the figure, the composite height model does not fit the data as well as these other models, tending to underestimate height in early childhood, while overestimating it in infancy and late childhood. Due to such systematic errors in model fit, objective features of the growth trajectories estimated from these composite models (e.g., age at maximum growth velocity) should be interpreted with caution. Similarly, due to simplifying assumptions used in the derivation of the new theory (see Methods), objective values of estimated parameters (e.g., the rate of anabolism, H) must also be interpreted with caution. However, as argued in Appendix F (comparing residuals) and demonstrated below, model estimates show considerable promise for relative comparisons of individual and population-mean growth trajectories, metabolic rates, and allometric relationships.
Comparison of growth models. Four alternative models estimate a mean growth trajectory using height measurements from U.S. boys, treated as cross-sectional data. The composite model representing three growth phases (Composite-3) is that presented in the main text. A modification of this model to represent five distinct growth phases (Composite-5), the parametric JPA-1 model of Jolicoeur et al. (1992), and the spline-based SITAR model of Cole et al. (2010) are described in Appendices E.1–E.3. A more detailed comparison of these models is presented in Appendix F. Estimated height trajectories are ascending, and the corresponding velocity curves are all descending after age 14, from left to right.
As shown in Figure 2, model fit is considerably improved by representing five, rather than three, growth phases (i.e., adding more parameters, and hence flexibility, to the composite model: Appendix E.1). However, as we are currently unaware of a theoretical justification for more than three distinct phases, implementing a more complex model would sacrifice mechanistic understanding for the sake of better description, thereby decreasing its unique contribution relative to existing parametric and non-parametric growth models.
Empirical analysis
Figure 3 illustrates differences in mean posterior estimates of population-level and individual-level growth trajectories for U.S. and Matsigenka girls and boys between conception and adulthood. Note that the model predicts that population-mean differences in height trajectories are primarily the result of processes occurring during infancy and childhood in both boys and girls (compare the component trajectories for the respective growth phases). Differences in mean weight trajectories are predicted to result from processes occurring primarily during childhood (girls), or childhood and adolescence (boys).
Estimated height and weight trajectories. Shown are posterior estimates from the composite height (upper row) and weight (lower row) growth models fit to temporally-dense measurements from U.S. children (red), and temporally-sparse measurements from Matsigenka children (blue). Thin lines are mean posterior trajectories for each individual. Thick red and blue lines are mean posteriors for the mean cumulative trajectories of each ethnic group, as well as the trajectories within each of the three growth phases: infancy, childhood, and adolescence. Corresponding mean posterior mean cumulative velocity trajectories in green and orange are decreasing (after age 14) from left to right.
Figure 4 incorporates uncertainty in the estimated U.S. and Matsigenka mean growth trajectories in order to compute population-level contrasts of three important descriptive characteristics of these trajectories: maximum achieved height and weight, maximum achieved velocity, and the age at maximum velocity. As explained above, these characteristics are suitable for relative comparisons, though their objective values should be interpreted with caution.
Estimated characteristics of mean growth trajectories by ethnic group. Shown are 90% highest posterior density intervals (HPDI: McElreath (2020)) for posterior distributions of biologically-meaningful characteristics of the mean height trajectories (left two columns) and weight trajectories (right two columns) of U.S. (red) and Matsigenka (blue) children, by growth phase. Posterior density outside of this HPDI is shown as white tails on a distribution. Distribution means are shown as solid vertical lines. The 90% HPDI of the U.S. - Matsigenka contrast (difference) is shown in grey. HPDIs of contrasts that do not overlap zero (dotted vertical lines) indicate a detectable ethnic-group difference in estimated characteristics. Maximum height velocity during the adolescent growth phase could not be calculated because the estimated values of the q3 parameters are so small (Figure 5) that, when used in Appendix Equation A.13, they require computation of 220,000, which is beyond the capacity of the computer. Small objective values of q3 are likely an artifact of the fitting algorithm (see Discussion).
Note that, consistent with Figure 3, U.S. and Matsigenka children are, on average, often predicted to be reliably distinguishable on the basis of these growth characteristics. In particular, Matsigenka have lower estimated mean maximum height and weight and lower mean maximum velocity in nearly all growth phases for both sexes (except female infant weight velocity). In the absence of other information, this could be interpreted as evidence that Matsigenka are, on average, stunted (i.e., potentially malnourished) at all ages relative to U.S. children. However, as we show next, comparing model parameter estimates suggest a different interpretation.
Figure 5 presents a comparison of the population-mean parameter estimates for U.S. and Matsigenka children. As explained above, these estimates are suitable for relative comparisons, though their objective values should be interpreted with caution. Note that the parameter q2, representing the way body form changes during the childhood growth phase, is estimated to be consistently larger among Matsigenka than among U.S. children (hence, for a given h, r is larger because r = hq for 0 < q < 1). According to the theory developed here, this suggests that, for a given height and density, Matsigenka children are heavier (where mass/density = volume = πr2h) and/or have greater intestinal surface area (2πrh) than U.S. children. Contrapositively, for a given weight and density during the childhood growth phase, Matsigenka children are predicted to be shorter due to the fact that they are wider and/or that they have greater intestinal mass (and hence absorbing surface). This difference in body form, which is also apparent in the raw data (Appendix Figure A.25), is, assuming the theoretical model, independent of metabolic rates H and K, and is thus plausibly unrelated to any differences in nutrient intake or energy expenditure between the populations.
Estimates of mean parameter values by ethnic group. Shown are 90% highest posterior density intervals (HPDI) for posterior distributions of model parameters (Equations 3 and 4) underlying the mean height trajectories (left two columns) and weight trajectories (right two columns) of U.S. (red) and Matsigenka (blue) children, by growth phase. Plot features are analogous to Figure 4. Interpretation of parameters and their estimated values is only meaningful under the strict assumptions used to derive the theoretical growth models, explained in the main text.
As shown in Figure 5, according to the height model, during infancy the rates of catabolism (K1) and anabolism (H1) both tend to be higher among Matsigenka than among U.S. infants. This trend reverses in the childhood growth phase, according to the weight model. A trend toward higher anabolism (but not catabolism) among U.S. boys is predicted by the height model in childhood, and by the weight model in adolescence. As explained above, growth phases for the height and weight models are estimated to correspond to slightly different age ranges, and differences in parameter estimates between height and weight models are expected to result from the contribution of soft tissue development to weight, but not to height.
Discussion
We have developed new causal models of human height and weight growth and estimated their parameters using temporally-dense data from a population of U.S. children, and temporally-sparse data from a population of Indigenous Matsigenka children. These two ethnic groups tend to differ in terms of estimated mean rates of anabolism and catabolism, and the allometric relationship between body height and intestinal surface area, during three phases of growth between conception and adulthood. Below we present possible explanations for these differences, implications for health interventions, limitations of this new approach, and directions for future research.
Inter-group comparison and interventions
According to the height growth model, on average, Matsigenka infants are estimated to have higher rates of both anabolism (H1) and catabolism (K1) than U.S. infants (Figure 5, Infant, Height), in contrast to studies in other populations that find little cross-cultural variation in weight-adjusted infant energy expenditure (Prentice and Paul 2000). We note that, in the above analysis, U.S.-Matsigenka comparisons in the infant growth phase must be interpreted cautiously, as component growth functions are fit to very few measurements of Matsigenka children younger than two years (Figure 6). With this caveat in mind, the observed population-level differences could result from Matsigenka infants consuming more energy during the first years of life, perhaps relating to the timing and type of foods (supplemental to breast milk) introduced during in-fancy, practices that were changing rapidly in the U.S. of the late 1920s (Bentley 2014; Castilho and Barros Filho 2010; Stevens et al. 2009). However, this hypothesis requires future systematic study. At the same time, Matsigenka infants may also expend relatively more energy due to immune system activity in response to parasites or other pathogens in the tropical forest environment (Urlacher et al. (2018); Gurven et al. (2016); Garcia et al. (2020), though see Urlacher et al. (2021)), e.g., intestinal pathogens accompanying the introduction of solid food.
Height and weight measurements by ethnic group. Shown are the height (upper row) and weight (lower row) measurements of U.S. (red) and Matigenka (blue) children used in this study. Measurements from the same individual at different ages are connected by lines. The temporally-dense U.S. data are taken from Tuddenham and Snyder (1954), while the temporally-sparse Matsigenka data were collected by the authors.
In the child growth phase, U.S. children are estimated to have, on average, higher rates of anabolism (H2) than Matsigenka children (Figure 5, except as estimated from female height). This could result from the fact that, after weaning, Matsigenka children consume foods produced through their parents’ daily fishing, hunting, gathering, and horticultural activities, that, on average, likely have higher fiber content and lower energy density than foods available to upper-class U.S. children in the 1930s, a time when commercially processed foods were gaining in popularity (Bentley 2014). Simultaneously, according to the weight model, U.S. children are estimated to have higher rates of catabolism (K2) than Matsigenka children (Figure 5, Child, Weight). This result could simply be an artifact of differences in body density, such that U.S. children tend to have higher proportions of body fat, and therefore lower density, than Matsigenka children. As discussed in Appendix G, for a given height, weight, age, and q, decreasing density (D) tends to result in a higher estimate for the catabolic rate parameter K (and/or a lower estimate for the anabolic rate H). A U.S.-Matsigenka difference in body density would be consistent with the documented positive relationship between adiposity and the adoption of a market-integrated diet and lifestyle (Urlacher et al. 2021; Bethancourt et al. 2019). Future incorporation of individual-level measures of body density D (e.g., derived from proportion body fat) into the empirical analysis may lead to more accurate estimates of metabolic parameters K and H.
As noted in the Results, one interpretation of the U.S.-Matsigenka difference in the childhood allometric parameter q2 (Figure 5) is that, for given rates of anabolism and catabolism, U.S. children tend, on average, to grow taller than Matsigenka children for a given weight, circumference, and/or intestinal surface area. This could be due to population-level differences in genes that influence body form, resulting from either drift or natural selection. For instance, there is genetic evidence that short stature may provide a selective advantage in tropical forest environments (Perry et al. 2014; Perry and Verdu 2017), though the nature of any such height adaptation, or linkage to another adaptive trait, is still poorly understood (Perry and Dominy 2009). Alternatively, the observed differences in q could result from Matsigenka having greater intestinal surface area than U.S. children for a given height (e.g., due to greater intestinal length, and/or a higher density of villi and microvilli: Crawley et al. (2014)). The length (and absorbing surface area) of the human small intestine shows considerable inter-individual variation and plasticity in response diet and disease (Weaver et al. 1991). Thus, the development of greater intestinal surface area during childhood could, conceivably, be a plastic response to the lower energy density (e.g., higher fiber content) of typical Matsigenka foods. This would coincide with the negative relationship between gut surface area and dietary energy density (fauna > fruit > foliage) observed in other mammals (Chivers and Hladik 1980). Furthermore, this interpretation would be consistent with the lower rate of anabolism (H2) estimated for Matsigenka during the child growth phase (above). As shown in Appendix G, a larger value of q for Matsigenka is unlikely to be an artifact of their (presumably) greater body density: increasing density is expected to result in a decrease, rather than an increase, in the estimate of q.
Based on these hypothesized causes of growth differences between U.S. and Matsigenka children, a potential healthcare intervention to increase Matsigenka growth (if this is their wish) could aim to decrease pathogen load during infancy, and supplement the diet with more energy-dense food between weaning and puberty. According to the theory of growth developed here, one expected outcome of such an intervention would be Matsigenka who are both taller and heavier. However, if the estimated allometric differences (i.e., in q2) are under strong genetic control, then, for a given weight, supplemented Matsigenka children would still be shorter than U.S. children, on average. As shown above, compared to existing growth models, the promise of this causal modeling approach is a richer mechanistic understanding of observed differences in children’s growth trajectories. Such an understanding can inspire speci fic testable causal hypotheses, which, if supported, can inform more effective healthcare interventions.
Limitations and future directions
In developing this theory of human growth, we have made several key simplifying assumptions to facilitate tractable derivation of the mathematical models. These assumptions place important limits on interpretation of model estimates. For instance, the intestine is represented as the (partial) surface of a cylinder with length equivalent to standing height and width equivalent to body diameter. These assumptions are known to be highly inaccurate (Appendix Figure A.1), and they limit us to relative comparisons of parameter estimates, rather than interpretation of objective values. Given the above assumption, the relationship between body/intestinal radius and height is represented as a simple power function. While better than alternative simple functions (Appendix Figures A.2 and A.3), this assumption precludes any weight change resulting from soft tissue (e.g., fat and muscle) development that does not correspond to a height increase. This results in different parameter estimates by the height and weight models for the same individual (or population: Figure 5). Furthermore, body density is assumed to be constant within individuals during ontogeny, and across both individuals and populations. This precludes known differences in bone, fat, and muscle composition (e.g., between populations: Urlacher et al. (2021)), and thus, as illustrated above for the child growth phase, potentially distorts inter-group comparisons of metabolic parameters (K and H), as well as the allometric parameter (q, Appendix G). Fortunately, approximate individual-level measures of body density (e.g., derived from proportion body fat, and bone, muscle, and water mass) can, potentially, be obtained relatively easily using commercially-available portable electronic balances that record electrical impedance (e.g., Tanita BC-730F). Thus, future cross-cultural analyses should consider employing such density measures when fitting theoretical models to height and weight data.
An additional important assumption arises from the Bayesian frame-work used here to fit the theoretical models. The Hamiltonian Monte Carlo algorithm employed by the Stan platform (Stan Development Team 2022) to explore the posterior parameter space requires models to be continuous (differentiable) over their entire domain. This, in turn, requires us to assume that metabolic processes responsible for growth in all three developmental phases (i.e., infancy, childhood, and adolescence) begin at conception and continue until the end of life. It seems more plausible that the metabolic processes characterizing each growth phase are initiated by changing hormonal production at different points in development (Karlberg 1989), rather than at conception. Our assumption thus distorts objective estimates of parameter values in later growth phases, especially adolescence (where the component function requires a long left-side tail held near zero: Figure 1), and would likely obscure any general trend for metabolic rates to decrease between early childhood and adulthood (Pontzer et al. 2021).
As a consequence of these assumptions, even with a large amount of temporally-dense longitudinal data, growth trajectories and parameter values estimated from the composite growth models, in their present form, must be interpreted with caution. In this study, Matsigenka children are represented by a temporally-sparse dataset, and it will be of interest to know if and how model estimates of Matsigenka growth parameters change as more data are collected in coming years. Therefore, due to simplifying theoretical assumptions and temporally-sparse data, the above comparison of U.S. and Matsigenka children’s growth should be viewed as an illustration of the promise and potential of this approach, rather than a definitive analysis. Improvements to our understanding of human growth (e.g., the number and timing of growth phases, the relationship between intestinal surface area and body shape) are expected to manifest as improvements in the fit of theoretical models to data and in the plausibility of the resulting parameter estimates. In this way, the models developed here provide a baseline metric against which future improvements in theory can be judged.
Another important area for future work is the collection of child height and weight data in populations spanning the range of human variation. Hierarchical Bayesian statistical analysis techniques, like those employed here, allow estimation of individual and population-mean growth trajectories using temporally-sparse data of a form reasonably collected in many remote non-urban societies that have been largely neglected in clinical growth studies. While recognizing that more longitudinal data usually result in more accurate estimates, this analytical framework facilitates (at least preliminary) inter-ethnic comparisons of growth that include remote populations from which the longitudinal collection of height and weight measurements is challenging. By distinguishing growth mechanisms potentially responsive to healthcare interventions (e.g., metabolic rates) from those that may be less so (e.g., allometric relationships), the theoretical models presented here constitute an initial step toward the design of desired healthcare support strategies better tailored to the needs of particular communities, and, importantly, toward a better understanding of the causes of growth variation in our species.
Methods
Derivation of theoretical growth models
In the Pütter (1920) and von Bertalanffy (1938) theory of growth, mass increases when the rate of anabolism (using energy to construct molecules) is greater than the rate of catabolism (breaking down molecules to release energy). All cells perform catabolic reactions. Anabolism is contingent on the acquisition of molecules (containing both energy and building materials) from the environment. Only parts of the organism in contact with the environment can acquire such molecules. Therefore, anabolism is a function of an organism’s absorbing surface area (under the assumption of abundant food: Kooijman (2010)), while catabolism is a function of an organism’s mass (= volume*density):
where m is mass, s is surface area, H is synthesized mass per unit surface (via anabolism), and K is destructed mass per unit mass (via catabolism).
We are interested in the rate of longitudinal (i.e., height) growth per unit time. Assume the organism’s shape can be roughly approximated by a cylinder of radius r and length h (Appendix Figure A.1). Equation 5 can be used to approximate the rate of growth in mass:
where D is density. Solving Equation 6 for the longitudinal growth rate yields:
Assuming a power law relationship between r and h such that r(t) = h(t)q, where 0 < q < 1 (justified in Appendix A), and substituting into Equation 7 yields:
Under the same assumptions (see Appendix B), re-writing s in Equation 5 in terms of m yields the rate of growth in mass:
Assuming D = 0.001 kg/cm3 (with units facilitating the modeling of heights in cm and weights in kg) and solving the two previous differential equations for h(t) and m(t) yields Equations 1 and 2, respectively. Solutions are provided in Appendices A and B.
Empirical analysis
We fit the composite growth models (Equations 3 and 4) to two datasets: 1) temporally-dense height and weight measures from 70 girls and 66 boys born in 1928 and 1929 to parents of relatively high socioeconomic status in Berkeley, California, U.S.A. (Tud-denham and Snyder 1954); and 2) temporally-sparse measures from 196 girls and 179 boys collected by C.R.M. and J.A.B. between 2017 and 2019 in four Matsigenka Native Communities inside Manu National Park in the lowland Amazonian region of Peru. Matsigenka in these remote communities are horticulturalists, gatherers, fishers, and hunters who produce most of their own food and have relatively little (though increasing) dependence on broader Peruvian society (Bunce and McElreath 2017; Revilla Minaya 2019; Shepard et al. 2010). We refer to these as the U.S. and Matsigenka datasets, respectively.
The U.S. dataset comprises data collected at regular intervals between birth and (at most) 21 years of age (shorter intervals before age two), with an average of 30 measurements per individual. Extrapolated measurements reported in Tuddenham and Snyder (1954) are excluded. To aid in fitting the theoretical models, we duplicated each individual’s last recorded height and weight measurements yearly until the age of 26. We also set each individual’s height and weight to zero at nine months prior to birth (i.e., conception).
The Matsigenka dataset comprises data collected, using an electronic balance (Tanita BC-351) and a stadiometer (Seca 213), from all individuals who were between two and 24 years of age and residing in the Matsigenka communities or attending one of three boarding secondary schools in neighboring communities at the time of our visits between 2017 and 2019. One Matsigenka community and the three boarding schools were visited twice in this period. Five one-year-old children who were eager to be measured are also included in the dataset. Due to our uncertainty about age in this population, ages are recorded to the nearest year. A maximum of three, and an average of 1.3, measurements per person were collected. The majority (70%) of Matsigenka children are represented by measurements at a single time point. As with the U.S. dataset, we set each individual’s height and weight to zero at conception. Raw data are plotted in Figure 6.
We fit the following models of observed height hjt and weight mjt of person j at time t to the combined U.S. and Matsigenka datasets:
where ηjt and μjt are Equations 3 and 4, respectively, which, when each parameter is indexed by j, represent the height and weight trajectories of individual j in the three growth phases. To represent the fact that observed heights and weights can never be negative, we take the log of both the observations and the mean of the Normal likelihood. As a consequence, variance due to measurement error around individual j’s actual height at time t is calculated as , where ση is the standard deviation on the log scale. Variance due to error in weight measurement is interpreted analogously. Females and males are fit with separate models.
We use a multilevel model design in a Bayesian framework (McEl-reath 2020), such that individuals’ height or weight trajectories in a given growth phase are estimated as the sum of ethnic group-level and person-level offsets to a baseline growth trajectory (i.e., random effects). We allow these offsets to covary across growth phases of an individual. Complete model structure, derivation, and priors are provided in Appendices C and D. Models were fit in R (R Core Team 2022) and Stan (Stan Development Team 2022) using the cmdstanr package (Gabry and Cesnovar 2021). Data and analysis scripts are provided at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model.
Ethics statement
Research was conducted with authorization from the Servicio Na-cional de Áreas Naturales Protegidas por el Estado del Perú (SER-NANP). Verbal informed consent from adult Matsigenka participants, and informed assent from their minor children, was obtained following standards for the protection of research subjects implemented by the Department of Human Behavior, Ecology, and Culture at the Max Planck Institute for Evolutionary Anthropology. Results were presented to, and discussed with, Matsigenka participants in community-wide meetings prior to publication.
APPENDICES
Acknowledgments
We thank participants in the Matsigenka Native Communities of Tsirerishi, Tayakome, Sarigeminiki, and Yomibato. M. Navarra of Chaska Wasi (Salvacóon), Padres P. Rey and C. Llana of the Misión san Miguel Arcangel Shintuya, E. Herrera, K. Mejía, and M. Chinchiquiti of Maganiro Matsiguenka (Boca Manu), N. Oyeyoyeyo, R. Chiqueti and family of Tayakome, C. Huamantupa, J. Poma and family of Ata-laya, V. Chávez, C. Flores, and F. Rayan of the Cocha Cashu Biological Station, N. Santullo and C. Matos of Rainforest Flow, J. Florez and E. Meza of SERNANP Cusco, O. Espinosa of the Ponti ficia Universidad Católica del Perú, M. Minaya, O. Revilla, N. Revilla, G. Lugon, and L. Revilla provided support with fieldwork. S. Atmaca, B. Beheim, A. Büchner, and A Bublikova helped with data transcription. R. McEl-reath, B. Beheim, C. Ross, D. Lukas, W. Church, and other members of the Department of Human Behavior, Ecology, and Culture (HBEC) at the Max Planck Institute for Evolutionary Anthropology provided valuable comments and criticism on the theoretical models, statistical analysis, and/or earlier versions of this paper. All errors are ours. Funding was provided by the Max Planck Society.
Appendix A. Derivation of the height growth model
We require a geometrical representation of the human body that incorporates both height and intestinal surface area, and is simple enough to make Equation 5 in the main text solvable. We employ the special case where an everted tube becomes a solid cylinder of length h and radius r, such that intestinal surface area is represented by the external surface of the cylinder (minus the circular ends), and the length of the cylinder represents height (Figure A.1). Incorporating this geometry into Equation 5 yields Equation 6 in the main text. Solving the latter for the longitudinal growth rate yields Equation 7 in the main text.
Approximating the geometry of the human body. A) The body is represented as a tube of length (i.e., height) h. The nutrient-absorption surface (i.e., intestine) is on the inside and has radius r. The volume of the body is contained in the wall of the tube, which has thickness w. In reality, due to villi and microvilli, the absorbing surface area of the human digestive tract is more than an order of magnitude larger than the external (skin) surface area (Mosteller 1987; Helander and Fändriks 2014). Thus, a tube, where internal surface area is necessarily less than that of the exterior, is an imperfect representation. B) The tube is everted (turned inside out), such that the absorbing surface is now on the outside. If r and the solid volume are held constant, this requires the thickness of the new wall i to increase relative to w, such that . C) For convenience in the analysis that follows, we assume the special case where
, such that i = r, and the body can thus be approximated by a cylinder with height h, radius r, absorbing surface area 2πrh (i.e., not including the surface area of the cylinder’s ends), and volume πr2h. This implements an assumption that any increase in solid body volume (i.e., mass, given constant density) must be accompanied by a corresponding increase in absorbing surface area.
The theory of Pütter (1920) and von Bertalanffy (1938) makes different predictions depending on how an organism changes shape (or not) as it grows. If the cylindrical organism grows in only one dimension, e.g., it grows in length h but not in diameter, we can set radius r as a constant, which, from Equation 7, yields the rate of length growth:
Because H, D, r, and K are assumed to be constant over time, and can therefore be consolidated into a single constant B, this equation has the form:
whose solution is the exponential function:
where C is a constant of integration. Thus, under the above assumptions, such an organism would grow continuously and exponentially in length, quickly becoming very long while retaining the same diameter. Now assume that the cylindrical organism grows in size and maintains the same shape (i.e., body proportions). In other words, the ratio of its radius r to its length h at any time t is a positive constant F. Setting r(t) = F · h(t), and substituting into Equation 7, yields the rate of length growth:
Consolidating constant terms into constants G and J, this rate has the form:
whose solution is the negative exponential function described by von Bertalanffy (1938):
where C is a constant of integration, and is a constant representing the asymptotic (e.g., maximum achieved) length. Thus, under the above assumptions, such an organism would grow at an ever-decreasing rate while maintaining the same shape, approaching a maximum size of length
and radius
. As was recognized early on (Pütter 1920), such a pattern of growth characterizes many fish species, and this model continues to be used as an excellent representation of fish growth (Es-sington et al. 2001; Vincenzi et al. 2020). One hypothesis for this model’s success is that a fish’s nutrient absorption surface (e.g., intestine) increases, like its external surface, at a constant proportion of the fish’s mass. In other words, the body proportions of most fish remain relatively constant between hatching and adulthood.
The growth of humans (and many other animals) falls between these two extreme patterns: The body changes in both size and shape over time, and no dimension can be reasonably assumed to be constant. Here, we are interested in three phases of growth, which roughly correspond to: 1) fetal period and infancy; 2) childhood; and 3) adolescence (inspired by Karlberg (1989), as explained in the main text). During each of these growth phases, the body changes shape in a specific way, namely, (as a rough approximation) height and circumference (=radius * 2π) both increase, but the ratio of circumference to height decreases over time, such that the body elongates.
Evidence for this comes from a comparison of height to both head circumference (Figure A.2) and waist circumference (Figure A.3). Note that, in these figures, circumference increases over time. Therefore r is not a constant, and, under the assumption that intestinal surface area is proportional to body circumference, we should not expect to observe pure exponential growth in height as described by Equation A.3. Similarly, the ratio circumference/height decreases over time. Therefore is not a constant F, and, under the same assumptions of intestinal surface area, we should neither expect to see pure negative exponential growth in height as described by Equation A.6.
Changes in head circumference and height by age since birth. Reference trajectories of median height by age (top row) are taken from LMS values published by the CDC (https://www.cdc.gov/growthcharts/percentile_data_files.htm), derived from a cross-sectional sample of U.S. children. Reference trajectories of median head circumference by age (middle row) are taken from the LMS values of Schienkiewitz et al. (2011), from a nationally representative cross-sectional sample of German children. All reference trajectories are shown as black lines. Colored lines show predicted trajectories of head circumference and head circumference/height given median height trajectories, assuming the body is a cylinder with radius r and height h, and linear (height) growth is determined by Equation 7 in the main text, under three alternative relationships between r and h (dotted, dashed, and solid blue or red lines). Head circumference/height (bottom row) is calculated by dividing the respective median values for each age, thereby implementing a simplifying assumption that children with median height also have median head circumference. Functions relating head circumference to height were fit to simulated median head circumference trajectories using Rstan (Stan Development Team 2018). Data and analysis scripts are available at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model.
Changes in waist circumference and height by age since birth. Reference trajectories of median height by age (top row) are taken from LMS values published by the CDC (https://www.cdc.gov/growthcharts/percentile_data_files.htm), derived from a cross-sectional sample of U.S. children. Reference trajectories of median waist circumference (middle row) and median waist circumference/height (bottom row) by age are taken from the LMS values of Sharma et al. (2015), using CDC data. All reference trajectories are shown as black lines. Colored lines show predicted trajectories of waist circumference and waist circumference/height given median height trajectories, assuming the body is a cylinder with radius r and height h, and linear (height) growth is determined by Equation 7, under three alternative relationships between r and h (dotted, dashed, and solid blue and red lines). In the bottom row, observed median waist circumference/height is shown as a black line. Grey, blue, and red lines represent waist circumference/height calculated by dividing the respective median values for each age, thereby implementing a simplifying assumption that children with median height also have median head circumference. Functions relating waist circumference to height were fit (using Rstan (Stan Development Team 2018)) to simulated median waist circumference trajectories only until ages 14 and 16, for girls and boys, respectively (vertical dotted lines). After these ages, observed waist circumferences diverge notably from the predictions of all functions. Data and analysis scripts are available at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model.
Given our simplifying assumption of a cylindrical human body (Figure A.1), r represents the intestinal diameter, intestinal length is equal to height h, and thus intestinal surface area is equivalent to skin surface area. In reality, the absorbing surface area of the human digestive tract is more than an order of magnitude larger than the skin surface area (Mosteller 1987; Helander and Fändriks 2014), such that, if the digestive tract is represented with length h, its radius would need to be much larger than the radius r of the cylindrical body. However, our modest initial objective is only to represent the general relationship between intestinal surface area and height as both increase. Specifically, we hypothesize that height h and intestinal surface area (represented as 2πrh) both increase during childhood, but that the proportional increase in intestinal surface from one time step to the next is less than the square of the proportional increase in height. This might occur, for instance, if, during some phases of growth, the body elongates, such that leg length increases without a substantial increase in intestinal surface (likely related to trunk volume) (see Eveleth and Tanner (1990), pg 186). In the cylindrical representation, this occurs if r increases slower than h, analogous to the relationship between height and body radius (circumference) in Figures A.2 and A.3.
A power function is a simple representation of a relationship between height and radius, where a power can be chosen such that both increase over time, but the ratio decreases over time. This is illustrated in Figures A.2–A.4.
The effects of different relationships between radius r and height h. When r is a power function of h (such that the power is between 0 and 1), both r and h increase over time (unlike when r is a constant), while the ratio decreases over time (unlike when r is a constant multiple of h), such that the cylindrical body elongates.
To implement this assumption, we set r(t) = h(t)q, where 0 < q < 1. Substituting for r in Equation 7 yields Equation 8 in the main text. Substituting constants and
, Equation 8 has the form:
This is solved as follows:
where C0 is a constant of integration. Substituting u = A − Bhq, and therefore , yields
Re-substituting for u yields
Re-substituting for A and B yields
where C’s are modified constants of integration. This height function has the following derivatives:
and the following characteristics:
under the conditions that H, K, D > 0, C < 0, , and
. Note that changing the value of the constant C serves only to shift the function horizontally on the x-axis, i.e., it contributes only to changing the age at maximum growth velocity, not the shape of the growth function.
For simplicity, in Equation A.8 we assume density D = 0.001 kg/cm3 (so that the model can be fit to height data in cm and weight data in kg) and the constant C = − 1, yielding the growth function in Equation 1 in the main text. Given this simplifying assumption for C, we limit our analysis of q to the range , which tends to give reasonable human growth functions. Relaxing the assumption on C would potentially necessitate exploration of values of q within the entire range (0, 1).
Appendix B. Derivation of the weight growth model
Here we again use Equation 5 to approximate the rate of increase in mass over time, and again assume the organism’s shape can be roughly approximated by a cylinder of radius r and length h (Figure A.1). If D is density, then the cylinder has mass
and surface area
such that, like the height function, at any time t, , where 0 < q < 1. Substituting for h and solving Equation A.14 for r yields
Substituting for r in Equation A.15 yields
Substituting s into Equation 5 yields Equation 9 in the main text. Substituting constant , Equation 9 has the form
This is solved as follows:
where C0 is a constant of integration. Substituting , such that
and therefore
, yields
Re-substituting for u yields
Re-substituting for A yields
where C’s are modified constants of integration. This weight function has the following derivatives:
and maximum asymptotic weight
We have been unable to find a closed form solution to m″ (t) = 0, at which tmax is the age at maximum velocity and m′(tmax) is the maximum velocity. Instead, we numerically solve for these characteristics of the weight function using posterior estimates of H, K, and q, as shown in Figure 4 in the main text.
As with the height model, for simplicity, in Equation A.19 we assume density D = 0.001 kg/cm3, , and the constant C = − 1, yielding the growth function in Equation 2 in the main text.
Appendix C. Empirical estimation of the height model
C.1. Estimating the baseline trajectory
To fit the composite height model (Equation 3 in the main text) to the U.S. and Matsigenka height measurements, our strategy is to first derive a baseline human height trajectory from which individual U.S. and Matsigenka children’s height trajectories can diverge. Estimating a baseline human trajectory around which to center informative priors increases the efficiency of Stan’s Hamiltonian Monte Carlo fitting algorithm, which can then limit its search of the posterior parameter space to regions plausibly within the range of human variation in growth.
To derive the baseline trajectory, we fit the composite height model to U.S. children’s height measurements (treated as cross-sectional data), thereby approximating the mean height trajectory of these children. We use the U.S. dataset for this purpose only because there is so much data, not because U.S. children are necessarily representative humans. Rich datasets from other populations would serve just as well for the derivation of a baseline growth trajectory. Importantly, this analysis strategy makes no assumption that the baseline trajectory represents “healthy” or “normal” growth. Furthermore, we neither present nor interpret any comparison of individual or population-mean trajectories with this baseline trajectory.
We fit the nine-parameter composite height model to the U.S. dataset, performing separate analyses for females and males. To represent the fact that observed heights can never be negative, we take their log. The mean of the Normal likelihood (ln(ηt)) is the log of the composite height function for age t (Equation 3 in the main text), and the scale of the distribution (ση) reflects possible measurement error, as explained in the main text. Observed height at time t, ht, is modeled as:
To increase the efficiency of the fitting algorithm, parameters in the composite height model are transformed (e.g., exponentiated: Equations A.25 and A.26, or decomposed into sums: Equation A.27). The very narrow prior on the scale (Equation A.28) reflects an assumption that measurement error is generally very small. Weak priors on the transformed parameters (, and R’s) are chosen using prior predictive simulations, basing the approximate form and location of the three component functions for infant, child, and adolescent growth phases on the results of Nierop et al. (2016). The priors shown above were used to fit data from both girls and boys. Figure A.5 demonstrates that these priors allow the model considerable flexibility when fitting the data. The model was fit in R (R Core Team 2022) and Stan (Stan Development Team 2022) using the cmdstanr package (Gabry and Cesnovar 2021). Convergence was indicated by
values of 1.00 (McElreath 2020). This usually required four chains of 2000 samples each, half of which were warm-up. Data and analysis scripts are available at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model.
Simulations from the priors of the baseline composite height model. Shown are 100 simulated height trajectories defined by the model in Equations A.23–A.27, given parameter values drawn from the prior distributions in Equations A.28 – A.31. Thick lines are the prior means. Left: component trajectories for the three growth phases (infancy: blue, childhood: red, adolescence: green). Right: cumulative height trajectories. Note that the variance in these weak priors allows the model to fit trajectories within (as well as outside of) the plausible range for humans.
Figures A.6, A.7, and A.8 show the estimated posterior mean base-line height trajectories for girls and boys.
Estimated mean baseline height trajectory for girls. Shown is the mean posterior for the population mean cumulative and component trajectories from the model in Equations A.23–A.31, fit to height measurements from U.S. girls, treated as cross-sectional data. The prior mean from Figure A.5 is shown in pink. The mean posterior velocity trajectory is shown in red.
Same as Figure A.6, but with modi fied scale for the velocity axis showing the maximum velocity during infancy.
Estimated mean baseline height trajectory for boys. This figure is analogous to Figure A.6, but the model is fit to height measurements from U.S. boys.
C.2. The multilevel model
We fit the following model of the height of person j at time t to the combined U.S. and Matsigenka datasets (females and males fit separately), with ethnic group-level (g) and person-level (p) offsets to the previously-estimated mean parameter values (, and R’s) for the baseline human height trajectory (above). We allow both sets of offsets to covary within and across growth phases.
where the subscript ETH[j] refers to the ethnicity (U.S. or Matsigenka) of individual j. In practice, the covariance matrices S and T are fit using Cholesky decomposition (McElreath 2020). The ethnic group-specific individual-level standard deviations (σp’s) are sampled from Gaussian distributions truncated at zero, whose means are the posterior means of the respective individual-level parameter standard deviations after fitting the above model to only the U.S. dataset (containing enough data to allow reasonable estimates of individual-level variance in parameter values). These same values are used for the means of the Exponential priors from which the group-level variances (σg’s) are sampled, given that we currently have no other way to estimate how much variation exists between different human populations. Standard deviations for the truncated Gaussian distributions are chosen based on prior predictive simulations (Figures A.9 and A.10), and must often be quite small in order to ensure model convergence. Priors on the correlation matrices R and U are chosen to bias against extreme correlations (McElreath 2020). In Equations A.42–A.47 values for priors used to fit males are shown beneath those used to fit females. In Equations A.48–A.52 male priors are shown to the right of female priors. Models were fit in R (R Core Team 2022) and Stan (Stan Development Team 2022) using the cmdstanr package (Gabry and Cesnovar 2021). Convergence was indicated by values of 1.00. This usually required four chains of 4000 samples each, half of which were warm-up. Data and analysis scripts are available at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model. Posterior estimates from these models are presented in Figures 3 and 5 in the main text.
Simulations from the female priors of the composite height model. Shown are 100 simulated height trajectories defined by the model in Equations A.32 – A.40, given parameter values drawn from the female prior distributions in Equations A.42 – A.52. Thick lines are the prior means. Left: component trajectories for the three growth phases (infancy: red, childhood: blue, adolescence: green). Right: cumulative height trajectories. The top row shows the prior variance between ethnic groups (Equations A.49–A.51), while the bottom row shows prior variance among individuals within an ethnic group (Equations A.45–A.47). Note that these informative priors bias the the model toward trajectories within the plausible range for humans, increasing the efficiency of Stan’s Hamiltonian Monte Carlo fitting algorithm.
Simulations from the male priors of the composite height model. This figure is analogous to Figure A.9, using the male priors in Equations A.42 – A.52.
Appendix D. Empirical estimation of the weight model
D.1 Estimating the baseline trajectory
To fit the composite weight model (Equation 4 in the main text) to the U.S. and Matsigenka weight measurements, we follow the same strategy as described above for the height model, first estimating a baseline human weight trajectory using the U.S. dataset. Observed weight at time t, mt, is modeled as:
Weak priors on the transformed parameters (, and R’s) are chosen using prior predictive simulations, such that the approximate form and location of the three component functions for infant, childhood, and adolescent growth sum to make the composite weight trajectory look approximately human. The priors shown above were used to fit data from both girls and boys. Figure A.11 demonstrates that these priors allow the model considerable flexibility when fitting the data. The model was fit in R (R Core Team 2022) and Stan (Stan Development Team 2022) using the cmdstanr package (Gabry and Cesnovar 2021). Convergence was indicated by
values of 1.00 (McElreath 2020). This usually required four chains of 2000 samples each, half of which were warm-up. Data and analysis scripts are available at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model.
Simulations from the priors of the base-line composite weight model. Shown are 100 simulated weight trajectories defined by the model in Equations A.53–A.57, given parameter values drawn from the prior distributions in Equations A.58 – A.61. Thick lines are the prior means. Left: component trajectories for the three growth phases (infancy: red, childhood: blue, adolescence: green). Right: cumulative weight trajectories. Note that the variance in these weak priors allows the model to fit trajectories within (as well as outside of) the plausible range for humans
Figures A.12, A.13, and A.14 show the estimated posterior mean baseline height trajectories for girls and boys.
Estimated mean baseline weight trajectory for girls. Shown is the mean posterior for the population mean cumulative and component trajectories from the model in Equations A.53–A.61, fit to weight measurements from U.S. girls, treated as cross-sectional data. The prior mean from Figure A.11 is shown in pink. The mean posterior velocity trajectory is shown in red.
Same as Figure A.12, but with modi fied scale for the velocity axis showing the maximum velocity during infancy.
Estimated mean baseline weight trajectory for boys. This figure is analogous to Figure A.12, but the model is fit to weight measurements from U.S. boys.
D.2 The multilevel model
We fit the following model of the weight of person j at time t to the combined U.S. and Matsigenka datasets (females and males fit separately), with ethnic group-level (g) and person-level (p) offsets to the previously-estimated mean parameter values (, and R’s) for the baseline human weight trajectory (above). We allow both sets of offsets to covary within and across growth phases.
where the subscript ETH[j] refers to the ethnicity (U.S. or Matsigenka) of individual j. In practice, the covariance matrices S and T are fit using Cholesky decomposition (McElreath 2020). The ethnic group-specific individual-level standard deviations (σp’s) are sampled from Gaussian distributions truncated at zero, whose means are the poste-rior means of the respective individual-level parameter standard devia-tions after fitting the above model to only the U.S. dataset (containing enough data to allow reasonable estimates of individual-level variance in parameter values). These same values are used for the means of the Exponential priors from which the group-level variances (σg’s) are sampled, given that we currently have no other way to estimate how much variation exists between different human populations. Standard deviations for the truncated Gaussian distributions are chosen based on prior predictive simulations (Figures A.15 and A.16), and must often be quite small in order to ensure model convergence. Note however that, despite these tight priors, there is enough information in the dataset such that the model is able to represent the variance in individual- and group-level weight trajectories reasonably well (Figure 3 in the main text). Priors on the correlation matrices R and U are chosen to bias against extreme correlations (McElreath 2020). In Equations A.72–A.77 values for priors used to fit males are shown beneath those used to fit females. In Equations A.78–A.82 male priors are shown to the right of female priors. Models were fit in R (R Core Team 2022) and Stan (Stan Development Team 2022) using the cmdstanr package (Gabry and Cesnovar 2021). Convergence was indicated by values of 1.00. This usually required four chains of 4000 samples each, half of which were warm-up. Data and analysis scripts are available at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model. Posterior estimates from these models are presented in Figures 3 and 5 in the main text.
Simulations from the female priors of the composite weight model. Shown are 100 simulated weight trajectories defined by the model in Equations A.62 –A.70, given parameter values drawn from the female prior distributions in Equations A.72 – A.82. Thick lines are the prior means. Left: component trajectories for the three growth phases (infancy: red, childhood: blue, adolescence: green). Right: cumulative weight trajectories. The top row shows the prior variance between ethnic groups (Equations A.79–A.81), while the bottom row shows prior variance among individuals within an ethnic group (Equations A.75–A.77). Note that these informative priors bias the the model toward trajectories within the plausible range for humans, increasing the efficiency of Stan’s Hamiltonian Monte Carlo fitting algorithm.
Simulations from the male priors of the composite weight model. This figure is analogous to Figure A.15, using the male priors in Equations A.72 – A.82.
Appendix E. Empirical estimation of other growth models
E.1 5-phase composite model
For comparison, we modified the composite height model to include five distinct growth phases, rather than the three phases modeled in the main text. The 15 parameters of this model give it the flexibility to fit the data much better than the three-phase model with only nine parameters, as shown in Figure 2. However, we are currently unaware of a theoretical justification for modeling more than three phases of human growth, and present the following analysis merely for illustration. We fit the following model to height data from U.S. boys, treated as cross-sectional data, in order to estimate the mean height trajectory of this population. Observed height at time t, ht, is modeled as:
Weak priors on the transformed parameters (, and R’s) are chosen using prior predictive simulations, such that the approximate form and location of the five component functions sum to make the composite height trajectory look approximately human. Figure A.17 demonstrates that these priors allow the model considerable flexibility when fitting the data. The model was fit in R (R Core Team 2022) and Stan (Stan Development Team 2022) using the cmdstanr package (Gabry and Cesnovar 2021). Convergence was indicated by
values of 1.00. This usually required four chains of 2000 samples each, half of which were warm-up. Data and analysis scripts are available at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model. Figure A.18 shows the estimated posterior mean height trajectory for U.S. boys.
Simulations from the priors of the 5-phase composite height model. Shown are 100 simulated height trajectories defined by the model in Equations A.83–A.87, given parameter values drawn from the prior distributions in Equations A.88 – A.91. Thick lines are the prior means. Left: component trajectories for the five growth phases (infancy: blue, early-childhood: pink, mid-childhood: green, late-childhood: orange, adolescence: red). Right: cumulative height trajectories. Note that the variance in these weak priors allows the model to fit trajectories within (as well as outside of) the plausible range for humans.
Estimated mean height trajectory using the 5-phase composite model. Shown is the mean posterior for the population mean cumulative and component trajectories from the 5-phase composite model in Equations A.83–A.91, fit to height measurements from U.S. boys, treated as cross-sectional data. The mean posterior velocity trajectory is shown in red.
E.2 JPA-1 model
For comparison, we fit the seven-parameter JPA-1 model of Jolicoeur et al. (1992) to height data from U.S. and Mat-sigenka boys. We were unable to fit a multi-level model with random effects for individual and ethnic group, as we could not find priors that would allow such a model to converge. Therefore, we fit separate models to U.S. and Matsigenka measurements, treating the data as cross-sectional. Observed height at time t, ht, is modeled as:
Means of the priors on the seven parameters are taken from Table 4 of Jolicoeur et al. (1992), which reports parameter estimates after fitting the model to data from 13 French boys. Prior standard deviations are chosen using prior predictive simulations. Figure A.19 demonstrates that these priors allow the model considerable exibility when fitting the data. The model was fit in R (R Core Team 2022) and Stan (Stan Development Team 2022) using the cmdstanr package (Gabry and Cesnovar 2021). Convergence was indicated by values of 1.00. This usually required four chains of 2000 samples each, half of which were warm-up. Data and analysis scripts are available at https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model. Figure A.20 shows the posterior mean height trajectories for U.S. and Matsigenka boys.
Simulations from the priors of the JPA-1 height model of Jolicoeur et al. (1992). Shown are 100 simulated height trajectories defined by the model in Equations A.92 and A.93, given parameter values drawn from the prior distributions in Equations A.94 – A.95. The thick line is the trajectory derived from the prior means. Note that these informative priors bias the model toward trajectories within the plausible range for humans, increasing the efficiency of Stan’s Hamiltonian Monte Carlo fitting algorithm.
Estimated mean height trajectories using the JPA-1 model of Jolicoeur et al. (1992). Shown are the mean posteriors for the population mean trajectories from the JPA-1 model in Equations A.92–A.95, fit (as separate models) to height measurements from U.S. (red) and Matsigenka (blue) boys, treated as cross-sectional data. The mean posterior velocity trajectories are shown decreasing (after age 14) from left to right. Note that velocity at conception (age = 0) is undefined in the JPA-1 model.
E.3 SITAR model
For comparison, we fit the SITAR model of Cole et al. (2010) to height data from U.S. and Matsigenka boys using the SITAR package in R. The model uses a spline with nine degrees of freedom to fit the mean height trajectory across both ethnic groups, as well as a fixed effect for ethnic group and three individual-level random effects (termed size, timing, and intensity) that relate individual trajectories to the estimated mean trajectory. We note that the SITAR package (Cole 2021) fitting algorithm terminated after four iterations. Two of these iterations had singular- and false-convergence warnings, and none of the iterations reached the stated convergence criteria of “tolerance” ≤ 1e − 5. Therefore, this analysis should be interpreted with caution. Figure A.21 presents the estimated mean height trajectories for U.S. and Matsigenka boys.
Estimated mean height trajectories using the SITAR model of Cole et al. (2010). Shown are the estimated group-mean trajectories from the SITAR model as implemented in the SITAR R package (Cole 2021), fit to height measurements from U.S. (red) and Matsigenka (blue) boys. The estimated group-mean velocity trajectories are shown decreasing (after age 14) from left to right. Note that the SITAR model estimates a large positive velocity (83 cm/year) at conception (age = 0), and a slightly negative velocity between ages 22 and 24.
Appendix F. Comparison of model fit
Comparing mean height trajectories as estimated by the three-phase composite model in the main text, the five-phase composite model (Appendix E.1), the JPA-1 model (Appendix E.2), and the SITAR model (Appendix E.3), we note that all four models exhibit shortcomings apparent upon visual inspection of the estimated mean trajectories and the residuals.
Figures A.8 and A.22 show that the three-phase composite model tends to underestimate mean U.S. male height (length) at birth and early childhood, while overestimating it in infancy and late childhood. Figures 2, A.18, A.21, and A.22 illustrate that both the five-phase composite model and the SITAR model fit the mean U.S. male height trajectory much better than the three-phase composite model. However, the five-phase composite model tends to underestimate mean height (length) at birth (seen most clearly in Figure A.22).
Figure A.21 shows that the SITAR model tends to underestimate the mean height of Matsigenka boys before age eight, and overestimate it at ages 15 and 16.
Figures A.20 and A.22 show that the JPA-1 model tends to overestimate mean U.S. male height at birth and mid-childhood.
Comparing the mean U.S. male height velocities estimated by these four models (Figure 2), the three- and five-phase composite models and the SITAR model are similar in describing two velocity peaks after age five: one in childhood and another during adolescence. Furthermore, the three-phase composite model and the SITAR model both describe Matsigenka boys as tending to reach their maximum height velocities during these two growth phases at earlier ages than do U.S. boys (Figures A.21 and 4). This contrasts with the JPA-1 model, which describes a single velocity peak during adolescence that U.S. boys tend to reach at earlier ages than do Matsigenka boys (Figure A.20).
Cross-model comparison of residuals from the estimated group mean height trajectory. Residuals are the measured heights of U.S. boys minus the predicted height of the mean U.S. boy at the corresponding age, taken from the posterior mean trajectory of the three-phase composite model in the main text (upper), the five-phase composite model described in Appendix E.1 (middle), and the JPA-1 model described in Appendix E.2 (lower). Bars represent the 90% HPDI of each residual estimate, and points represent the means. The bars for the JPA-1 model are too small to see, indicating that the model is very confident about its posterior estimate for the mean trajectory. If a model represents the group mean trajectory accurately, the residuals at each age should be distributed approximately symmetrically about zero.
The principle aim of the empirical analysis in the main text is to compare the mean height and weight trajectories of U.S. and Matsigenka children. Thus, it is of interest to know whether the inaccuracies in the estimated mean growth trajectories of the three-phase composite growth models with respect to observed heights and weights manifest similarly for the two ethnic groups. If the patterns of inaccurate prediction are different for U.S. and Matsigenks children, then it would be inappropriate to compare their population-mean growth trajectories using these models. Figure A.23 plots the posterior means of the residuals of the estimated mean U.S. and Matsigenka height and weight trajectories. Importantly, the patterns of asymmetry around zero for residuals at different ages of U.S. children are echoed for Matsigenka children, though with fewer data points. This suggests that, although, in objective terms, model-estimated mean trajectories should be interpreted with caution, relative comparisons of estimated U.S. and Matsigenka growth trajectories are possible.
Figure A.24 plots the posterior means of the residuals of the estimated individual U.S. and Matsigenka height and weight trajectories (i.e., incorporating both group-level and individual-level offsets from the multi-level model). If a model represents individual growth trajectories accurately, the residuals at each age should be as close to zero as possible and distributed approximately symmetrically around zero. As is apparent, there is much room for improvement of the three-phase composite height and weight models. Importantly, the distributions of such individual-level residuals constitute one metric by which future improvements in within-sample model prediction (fit) can be judged (complementing metrics of out-of-sample model prediction, such as WAIC: McElreath (2020)). To improve the accuracy of future model estimates, we favor strategies that aim to re fine the theory (described in the main text) upon which these growth models are based.
Figure A.25 plots weight for height for U.S. and Matsigenka children. Note that, starting at a height of approximately 130 cm, Matsigenka girls and boys tend to weigh more than U.S. children for a given height. This is consistent with higher model estimates of the q parameter for Matsigenka children in the child growth phase, shown in Figure 5 in the main text, and gives us confidence that the composite growth models derived here are reasonable representations of growth patterns in U.S. and Matsigenka children.
Residuals from estimated group mean height and weight trajectories. Residuals are the measured heights and weights of U.S. and Matsigenka children minus the mean posterior predicted height or weight of the mean U.S. or Matsigenka child at the corresponding age, taken from the posterior mean trajectory of the multi-level three-phase composite models in the main text. If a model represents the group mean trajectory accurately, the residuals at each age should be distributed approximately symmetrically about zero. Note that the pattern of asymmetries about zero for the U.S. estimates is echoed in the Matsigenka estimates, though with fewer data points. This suggests that, though, in objective terms, the model-estimated mean trajectories should be interpreted with caution, relative comparisons of estimated U.S. and Matsigenka growth trajectories are possible.
Residuals from the estimated individual mean height and weight trajectories. This figure is analogous to Figure A.23, except that residuals now represent differences between measured heights and the mean posterior predicted height or weight of the corresponding individual U.S. or Matsigenka child at the corresponding age (i.e., incorporating both group-level and individual-level offsets from the multi-level model). If a model represents individual growth trajectories accurately, the residuals at each age should be as close to zero as possible and distributed approximately symmetrically about zero. This forms the basis for judging future improvements in model fit, resulting from re finement of the theory upon which the growth models are based.
Weight for height for U.S. and Matsigenka children. Note that, starting at a height of approximately 130 cm, Matsigenka girls and boys tend to weigh more than U.S. children for a given height. This is consistent with higher model estimates of the q parameter for Matsigenka children in the child growth phase, as shown in Figure 5 in the main text.
Appendix G. Effects of body density
In the models in the main text, we assume that everyone’s body has the same density: 0.001 kg/cm3 (i.e., the density of water at 4°C). However, it is likely that body density varies among individuals, and average body density varies among populations, due to differences in the proportions of bone, muscle, and fat (which have different densities). Thus, it is of interest to know how different values of body density D would affect estimates of metabolic parameters H and K, and the allometric parameter q, in the growth models in the main text (Equations 1 and 2, based on Appendix Equations A.8 and A.19, respectively).
From Appendix Equations A.8 and A.19, it is clear that, holding height, weight, and all other parameters constant, decreasing D will decrease the estimated value of the anabolic rate parameter H. The effect of decreasing D on the catabolic rate parameter K and the allometric parameter q is less clear from these equations. However, ignoring the C terms, it seems likely that, all else being equal, decreasing D will result in larger estimates for K and q.
Solving these equations for K and q is non-trivial. However we have managed to estimate these parameters’ relationship with D over a relatively small portion of the models’ parameter spaces (containing some reasonably realistic parameter values). Figures A.26 and A.27 suggest that, as D decreases, estimates of K derived from the height and weight models (respectively) will increase. Similarly, Figures A.28 and A.29 suggest that, as D decreases, estimates of q derived from the height and weight models (respectively) will also increase. These relationships coincide with our intuitions, although a more comprehensive exploration of the parameter space is warranted.
In any case, it is clear that estimates of model parameters are sensitive to assumptions about body density. Thus, as discussed in the main text, to increase the accuracy of the estimated values of the metabolic and allometric parameters in these growth models, future empirical work should consider incorporating individual-level estimates of body density. This would mean fitting the growth models in Appendix Equations A.8 and A.19 to data comprising height and density and weight and density, respectively, such that Dj is the density of individual j at the time of each height or weight measurement. Readily-available technology to potentially estimate individual body density in remote populations is discussed in the main text.
The effect of density D on estimates of the catabolic rate K, using the composite height model. For the given values of H and q, we solve Appendix Equation A.8 for K given a range of values of D at h(10) = 100 (i.e., height = 100 cm at 10 years of age since conception), assuming C = − 1. Although this figure shows only a small portion of the model’s parameter space, it supports the intuition, derived from Appendix Equation A.8, that, all else being equal, a decrease in body density will tend to result in higher estimates of K.
The effect of density D on estimates of the catabolic rate K, using the composite weight model. For the given values of H and q, we solve Appendix Equation A.19 for K given a range of values of D at m(10) = 25 (i.e., weight = 25 kg at 10 years of age since conception), assuming C = − 1. Like Appendix Figure A.26, this figure supports the intuition that, all else being equal, a decrease in body density will tend to result in higher estimates of K.
The effect of density D on estimates of the allometric parameter q, using the composite height model. For the given values of H and K, we solve Appendix Equation A.8 for q given a range of values of D at h(10) = 100 (i.e., height = 100 cm at 10 years of age since conception), assuming C = − 1. Although this figure shows only a small portion of the model’s parameter space, it supports the intuition, derived from Appendix Equation A.8, that, all else being equal, a decrease in body density will tend to result in higher estimates of q.
The effect of density D on estimates of the allometric parameter q, using the composite weight model. For the given values of H and K, we solve Appendix Equation A.19 for q given a range of values of D at m(10) = 25 (i.e., weight = 25 kg at 10 years of age since conception), assuming C = − 1. Like Appendix Figure A.28, this figure supports the intuition that, all else being equal, a decrease in body density will tend to result in higher estimates of q.
Footnotes
E-mail address: john_bunce{at}eva.mpg.de.
https://github.com/jabunce/bunce-fernandez-revilla-2022-growth-model