Abstract
Sex differences in the lifetime risk and expression of disease are well-known. Paradoxically, preclinical research targeted at improving treatment, increasing health span and reducing the financial burden of health care, has mostly been conducted on male animals and cells. Females are assumed to be the same or scaled versions of males, yet sex differences in the allometric relationship between phenotypic traits and body size, needed to evaluate the validity of this assumption, have not been established. We quantify allometry for 297 phenotypic traits in male and female mice, recorded in >2.1 million measurements from the International Mouse Phenotyping Consortium. We find sex differences in allometric parameters (slope, intercept, residual SD) are common. Thus, the allometric relationship varies between the sexes: females are not scaled males. Our results support a complex, trait-specific patterning of sex differences in phenotypic traits, promoting case-specific approaches to therapeutic intervention and drug dosage scaled by body weight.
Introduction
A historic use of male animals in preclinical research and male participants in clinical trials has resulted in a significant bias in healthcare systems around the world (Holdcroft, 2007). The knowledge available on many diseases, their manifestation, time course and the efficacy of treatment options, is highly skewed in favour of males. The need to reach parity of the sexes in biomedical research and to conduct sex-specific analysis of research results has been widely acknowledged (Mogil & Chanda, 2005; Rogers et al., 2008; Kim et al., 2010; Beery & Zucker, 2011; Klein et al., 2015). Efforts to address this issue initially resulted in legislative changes around clinical research, requiring female participants in government-funded clinical trials (e.g., NIH, 1993; Correa-de-Araujo, 2006; Klinge, 2008). Modest improvement to rebalancing representation of the sexes in clinical trials (Zucker & Beery, 2010; Mazure & Jones, 2015; Feldman et al., 2019) has been bolstered by recent revisions to government guidelines in the US for preclinical research, requiring biological sex to be included as a study variable (Clayton & Collins, 2014).
Basing healthcare decisions for women based on research conducted on men (and vice versa, e.g., Wiemann et al., 2007) potentially has profound consequences (Kim et al., 2010; Oh et al., 2015; Tannenbaum et al., 2019). Studies have established that the nature of disease experience and benefits of treatment differ between men and women (Rahore et al., 2002; Gandhi et al., 2004; Canto et al., 2007; Whitley et al., 2009; Wallach et al., 2016; Mauvais-Jarvis et al., 2020). These differences manifest in major pillars of healthcare, impacting cost associated with care and its quality (Wainer et al., 2020). For example, sex differences in pharmacokinetics mean that therapeutic decisions based on studies with male subjects may lead to increased magnitudes of adverse drug reactions in women (Nakagawa & Kajiwara, 2015, Yu et al. 2016). Similarly, the broadly divergent behaviour of male (anti-inflammatory) and female (pro-inflammatory) immune systems translates to antibody response variability, with some vaccines resulting in a stronger immune response in males compared to females (Bouman & Heineman, 2005; Cook, 2008; Klein, 2013; Flanagan, 2014). Moreover, pathophysiological differences between the sexes lead to women being underdiagnosed or undertreated for leading causes of mortality, such as cardiovascular disease and Type 2 diabetes (Mauvais-Jarvis et al., 2020).
With the growing recognition of the importance of sex in biomedicine, a sharper focus on the topic has revealed that some of the initial assumptions and concerns surrounding use of female animals in preclinical research, such as their propensity for greater variation associated with the oestrous cycle (Shansky, 2019), lack empirical support (Mogil & Chanda, 2005; Prendergast et al., 2014; Zajitschek et al. 2020). Nevertheless, questions have been raised about the value of including female animals in preclinical research, citing a negative impact on the burden of evidence for therapeutic interventions (Fields 2014) and a lack of clarity surrounding the extent to which sex differences may be explained by sex-linked variables, such body mass index or body weight differences between the sexes (Richardson 2015).
Building on empirical studies that have sought to establish the nature of sex differences in biomedicine and to clarify the assumptions surrounding preclinical (Mogil & Chanda, 2005; Becker et al., 2016; Karp et al., 2017; Zajitschek et al., 2020) and clinical (Campesi et al., 2021) research data collected on males and generalized to females, we here tackle the extent to which females can be considered ‘small’ males in biomedicine. This is a pervasive narrative that impacts research design.
We adopt the framework of static allometry, the measurement of trait covariation among individuals of different size at the same developmental stage, following Huxley (1924, 1932), who proposed an equation to model simple allometry. This equation expresses the growth of two traits, x and y, when regulated by a common growth parameter: y = axb, or equivalently log y = log(a) + b log (y), where the ratios between the components of the growth rates of y and x correspond to intercept log(a) and a slope b (Pélabon et al., 2013). We quantify the relationship between phenotypic trait and body weight in males and females, statistically evaluating scenarios that describe the magnitude and patterning of sex differences across 297 traits in over 2 million mice from the International Mouse Phenotyping Consortium (IMPC, www.mousephenotype.org; Dickinson et al., 2016).
By providing empirical data on static allometry across phenotypic traits that represent preclinical parameters (e.g., immunology, metabolism, morphology), we aim to clarify if, and the extent to which, trait values for males may be scaled to match those of females. That is, we tackle the assumption that females are small males and identify, for the first time, the trait-specific features of the allometric relationship. We discuss these data considering the discourse on the generalization of male data in preclinical research (Usui et al. 2021), as well as their evolutionary implications, leveraging a large, wildtype dataset to illuminate microevolutionary trends in static allometry. Consideration of the evolutionary context surrounding sex differences may augment understanding of how disease state phenotypes emerge or persist in a population (Morrow & Connallon, 2013; Morrow, 2015). Data on allometric scaling also relate to one of the most salient aspects of sex differences, those concerning adverse drug reactions (ADRs) and the so far unanswered question of whether weight-adjusted doses would suffice to offset the majority of sex-specific ADRs (Zucker & Prendergast, 2020).
Results
Data characteristics
Following initial data cleaning and filtering procedures, the dataset comprised 297 phenotypic traits with a median sample size of 1,585 mice per trait (n = 2,104,527). Representation of males and females was highly similar across most phenotypic traits, with fewer than 10% of traits (29/297) displaying greater than 5% difference in sample size between males and females. The traits were collated into nine functional groupings following Zajitschek et al. (2020) (see Methods): behaviour (57 traits, n = 484,207), eye (27 traits, n = 10,366), hearing (16 traits, n = 201,220), heart (27 traits, n = 196,777), hematology (25 traits, n = 300,699), immunology (79 traits, n = 89,952), metabolism (9 traits, n = 111,659), morphology (24 traits, n = 364,484), and physiology (33 traits, n = 345,163).
The 297 phenotypic traits were further filtered for non-independence of traits, so that p values were merged for traits that were related to one another, resulting in a reduced data set of 181 traits, with a median sample size of 4,044 individuals per trait.
Linear mixed-effects models for static allometry
Our linear mixed-effects models indicated that 8 out of 181 traits (4%) (13 / 297 traits for unmerged p-values) are associated with scenario A (different slope, same intercept, Fig. 1A, 1D); most of these traits belonged to immunology and heart functional groups. Note that the intercept for each sex was set so that we compared mean values for each sex for a given trait. Scenario B (same slope, different intercept, Fig. 1B, 1E) was supported for 70 / 181 (39%) traits (125 / 297 traits for unmerged p-values). For scenario C (different slope, different intercept, Fig. 1C, 1F), 69 / 181 (38%) traits were categorized as consistent (86 / 297 traits for unmerged p-values), and the remaining 34 / 181 (19%) traits showed no significant differences in slope and intercept between males and females. Overall, when a statistically significant difference in allometric pattern was present between the sexes, intercept differences appeared more common than slope differences (39% compared to 4% traits), however both slope and intercept differences were also similarly common (38%). Just under a fifth of traits showed no significant differences between males and females, indicating that, for most traits, sex differences in allometric patterning represent a significant source of variation in trait values.
Examples of scenarios of sex differences in a trait of interest ~ weight allometric relationship. Top row shows a hypothetical positive relationship between body weight and eye size and the bottom row negative relationship between body size and activity. Body weights are scaled and centred so that the intercept is at the trait mean represented by a grey dashed line. A) Different positive slopes for the sexes, but same intercepts. B) Same positive slopes for both sexes, but different intercepts. C) Different positive slopes for both sexes, and different intercepts. D) Different negative slopes for the sexes, but the same intercepts. E) Same negative slopes for both sexes, but different intercepts. F) Different negative slopes for both sexes and different intercepts.
Taken together, traits in all functional groups showed statistically significant (α = 0.05) sex differences. Slope differences between the sexes (scenario A) e most common in immunology and heart groups, while intercept differences (scenario B) were most common for traits in the behaviour and heart functional groups. Traits exhibiting both slope and intercept differences between the sexes (scenario C) were most commonly found in the metabolism and physiology functional groups. Non-significant differences in slope and intercept were most common among traits in the behaviour and morphology functional groups.
Sex bias in allometric parameters
Sex bias in the slope and intercept values, in addition to the magnitude of variance (residual SD), showed considerable variability across functional groups, suggesting trait-specific patterning of sex differences. For scenario A, representing traits with significant differences in slope, most traits showed greater slope magnitudes for males (n = 6 traits), rather than for females (n = 4 traits) (Fig. 2A). For scenario B, females showed greater intercept magnitudes for morphology, immunology, eye and behaviour functional groups (n = 45 traits), whereas males showed greater intercepts for traits in physiology, metabolism, hematology, heart and hearing functional groups (n = 32 traits) (Fig. 2B). Overall sex bias (65 male traits: 60 female traits, Fig. 2B) was slightly greater for intercept differences, compared to slope differences (7 male traits: 6 female traits, Fig. 2A). Scenario C, which represents significant slope and intercept parameter differences between the sexes, was predominated by mixed bias across five out of nine functional groups (n = 24 traits), indicating that most functional groups contained traits that showed a mixture of directional differences in bias, comprising a combination of male bias in one parameter (slope or intercept) and female bias in the other parameter (slope or intercept) (Figure 2C). Immunology-related traits represent an exception under scenario C, whereby traits with significant differences between the sexes did not show a mixed bias for slope and intercept values. Across functional groups, male bias is slightly more common (5 groups) than female bias (4 groups) for statistically significant sex difference in residual SD, indicating that where traits show differences between the sexes, it is more common for males to be more variable than females, than vice versa (Figure 2D) (133 male traits: 71 female traits).
Sex biases for mice phenotypic traits arranged in functional groups. Colours represent significant differences in trait values between the sexes (green – male biased, orange – female biased) for allometric slope (scenario A), intercept (scenario B) or slope and intercept, including traits with mixed (purple) significant differences (i.e. male-biased significant slope and female-biased significant intercept, or female-biased significant slope and male-biased significant intercept) (scenario C), and bias in statistically significant difference in variance (residual SD) between the sexes (D). The number of traits that are either female biased (relative length of orange bars) or male biased (relative length of green bars) are expressed as a percentage of the total number of traits in the corresponding group. Numbers inside the green bars represent the numbers of traits that show female bias within a given group of traits, values inside the orange bars represent the number of male biased traits, and those inside the purple bars represent a combination of female bias (for intercept or slope) and male bias (for intercept or slope).
Meta-analysis and meta-regression of sex differences in slope, intercept and variance
Multi-level meta-analysis of absolute values in allometric slope and intercept, and variance, revealed significant differences between the sexes (Fig. 3A – C), with the greatest effect size evident for intercept value (Fig. 3A). Across functional groups, there was variability in the magnitude of absolute difference between the sexes, both within parameters (i.e., intercept) and across parameters. For absolute differences in intercept, traits within the physiology functional group showed greatest model point estimate difference between males and females, whereas those within the hearing group showed the smallest magnitude of difference (Fig. 3D). For differences in slope, which showed lower inter-trait variability than differences in intercept, the largest model point estimate difference was observed for eye traits, and the smallest difference for hearing traits (Fig. 3E).
Orchard plots illustrating results of multivariate meta-analysis based on differences between male and female absolute values for allometric intercept (A, D), slope (B, E) and residual variance (SD) (C, F). Plots in greyscale (top row) show overall differences (A – C), and plots below, in colour, show separate results for each functional group (D – F). Orchard plots show model point estimate (black open ellipse) and associated confidence interval (CIs) (thick black horizontal line), 95% prediction intervals (PIs) (thin black horizontal line), and individual effect sizes (filled ellipses), which are scaled by their precision, defined as: precision = 1 / Standard Error (SE) (see Nagakawa et al., 2021).
Similarly, for relative difference in residual SD, eye traits showed the largest amount of dimorphism, whereas heart and metabolism traits were most similar in SD values between the sexes (Fig. 3F). Overall, across all parameters (intercept, slope and SD), confidence intervals (CIs) for hearing traits were the only ones to consistently overlap with zero, showing no statistically significant difference between the sexes (Fig. 3D, E, F). For traits within a given functional group, there was considerable variability in the magnitude of difference between the sexes. For sex differences in intercept, inter-trait variability was highest for physiology, morphology and metabolism groups (Fig. 3D), whereas slope differences showed most inter-trait variability for eye and behaviour traits (Fig. 3E), and relative difference in SD was also most variable among traits in eye and behaviour groups (Fig. 3F).
Relationship between slope/intercept and residual variance
Tri-variate meta-regressions and ordinations of the relationships between slope, intercept and residual variance (Fig. 4) revealed weak correlations between either slope or intercept and residual variance (r = 0.07 – 0.19, Fig. 4A – B), indicating that a greater magnitude of difference between the sexes in either slope or intercept parameter is not strongly associated with greater trait variance. In contrast, absolute differences between the sexes in slope and intercept are strongly correlated (r = 0.56, Fig. 4C), indicating that in cases where there are significant differences in trait values for males and females, should a difference in intercept be present, this is likely accompanied by a difference in allometric slope.
Bivariate ordinations of log absolute difference between males and females for intercept and residual SD (A), slope and residual SD (B), and slope and intercept (C), for biological traits collated into nine functional groups (i.e., Trait types, represented as different circle colours). Individual effect sizes (circles) are scaled by their precision, defined as: precision = 1 / Standard Error.
Discussion
Most current medical guidelines are not sex-specific, being informed by preclinical studies that have been conducted only on male animals (Zucker & Beery, 2010; Kim et al., 2010; Zucker et al., 2021) under the assumption that the results are equally applicable to females, or that the female phenotype represents a smaller body size version of the male phenotype (Buch et al., 2019; Campesi et al., 2021). Our study sought to provide comprehensive assessment of this assumption for a large dataset of phenotypic traits in mice. We did not recover strong evidence for the validity of this assumption in a preclinical (mouse) model: we find that females are not ‘small’ males or, more accurately, not ‘scaled’ males.
In an era where personalised medicine interventions are within reach and patient-specific solutions represent a realisable frontier in healthcare (e.g., Jackson & Chester, 2014; Javaid & Haleem, 2018; Heath & Pechlivanoglou, 2022), it is now well recognised that sex-based data are much needed to advance care in an equitable and effective manner. The historic neglect of sex as a study variable means that the natural history and trajectory of treatment response in women remains opaque for many chronic diseases. As studies that illuminate the presence and importance of sex differences continue to emerge, many experimental set-ups that use both sexes continue to eschew downstream testing for sex differences, in part due to perceived inflation of sample size required for such analyses (Dayton et al., 2016; Buch et al., 2019; Arnegard et al., 2020; Woitowich et al., 2020).
Explicit male-female comparisons are needed to clarify the nature of sex differences (Garcia-Sifuentes & Maney, 2021; Zucker et al., 2021). Here we address this issue through a novel meta-analytical focus on identifying and characterising allometric scaling relationships for biological traits on a broad scale. We identify slope parameter (b) differences between the sexes as being common (Fig. 2C, Fig. 3E) and where present, often associated with significant differences in intercept value (Fig. 4C). We therefore demonstrate that the relationship between trait and body mass in mice differs fundamentally in mode (i.e., change in inter-trait covariance) between the sexes and that dimorphism cannot be fully explained by a magnitude shift in intercept value, as would be predicted should female phenotype represent a scaled version of male phenotype. For traits where there are significant differences in both slope and intercept between the sexes (Fig. 3C), it is common for a mixed scenario (male-biased significant slope and female-biased significant intercept, or female-biased significant slope and male-biased significant intercept; note that intercepts represent mean values for each sex) to occur. Therefore, for a given trait, a female value cannot be predicted based on an allometric coefficient extracted from regression data collected on males. Further, we find a male bias in residual SD for traits in morphology, immunology, hematology, hearing, and behaviour functional groups (7 out of 9 functional groups). However, we also find a weak correlation between difference in intercept and residual SD (Fig. 4A), meaning that allometric scaling differences alone do not explain increased residual SD in males compared to females. Or, put another way, among traits that show significant dimorphism in allometric relationships, males do not show greater variance than females just because they have greater body weights than females.
Our results complement recent evidence that supports a complex, trait-specific patterning of sex differences in markers routinely recorded in animal research (Rawlik et al., 2016; Karp et al., 2017; Zajitschek et al., 2020). Specifically, we build on previous studies using phenotypic traits from the International Mouse Phenotyping Consortium that have identified that sexual dimorphism is prevalent among phenotyping parameters (Karp et al., 2017), and moreover that, contrary to long-held assumption, neither females nor males show greater trait variability. We here show that the allometric relationship between trait value and body weight is dimorphic for most traits (75%), and these differences, where present, reflect trait-specific allometric patterns, involving both slope and intercept changes. As such, for slopes greater than zero, some trait values increase faster than body weight (positive allometry; b > 1) and some do not increase at the same rate as body weight (negative allometry; b < 1).
Sex-based scaling in biomedical studies
Our findings likely have implications for drug therapy, and specifically data surrounding the efficacy of drug dosing scaled by body weight. There exist known sex differences in drug prescription prevalence and usage patterns, as well as response to drug therapy (Watson et al., 2019; Malda et al., 2021). The same therapeutic regimen can elicit different responses due to sex-specific variance in pharmacokinetics and pharmacodynamics profiles (e.g., Yang et al., 2012; Zakiniaeiz et al., 2016), arising from underlying physiologic differences. These include, for example, significantly dimorphic traits captured among the physiology group in our analysis, such as iron (Jiang et al., 2019) and body temperature (van Hoof, 2015), among the morphology group, such as lean mass and fat mass (Madla et al., 2021), and among the heart functional group, such as QT interval (time between Q wave and T wave) (Regitz-Zagrosek & Kararigas, 2017). Population studies have revealed that there is a higher prevalence of use for most therapeutic drugs in women as compared to men (Fernandez-Liz et al., 2008; Watson et al., 2019). Further, women are 50 – 75% more likely to experience Adverse Drug Reactions (ADRs) (Rademaker, 2001), although these are not fully explained (Koren et al., 2012). Women may be at increased risk of ADRs because they are prescribed more drugs than men, however women are usually prescribed drugs at the same dose as men, meaning that they receive a higher dose relative to body weight in most cases. Scaling of doses on a milligram/kilogram body weight basis has been recommended as a pathway to reducing ADRs (Zucker & Prendergast, 2020), particularly for drugs that exhibit a steep dose-response curve (Chen et al., 2020). Indeed, sex differences in ADRs have been argued to be the result of body weight rather than sex, per se (Richardson et al., 2015). For both assertions to be supported, we would expect to observe a scenario (here, scenario B) whereby most or all phenotypic traits exhibit a scaled relationship between males and females, as a function of body weight. Our results do not provide overwhelming support in favour of scenario B, but rather support a sex- and trait-specific relationship between weight and phenotypic traits. This aligns more closely with evidence that weight-corrected pharmacokinetics are not directly comparable in men and women (Fadiran & Zhang, 2015; Zucker & Prendergast, 2020), and that many sex differences in ADRs persist after body weight correction (Greenblatt et al., 2014, 2019). Nevertheless, the Food and Drug Administration (FDA) has recommended dosage changes for women (e.g., sleep drug zolpidem; Farkas et al., 2013) and weight adjusted dosing of some drugs, such as antifungal drugs and antihypertensive drugs, appear to ameliorate sex differences in pharmacokinetics (Guo et al., 2010; Jarugula et al., 2010). As such, we suggest that where there exists an association between sex and dose, dose-response curves are likely to be sex-specific and clarification of this relationship would be supported (e.g., using meta-analysis, Zhong et al., 2017) rather than using a scaled male-specific dose response curve for females. Since many drugs are withdrawn from the market due to risks of ADRs in women, meta-analytic approaches to illuminating sex-specific dose response curves represents a viable opportunity to reducing the number of ADRs and reaching an important target set by precision medicine (Polasek et al., 2018).
Implications for allometric evolution
The study of allometry has a long history in evolutionary biology, established as a foundational descriptor of morphological variation at ontogenetic, population and evolutionary levels (Cheverud, 1982; Klingenberg, 1998). Allometry may channel phenotypic variation in fixed directions, defining scaling relationships that persist across large evolutionary timescales. For example, craniofacial variation among mammals has been observed to be constrained by allometry, such that small mammals have shorter faces than do larger ones (Cardini & Polly, 2013; Cardini et al., 2015). Conversely, allometry may facilitate morphological diversification, acting as a line of ‘evolutionary least resistance’, allowing for new morphotypes to originate relatively rapidly among closely related species (Porto et al., 2009; Pélabon et al., 2014). These pathways (allometric constraint vs allometric facilitation) may be a start point for exploring how sex differences in disease phenotypes arise, data that have been cited as a potential unexploited resource relevant for the development of new therapies (Arnold, 2010). Studies of static allometry, as examined herein, have revealed low levels of intraspecific variation in allometric slope, which explains only a small proportion of variation in size (Voje et al., 2014), compared to variation in allometric intercept (Bonduriansky, 2007). Moreover, traits under sexual selection have also revealed low magnitudes of allometric slope change under artificial selection experiments (Egset et al., 2012) and in wild populations (Egset et al., 2011), whereas intercept changes appear clear and heritable. These differences have historically been thought to be due to underlying features of the developmental system acting as an internal constraint (Huxley 1932; Gould, 1966), whereas more recent interpretations suggest that external constraint (selection) more likely acts to maintain slope invariance at the static level (Pélabon et al., 2013), which is consistent with data showing that variation occurs instead at the ontogenetic level, i.e. growth rate and ontogenetic allometric slope are evolvable (e.g., Wilson & Sánchez-Villagra, 2010; Klingenberg, 2010; Wilson, 2013). Broadly consistent with other static allometric studies, we find that where differences in allometry are present, significant intercept shifts alone are more common than are significant slope shifts (Fig. 2A compared to 2B). We focus explicitly on sex differences and observe that many traits show a combination of intercept and slope changes, as well as differences in residual variance. Aside from the evolutionary implications – that allometric slope likely does not have a high evolvability, or capacity to evolve – many of the traits examined here may show a low level of sex difference in slope because the sexes are both experiencing the same selective pressure to maintain functional size relationships across different body sizes.
Our meta-analytic results build a narrative of complexity in sex-based trait interactions and promote a case-specific approach to preclinical research that seeks to inform drug discovery, development and dosage. That females are not ‘small’ or ‘scaled’ males in a preclinical mouse model underscores the need to include female data from the earliest experimental stages. Our results evidence the plasticity of allometry at a microevolutionary scale, revealing a pathway for sex variation in phenotypic traits, which may influence study outcomes in biomedicine.
Methods
Data compilation and filtering
We conducted all data procedures, along with statistical analyses, in the R environment v. 4.1.1 (R Development Core Team, 2021). We compiled our data set from the International Mouse Phenotyping Consortium (IMPC) (www.mousephenotype.org, IMPC data release 10.1 June 2019), accessed in October 2019. These represent traits recorded in a high-throughput phenotyping setting whereby standard operating procedures (SOPs) are implemented in a pipeline concept. The phenotypic traits represent biomarkers used for the study of disease phenotypes (see Karp et al., 2017), collated into the following nine functional groups: behaviour, eye, hearing, heart, hematology, immunology, metabolism, morphology, and physiology, which are the IMPC’s original categorization (also previously used in Zajitschek et al., 2020). These groupings were assigned in relation to the description of the procedure undertaken for data point collection and following the categorisation of pipeline events at adult stage, detailed in the International Mouse Phenotyping Resource of Standardised Screens (IMPReSS, https://www.mousephenotype.org/impress/index).
For the initial dataset, data points were collated for adult wildtype mice only, filtering to include non-categorical phenotypic trait values for which covariate information on sex and body weight were available. This initial dataset comprised of 2,866,345 data points for 419 traits. A series of data cleaning procedures were implemented to remove data points with missing body weight, zero values for a phenotypic trait and duplicated specimen IDs. Data filtering was conducted using the R package dplyr v.1.0.7 (Wickham et al., 2021). The resulting data set comprised 2,104,497 data points for 297 phenotypic traits, all of which had corresponding body weight data, enabling us to estimate an allometric relationship between a trait of interest and body weight. For each phenotypic trait, we had the following variables (covariates): phenotyping center name (location where experimental data were collected), external sample ID (animal ID), metadata group (identifier for experimental conditions in place during the experiment), sex (male / female), weight (body weight in grams), weight days old (day on which weight was recorded), procedure name (description of the experimental procedure as in IMPReSS), parameter name (description of the recorded parameter as in IMPReSS), and data point (phenotypic trait measurement – response variable).
Linear mixed-effects model for static allometry
The static form of allometry, the covariation of a trait with size as measured across a population of adults within a single species (Klingenberg, 1998), was quantified using a linear mixed-effects model approach (Laird & Ware, 1982). Within this framework, the relationship between phenotypic trait value and body weight, accounting for random effects associated with assignment to a metadata group and batch (defined as the date when the measurements are collected), was quantified for each of the 297 traits. Models were constructed using the function lme in the R package nlme v. 3.1-153 (Pinheiro et al., 2021) and applied to each phenotypic trait separately. We used the approach described by Nakagawa et al. (2017) that uses within-group centring (wgc) of the continuous predictor (i.e., weight); in this way, the intercepts (x = 0) for each sex represents the population mean for that specific sex. Also, we calculated z-scores (z) from the response (y) so that all regression coefficients are directly comparable across different traits. The applied model was:
The random factor ‘batch’ labelled a cohort of mice that went through a procedure on the same day (see Karp et al., 2017), whereas ‘metadata group’ represented occasions when procedural parameters were changed (e.g., different instruments, different observers and different settings). These two random factors along with the ‘weight’ random slopes would reduce Type I errors due to clustering (Schielzeth & Forstmeier, 2009). Also, to estimate different residual variances between the two sexes, we modelled group-wise heteroscedasticity structure, which was defined using the lme function’s argument weights = varIdent (form = ~1 | sex).
For each phenotypic trait, model parameters (regression coefficients and variance components) were extracted, using R package broom.mixed v.0.2.7 (Bolker & Robinson 2021), for males and females (slope, intercept, standard error, SE of slope, SE of intercept and residual variance) and corresponding p values for regression coefficients were extracted to assess the significance of sex differences in slope and intercept. Because the lme function did not provide statistical significance for differences in residual variances (standard deviations, SDs), we used the method developed by Nakagawa et al (2015) or the logarithm of variability ratio, which compares the difference in SDs between two groups to obtain p values for residual SD differences (see also Senior et al., 2020).
We were aware that some of the 297 studied traits were strongly correlated (i.e., non-independent: e.g., traits from left and right eyes and immunological assays with hierarchically clustering and overlapping cell types). Therefore, we collapsed p values of these related traits into 181 p values, using the procedure (grouping related traits or trait grouping) performed by Zajitschek et al. (2020). We employed Fisher’s method with the adjustment proposed by Li and Ji (2005) implemented in the R package, poolr (Cinar & Viechtbauer, 2021), which modelled the correlation between traits; we set this correlation to 0.8.
Static allometry hypotheses and Sex-bias in allometric parameters
Using parameters extracted from the above models, three scenarios were assessed (see Fig 1), describing the form of sex differences in the static allometric relationship between phenotypic trait value and body weight. For a given trait, these were: a) males and females have significantly different slopes but share a similar intercept (Fig. 1A, 1D), b) males and females have significantly different intercepts but share a similar slope (Fig. 1B, 1E), c) males and females have significantly different slopes and intercepts (Fig. 1C, 1F). In addition, we assessed how many traits were significantly different in residuals SDs between the sexes. For these classifications, we used both p values from 297 traits and 181 merged trait groups.
For scenarios A – C, which represent significant differences between male and female regression slope and / or intercept parameters and cases where sex differences in SDs were significant, data were collated into functional groupings (as listed above) to assess whether, and to what extent, sex bias in parameter values and variance was present across phenotypic trait values. That is, when males and females differed significantly, we counted which sex displayed the greater parameter value (intercept, slope) and, separately, we also tallied the sex with the higher magnitude of variance. Results were pooled for phenotypic traits within a functional group and visualised using R package ggplot 2 v. 3.3.5 (Wickham 2016) for scenarios A – C, resulting in one set of comparisons for parameter values, and one for variance (SD) values. We should highlight that we only used the data set with 297 traits because the directionality of some trait values became meaningless once traits were merged, although merged p values were meaningful as p values are not directional (e.g., spending time in light side or dark side).
Meta-analysis of differences in slopes, intercepts and residual SDs
We were aware that our classification approach using p values are akin to vote counting, which has limitations (Gurevitch et al., 2018). Therefore, we conduced formal meta-analyses using the following effect sizes: 1) difference between intercepts (traits mean for males and females), 2) difference between slopes and 3) differences between residuals SDs. We used corresponding SE or, more precisely, the square of SE as sampling variance. We were not able to compare the directionality of effect sizes among traits (e.g., latencies and body sizes), however our main interest in this study was whether males and females were different in intercepts, slopes and residuals SDs irrespective of directionalities. Therefore, we conducted meta-analyses of magnitudes applying the transformation to the mean and sampling variance, which assumes to follow folded normal distribution (Morrissey, 2016, eq. 8), by using the formulas below:
Where Φ is the standard normal cumulative distribution function and ESfolded and SEfolded are, respectively, transformed effect size (point estimate) and sampling variance, while ES and SE are corresponding point estimate and sampling variance before transformation. Morrissey (2016) has shown that meta-analytic means using such a folding transformation are hardly biased. Therefore, these transformed variables were directly meta-analysed using the rma.mv function in the R package, metafor (Viechtbauer, 2010). The intercept models (meta-analytic model) had three random factors: 1) functional group, 2) traits group and 3) effect size identifier (which is equivalent to residuals in a meta-analytic model; see Nakagawa & Santos, 2012), while in the meta-regression models, we fitted functional group as a moderator (see Fig 3). The model structures for all the three effect sizes were identical. We reported parameter estimates and 95% confidence intervals, CI and 95% prediction intervals, PI, which were visualised by the R package, orchaRd (Nakagawa et al., 2021). In a meta-analysis, 95% PI represents the degree of heterogeneity as well as a likely range of an effect size for a future study. We considered the estimate statistically significant when 95% CI did not span zero.
Correlations among differences in slopes, intercepts and residual SDs
We also quantified correlations among the three effect sizes, using a Bayesian tri-variate meta-analytic model, implemented in the R package, brms (Burkner, 2017). We fitted functional grouping as a fixed effect and trait groups as a random effect using the function, brm. Notably, we have log transformed ESfolded and also transformed SEfolded using the delta method (e.g., Nakagawa et al., 2017), accordingly, before fitting effect sizes to the model. We imposed the default priors for all the parameter estimated with the settings of two chains, 1,000 warm-ups and 4,000 iterations. We assessed the convergence of the chains by Gelman-Rubin statistic (Gelman & Rubin, 1992), which was 1 for all chains (i.e., meaning they were all converged) and we also checked all effective sample sizes for posterior samples (all were over 800). We reported mean estimates (correlations among the three effect sizes) and 95% credible intervals (CI) and if the 95% CI did not overlap with 0, we considered the parameter statistically significantly different from 0.
Data availability
The R code and data generated during this study are freely accessible on GitHub at <to be inserted>. An R Markdown file with the complete workflow for all analyses is provided in the supporting information, available at <to be inserted>.
Author contributions
LABW and SN designed the research; SN, LABW, SRKZ, ML and HH contributed to the conception and implementation of data analysis; JM contributed to data acquisition; LABW drafted the manuscript with contributions from SN and ML.
Competing interests
The authors declare no competing interests.
Acknowledgements
This research was supported by Australian Research Council grants DP200100361 awarded to SN and ML and FT200100822 awarded to LABW. Research reported in this publication was supported by the European Molecular Biology Laboratory core funding and the National Human Genome Research Institute of the National Institutes of Health under Award Number UM1HG006370. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.