Avoiding misinterpretation of regression lines in allometry: is sexual dimorphism in digit ratio spurious?

The statistical analysis of allometry (size-dependence of traits) is fraught with difficulty that is often underestimated. In light of some recent controversies about statistical methods and the resulting biological conclusions, I here discuss the interpretation of regression lines and show how to avoid spurious effects. General linear models based on ordinary least square (OLS) regression are often used to quantify sexual dimorphism in a trait of interest that is modelled as a function of sex while controlling for size as a covariate. However, an analysis of artificially generated data where males and females differ in size only, but are otherwise built according to the same principles, shows that the OLS method induces a spurious dimorphism where there is none. Hence, OLS-based general linear models should not be regarded as a fail-proof tool that automatically provides the correct answer to whatever question one has in mind. Here I show how to avoid misinterpretation and how to best proceed with answering the recent debate about sexual dimorphism in digit ratio, a trait that is thought to reflect sex-hormone levels during development. The limited data, currently available to me, suggests that the widely accepted sexual dimorphism in digit ratio might well be only a by-product of an allometric shift in shape, urgently calling for a re-examination in larger data sets on humans and other vertebrates.

The relative length of human index and ring fingers (digit ratio, 2D:4D) has received a lot of research 23 attention, because it is known to be sexually dimorphic (smaller ratios in men than in women [1,2]) 24 and has been suggested to reflect sex-hormone levels experienced during development [2][3][4][5]. Now 25 Lolli and coworkers [6] have made a claim that, if true, would have major consequences for this 26 research field. According to their analyses of data on human finger lengths, the apparently lower 27 average digit ratio of men compared to women arises simply as an artefact of allometry: as hands get 28 bigger, there is a general shift in shape such that digit ratio gets smaller. In consequence, there is no 29 particular sexual dimorphism (apart from men being generally larger) that would require an 30 additional explanation invoking specific sex-hormone effects on finger development. 31 This potentially controversial conclusion might however be questioned for methodological reasons. 32 The study by Lolli and coworkers [6] investigates allometry and sexual dimorphism by regressing the 33 length of the second finger 2D (dependent variable) over the length of the fourth finger 4D (predictor 34 variable), yet the implemented method of ordinary least square (OLS) regression fails to account for 35 'biological noise' (aka 'natural variation' [7] or 'biological deviance ' [8]) in the predictor variable. An 36 earlier paper [9] already analysed the very same data set using such OLS regression and was 37 subsequently criticized for its inappropriate statistical approach [10]. Indeed, it is easy to show (see 38 further below) that OLS regression systematically induces positive sexual dimorphism (relatively 39 larger trait values in the larger sex) and is therefore not suitable for quantifying sexual dimorphism 40 [10,11]. Yet, Lolli and coworkers [6] did not address these statistical concerns but rather followed 41 the earlier study [9] (and others, e.g. [12]) in using OLS-based methods. Apparently this was also 42 motivated by a recent methodological review by Kilmer  use. To bring more clarity into this debate, I here would like to highlight two points: 47 (I) The interpretation of OLS regression lines in the study of allometry and sexual dimorphism is more 48 difficult than is typically recognized [6,13]. OLS regression lines are suitable for predicting values of y 49 from a given value of x, but they are not designed to automatically reflect the principles that underlie 50 allometric relationships [7]. If we aim at finding out whether males and females differ in size only but 51 are otherwise built according to the same general principles, then OLS regression lines will typically 52 give the wrong answer. For correct interpretation of sexual dimorphism 'after accounting for 53 differences in size', regression slopes need to be adjusted for biological noise in the predictor 54 variable. In contrast, accounting only for technical measurement error in the predictor variable (as 55 suggested by Kilmer and Rodríguez [13]) is not sufficient for removing the spurious dimorphism. 56 (II) I suggest a different way of how to analyse whether digit ratio is sexually dimorphic after 57 accounting for body size. Rather than plotting one measure of size over another (2D over 4D, which 58 calls for major axis regression [7, 10, 15, 16]), I suggest to plot digit ratio directly over mean finger 59 length, and to adjust OLS regression slopes for the biological noise in the predictor. Intriguingly, such 60 analysis of the empirical data made available by Lolli and coworkers [6] indeed suggests that sexual 61 dimorphism in digit ratio may simply emerge from an allometric change in shape with size. 62 Ratios are widely used in many disciplines (e.g. body mass index, waist to hip ratio, body condition 63 index) because many ratios have advantageous properties, which however rarely get appreciated 64 explicitly [15]. More frequently, the inconsiderate use of ratios has been criticized [16-19], typically 65 because ratios often fail to achieve independence of body size. By their very nature, ratios (here 66 2D:4D) are strongly positively correlated with their numerator (here 2D) and strongly negatively 67 correlated with their denominator (here 4D), yet these are mathematical necessities that do not 68 reveal whether a ratio is size dependent. Ratios can in principle be independent of variation in body 69 size (see below), but whether real data on human digit ratios show such independence of size (here 70 average finger length) is a matter of empirical testing. 71 Does digit ratio shift continuously with size? 72 Figure 1a shows digit ratios calculated from the data of Lolli and coworkers [6] plotted over the mean 73 length of the two fingers. Intriguingly, within each sex, digit ratio declines as fingers get longer, and 74 the two OLS regression lines for males and females (the appropriateness of which I will address 75 below) practically coincide. This suggests that, as fingers get longer on average, the ratio 2D:4D is 76 shifting continuously, with 4D getting longer disproportionally and 2D lagging behind. Judging from 77 the indicated OLS regression lines, there is apparently no need to invoke any sexual dimorphism in 78 digit ratio beyond the allometric shift. This is because at the population-wide average finger length of 79 70.4 mm, men and women are predicted to have practically equal digit ratios (0.9890 and 0.9895 80 respectively), while without consideration of the allometric shift, the sexes are clearly different in 81 their average digit ratio (men: 0.9828, women: 0.9925, t 830 = 3.68, p = 0.0003). Whether the data 82 shown in Figure 1a is truly representative of human allometry will have to be shown in the future by 83 additional analyses of larger data sets, but it seems possible that the phenomenon is more general. 84 The human digit ratio literature has also inspired similar research on numerous other tetrapod 85 species, many of which have also been found to be sexually dimorphic for digit ratio (e.g. [20-27]). 86 Hence, I examined own data on digit ratio in zebra finches (2D:4D, based on toes of the right foot; 87 see [28]). Figure 1b illustrates an interesting similarity with the human case, since within both males 88 and females digit ratio continuously declines as digits get longer and the two OLS regression lines 89 practically coincide. Since the toes of males are only slightly longer than those of females (8.14 mm 90 versus 8.02 mm) and the regression line is relatively shallow, male digit ratios are shifted to only Previously [10], I argued that OLS regression can lead to erroneous conclusions about sexual 105 dimorphism because it fails to acknowledge that there is biological noise in the predictor variable 106 (e.g. when regressing 2D over 4D). If our interest lies in exploring sexual dimorphism, we should 107 consider major axis regression (MA) or reduced major axis regression (RMA) when regressing one 108 measure of size over another, because these methods assume equal amounts of statistical noise in 109 the two variables, which appears most sensible for two digits with comparable properties [10]. Now 110 in the above case ( Fig. 1), I used OLS regression merely out of convenience, as is often done, but is 111 this really appropriate, given that the x-axis of Figure 1 contains biological noise? Below I will show 112 with simulated data that, in some sense, the regression lines in Figure

Creating simulated data that include biological noise 117
Closely adhering to the simulations that I carried out previously [10], I created artificial data on finger 118 lengths following specific predefined rules. In the first scenario, I assume isometry (like in [10]) simply 119 to illustrate how one gets to the situation where ratios are size independent. In the second scenario, 120 I incorporate an allometric shift with size (importantly, the same for the two sexes), and then 121 examine which regression method allows retrieving the correct value for the slope of the underlying 122 line around which the data have been generated. The data generation always consists of three steps: 123 (1) generating between-individual variation in latent size of fingers (representing the sum of genetic 124 and environmental effects that make the fingers of some individuals generally shorter or longer, 125 thereby inducing a strong positive correlation between 2D and 4D across individuals, r = 0.9 in [6]), 126 (2) applying underlying rules of isometry or allometric shift, and (3) adding individual biological noise 127 to each finger separately (representing the sum of unique genetic and environmental effects on each 128 finger, which induces the scatter around the regression lines in plots of 2D over 4D, and which is the 129 main source of individual differences in digit ratio). 130 For the first scenario of isometry, latent finger lengths of 5,000 women were drawn from a normal 131 distribution with a mean of 69 mm and a standard deviation (SD) of 5 mm. For 5,000 men I used a 132 mean of 77 mm and also a SD of 5 mm. Next, for women I assumed perfect isometry with a slope of 1 133 (2D = 4D = latent size), corresponding to a digit ratio of 1. For men, I assumed an isometric slope of 134 0.985 (2D = 0.985*4D = 0.985* latent size), corresponding to a digit ratio of 0.985. Finally, for each 135 finger I drew biological noise from a normal distribution with a mean of zero and a SD of 2 mm. Yet 136 to reflect the fact that the amount of noise is typically proportional to size, I multiplied each value of 137 noise with the relative latent size (i.e. with latent size / 73 mm). This means that the noise was on 138 average 10% larger for an individual whose latent size was 10% above the population mean. After 139 adding the created noise to each finger separately, I calculated for each individual its realized mean 140 finger length ((2D + 4D) / 2) and its digit ratio (2D / 4D). 141 In the second scenario, latent sizes were created as above. Yet this time digit ratio was not assumed 142 to be size-independent, but (inspired by Figure 1a) rather to change systematically with latent size, 143 equally in the two sexes (expected digit ratio = 0.984 -0.002 * (latent size -73)). Using this equation, 144 I calculated expected digit ratios for each individual from its latent size. The expected length of 2D 145 was then determined as 2D = (2 * latent size * expected digit ratio) / (1 + expected digit ratio), and 146 the expected length of 4D as 4D = 2D / expected digit ratio. After this, I again added biological noise 147 to each finger as described above and calculated realized mean finger length and digit ratio. Indeed, the average slope for males (-0.00182 ± 0.00001) and females (-0.00187 ± 0.00001) are 160 clearly shallower than the slope of the underlying digit ratio change with latent size (-0.00200). 161 The reason behind this attenuation in slopes lies in the fact that the x-axis of Figure 2b shows the 162 realized size (after adding biological noise to each finger), while the slope of the underlying 163 relationship (-0.002, black line) refers to the change with latent size (before adding the noise). Which 164 of these slopes is considered as 'correct' is an issue of the purpose of line fitting. The shallower OLS 165 regression lines represent the optimal solution for predicting for each sex an individual's digit ratio 166 from its realized mean finger length. However, these sex-specific lines are too shallow if one wants to 167 retrieve the rules according to which the data had been generated (namely a universal rule that is 168 shared between the sexes rather than two sex-specific rules). 169 Digit ratio

Mean finger length (mm)
females males Importantly, the shallower OLS lines in Figure 2b are not "flawed" since the x-axis of Figure 2b shows 177 the realized mean finger length rather than latent size, but they can be misinterpreted. When 178 researchers say "after accounting for variation in size" they often forget that they have accounted for 179 'realized size' but not for 'latent size' (which remains unknown in real data sets) and hence they 180 cannot retrieve the rules from which the sexes were built. 181 The degree of attenuation of slopes caused by the biological noise in the x-axis is relatively 182 straightforward to calculate for such simulated data (see [29]). For each sex, the between-individual 183 variance in latent size is 25 mm 2 (the square of the SD (= 5 mm) of the normal distribution from 184 which latent size was sampled). To each finger we added noise with a variance of 4 mm 2 (SD = 2 mm). 185 Since the realized average finger length is calculated from two fingers ((D4 + D2) / 2), the noise in the 186 averages is half of that in each finger (4 mm 2 / 2 = 2 mm 2 ). Given the additivity of variances, the 187 between-individual variance in realized mean finger length is 25 mm 2 + 2 mm 2 = 27 mm 2 . Yet since 188 the amount of noise was scaled to latent size, there is overall 25 mm 2 + 2.2 mm 2 = 27.2 mm 2 for 189 males and 25 mm 2 + 1.8 mm 2 = 26.8 mm 2 for females. The ratio of variance in latent size to variance

Careful interpretation of regression lines is needed 195
The above simulation illustrates that OLS-based general linear models should not be regarded as a 196 fail-proof tool that automatically provides the correct answer to whatever question one has in mind. 197 If we set up a scenario where the sexes only differ in their size but otherwise are constructed 198 according to identical rules, we would ideally like to be able to retrieve that information from our 199 data analysis. Specifically, after correcting for variation in size, we would like to see identical 200 intercepts for the two sexes. This was not the case here, because OLS slopes were systematically 201 biased downwards as a function of the amount of biological noise in the x-axis. In the present case of 202 Figures 1 and 2, the amount of downward bias in slopes is almost negligible (here about 7-8 %), such 203 that the resulting sexual dimorphism (women having slightly higher digit ratios than men at the 204 population average finger length of 73 mm, see Fig. 2b) is even difficult to detect. Here, OLS 205 regression fails to account for the biological noise in the x-axis, but the resulting difference in 206 intercepts is really minimal (though statistically significant, see final paragraph). The resulting bias is 207 minimal because, in this example, the y-axis contains much more biological noise than the x-axis 208 (here making OLS regression much more sensible than RMA regression). in plots of one measure of size over another. To illustrate that point, I plotted the data from Figure  219 2b in such a way: once arbitrarily plotting 2D over 4D (Fig. 3a) and once arbitrarily plotting 4D over 220 2D (Fig. 3b). The OLS regression slopes show the expected bias, namely an apparent (deceptive) 221 sexual dimorphism with the larger sex (males) showing the 'relatively' larger trait values (irrespective 222 of what the trait is). The lines suggest that males have relatively longer 2D than females (by 0.85 mm 223 = 72.98 mm -72.13 mm) after accounting for variation in 4D (Fig. 3a), and when plotting the very 224 same data the other way around, the lines suggest that males have relatively longer 4D than females 225 (by 1.50 mm = 74.50 mm -73.00 mm) after accounting for variation in 2D (Fig. 3b). The fact that 226 these two interpretations appear paradoxical (it is confusing when both fingers are relatively longer 227 in males after accounting for size), illustrates that it is erroneous to interpret OLS regression lines in be interpreted as proof of sexual dimorphism in y after accounting for variation in x (e.g. the 243 erroneous claim that men have relatively larger brains than women after correcting for body size, 244 [10,30]). Given that the explicit modelling of biological noise in the x-axis is still far from 245 commonplace, one should always be maximally cautious with formulations like 'after correcting for 246 differences in body size…'. 247

Next steps for digit ratio research 248
For the interpretation of digit ratio as a marker of sex-hormone effects, it appears essential to know 249 whether digit ratio is sexually dimorphic or not once accounting for any size-related shifts like in 250 other measures of size than mean finger length (e.g. body mass), which bears a risk of overlooking a 283 size dependence because the covariate might contain additional biological noise that is not relevant 284 for digit ratio. Hence regressing digit ratio over mean finger length appears most effective in 285 identifying a direct dependency. 286 Sample size is important to consider, if our aim is to reject the idea that there is an allometric shift in 287 digit ratio like shown in Figure 1a. In this analysis, I have treated measurements from both hands as 288 equivalent in order to maximise statistical power. Generally, I would recommend running separate 289 analyses for each hand, only if the patterns differ significantly between hands. In Figure 1a between digit ratio and mean finger length, for a later meta-analytic summary. 297 If the patterns shown in Figure 1 should turn out to be representative of humans and also other 298 animals, it is not easy to see how digit ratio could be a more informative indicator of sex-hormone 299 effects than for instance mean digit length. Ignoring the allometric shift, the sex difference in human 300 digit ratio is only about 0.26 within-sex standard deviations (Cohen's D calculated from Figure 1a), 301 while the sex difference in mean finger length is about six times larger (Cohen's D = 1.56, Figure 1a), 302 hence the latter somehow appears to have greater potential in reflecting levels of sex hormones 303 experienced during development. 304 Proponents of the idea that digit ratio is informative about sex-hormone effects face a two-fold 305 challenge. First, they should demonstrate that sexual dimorphism in digit ratio exists independently 306 of the dimorphism in size (Figure 2a versus Figure 2b), and that the trait is more informative than 307 other sexually dimorphic characters. Second, when testing for the allometric shift, they should 308 demonstrate the objectivity of their analyses in spite of a possible conflict of interest or at least some 309 wishful thinking. To show that one has not e.g. selectively removed outliers that appear influential 310 after inspecting the data (see top left corner of Fig. 1a), I suggest to first deposit complete data sets 311 that have been used in a previous publication (that allow reconstructing the results shown in that 312 publication), and then to analyse the complete data set, following the approach outlined above. This 313 effectively eliminates most 'researcher degrees-of-freedom' [33] and thereby demonstrates a 314 maximum of scientific objectivity [34]. 315