Review Article
Human fluctuating asymmetry in relation to health and quality: a meta-analysis

https://doi.org/10.1016/j.evolhumbehav.2011.03.002Get rights and content

Abstract

Developmental instability (DI) reflects the inability of a developing organism to buffer its development against random perturbations, due either to frequent, large perturbations or to a poor buffering system. The primary measure used to assess DI experienced by an individual organism is fluctuating asymmetry (FA), asymmetry of bilateral features that are, on average in a population, symmetrical. A large literature on FA in humans in relation to measures of health and quality (close to 100 studies and nearly 300 individual effect size estimates) has accumulated. This paper presents the first quantitative meta-analysis of this literature. The mean effect size (scaled as Pearson r) was about 0.2. Effect sizes covaried negatively with sample size, consistent with effects of publication bias, the tendency for significant effects to be published. Conservative correction for this bias reduced the mean effect to about 0.1. Associations with FA underestimate effects of underlying DI due to imprecise measurement of the latter. A model-based best estimate of the mean effect of DI on outcomes is about 0.3, a theoretically meaningful, relatively large effect, albeit of moderate absolute size. The data are consistent, however, with a range of true effect sizes between 0.08 and 0.67, partly due to large study effects. Study-specific effect sizes in DI ranged between −0.2 and 1.0. A humbling and perhaps sobering conclusion is that, in spite of a large body of literature involving nearly 50 000 participants, we can only confidently state that there is on average a robust positive average effect size. An accurate estimate of that effect size was not possible, and between-study variation remained largely unexplained. We detected no robust variation across six broad categories of outcomes (health and disease, fetal outcomes, psychological maladaptation, reproduction, attractiveness and hormonal effects), though examination of narrower domains reveal some corrected effects close to 0.2 and others near zero. The meta-analysis suggests fruitful directions for future research and theory.

Introduction

Developmental instability (DI) reflects the inability of a developing organism to buffer its development against random perturbations, due either to frequent, large perturbations (e.g., frequent illness, many deleterious mutations, significant oxidative stress) or to a poor buffering system (e.g., a poorly co-adapted genome). In theory, DI may relate to poor adaptation, a conjecture tested through observation of associations between phenotypic outcomes of DI and poorly adapted outcomes including reduced fitness. The most common marker of DI used in research is fluctuating asymmetry (FA), random deviations from symmetry on bilateral features that are, on average, symmetric at the population level. The underlying idea about the use of FA as measure of DI is that two sides of a bilateral feature represent independent replicates of the same developmental events. Differences between sides, then, must reflect minor developmental “errors” or perturbations affecting one side and not the other (e.g., Klingenberg, 2003, Van Dongen, 2006). A review by Møller (1999) indicated that, on average across a variety of biological taxa, FA does relate to measures of stress and (inversely) fitness, health or “quality” (e.g., reproductive success). In the past 20 years, a large literature has explored associations between FA and phenotypic outcomes pertaining to human health and quality. Though qualitative reviews have been published (e.g., Møller, 2006, Thornhill and Møller, 1997), no systematic quantitative analysis of these associations reported in the literature has appeared. We performed such a meta-analysis.

Meta-analyses of associations have several advantages over qualitative reviews. First, they are quantitative. A meta-analysis not only examines whether associations in expected directions are generally reported in the literature; it yields an estimate of mean effect size (typically in the form of a Pearson product–moment correlation). It is important to note that the term “effect” here does not imply causation, but rather only covariation. Associations may be in expected directions but, on average, weak and not statistically robust, which a meta-analysis can reveal but a purely qualitative review cannot. Second, a meta-analysis is not selective. Suppose, for instance, that a study reported that two of nine variables of interest were significantly associated with FA. A qualitative review might note that the study yielded more significant associations than is expected by chance (22% vs. 5%). A meta-analysis, by contrast, would include the strength of association of every data point, both the significant and nonsignificant effects, from this study. Third, meta-analysis permits statistical examination of important moderators of effect sizes. Perhaps, for instance, some studies measure DI with more validity than others; a meta-analysis can examine whether a marker of this variation (e.g., number of bilateral traits used to assess FA) translates into a difference in effect size. Or perhaps some outcomes (e.g., birth defects) relate to FA, but others (e.g., adult disease) do not; again, a meta-analysis can potentially detect this difference. Fourth, meta-analysis can detect publication bias, the tendency for studies that show positive results to be published more frequently than studies that yield null results (see below).

We aim at summarizing patterns in associations between FA and human health and quality using the flexibilities of meta-analysis. We mainly aim at presenting average effect sizes We also, however, compare effect sizes across different broad outcome categories.

Human DI has been associated with a heterogeneous set of outcomes (i.e., variables pertaining to health, fitness or quality with which FA has been associated in empirical research). In our examination of the literature (see Methods below), we found a total of 293 estimated associations with FA from 94 studies, which could be grouped into six broadly defined categories of outcome variables.

In theory, DI may be related to increased susceptibility to infectious disease (e.g., Thornhill & Gangestad, 2006), organic problems possibly due to congenital susceptibilities (e.g., heart defects; see Milne et al., 2003) or weak defenses against parasitism (e.g., Møller, 2006) . We analyzed 37 effect sizes from nine studies measuring outcome variables in this category.

Developmental instability may be associated with low birth weight (a significant risk factor for many diseases), birth complications (e.g., leading to hospitalization of a newborn) or more serious congenital defects (e.g., Van Dongen, Cornille, & Lens, 2009a). In total, there were 36 such associations reported in 16 different studies.

Researchers have speculated that DI is implicated in neurodevelopmental psychiatric disorders (notably, schizophrenia; e.g., Yeo, Gangestad, Edgar, & Thoma, 1999) or personality variations associated with these disorders (e.g., schizotypy; e.g., Thoma et al., 2008). In addition, some have argued that DI may affect brain development and thereby reduce general intelligence (e.g., Furlow, Armijo-Prewitt, Gangestad, & Thornhill, 1997). In this category of outcomes variables, we found 47 effect sizes from 24 studies.

In studies of nonhumans, FA has been found to relate to male mating success (typically measured in terms of quantity of sexual mates) and fecundity (Møller, 1999, Thornhill and Møller, 1998; see also Møller, Thornhill, & Gangestad, 2005). In humans, it has been argued that men of low DI are favored as sexual mates (perhaps for their genetic quality) and hence have relatively many partners (e.g., Thornhill & Gangestad, 1994), and that this translates to fecundity in natural-fertility populations (e.g., Waynforth, 1998). In total, we analyzed 48 effect sizes from 16 studies reporting outcome variables for males and females in this category.

Relatedly, it has been argued that individuals have evolved to find markers of DI sexually attractive — i.e., markers found in facial features (e.g., Gangestad, Simpson, Cousins, Garver-Apgar, & Christensen, 1994), bodily features (Brown et al., 2008), body scents (e.g., Gangestad & Thornhill, 1998), vocal qualities (e.g., Hughes, Harrison, & Gallup, 2002) and dance movements (e.g., Brown et al., 2005; but see Trivers, Palestis, & Zaatari, 2009). Fifty-nine estimates from 29 studies belonged to this category of outcome variables.

From a life history perspective, individuals of high quality or in better biological condition (see Rowe & Houle, 1996, for a discussion of this concept) may be able to invest more heavily in reproductive traits, which in many species, including humans, may be promoted through effects of reproductive hormones — primarily testosterone in the case of males and estrogen in the case of females (e.g., Ellison, 2003). Researchers have hence examined associations between FA and hormone levels (e.g., Jasiénska, Lipson, Ellison, Thune, & Ziomkiewicz, 2006) or sexually dimorphic traits promoted by hormones (e.g., facial masculinity or femininity; e.g., Gangestad & Thornhill, 2003a). We located 64 effect sizes from 20 studies in this category.

The first three categories relate largely to health outcomes and hence, perhaps, mortality risks, whereas the last three categories relate largely to reproduction (or would have ancestrally).

A very small additional category (two effect sizes from two studies) studied associations between FA and reduced genetic diversity (due to inbreeding), which, for the sake of completeness, we included in our full sample analyses (see below). In more restricted samples of outcome variables (which we describe below), there were no studies on genetic diversity.

Within our six outcome categories, some narrower domains have been studied intensively. We performed exploratory comparisons among them. More specifically, we compared effect sizes between infectious disease vs. major disease (within health and disease), fetal problems vs. maternal condition effects (within fetal outcomes), schizophrenia and related disorders vs. reduced intelligence (within psychological maladaptation), male number of sex partners vs. female number of sex partners (within reproductive outcomes), facial attractiveness vs. other forms of attractiveness (within attractiveness) and anatomical (e.g., facial) masculinity–femininity vs. reproductive hormone levels per se (within sexually dimorphic hormonal effects).

The main objective of this paper is mainly to provide estimates of average effect sizes and to identify biologically relevant differences in effect sizes through comparisons among outcome categories. To do so, however, other sources of variation in effect sizes must be evaluated. These possible confounding effects include the adequacy of the health or quality measure, the number of traits measured and the strength of association between FA and DI, and publication bias. Thus, a large body of this paper deals with these issues, as outlined in the next subsections.

Many kinds of associations are reported in the literature. Some involve variables that, in theory, relate to health and quality pertinent in environments in which humans recently evolved (e.g., in the settings of traditional foragers). Others do not, and for multiple reasons. First, some outcomes may not be related to quality per se (e.g., the second to fourth digit ratio may not reflect quality or health, and the associations with FA have been recently questioned; Van Dongen, 2009, van Dongen et al., 2009c). Second, some outcomes may be products of modern environments (e.g., metabolic disease such as cardiovascular distress or diabetes, fecundity in populations with effective conceptive control and deliberate family planning). Third, some outcomes may apply to one sex and not the other (e.g., quality may be expected to relate to number of sexual partners for males but not females). Fourth, some asymmetries reported in the literature are dermatoglyphic asymmetries, which appear to be formed within the first half of gestation and, hence, may not tap individual quality particularly well. As one might expect stronger effect sizes in a restricted sample excluding these effects, we performed analyses on the total literature of effects as well as a restricted sample of effects. For the latter, we eliminated all effects that did not clearly tap a pertinent component of health or quality (see Appendix 1 on the journal's website).

Theoretical and empirical analyses show that FA is a fairly weak measure of underlying DI (Gangestad and Thornhill, 1999, Houle, 1997, Houle, 2000, Van Dongen, 1998, Whitlock, 1996, Whitlock, 1998). Associations between single-trait FA and other covariates hence underestimate, on average, associations with DI. The extent of underestimation depends on R, the developmental repeatability of the trait's FA, which can be estimated from the distributional characteristics of FA (e.g., leptokurtosis; Gangestad and Thornhill, 1999, Van Dongen, 1998, Whitlock, 1996, Whitlock, 1998). Specifically, the degree attenuation is equal to the square root of R. In studies of humans, R appears to be approximately 0.04 to 0.07 (e.g., Gangestad and Thornhill, 1999, Gangestad and Thornhill, 2003b, Gangestad et al., 2001), which means that the correlation of a variable with a single trait's FA underestimates the correlation with underlying DI by about a factor of about 4 (that is, 1/√R). Hence, if the correlation between a variable and DI is 0.4, the mean correlation (across random samples) with a single trait's FA would be about 0.1.

One can bolster the reliability and validity of a measure of DI by aggregating FA of multiple traits (Soule & Cuzin-Roudy, 1982 and see Leung, Forbes, & Houle, 2000 for an evaluation of the usefulness of multiple trait approaches), and many researchers have done so in their studies. That is, researchers have typically measured multiple traits' FA, summed or averaged FA of the multiple traits into an aggregate and then reported associations with outcome variables of interest with the aggregate rather than FA of individual traits. The increase in validity that results from trait aggregation can be estimated from the Spearman–Brown prophecy formula developed within psychometrics (see Gangestad & Thornhill, 1999). Specifically, the validity of an aggregate is estimated to be (n×R)/(1+(n1)×R) (where n is the number of traits aggregated and R is the repeatability for a single trait). One important note is that the only variation that aggregates in a composite is the variance due to DI that is shared across the traits. If all variance due to DI in a composite is unique to individual traits, then aggregation does nothing to increase the validity of FA to measure DI. The proportion of variance in FA due to DI shared across traits (“organism-wide” DI, as opposed to traits-specific DI) is estimated by the mean correlation in FA across traits (see Van Dongen & Lens, 2000). Studies on humans per se indicate that mean correlation of FA between independently developing traits is about 0.025 (Gangestad & Thornhill, 1999) to 0.045 (Gangestad et al., 2001), though higher values have been reported in some studies (e.g., Van Dongen, Wijnaendts, Ten Broek, & Galis, 2009b). If one were to aggregate FA of eight traits, then the validity of the composite for measuring shared DI would be approximately 0.4 to 0.5, and the attenuation of the correlation would be closer to a factor of 2 rather than 4 (i.e., if the correlation with DI is 0.4, then the mean correlation with an FA composite across samples would be close to 0.2.). Most studies of human FA have used a fairly narrow range of number of traits — 6 to 10 — though the full range is substantially greater. Because one should expect studies using a large number of traits to yield greater effect sizes than those using a small number of traits (Gangestad and Thornhill, 1999, Van Dongen and Lens, 2000), we examined the effect of number of traits aggregated in a composite FA score on effect size in the meta-analysis.

Two additional points pertain to the weak associations between FA and DI. First, if two traits are developmentally integrated, such that the development of FA of one necessitates the development of FA of the other, aggregation of traits does little to increase the validity of measuring organism-wide variation. Though most studies have measured multiple traits that are developmentally independent (e.g., many bodily traits such as ear asymmetry and wrist asymmetry), some studies have focused on asymmetry in very specific regions, notably facial asymmetry and dental asymmetry. Aggregation in these studies should have a weaker effect; one facial asymmetry, for instance, is often not independent of another facial asymmetry. (If the face extends wider to one side from the center of the eyes, it likely extends wider to the same side from the center of the nose.) Hence, we also separately examined effects of number of traits in studies that used multiple independently developing traits (which we refer to as “body FA”). The presumed developmental integration in facial and dental traits is expected to yield not only lower mean effect sizes compared to body FA, but also weaker associations between number of traits and effect sizes (which we tested with the interaction between trait type and number of traits measured).

Second, the methods of disattenuation we have discussed assume that DI causes both FA and the outcome of interest, thereby driving the association between these variables. These methods are not appropriate if measured asymmetry itself causes the outcome of interest. For most associations reported in the literature, this is highly unlikely to be the case. For instance, asymmetries of the ears, elbows and fingers are highly unlikely to directly affect scent, vocal qualities, masculinization, reduced intelligence or sperm quality. In select instances, however, asymmetries may affect measured outcomes. For instance, facial asymmetries may directly affect facial attractiveness. To assess these effects, we compared effect sizes for facial attractiveness between body and facial FA.

Publication bias is the tendency of studies showing significant associations (and perhaps particularly ones in predicted directions) to be more likely to be published compared to studies yielding nonsignificant associations. It has a number of causes: Reviewers are more likely to find positive results worthy of publication, editors are hence more likely to accept papers reporting positive results, and authors are accordingly more likely to submit papers reporting positive results. Although publication bias in no way implies deliberate intention of reviewers, editors or authors to deceive, it does lead to distortion of mean effect size in the literature. Published effect sizes are stronger than true mean effects (e.g., Møller & Jennions, 2001). Though publication bias may not operate to the same extent across all literatures (e.g., see Koricheva, 2003), there are good conceptual reasons to think that, to some extent, it affects most literatures.

One way to assess publication bias is to examine the association between sample size and reported correlations across studies (on the FA literature in general, see Palmer, 1999, Palmer, 2000). Publication bias can lead to a negative association between effect size and sample size for at least two reasons. First, the threshold effect size that separates significant from nonsignificant findings diminishes as a function of sample size (more specifically, close to a linear function of the square root of sample size). Second, small samples yielding nonsignificant results are less likely to be published than large samples yielding nonsignificant results. If, in fact, an association with sample size exists, one can estimate the “true” mean effect size in a number of ways. The “trim and fill” method imputes “missing” (unpublished) studies (typically, of relatively small sample size) that yield a reasonable and uniform mean effect size across sample sizes. Alternatively, one can look for a sample size at which effect size appears to be asymptote, and estimate true effect size at that sample size. Finally, one can calculate mean effect sizes in studies with sample sizes that exceed this threshold. We used all three methods and report mean effect sizes correcting for sample size as well as uncorrected mean effect sizes. It seems likely that “true” mean effect sizes fall somewhere in between.

One must be careful, however, not to equate an association of sample size with effect sizes with an effect of publication bias, particularly when outcome variables are highly heterogeneous. An association of sample size with effect size can exist for reasons other than publication bias (e.g., Thornhill, Møller, & Gangestad, 1999). For example, studies with large sample sizes may be more likely to employ relatively weak methods for measurement (e.g., short questionnaire responses), or studies examining associations with variables that are more expensive to measure often have relatively small sample sizes. In the FA literature, there exists a great deal of heterogeneity in outcome measures and methods of measurement. For instance, two studies examined FA in relation to sperm quality, three studies examined FA in relation to scent attractiveness, and one study examined FA in relation to vocal attractiveness, outcome measures relatively expensive to acquire. All of these studies had sex-specific sample sizes no greater than 100, below the mean of 163 (Table 1). Studies with large sample sizes tended to examine FA in relation to questionnaire self-reports or facial attractiveness. We examined the distribution of effect sizes with sample size for signatures of publication bias, such as clustering of effect sizes just beyond the threshold of significance. Furthermore, we examined the association of sample size with effect size in a subset of studies where outcomes were measured in a very similar way, with sources of covariation with sample size other than publication bias thereby minimized.

Publication bias can also generate an association between year of publication and effect size (e.g., Jennions & Møller, 2002). Prior to the publication of positive results, publication bias against null results may be particularly strong. Once positive results appear, publication bias may weaken, as both reviewers and editors may see value in publication of failures to replicate. In the broader FA literature, Tomkins & Simmons (2003) showed that associations between FA and sexual selection decreased in strength as year of publication increased. We similarly examined effects of year of publication in the human FA literature.

In sum, we conducted a meta-analysis on reported associations between human FA and measures of health and quality. We computed average effect sizes and compared those among outcome categories. Several confounding factors could affect those estimates. Therefore, we examined the impact of number of traits aggregated on effect size. We also examined effects on effect size of two potential markers of publication bias: sample size and year of publication. We did so within a set representing the full literature, as best as we could identify, as well as a restricted sample of effects clearly pertinent to assessing the effect of DI on health and quality. Within the latter set, we separately examined effects within a subsample of studies that measured body FA asymmetries of multiple, independently developing bodily features.

Section snippets

Literature search

Literature cited by Thornhill and Møller, 1997, Møller, 2006, as well as our own familiarity with the literature, was used as a starting point. We then performed searches using Web of Science to find more recent and additional studies, based on the keyword “developmental (in)stability” or “fluctuating asymmetry” in combination with “human” or “Homo sapiens.” The search was finalized in December 2009. We examined all relevant papers for pertinent effect sizes. Based on reported correlation

Descriptive statistics

Descriptive statistics for the three data sets are provided in Table 1. The three data sets did not show marked differences in sample size, number of traits, year of publication and distribution of estimates over our outcome categories.

Factors associated with effect size

Before discussing mean estimates, we examine factors that were systematically associated with effect size. Across all analyses, we found that just two factors were. First and most notably, sample size significantly predicted effect size in all three data sets

Discussion

We performed a meta-analysis on the literature reporting associations of human FA and markers of health and disease, fetal outcomes, psychological maladaptation, reproductive outcomes, attractiveness and hormonal effects. We estimated mean effect sizes. We also examined associations of effect size with number of traits in composite measures, sample size and publication year. A number of important conclusions, as well as outstanding questions, follow from the results of this meta-analysis.

References (63)

  • D.A. Puts

    Mating context and menstrual phase affect women's preferences for male voice pitch

    Evolution and Human Behavior

    (2005)
  • D.A. Puts

    Beauty and the beast: Mechanism of sexual selection in humans

    Evolution and Human Behavior

    (2010)
  • R. Thornhill et al.

    Facial sexual dimorphism, developmental stability, and susceptibility to disease in men and women

    Evolution and Human Behavior

    (2006)
  • R.A. Yeo et al.

    The evolutionary-genetic underpinnings of schizophrenia: The developmental instability model

    Schizophrenia Research

    (1999)
  • W.M. Brown et al.

    Dance reveals symmetry especially in young men

    Nature

    (2005)
  • W.M. Brown et al.

    Fluctuating asymmetry and preferences for sex-typical bodily characteristics

    Proceedings of the National Academy of Sciences USA

    (2008)
  • S. Duval et al.

    A non-parametric “Trim and Fill” method of accounting for publication bias in meta-analysis

    Journal of the American Statistical Association

    (2000)
  • P.T. Ellison

    Energetics and reproductive effort

    American Journal of Human Biology

    (2003)
  • F.B. Furlow et al.

    Fluctuating asymmetry and psychometric intelligence

    Proceedings of the Royal Society of London B.

    (1997)
  • B. Furlow et al.

    Fluctuating asymmetry and human violence

    Proceedings of the Royal Society of London B

    (1998)
  • S.W. Gangestad et al.

    A latent variable model of developmental instability in relation to men's number of sex partners

    Proceedings of the Royal Society of London B

    (2001)
  • S.W. Gangestad et al.

    Changes in women's mate preferences across the ovulatory cycle

    Journal of Personality and Social Psychology

    (2007)
  • S.W. Gangestad et al.

    Women's preferences for male behavioral displays change across the menstrual cycle

    Psychological Science

    (2004)
  • S.W. Gangestad et al.

    Menstrual cycle variation in women's preferences for the scent of symmetrical men

    Proceedings of the Royal Society of London B

    (1998)
  • S.W. Gangestad et al.

    Individual differences in developmental precision and fluctuating asymmetry: A model and its implications

    Journal of Evolutionary Biology

    (1999)
  • S.W. Gangestad et al.

    Fluctuating asymmetry, developmental instability, and fitness: Toward model-based interpretation

  • D. Houle

    Comment on 'A meta-analysis of the heritability of developmental stability' by Møller & Thornhill

    Journal of Evolutionary Biology

    (1997)
  • D. Houle

    A simple model of the relationship between asymmetry and developmental stability

    Journal of Evolutionary Biology

    (2000)
  • M.D. Jennions et al.

    Relationships fade with time: A meta-analysis of temporal trends in publication in ecology and evolution

    Proceedings of the Royal Society Series B

    (2002)
  • E. Kizilkaya et al.

    Asymmetry of the height of the ethmoid roof in relationship to handedness

    Laterality

    (2006)
  • C.P. Klingenberg

    Developmental instability as a research tool: Using patterns of fluctuating asymmetry to infer the developmental origine of morphological integrator

  • Cited by (0)

    View full text