Abstract
Metabolism and evolution are closely connected: if a mutation incurs extra energetic costs for an organism, there is a baseline selective disadvantage that may or may not be compensated for by other adaptive effects. A long-standing, but to date unproven, hypothesis is that this disadvantage is equal to the fractional cost relative to the total resting metabolic expenditure. This hypothesis has found a recent resurgence as a powerful tool for quantitatively understanding the strength of selection among different classes of organisms. Our work explores the validity of the hypothesis from first principles through a generalized metabolic growth model, versions of which have been successful in describing organismal growth from single cells to higher animals. We build a mathematical framework to calculate how perturbations in maintenance and synthesis costs translate into contributions to the selection coefficient, a measure of relative fitness. This allows us to show that the hypothesis is an approximation to the actual baseline selection coefficient. Moreover we can directly derive the correct prefactor in its functional form, as well as analytical bounds on the accuracy of the hypothesis for any given realization of the model. We illustrate our general framework using a special case of the growth model, which we show provides a quantitative description of overall metabolic synthesis and maintenance expenditures in data collected from a wide array of unicellular organisms (both prokaryotes and eukaryotes). In all these cases we demonstrate that the hypothesis is an excellent approximation, allowing estimates of baseline selection coefficients to within 15% of their actual values. Even in a broader biological parameter range, covering growth data from multicellular organisms, the hypothesis continues to work well, always within an order of magnitude of the correct result. Our work thus justifies its use as a versatile tool, setting the stage for its wider deployment.
Introduction
Discovering optimality principles in biological function has been a major goal of biophysics [1–6], but the competition between genetic drift and natural selection means that evolution is not purely an optimization process [7–9]. A necessary complement to elucidating optimality is clarifying under what circumstances selection is actually strong enough relative to drift in order to drive systems toward local optima in the fitness landscape. In this work we focus on one key component of this problem: quantifying the selective pressure on the extra metabolic costs associated with a genetic variant. We validate a long hypothesized relation [10–12] between this pressure and the fractional change in the total resting metabolic expenditure of the organism.
The effectiveness of selection versus drift hinges on two non-dimensional parameters [13]: i) the selection coefficient s, a measure of the fitness of the mutant versus the wild-type. Mutants will have on average 1+s offspring relative to the wild-type per wild-type generation time; ii) the effective population Ne of the organism, the size of an idealized, randomly mating population that exhibits the same decrease in genetic diversity per generation due to drift as the actual population (with size N). For a deleterious mutant (s < 0) where , natural selection is dominant, with the probability of the mutant fixing in the population exponentially suppressed. In contrast if , drift is dominant, with the fixation probability being approximately the same as for a neutral mutation [7]. Thus the magnitude of determines the “drift barrier” [14], the critical minimum scale of the selection coefficient for natural selection to play a non-negligible role.
The long-term effective population size Ne of an organism is typically smaller than the instantaneous actual N, and can be estimated empirically across a broad spectrum of life: it varies from as high as 109 - 1010 in many bacteria, to 106 - 108 in unicellular eukaryotes, down to ∼ 106 in invertebrates and ∼ 104 in vertebrates [12, 15]. The corresponding six orders of magnitude variation in the drift barrier has immense ramifications for how we understand selection in prokaryotes versus eukaryotic organisms, particularly in the context of genome complexity [16–18]. For example, consider a mutant with an extra genetic sequence relative to the wild-type. We can separate s into two contributions, s = sc + sa [12]: sc is the baseline selection coefficient associated with the metabolic costs of having this sequence, i.e. the costs of replicating it during cell division, synthesizing any associated mRNA / proteins, as well as the maintenance costs associated with turnover of those components; sa is the correction due to any adaptive consequences of the sequence beyond its baseline metabolic costs. For a prokaryote with a low drift barrier , even the relatively low costs associated with replication and transcription are often under selective pressure [11, 12], unless sc < 0 is compensated for an sa > 0 of comparable or larger magnitude [19]. For the much greater costs of translation, the impact on growth rates of unnecessary protein production is large enough to be directly seen in experiments on bacteria [1, 20]. In contrast, for a eukaryote with sufficiently high , the same sc might be effectively invisible to selection, even if sa = 0. Thus even genetic material that initially provides no adaptive advantage can be readily fixed in a population, making eukaryotes susceptible to non-coding “bloat” in the genome. But this also provides a rich palette of genetic materials from which the complex variety of eukaryotic regulatory mechanisms can subsequently evolve [12, 21].
Part of the explanatory power of this idea is the fact that the sc of a particular genetic variant should in principle be predictable from underlying physical principles. In fact, a very plausible hypothesis is that sc ≈ -δCT /CT, where CT is the total resting metabolic expenditure of an organism per generation time, and δCT is the extra expenditure of the mutant versus the wild-type. This relation can be traced at least as far back as the famous “selfish DNA” paper of Orgel and Crick [10], where it was mentioned in passing. But its true usefulness was only shown more recently, in the notable works of Wagner [11] on yeast and Lynch & Marinov [12] on a variety of prokaryotes and unicellular eukaryotes. By doing a detailed biochemical accounting of energy expenditures, they used the relation to derive values of sc that provided intuitive explanations of the different selective pressures faced by different classes of organisms. The relation provides a Rosetta stone, translating metabolic costs into evolutionary terms. And its full potential is still being explored, most recently in describing the energetics of viral infection [22].
Despite its plausibility and long pedigree, to our knowledge this relation has never been justified in complete generality from first principles. We do so through a general bioenergetic growth model, versions of which have been applied across the spectrum of life [23–25], from unicellular organisms to complex vertebrates. We show that the relation is universal to an excellent approximation across the entire biological parameter range.
Growth model
Let Π(m(t)) [unit: W] be the average power input into the resting metabolism of an organism (the metabolic expenditure after locomotion and other activities are accounted for [24]). Π(m(t)) can be an arbitrary function of the organism’s current mass m(t) [unit: g] at time t. This power is partitioned into maintenance of existing biological mass (i.e. the turnover energy costs associated with the constant replacement of cellular components lost to degradation), and growth of new mass (i.e. synthesis of additional components during cellular replication) [26]. Energy conservation implies Here B(m(t)) [unit: W/g] is the maintenance cost per unit mass, and E(m(t)) [unit: J/g] is the synthesis cost per unit mass. We allow both these quantities to be arbitrary functions of m(t).
Though we will derive our main result for the fully general model of Eq. (1), we will also explore a special case: Π(m(t)) = Π0mα(t), B(m(t)) = Bm, E(m(t)) = Em, with scaling exponent α and constants Π0, Bm, and Em [25]. Allometric scaling of Π(m(t)) with α = 3/4 across many different species was first noted in the work of Max Kleiber in the 1930s [27], and with the assumption of time-independent B(m(t)) and E(m(t)) leads to a successful description of the growth curves of many higher animals [23, 24]. However, recently there has been evidence that α = 3/4 may not be universal [28, 29]. Higher animals still exhibit α < 1 (with debate over α = 2/3 versus 3/4 [30]), but unicellular organisms have a broader range α ≲ 2. Thus we will use the model of Ref. [25] with an arbitrary species-dependent exponent α. While the resulting description is reasonable as a first approximation, particularly for unicellular organisms, one can easily imagine scenarios where the exponent and maintenance costs might vary between different developmental stages [31]. For the case of maintenance in endothermic animals, which in our approach includes all non-growth-related expenditures, more energy per unit mass is allocated to heat production as the organism matures [32], effectively increasing the cost of maintenance. In the Supplementary Information (SI) Sec. V [33] we show how the generalized model works in this scenario, using experimental growth data from two endothermic bird species [34]. Thus it is useful to initially consider the model in complete generality.
Baseline selection coefficient for metabolic costs
To derive an expression for sc for the growth model of Eq. (1), we first focus on the generation time tr, since this will be affected by alterations in metabolic costs. tr is the typical age of reproduction, defined explicitly for any population model in SI Sec. I, where we relate it to the population birth rate r through r = ln(Rb)/tr [35, 36]. Here Rb is the mean number of offspring per individual. Let ϵ = mr/m0 be the ratio of the mass mr = m(tr) at reproductive maturity to the birth mass m0 = m(0). For example in the case of symmetric binary fission of a unicellular organism, Rb ≈ ϵ ≈ 2 (see SI Sec. III for a discussion of ϵ in more general models of cell size homeostasis). Since m(t) is a monotonically increasing function of t for any physically realistic growth model, we can invert Eq. (1) to write the infinitesimal time interval dt associated with an infinitesimal increase of mass dm as dt = dm E(m)/G(m) where G(m) ≡ Π(m) - B(m)m is the amount of power channeled to growth, and we have switched variables from t to m. Note that G(m) must be positive over the m range to ensure that dm/dt > 0. Integrating dt gives us an expression for tr, If we are interested in finding sc for a genetic variation, we can focus on the additional metabolic costs due to that variation. For the purposes of calculation, this means treating the mutation as if it does not alter biological function in any other respect, including the ability of the organism to assimilate energy for its resting metabolism through uptake of nutrients or foraging. If the mutation actually had only metabolic cost effects, the full selection coefficient s = sc. However generically mutations can affect both metabolic costs and power input (and/or other adaptive aspects), so s = sc + sa, with a correction term sa due to the adaptive effects [12]. In the latter case sc can still be calculated as shown below (ignoring adaptive effects) and interpreted as the baseline contribution to selection due to metabolic costs. While we do not focus on sa here, our theory can be readily extended to consider adaptive contributions as well, as illustrated in SI Sec. VII, including aspects like spare respiratory capacity. This broader formalism is summarized in Fig. S3.
Proceeding with the sc derivation, the products of the genetic variation (i.e. extra mRNA transcripts or translated proteins) may alter the mass of the mutant, which we denote by . The left-hand side of Eq. (1) remains Π(m(t)), where m(t) is now the unperturbed mass of the organism (the mass of all the pre-variation biological materials). The power input Π(m(t)) depends on m(t) rather than since only m(t) contributes to the processes that allow the organism to process nutrients, in accordance with the assumption that power input is unaltered in order to calculate sc. It is also convenient to express our dynamics in terms of m(t) rather than , since the condition defining reproductive time tr remains unchanged, m(tr) = Em0, or in other words when the unperturbed mass reaches ϵ times the initial unperturbed mass m0. Thus Eq. (1) for the mutant takes the form , where and are the mutant maintenance and synthesis costs. For simplicity, we assume the perturbations δB and δE are independent of m(t), though this assumption can be relaxed. In SI Sec. IV, we show a sample calculation of δB and δE for mutations in E. coli and fission yeast involving short extra genetic sequences transcribed into non-coding RNA. This provides a concrete illustration of the framework we now develop.
Changes in the metabolic terms will perturb the generation time, , and consequently the birth rate . The corresponding baseline selection coefficient sc can be exactly related to , the fractional change in tr, through (see SI Sec. I). This relation can be approximated as sc ≈ ln when , the regime of interest when making comparisons to drift barriers . In this regime , the fractional change in birth rate. While we focus here on the the simplest case of exponential population growth, where is time-independent, we generalize our approach to density-dependent growth models, where varies between generations, in SI Sec. VI. can be written in a way that directly highlights the contributions of δE and δB to . To facilitate this, let us define the average of any function F (m(t)) over a single generation time t as . Changing variables from t to m, like we did above in deriving Eq. (2), we can write this equivalently as , where . The value p(m)dm is just the fraction of the generation time that the organism spends growing from mass m to mass m + dm. Expanding Eq. (2) for tr to first order in the perturbations δE and δB, the coefficient , with positive dimensionless prefactors Here Θ(m) ≡ G(m)/m, and F -1(m) ≡ 1/F (m) for any F. The magnitude of σB versus σE describes how much fractional increases in maintenance costs matter for selection relative to fractional increases in synthesis costs. We see that both prefactors are products of time averages of functions related to metabolism. See SI Sec. II for a detailed derivation of Eq. (3), and also Eq. (4) below.
Relating the baseline selection coefficient to the fractional change in total resting metabolic costs
The final step in our theoretical framework is to connect the above considerations to the total resting metabolic expenditure CT of the organism per generation time tr, given by . To compare with the experimental data of Ref. [12], compiled in terms of phosphate bonds hydrolyzed [P], we add the prefactor ζ which converts from units of J to P. Assuming an ATP hydrolysis energy of 50 kJ/mol under typical cellular conditions, we set ζ = 1.2 × 1019 P/J. The genetic variation discussed above perturbs the total cost, , and the fractional change δCT /CT can be expressed in a form analogous to , namely δCT /CT = σE’ E/⟨E⟩ + σ ′ B δB/⟨B⟩, with where again the prefactors are expressed in terms of time averages over metabolic functions. The connection between sc and δCT /CT can be constructed by comparing Eq. (3) with Eq. (4). We see that for all possible perturbations δE and δB only when σE = σ′E and σB = σ′B We derive strict bounds on the differences between the prefactors (SI Sec. II), which show that the relation is exact when: i) Π(m) is a constant independent of m; and/or ii) E(m) and Θ(m) are independent of m. Outside these cases, the relation is an approximation. To see how well it holds, it is instructive to investigate the allometric growth model described earlier, where Π(m(t)) = Π0mα(t), E(m(t)) = Em, B(m(t)) = Bm.
Testing the relation in an allometric growth model
We use model parameters based on the metabolic data of Ref. [12], covering a variety of prokaryotes and unicellular eukaryotes. This data consisted of two quantities, CG and CM, which reflect the growth and maintenance contributions to CT. Using Eq. (1) to decompose Π(m(t)), we can write CT = CG+ tr CM, where is the expenditure for growing the organism, and CM = ζ ⟨ Bm ⟩ = ζBm ⟨m ⟩ is the mean metabolic expenditure for maintenance per unit time. CG and CM scale linearly with cell volume (SI Sec. III), and best fits to the data, shown in Fig. 1, yield global interspecies averages: Em = 2, 600 J/g and Bm = 7 × 10-3 W/g. As discussed in the SI, these values are remarkably consistent with earlier, independent estimates, for unicellular and higher organisms [24, 25, 37, 38].
Since E(m(t)) = Em is a constant in the allometric growth model, σE = 1 from Eq. (3), and σE = σ′E holds exactly from Eq. (4). So the only aspect of the approximation that needs to be tested is the similarity between σB and σ′B. Fig. 2A shows σB versus σ′B for the range α = 0 - 3, which includes the whole spectrum of biological scaling [28] up to α = 2, plus some larger α for illustration. For a given α, the coefficient Π0 has been set to yield a certain division time tr = 1 - 40 hr, encompassing both the fast and slow extremes of typical unicellular reproductive times. In all cases σ′B is in excellent agreement with σB. For the range α ≤ 2 the discrepancy is less than 15%, and it is in fact zero at the special points α = 0, 1. Clearly the approximation begins to break down at α " 1, but it remains sound in the biologically relevant regimes. Note that σB values for tr = 1 hr are ∼ 0.01, reflecting the minimal contribution of maintenance relative to synthesis costs in determining the selection coefficient for fast-dividing organisms. This limit is consistent with microbial metabolic flux theory [39], where maintenance is typically neglected, so exactly (since only σE = σE′= 1 matters). As tr increases, so does σB and hence the influence of maintenance costs, so by tr = 40 hr, σB is comparable to σE.
To make a more comprehensive analysis of the validity of the relation, we do a computational search for the worst case scenarios: for each value of α and E, we can numerically determine the set of other growth model parameters that gives the largest discrepancy |1 - σ′B /σB|. Fig. 2B shows a contour diagram of the results on a logarithmic scale, log10 |1 - σ′B /σB|, as a function of α and E. Estimated values for α and E from the growth trajectories of various species are plotted as symbols to show the typical biological regimes. While the maximum discrepancies are smaller for the parameter ranges of unicellular organisms (circles) compared to multicellular ones (triangles), in all cases the discrepancy is less than 50%. To observe a serious error (σ′B a different order of magnitude than σB), one must go to the large α, large E limit (top right of the diagram) which no longer corresponds to biologically relevant growth trajectories.
Validity of the relation in more complex growth scenarios
Going beyond the simple allometric model, SI Sec. V analyzes avian growth data, where the metabolic scaling exponent varies between developmental stages. We find σE = σ′E = 1 and the discrepancy |1 - σ′B /σB| ≤ 30%. SI Sec. VI considers density-dependent growth, illustrated by examples of bacteria competing for a limited resource in a chemostat and predators competing for prey. Remarkably, when these systems approach a stationary state in total population and resource/prey quantity, we find σE = σ′E = 1, σB = σ′B = (Bm ln Rb)/(Emd ln ϵ), where d is the dilution rate in the chemostat, or the predator death rate. The simple expression for σB allows straightforward estimation of the maintenance contribution to selection. For the chemostat that contribution can be tuned experimentally through the dilution rate d.
Conclusion
We thus reach the conclusion that the baseline selection coefficient for metabolic costs can be reliably approximated as sc ≈ - ln(Rb)δCT /CT. As in the original hypothesis [10–12], -δCT /CT is the dominant contribution to the scale of sc, with corrections provided by the logarithmic factor ln(Rb). Our derivation puts the relation for sc on a solid footing, setting the stage for its wider deployment. It deserves a far greater scope of applications beyond the pioneering studies of Refs. [11, 12, 22]. Knowledge of sc can also be used to deduce the adaptive contribution sa = s - sc of a mutation, which has its own complex connection to metabolism [40] (see also SI Sec. VII). The latter requires measurement of the overall selection coefficient s, for example from competition/growth assays, and the calculation of sc from the relation, assuming the underlying energy expenditures are well characterized. The sc relation underscores the key role of thermodynamic costs in shaping the interplay between natural selection and genetic drift. Indeed, it gives impetus to a major goal for future research: a comprehensive account of those costs for every aspect of biological function, and how they vary between species, what one might call the “thermodynome”. Relative to its more mature omics brethren—the genome, proteome, transcriptome, and so on—the thermodynome is still in its infancy, but fully understanding the course of evolutionary history will be impossible without it.
The authors thank useful correspondence with M. Lynch, and feedback from B. Kuznets-Speck, C. Weisenberger, and R. Snyder. E.I. acknowledges support from Institut Curie.
Footnotes
Added SI Sec. VII comparing metabolic and adaptive contributions to selection, including discussion of spare respiratory capacity. Fixed reference numbering between main text and SI.
References
- [1].↵
- [2].
- [3].
- [4].
- [5].
- [6].↵
- [7].↵
- [8].
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].
- [50].↵
- [51].↵
- [52].↵
- [53].
- [54].↵
- [55].↵
- [56].
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].
- [73].
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].
- [82].