Resource allocation to cell envelopes and the scaling of bacterial growth rate

Although various empirical studies have reported a positive correlation between the specific growth rate and cell size across bacteria, it is currently unclear what causes this relationship. We conjecture that such scaling occurs because smaller cells have a larger surface-to-volume ratio and thus have to allocate a greater fraction of the total resources to the production of the cell envelope, leaving fewer resources for other biosynthetic processes. To test this theory, we developed a coarse-grained model of bacterial physiology composed of the proteome that converts nutrients into biomass, with the cell envelope acting as a resource sink. Assuming resources are partitioned to maximize the growth rate, the model yields expected scalings. Namely, the growth rate and ribosomal mass fraction scale negatively, while the mass fraction of envelope-producing enzymes scales positively with surface-to-volume. These relationships are compatible with growth measurements and quantitative proteomics data reported in the literature.

Given that the surface-to-volume ratio increases with decreasing cell size, small cells will have more weight 23 sequestered in membranes and walls than their larger counterparts. Therefore, as the cell size decreases, 24 more resources have to be invested in the production of the cell envelope, implying that fewer resources 25 can be invested in the biosynthetic processes that replicate the other features of the cell. It was initially 26 suggested that this constraint might impose the limit on the smallest size that a cell can attain [10]. This 27 idea has been further used to explain the lower limit on the size of the photoautotrophic organism and why 28 phytoplankton growth rates appear to increase with cell volume [11,12]. Our goal is to formalize this theory 29 and investigate whether it can explain the growth scaling in heterotrophic bacteria. We start by imagining 30 the fastest-growing cell as a bag of self-replicating ribosomes [13]. However, the growth rate is depressed 31 below this perfect state because the cell has to invest resources in (1) machinery that acquires and converts 32 nutrients to fuel ribosomes, and (2) cell envelope, which is necessary for the cell to maintain the proper shape. 33 As the cell becomes smaller, a larger fraction of the resources are diverted to envelope, and fewer resources 34 are left for ribosomes which, in turn, means that cells grow more slowly. 35 36 We formalize the afore-mentioned verbal argument in sections 3.1 and 3.2, and then use this framework to 37 obtain the simple analytical solution for maximal attainable growth rate given a particular cell size (Section 38 3. 3). These predictions are tested using quantitative proteomic data of the model bacterium, Escherichia coli 39 (Section 4.1). Lastly, these cross-species scaling expectations are compared against data on bacterial growth 40 rates, cell sizes, and proteome compositions (sections 4.3 and 4.2). 41 3 Materials and methods 42 3.1 Derivation of the steady-state growth rate 43 In our model, a cell is composed of two metabolite species and three protein species (Fig 1). The external 44 nutrient concentration is assumed to be constant, thus mimicking the nutrient-replete conditions when 45 bacteria are grown in the lab. These nutrients are taken up and converted into building block b which 46 corresponds to amino acids. Although we refer to species l as lipids, this group includes all molecules used in 47 the construction of the cell envelope, such as peptides and saccharides (including components of membrane 48 lipoproteins and peptidoglycan). We use upper-case symbols to refer to absolute abundances, and lower-case 49 symbols for relative abundances or concentrations. All chemical reactions obey Michaelis-Menten kinetics, 50 and we assume that half-saturation constants (K M ) for all reactions are identical. We ultimately focus on the 51 special case when all reactions are operating at saturation (K M = 0) because this permits a simple analytical 52 solution for the steady-state of various cellular features, and we only vary K M to compare this limiting 53 behavior to a more general case when cellular enzymes are not saturated. Table 1 outlines the meaning of 54 symbols that are used throughout the main text. 55 56 Our model is similar to those proposed in [14], with two important differences. First, the cell divides once 57 the critical volume is reached, and not the critical abundance of a particular protein controlling cell division. 58 We chose this model because it was the easiest to deal with analytically, and other cell division mechanisms 59 are unlikely to affect the overall scaling patterns, as the envelope imposes a burden regardless of the exact 60 molecular details underlying the cell division. Second, we introduced the degradation of macromolecules, and 61 the production of an additional metabolite, which is the envelope component. Perhaps most importantly, 62 we obtained a simple analytical solution to the maximal growth rate in the limit of saturation, whereas the 63 model in [14] is explored only numerically. This allows one to quickly derive relationships between proteomic 64 mass fractions and the growth rate under various perturbations, and use these to infer model parameters. 65 The model is also similar to the one reported in [15], as these authors also include protein degradation, 66 but differs in that it posits second-order mass action kinetics, whereas we let reactions run according to 67 Michaelis-Menten kinetics. Lastly, we show that their analytical solution for maximal growth rate reduces to 68 our solution, in the limit of no envelope burden and enzyme saturation. The key components and processes in the model (left pane). Nutrients are taken up from the external environment via the building block producer (P b ). The produced block is then converted into lipids and other cell envelope components via the lipid-producer (P l ), and into proteins by the ribosome (R). Envelope and protein synthesis production can be inhibited by antibiotics, whereas nutrient uptake can be reduced by growing the culture on a poor carbon source. Envelope and proteins are degraded at rates d l and dp, respectively. Building blocks are represented as grey arrows flowing through protein machinery. Envelope components are depicted in the membrane. Outline of the hypothesis (right pane).
The time-evolution of metabolite concentrations are: The production of b occurs at rate k n , which represents a pseudo-first-order rate constant that depends 71 on the nutrient status of the environment inhabited by the cell. Rate constants k n , k l , and k t are the 72 turnover numbers for reactions of nutrient processing, envelope synthesis, and translation, while m s is 73 the size an envelope unit expressed in terms of the number of building blocks. Envelope components are 74 eliminated from the cell by degradation at rate d l , and by dilution due to growth at rate λ(t). Note that 75 the growth rate is time-dependent because it is a function of state variables (i.e., molecular abundances). 76 We neglect the degradation of the free amino acids and focus only on the turnover of cell envelope components. 77 78 Protein species p b , p l , and r are produced by ribosomes which represent the autocatalytic part of the cell. 79 All proteins have associated degradation rates d p , with the exception of ribosomes which are reported to be 80 remarkably stable in exponentially growing cells [16]. Sizes of the metabolic protein (P b and P l ) and the 81 ribosome are m p and m r , respectively. The time-evolution equations are identical in form to equations for 82 metabolites: with the only notable difference that -given the finite ribosomal pool -ribosomes have to be partitioned 84 between different protein species, and this is denoted with φ x , which represents the fraction of total ribosomal 85 concentration that is allocated to translation of protein species x. We will assume that each molecular species 86 contributes to volume proportional to its size in amino acids. For example, ribosomes are about 20x larger 87 than other metabolic proteins, contributing 20x more to the cell volume. Concentrations of each species are 88 Symbol Meaning The relative abundance of building block-producing enzyme at time t p l (t) The relative abundance of envelope-producing enzyme at time t r(t) The relative abundance of ribosomes at time t l(t) The relative abundance of envelope component The relative abundance of building blocks The absolute abundance of building block producer P l (t) The absolute abundance of envelope producer The absolute abundance of ribosomes The absolute abundance of envelope component The absolute abundance of building blocks The ribosome fraction allocated to translation of building block producers φ l The ribosome fraction allocated to translation of envelope producers The specific growth rate at time t  x(t) = X(t) V (t) , x ∈ {b, l, p b , p l , r}, X ∈ {B, L, P b , P l , R} (3a) V (t) = m r R(t) + m p P b (t) + m p P l (t) + B(t) (3b) The volume V corresponds to the internal volume of the cell where the chemical reactions take place, and 90 including the envelope in the volume of the cell would imply that one can slow down chemical reactions by, 91 say, increasing the thickness of the cell wall. As this is a non-sensical conclusion, we assume that L does not 92 contribute to cytoplasmic volume. The cell is assumed to grow exponentially, such thatV = λ(t)V (t). Given 93 that both volume and concentrations are time-dependent, the left hand side of equations (1a-2c) will take 94 the form: Substituting Eq (4a) in Eq (1a-2c) and rearranging yields: When cells are in a steady-state, all cellular components grow exponentially at a constant rateλ which is 97 the steady-state growth rate. We use this property to find the steady-state of the dynamical system. More 98 precisely, we have: Note that X(0) = kX, whereX is abundance at cell division time when the cell is in the steady-state, and k 100 is 1/2 if the abundance doubles over the life cycle. Substituting Eq 6a in the system (5a-5e) leads to: An intuitive interpretation of these equations is that the rate at which molecules are synthesized and degraded 102 ultimately equals the rate at which they are diluted by growth. Note thatṼ is the steady-state cell volume 103 which is by definition the critical volume at which cell divides. To simplify downstream expressions, let 104 κ n = k n /m p , κ t = k t /m r , κ l = k l /m p . These are rate constants scaled by the size of a particular enzyme, 105 capturing how fast an enzyme is relative to its size. Solving for molecular abundances yields: By definition, proteome mass fractions are: By replacing steady-state abundances in Eq 10 with solutions in Eq 8a-8e, we have: In the limit when there is no protein degradation (d p = 0), the proteomic mass fractions are equal to the 109 fraction of ribosomes allocated to the synthesis of a particular protein component. This result was originally 110 obtained in [17]. However, we are still lacking the solution for the steady-state growth rateλ.

112
Given that volume increases exponentially and from the definition of the cell volume (Eq 3b), we have: where Eq 11b has been obtained by applying 6a to molecular abundances in 11a. The steady-state growth 114 rate is equal to the rate at which the building blocks are generated from the acquired nutrients minus the 115 building blocks that are diverted into cell envelope synthesis and thus do not contribute to cytoplasmic volume 116 production or are dissipated through protein degradation occurring at a rate d p . Note that diversion to lipids 117 and other envelope constituents does not explicitly enter the growth rate because L does not contribute to 118 the cytoplasmic volume. However, lipid production does affect the growth rate by altering the amount of 119 available building blocksB, and by diverting proteome allocation to P l . Finally, substituting equations 8a-8e 120 into 11b, we retrieve a cubic polynomial: 121 a 0 + a 1λ + a 2λ 2 + a 3λ 3 = 0 (12) with the following coefficients: Eq 12 has two non-zero roots: In the case of the positive branch,λ = 0 when φ r = 0 andλ = κ t when φ r = 1. This is clearly incorrect, 124 given thatλ should be zero both when the cell does not have ribosomes (because there is nothing to build 125 the cell) and when the cell contains only ribosomes without any other protein components (because there is 126 nothing to deliver building blocks to the ribosomes). On the contrary, the negative branch attains values of 127 zero both when φ r = 0 and when φ r = 1, so we take this solution as biologically meaningful. Finally, we 128 obtain the steady-state abundances by substituting Eq 14 in equations 8a-8e.

130
The analytical solution for the steady-state growth rateλ (Eq 14) is validated by comparison to numerically 131 integrating equations 5a-5e (Fig 2; upper row). We assign initial abundances and integrate the system of 132 ODEs until the cell volume reaches the critical division volume. Next, we reduce the initial abundances by 133 the factor of two and restart the integration process. This emulates the process of cell division when cellular 134 content is equipartitioned among the daughter cells. After a short out-of-equilibrium phase, the cell lineage 135 settles into a steady-state that matches the analytical solution forλ and the steady-state abundances. Although the explicit solution for the maximalλ is prohibitively difficult to obtain, we can gain some insight 137 by examining boundary cases when the cell does not have a membrane (φ l = 0) and cellular processes are 138 infinitely fast. When κ n → ∞,λ → κ t /(1 + K M ), and, conversely, as κ t → ∞ we haveλ → κ n − d p . These 139 two asymptotic results occur because the cell that instantaneously converts nutrients to building blocks will 140 saturate the downstream translation machinery and the growth rate will be determined by the rate at which 141 ribosomes operate. Conversely, when ribosomes are infinitely fast, the growth rate is set by how fast the 142 building blocks are supplied to protein synthesis machinery. It is a working hypothesis that the steady-state growth rateλ is a function of the ribosome partitioning 145 parameters φ, and the cell has a task to find an optimal partitioning across different proteome components 146 such thatλ is maximized. This maximization is assumed to be achieved by homeostatic mechanisms that 147 constantly take input from the environment and adjust proteome composition accordingly. Note, however, 148 that the cell could attain maximal growth even if the partitioning parameters were hard-coded in the genome 149 and thus cannot be actively adjusted (i.e., when homeostatic mechanisms are absent). In that case, the 150 process of finding the peak is governed by natural selection and other evolutionary forces. Hence, our model 151 holds regardless of whether the actual cells have regulatory mechanisms or not.
has to satisfy the constraint based on the shape of the cell (c2): 156 Maximizeλ subject to: Note that V is the cytoplasmic volume, and thus the Π is the ratio of cell surface area to cytoplasmic 157 volume. Parameters γ and β are unit conversion factors corresponding to the number of amino acids per unit 158 of cell volume (molecs/µm 3 ), and the number of lipids/envelope components per unit of cell surface area 159 (molecs/µm 2 ); These values are reported in Table 2. To intuitively understand the optimization problem, 160 consider a landscape of ribosomal partitioning parameters and the resulting growth rates (Fig 2; bottom row). 161 In the absence of geometric constraint (c2), the cell maximizes growth rate by (1) completely abolishing 162 expression of the envelope-producing enzyme (when φ l = 0), and (2) by optimally expressing ribosomes 163 (when 0 < φ r < 1); this is because envelope-producing enzyme acts as a burden that diverts resources from 164 other proteome components, and because the cell has to balance the production of proteins (for which high 165 ribosomal expression is required) and the production of building blocks that fuels the translation (for which 166 low ribosomal expression is required). These two aspects explain why the growth rate is a monotonically 167 decreasing function of φ l , and a non-monotonic function of φ r .

169
Now suppose that the cell maximizes the growth rate, while at the same time having to maintain a surface-170 to-volume ratio dictated by the cell's geometry; out of all partitioning parameters, only a subset satisfies this 171 constraint and these parameters fall onto the red line in Fig 2. For example, a cell that has a low expression 172 of the envelope-producing enzyme (the left-most grey point in the landscape) will have a high growth rate but 173 won't have enough envelope to cover its cytoplasm, thus making it a non-viable option. On the other hand, a 174 cell that has a highly expressed envelope producer (the right-most grey point) will make too much membrane 175 relative to its volume, thus causing the cell to wrinkle. However, a cell with intermediate values of φ l will 176 produce just enough envelope to cover its cytoplasm (the intermediate grey point), and the maximal growth 177 rate is achieved by the adjustment of partitioning parameters along this line (black point in the landscape). 178

179
The developed model assumes that the S/V is constant across the cell cycle, which is wrong, given that the 180 cell has to change the shape and size as it grows. Unfortunately, we currently cannot derive a more general 181 model which accounts accurately for changes in shape as the individual cell evolves. However, the variation 182 in S/V across the cell cycle is much smaller than the cross-species range of S/V that we focus on (Section 183 S1.4), so it should not strongly affect the reported derivations. Although obtaining the analytical solution for maximal steady-state growth rate is prohibitively difficult, 186 one can obtain this property for a special case when all enzymes are saturated, such that chemical reactions 187 obey first-order mass-action kinetics. Indeed, most E. coli enzymes have K M lower than their substrate's 188 concentrations [18]. This reduces a nonlinear system of Eq 7a-7e to a system of linear equations that can be 189 readily solved for the partition parameters φ: The steady-state growth is then obtained from Eq 11b by setting K M = 0: Substituting Eq 17a in Eq 16a-16d: By substituting abundances in Eq 9 for Eq 18b-18d It immediately follows that the mass fractions of protein 193 sectors are: Note that substitution of Eq 18a-18d in Eq 17a yieldsλ = κ t φ r , meaning that the growth rate is simply 195 proportional to the fraction of ribosomes that are allocated to ribosome translation. The cell, however, cannot 196 allocate the entirety of its ribosomes to this task because this would halt the production of other necessary 197 protein components. Therefore, to find the optimal partitioning of ribosomes that maximizes the growth 198 rate, one has to impose additional algebraic constraints. The first algebraic constraint reflects the fact that 199 maximal growth is achieved when the building block influx matches the outflux. If influx is higher than 200 the outflux, then buidling blocks unnecessarily accumulate in the cell, and if the outflux is higher, then the 201 building block pool will become completely depleted and the chemical reactions will halt. Because B is being 202 produced and consumed at matching rates, the building block pool does not grow over time, and we retrieve 203 the constraint by setting the left-hand side of Eq 7a to zero and K M = 0. The second expression imposes a 204 constraint on the amount of resources that has to be diverted into cell envelope to ensure a proper cell shape: To obtain optimal partitioning parameters φ that satisfy these constraints, we substituteP b ,P l , andR with 206 solutions 18a-18d. While there are three φ parameters, there are only two degrees of freedom given that 207 Thus, we have a system of two linear equations in two unknowns, which we solve to obtain 208 the optimal partitioning of the proteome such that the flux of resources is balanced between catabolic and 209 anabolic processes, and the cell has a proper shape given its size: Substituting φ opt r and φ opt l in Eq 18a-18d, one retrieves molecular abundances in terms of model parameters. 211 Substitution of the same into Eq 19, allows one to obtain expressions for optimal proteome mass fractions. 212 Finally, substitution in formula forλ (Eq 17a) yields a maximal steady-state growth rateλ max : Where = m s β/γ is the resource cost of the unit S/V, or the resource investment in a unit of surface area 214 per unit of cytoplasmic volume. For example, if = 2, then each added unit of the surface area requires twice 215 as many amino acids relative to the added unit of cytoplasmic volume.

217
Intuitively, the first term in Eq 22a is the steady-state growth rate in the limit of no additional resource sink, 218 and the second term represents a deviation from maximal achievable growth due to cell envelope production. 219 The parameter Θ is the bioenergetic cost of cell envelope production, which depends not only on the actual 220 amount of resources that go into this cellular feature ( Π) but also on the rates of all cellular processes. 221 Previous theoretical developments represent a special case of Eq 22a. In the limit of no investment into cell 222 envelope (Π = 0), Eq 22a reduce to those reported in [15], and taking this further to the case without protein 223 degradation (d p = 0) yields the result in [17].

225
While the explicit solution for Θ appears complicated, some insight can be gained by looking at the special 226 case when there is no degradation (d l = d p = 0): The part Π corresponds to the resource cost of producing the entire envelope structure. Parameter can be 228 interpreted as the envelope cost of the cell with Π = 1 µm −1 . This is because γ is the total number of amino 229 acids per unit volume and m s β is the total number of amino acid equivalents required to build the unit of 230 the cell surface. Given that Π is the surface-to-cytoplasmic volume ratio, the whole term Π is the cost of 231 producing surface relative to the whole amino acid budget of the cytoplasmic volume. The fractional term in 232 Eq 23 captures the cost of producing the enzyme machinery that builds the actual envelope. One cellular 233 process supplies the building blocks at the per unit proteome rate κ n , and two other cellular processes are 234 competing for this common pool at the per unit proteome rates of κ l and κ t . For instance, if κ l is decreased, 235 the cell maximizes growth rate by overexpressing cell envelope-producing machinery in order to compensate 236 for the low per per unit proteome rate, thus increasing the total costs of the envelope.

238
The expressions for proteomic mass fractions are retrieved by substituting partition parameters in Eq 19 for 239 optimal partition that maximize the growth rate (Eq 21a and Eq 21b): One can immediately see that ribosomal mass fraction is monotonically decreasing and envelope-producer 241 mass fraction is monotonically increasing function of the surface-to-volume ratio, Π. The analytical solution forλ max , and optimal Φ R and Φ L in the saturation limit was cross-validated by 243 comparison to the numerically maximizedλ with algebraic constraints of the finite ribosomal pool (c1) and 244 geometric constraint on the cell shape (c2), as described in Section 3.2, and optimization was performed 245 using Nelder-Mead algorithm. To ensure that the optimizer obtains the global maximum, we repeated the 246 optimization 20 times for each set of model parameters. Each iteration started by randomly seeding the 247 points of the polytope. We take the highestλ as the solution of the optimization problem. We generally 248 find an excellent correspondence between analytics and numerics for both the growth rate and proteome 249 composition (right panel in Fig 3). When enzymes in the model operate far from the saturation limit (i.e., 250 when K M 0), the analytics break down (see inset). Note that large K M has a similar effect on scaling as a 251 reduction in nutrient quality of the media. Therefore, although the analytic solution neglects the presence of 252 the aqueous phase -metabolites and associated water molecules -inside the cell, the incorporation of this 253 property affects the intercept but not the overall scaling pattern. For a fuller analysis of growth-impeding 254 effects of cell envelope, see Section S1.1. Eq 22a has six parameters (κ n , κ l , κ t , , d l , d p ) and one variable (Π). Our goal was to constrain the rates of 257 chemical reactions to the values occurring in Escherichia coli, and then ask how the growth rate would scale 258 if the variable Π was altered. That is, we are interested in understanding how the growth rate would scale if 259 all bacterial species were biochemically identical to an E. coli cell, and only differed in cell size.

261
We estimated the resource cost of the envelope in a cell with a unit S/V, , by assuming that resource 262 investment is equal to the mass M of a structure. Given that our model consists of the whole envelope 263 (lipids, peptides, and sugar attachments) with mass M env and proteins of mass M prot , and that envelope and 264 proteins account for roughly 30% and 55% of total cell mass M T (see [19], and Chapter 2 in [20]), and Π = 5 265 ( Table 2): This is a crude estimate because it does not include the direct costs of envelope production and proteome 267 components (i.e., resources needed to convert one chemical compound into another one), but these usually 268 account for a small fraction of the total costs [21], and small changes in do not affect our conclusion.

270
The degradation rates are estimated separately from the literature. We assumed that the ribosomal degrada-271 tion rate is zero, and we justify this assertion by three observations. First, ribosomal rRNA is remarkably 272 stable in the exponential and stationary phases, and the degradation occurs only when the culture transitions 273 between these two growth stages [16,22]. Second, almost all ribosomal proteins have degradation rates close 274 to zero [23]. Third, ribosomal proteins are mutually exchangeable when damaged, meaning that replacement 275 is favored over degradation and re-synthesis [24]. The protein degradation rate was set to 0.05 per hour [25]. 276 We also tried out estimates for various protein classes [26] but found little variation in the scaling pattern 277 (Fig S1.7). The envelope degradation rate d l is difficult to estimate exactly due to the diverse components 278 that make up this structure, but was set to 1 per hour [27]. This study reports a hypermetric scaling of the 279 peptidoglycan turnover and the growth rate (d l = 0.7 × (Log[2]/τ ) 1.38 ). Assuming that an E. coli cell divides 280 in 30 minutes, we have d l 1. Unfortunately, this is a very crude estimate based on the peptidoglycan 281 degradation rate in B. subtilis, which constitutes a large part of its envelope. Peptidoglycan degradation rates 282 are somewhat lower in E. coli [28] but on the other hand, that species has an outer membrane containing 283 polysaccharides, and we do not know how fast this component is degraded.

285
The rate constants of chemical reactions (κ t , κ l , κ n ) were inferred from Escherichia coli proteomic data 286 across different growth conditions, leveraging growth laws derived in Section S1.2. We ignore the intercept 287 of regressions (Eq S1.11a-S1.13b) and infer parameters strictly from the slopes of these relationships. The 288 slope of the regression of the growth rate on the mass fraction of ribosomal proteins allows one to compute 289 κ t from Eq S1.11a. Similarly, the value of κ l can be computed from the slope of the regression of the growth 290 rate on the mass fraction of envelope-producing enzymes and Eq S1.11b. Our data for growth rates across 291 different bacterial species are normalized to 20 • C using Q 10 correction with the coefficient of 2.5. Hence, we 292 also normalized the inferred rate constants using the same method. Estimates of rate constants for E. coli 293 are reported in Table 2 Table 2. Parameterization of the model. Standard errors were obtained using error propagation (see Section S1.9). † Surface-to-volume ratio represents the average across different dimensions of E. coli in our dataset. * Calculated in the text. All rate parameters are Q10-corrected as described in text. • Rate constants estimated from the noted study.
Because κ n captures both the intrinsic efficiency of metabolism to convert nutrients into building blocks and 295 the nutrient state of the external environment, there will be one κ n for every medium that E. coli is reared in. 296 Intuitively, one would expect minimal media to have lower κ n than rich media, as the latter contains better 297 quality nutrients and thus leads to more amino acids being generated per unit time per molecule P b . The 298 parameter κ n is calculated from the slopes of ribosomal mass fraction across growth conditions with varying 299 concentrations of translation inhibitor, after plugging in values of κ l , d p , and Π in Eq S1.12a. The values 300 are computed for the following media: M63 with glycerol, M63 with glucose, casamino acids with glycerol, 301 casamino acids with glucose, rich defined media with glycerol, and rich defined media with glucose.

303
One can intuitively explain the estimated rate constants by using logic outlined in [30]. It takes roughly 10 9 304 glucose molecules to produce the entire carbon skeleton of an E. coli cell. Given that an amino acid has, on 305 average, about the same number of carbon atoms as a single glucose molecule (5.35 C-atoms), one could say 306 that the cell requires 10 9 amino acid-equivalents for its construction. Therefore, with roughly 3 × 10 6 copies 307 of metabolic proteins in the cell, each building block producer making k n = κ n m p = 2 × 325 amino acids 308 per hour, it will take ∼ 30 minutes for the metabolic proteins to generate enough amino acids to replace 309 the cell. This is also the experimentally measured cell division time of an E. coli cell grown under favorable 310 conditions. For the protein synthesis rate (i.e., the elongation rate) κ t , note that each ribosome converts 311 κ t × m r amino acids into proteins (where the length of the ribosome m r is 7336 amino acids). This means 312 that the inferred translation rate k t in our model is about 25 amino acids per ribosome per second, which 313 is close (albeit slightly larger) to empirical values for an E. coli cell [20]. Length of metabolic protein was 314 taken to be the median length of an E. coli protein (325 aa from [31]), and ribosome length is set to the total 315 number of amino acids in ribosomal proteins (7336 aa from [32]). 316 4 Results

317
The model of the cell developed in section 3 purports to explain two phenomena: (1) the optimal partitioning 318 of the proteome when the same bacterial species is reared under different growth conditions; and (2) the 319 scaling of the growth rate and proteome composition across bacterial species of different shapes and sizes. 320 Both sets of predictions are tested in the ensuing sections. Applying the developed theory to data on growth 321 and proteome composition of the model organism Escherichia coli, we first show that our model yields a 322 good qualitative description of physiological responses to changes in the environment. We then use this 323 correspondence between theory and data to estimate the parameter values that yield a good quantitative fit 324 as well (Section 4.1).

326
By parameterizing our model with values obtained from E. coli, one can address how the growth rate is 327 supposed to scale with the shape and size of the cell (Section 4.2). This implicitly assumes that all bacterial 328 species are biochemically identical to E. coli, such that the rates of all chemical reactions are the same. Lastly, 329 in an attempt to simplify the expectation and remove dependence on parameters that are difficult to estimate, 330 we look at the special case when degradation is absent, and the envelope is costly enough such thatλ max is 331 inversely proportional to Π (Section 4.3). The internal homeostatic mechanisms allow the cell to allocate the proteome to different cellular tasks such that 334 the growth rate is maximized. While E. coli cells achieve this via alarmone ppGpp [33], our model is agnostic 335 of the exact mechanism and simply assumes that such resource-tuning strategy exists. One can express the 336 proteome mass fraction of a particular component as a linear function of growth rate when the latter is altered 337 via changes in growth conditions (derived in section S1.2). Given that there are three cellular processes in our 338 representation of the cell (building block production, cell envelope synthesis, and protein synthesis), we ask 339 how the proteome composition changes when each process is perturbed. More precisely, we are interested in 340 how the proteome composition changes as the growth rate is modulated by changing the values of κ n , κ l , or κ t . 341

342
The first expectation is that the ribosomal mass fraction of the proteome Φ R scales with the growth rate 343 across different nutrient conditionsλ N as: The cell re-balances building block supply and biosynthesis demand by allocating proteome to the limiting 345 process, which is, in this case, protein synthesis. The proteomic data for E. coli qualitatively corroborate this 346 expectation (Fig 4, upper left panel). Data from different studies appear to show a small amount of variation, 347 but the overall trend is strong. The second expectation is that the envelope-producer fraction increases with 348 the growth rate under nutrient perturbation: The slope captures the fact that faster growth implies that a cell has to produce its envelope faster. Given 350 that each envelope-producer operates at a constant rate, the only way this constraint can be met is by 351 allocating more enzymes to this task.

353
When inferring parameters, one must account for the fact that Π might depend onλ. Π is largely independent 354 of the growth conditions when the latter is varied with translation-inhibiting antibiotic (Fig S1.5). However, 355 Π decreases with the growth rate when the bacteria are cultured with different carbon sources [34][35][36]. We 356 account for this confounding factor by plugging the equation for the relationship Π ∼λ N derived from 357 empirical studies into Eq 28, and then infer κ l using the non-linear least squares method (see Section S1.5 for 358 details). While the proteomic data on envelope-producer component is less accurate compared to ribosomal 359 proteins (probably owing to low abundance of envelope-producers), the predicted positive scaling still occurs 360 in the two most comprehensive proteomic studies (pink circles and orange squares in lower left panel of  [17]. Each line corresponds to a different media as outlined in the corner legend. Lower right panel contrasts changes in ΦR across nutrient conditions (orange points) with ΦR changes under envelope producer inhibitors (black and grey dots), with data from [36]. Lines signify the ordinary least squares fit to the data from [17] (upper right panel), and [29] all other panels.
The third expectation is concerned with scaling of ribosomal mass fraction under translation inhibition: As the concentration of a translation-inhibiting drug is increased, so is the fraction of inhibited ribosomes. 364 The cell compensates for this inhibition by overexpressing ribosomes, which causes a negative scaling between 365 Φ R andλ T . We see a good fit to data (upper right panel in Fig 4), and similar relationships can be obtained 366 by using the alternative source of data from [36] (Fig S1.5).

368
The fourth prediction is that the response of Φ R to nutrient quality perturbation is the same as the response 369 to envelope-synthesis inhibition. This follows from the fact that the relationship is identical to Eq 27. Thus, the prediction is that points from two types of perturbation experiments ought 371 to fall onto the same line. Indeed, the treatment of E. coli culture with fosfomycin (peptidoglycan synthesis 372 inhibitor) and triclosan (fatty acid synthesis inhibitor) reduces both the growth rate and the ribosomal mass 373 fraction (lower-right panel in Fig 4). One can intuitively understand this behavior by noting that the cell 374 responds to the envelope-synthesis inhibition by overexpressing envelope-producer to compensate for the 375 poisoning of the enzymes, which reduces the resources available for ribosomal proteins thus causing the 376 reduction in their expression. The theory is tested by comparing the expected to the observed scaling relationships using growth, cell shape, 379 and proteomic data across heterotrophic bacteria. Details on data collection and normalization are reported 380 in Section S1.6 and S1.7. This requires specifying model parameters. Because we are dealing with a somewhat 381 metabolically homogeneous set of species (i.e., all are heterotrophs), we assume that all the cellular reaction 382 rates are identical, and the only feature that varies across bacteria is Π. Furthermore, most of the maximal 383 growth rates in our dataset were measured in the media containing yeast extract, so we assume that κ n is 384 fixed across species as well. The prerequisite for these assumptions to hold is that the type of metabolism 385 of a bacterial species does not correlate with cell size and shape. That is, the variation in metabolism and 386 external environment might affect dispersion around the scaling but does not affect the scaling itself.

388
The model is determined by six parameters: rate constants of three types of enzymes (κ n , κ l , κ t ), degradation 389 rates (d p , d l ), and resource cost of the unit S/V ( ). The degradation rates of lipids, peptidoglycan, and 390 proteins are set to values determined in pulse-chase experiments obtained from the literature (see Section 3.4). 391 We take the energetic cost of the envelope, , from our earlier calculation (Eq 26). Rate constants are not 392 only related to turnover numbers of respective enzyme classes but also depend on other cellular properties 393 that are not explicitly accounted for in the model, such as the nutrient status of the environment, the number 394 of enzymes belonging to a particular sector, the topology of a particular biochemical pathway, the amount of 395 nucleic acid in the cell, and so on. Thus, by estimating rates from the empirical relationships we derived in 396 Section 4.1, one can "collect" the effects of the explicitly unaccounted cellular properties into these three 397 parameters.

399
The protein synthesis rate, κ t , can be estimated from the slope of Eq 27 if the degradation rate d p is known. 400 The envelope synthesis rate, κ l , is retrieved from the slope in Eq 30, if the envelope cost Π is known. 401 And finally, the nutrient processing rate κ n is inferred from slopes of changes in Φ R as the growth rate 402 is perturbed by the addition of translation-inhibiting drug chloramphenicol (Eq 29), after plugging in all 403 of the previously-estimated parameters. Because κ n also depends on the nutrient quality, the values were 404 inferred for various media ranging from nutrient-depleted (M63 with glycerol) to nutrient-replete (RDM with 405 glucose). All rate constants were estimated by fitting physiological scaling relationships to E. coli proteomic 406 data. We first classified each protein into one of the three sectors based on its function (for details, see 407 Section S1.7), and then pooled their individual masses to compute the total mass fractions Φ R and Φ L . The 408 surface-to-volume ratio was calculated from the linear dimensions of the cell (Section S1.3). Cytoplasmic 409 volume was calculated by first subtracting 2× thickness of the bacterial envelope from the linear dimensions 410 of the cell, where the latter was taken to be 30 nm [37]. We find that the curves parameterized with κ n from minimal (black) to rich media (orange) capture most of 413 the variation in growth rate and proteome composition data (Fig 5). Hence, in principle, one could attribute 414 the scatter along the y-axis to differences in efficiencies in which bacteria take up and convert the nutrients 415 into building blocks. Furthermore, Φ L is expected to have a slightly positive and Φ R a negative scaling with 416 Π. The prediction is not unambiguously corroborated by the data. A large amount of variation in proteomic 417 data around the expectation might reflect growth and temperature differences which we do not know how to 418 correct for. Also, Φ L is a low abundance proteome sector as it accounts for no more than a few percent of 419 total proteome mass. These two effects might jointly make the data unreliable. 420 Figure 5. The comparison of expected and observed scaling relationships. Upper row: scaling of growth rate with S/V across bacterial species (left pane), and scaling in long-term evolution experiments (right pane); colored curves denote expected theoretical scaling given by Eq 22a and parameterized with κn estimated from various media, as reported in Table 2. Lower row: scaling of envelope-producer (left pane) and ribosomal mass fractions (right pane).
A large variation in growth rate data across bacteria might also stem from variation in morphology (e.g., 421 motility structures), metabolism (e.g., aerobic or anaerobic capabilities), or environment (e.g., from oceanic 422 sediments to digestive tract of mammals). Ideally, one would want to compare the growth rates across species 423 with the same type of metabolism and the same environment and only differ in shape or size. This way, all of 424 the morphological, metabolic, and environmental factors would be controlled, and the entirety of variation in 425 growth can be attributed to variation in size. We attempted to control for these issues by using data from 426 experimental evolution studies, where cells were kept in approximately exponential phase for a long period of 427 time. In the wild, cell size and shape are determined by a myriad of ecological factors [38]. When these cells are 428 taken out of their natural habitat and propagated in the nutrient-rich medium, the ecological factors dictating 429 the size of the cell are removed. For instance, large cells might be deleterious in situ owing to size-selective 430 predation but they do not suffer the same disadvantage when reared in the laboratory setting where predators 431 are absent. Under laboratory selection for fast growth, a mutant with lower S/V has a growth advantage 432 over the wild-type due to lower relative investment in envelope, and will spread and take over the population. 433 In this situation, our theory predicts three outcomes. First, increasing growth rate should be accompanied by 434 a reduction in S/V. Second, the mass fraction of envelope-producers should decrease because the envelope 435 synthesis becomes a less of an obstacle with inflation of the cell size. Third, given a diminished need for syn-436 thesizing envelope-producers, a larger mass fraction of the proteome can be allocated to the ribosomal proteins. 437

438
To test the first prediction, we analyzed the scaling of the growth rate with S/V in Lenski's long-term 439 evolution experiment in Escherichia coli and with a more recent experimental evolution of Mycoplasma 440 mycoides. First, we used S/V in combination with the relative fitness data for four LTEE time points spanning 441 50,000 generations [39]. Given that the relative fitness is the ratio of Malthusian parameters of evolved to 442 ancestral line determined from the head-to-head competition (see [40]), we converted this metric to the growth 443 rate by multiplying it with the growth rate of the ancestral strain (see V max in [41]). However, because the 444 competition experiment lasts for one day, competing strains enter into stationary phase after a few hours, so 445 the competitive advantage of the evolved line might reflect not only differences in the growth rates but also 446 the ability to survive in the stationary phase. Thus, we sought to cross-validate our approach by including 447 data from [42], which reported the growth rates and cell volumes across LTEE. Cell volumes were converted 448 to S/V using the empirical scaling S = 2πV 2/3 reported to hold across many bacterial species [43]. Second, 449 we collected relative fitness and cell size of wild-type Mycoplasma mycoides and M. mycoides whose genome 450 has been minimized by removal of non-essential genes [44]. Comparing LTEE data with our analytical model 451 (Fig 5), we find that the observed scaling falls very near to the expected scaling for E. coli grown in the 452 media with similar composition (M63) as that used in LTEE (Davis broth with glucose). Note that both 453 indirect estimates (red points) and direct measurements (green points) of the growth rate fall onto the same 454 line. A similar trend is observed in the experimental evolution of M. mycoides.

456
The latter two predictions were tested using data on the changes in the absolute transcript abundance over 457 the course of the long-term experimental evolution in E. coli [45]. To convert transcript abundance into 458 protein abundance, we first computed the number of proteins per transcript for each E. coli gene using data 459 in [46], and then multiplied transcript abundance of each gene in long-term evolution experiment with the 460 corresponding conversion factor. One can then obtain the mass fraction of a particular protein by multiplying 461 the abundance with its molecular mass and normalizing by the total mass of the proteome. We find that 462 the average ribosomal mass fraction slightly increases (0.210 ± 0.02 vs. 0.218 ± 0.01), while the average 463 envelope-producer mass fraction decreases (0.034 ± 0.001 vs. 0.028 ± 0.0008) during 50,000 generations of 464 the experiment. This roughly coincides with the predicted scaling (orange and blue lines in Figure 5). It 465 is currently unclear why the mass fractions do not adhere to the same line with the growth rate (black 466 line); Perhaps this reflects differences between strains, given that Lenski used derivates of B strain while 467 the expected scalings were plotted using parameters inferred from experiments with BW25113 and a strain 468 derived from MG1655. Given the uncertainty of model parameters (especially envelope degradation), we attempted to simplify the 471 theoretical expectation by focusing on the simplest possible case when degradation is absent, and the costs of 472 the envelope are sufficiently high that Eq 22a can be approximated asλ max ∝ Π −1 , Eq 21a as Φ R ∝ Π −1 , and 473 Eq 21b as Φ L ∝ Π 0 . While this scenario might not be the most realistic, the benefit is that it does not require 474 any further specification of model parameters because the prediction is that the slopes of the regression 475 Log 10 (λ max )∼Log 10 (Π) and Log 10 (Φ R )∼Log 10 (Π) should be −1, and the slope of Log 10 (Φ L )∼Log 10 (Π) should 476 be zero. Assuming that the shape is constant across bacteria, similar scalings can be obtained with cell 477 volumes as the independent variable -given that Π ∝ V −1/3 . One expects Log 10 (λ max )∼0.33 Log 10 (V ), 478 Log 10 (Φ R )∼0.33 Log 10 (V ), and Φ L retains independence as with S/V. We use these expectations as the null 479 hypothesis for the slopes in the regression analysis. In addition to scaling expectations, we also wish to test 480 whether surface-to-volume is a better predictor of the growth rate than the cell volume. To this end, we 481 performed OLS regression with either S/V or V as independent variables, and the growth rate, or proteomic 482 mass fractions as the dependent variable of various bacterial species (Table 3). While our theory relates the 483 growth rate to the ratio of the surface area to cytoplasmic volume (S/V cyt ), we also examined whether the 484 dependent variables scale with surface area to the total cell volume (S/V tot ). In total, the analysis included 485 the growth rate data from 229 species, and ribosomal and envelope-producer mass fractions from 41 and 30 486 species, respectively. Graphic representation of regression analysis is reported in Section S1.11.

496
We cross-validated these results by analyzing a recently-published study [47], which contains data on linear 497 dimensions of the cell and the minimal doubling time. Although we do not find any correlation between cell vol-498 ume and the growth rate, we find a very weak negative relationship (adjusted R 2 = 0.08) between S/V and the 499 growth rate (Fig S1.11) with a scaling exponent of −0.97 which is not significantly different from -1 (p = 0.92). 500 Given that bacterial species with large S/V (i.e., small coccal or thin helical cells) are frequently parasitic, it 501 is possible that the pattern is mainly driven by large-S/V species growing more slowly because of their natural 502 habitat and not because of trade-offs in investments. If that is the case, then excluding parasitic species from 503 the dataset would eliminate negative scaling. To control for this confounding factor, we separated data into 504 free-living and host-associated species and then performed the regression analysis (see Section S1.12). We find 505 a very weak negative relationship between S/V and the growth rate in both of these datasets and pooled data, 506 indicating that preponderance of small-celled species in pathogens is unlikely to account for the scaling pattern. 507 508 Thus far, we have assumed that all species have an envelope that is ∼30 nm thick, like that of an E. coli [37]. 509 However, larger bacterial species may have thicker envelopes that require higher resource investments. If 510 resource investments stay the same across the size range, then the variation in growth cannot be explained by 511 the invariant S/V. To account for this possibility, we collected data on envelope thicknesses across 45 species 512 and used these values to compute Π (see Section S1.10). We find thatλ max scales with Π with exponent 513 −1.15 (Fig S1.8) which is not significantly different from −1 (p = 0.422). Similarly,λ max scales with V cyt 514 with exponent 0.37 (Table S1.6) which is not significantly different from 0.33 (p = 0.749). Therefore, the 515 inclusion of envelope thickness data does not alter the conclusions reached by using fixed thickness.

517
Three conclusions are reached from the preceding analysis. First, Π is a moderate predictor ofλ max , and it 518 accounts for roughly a quarter of growth rate variation in the whole dataset. Second, S/V cyt is slightly better 519 predictor of growth rate than S/V tot . Third, the proteome composition qualitatively -but not quantitatively 520 -fits the expected pattern. The ribosomal mass fraction of the proteome scales negatively with Π, albeit with 521 a slope that is shallower than the expected −1, and the envelope producer mass fraction appears independent 522 of Π, which is in accordance with theory. While the central trait in the preceding analysis is the growth rate, one can also obtain the expression for the 525 metabolic rate, and the scaling of this feature can be further used to test our theory. The total metabolic 526 rate of the cell that is in the steady-state,Q, is the sum of the rates of all cellular reactions: where Q x corresponds to the number of ATPs hydrolyzed by the process x, which can be amino acid produc-528 tion, cell envelope prodution, translation, protein degradation, and degradation of cell envelope. Substituting 529 P b ,P l ,R, andL for optimal abundances yields the steady-state metabolic rate of the cell as a function of 530 the rate constants and other model parameters. The full expression is cumbersome, and is only reported in 531 supplemental Mathematica notebook. The steady-state metabolic rateQ depends on the parameters reported 532 in Table 2, and on additional parameters indicating the amount of ATP used by each reaction are needed. 533 These are estimated in supplemental section S1.14, and reported in Table S1.9. The expected scaling of the 534 steady-state metabolic rate is tested by using the measurements of the metabolic rate over the course of 535 long-term experimental evolution in E. coli [48]. Note the cell volume measured in the original study was 536 converted to S/V using the empirical scaling S = 2πV 2/3 .

538
The analysis yields three insights. First, the observed scaling roughly matches the expected scaling relationship 539 (Fig 6, left pane), meaning that one can predict the metabolic rate from our model without any additional 540 fitting parameters. Second, both growth rate and metabolic rate exhibit scaling of an organism that was 541 reared in the same medium. That is, growth and metabolic rate in LTEE fall close to the black line in Fig 542  5 and Fig 6, meaning that a single nutritional capacity parameter κ n can account for the full behavior of 543 the model. Third, the theory suggests the mechanism behind the scaling of metabolic rate. Namely, smaller 544 ancestor has larger S/V and thus invests a larger portion of its metabolism to envelope synthesis, which is 545 energetically less demanding process than the protein synthesis (middle pane). As the S/V decreases, the 546 relative energy expenditure due to envelope synthesis decreases, and a greater portion of the metabolic rate is 547 accounted for by the elevated rate of protein synthesis (right pane), which is a more energetically demanding 548 process. In other words, with reduction of S/V, the cell shifts from a cheap to an energetically expensive 549 process, resulting in an increase of overall energy consumption.

551
Perhaps the good fit between the theory and measurements is surprising, given that many cellular processes 552 that require energy -such as signaling, motility, synthesis of nucleic acids, and so on -are not explicitly 553 accounted for in our model. Note, however, that we have parameterized Eq 31 using rate constants that were 554 inferred from the changes in protemic composition with the growth rate (Section 3.4), and that in doing so, 555 all of the aspects of the cell that were not explicitly accounted for in the model were "collected" in these 556 three rate constants (namely, κ n , κ t , and κ l ). 557 Figure 6. The scaling of the metabolic rate with S/V in the long-term experimental evolution of E. coli. Left pane: Comparison of measured metabolic rate and theoretical expectation (Eq 31) for different values of nutritional capacity κn inferred earlier. Points are metabolic rate measurements reported in [48]. Middle pane: the energy expenditure of protein and cell envelope synthesis. The contribution of these two processes to metabolic rate was obtained by setting Q l = Q dp = Q dl = 0 (for protein synthesis), and Qt = Qaa = Q dp = Q dl = 0 (for cell envelope synthesis). Right pane: expected partitioning of the total metabolic rate among processes involved in protein and cell envelope synthesis.
However, our theory cannot explain the scaling of metabolic rate across diverse prokaryotes, given that the 558 cross-species metabolic rate scales with exponent of 2 with the cell volume [49,50], which is much larger 559 than the exponent of 1/3 observed in the long-term experimental evolution of E. coli. It is currently unclear 560 what causes this discrepancy.

562
Motivated by the observation that larger bacteria tend to grow faster, we proposed a theory that explains 563 this relationship in terms of a trade-off between investment into surface features and biosynthetic machinery 564 that builds the cell. By formalizing this verbal statement into a quantitative model, we obtained predictions 565 that we sought to test with published data. Three conclusions are reached. First, the model recapitulates the 566 previous physiological responses (Φ R ∼λ N , and Φ R ∼λ T ), and uncovers new ones (Φ L ∼λ N , and Φ R ∼λ L ). 567 Second, it reveals that the natural variable that governs the scaling of the growth rate with bacterium size is 568 the ratio of the cell surface area to volume and not the cell volume itself. And third, the model correctly 569 predicts negative scaling of ribosomal content with S/V. On the other hand, the purported mild positive 570 scaling of Φ L is not unambiguously corroborated by proteomic data.

572
We attempted to control for factors that might affect the scaling. By cross-validating with an independently 573 published data, we find that our conclusions are not an artifact of the dataset. By performing separate regres-574 sions with data on host-associated and free-living species, we excluded the possibility that the overall scaling 575 is caused by small-celled species being dominated by organisms selected for slow growth to avoid excessively 576 damaging their host. By including data on envelope thickness when calculating S/V cyt , we excluded the 577 possibility that the envelope becomes thicker with cell size, thus increasing the relative investment into the 578 envelope as cells become larger [51,52]. By comparing growth rates across experimentally evolved studies where 579 fast growth is selected, we controlled for other factors -such as the environment that bacteria inhabit -that 580 confound the scaling of the growth rate. Indeed, the scaling identical to cross-species comparisons holds as well. 581

582
Based on the preceding analysis, we propose the following scenario for the origins of the scaling of bacterial 583 size and growth. Bacteria inhabit diverse environments that favor various cell shapes and sizes, due to 584 size-selective predation, swimming efficiency in viscous environments, maximization of nutrient uptake rate, 585 enhancing dispersal, and so on [38,53]. The shape and size, in turn, set the maximal growth rate that a 586 bacterium can achieve via investment trade-off between biosynthetic and envelope-producing processes. While 587 our analysis was restricted to one particular theory, one might propose alternative explanations. These are 588 now discussed. 589 5.1 Gene-repertoir hypothesis 590 It was proposed that the larger bacteria have larger genomes and a greater gene repertoire, which allows 591 them to metabolize a more diverse set of nutrients (or use them more efficiently) compared to their smaller 592 counterparts [3]. There are a few problems with this idea. First, it is difficult to test this hypothesis because 593 it does not offer a quantitative prediction on how genome length or gene number scaling should translate to 594 the scaling of the growth rate. For example, should one expect the growth rate to scale with an identical 595 exponent as gene number? If so, data in [3] refutes this hypothesis, given that the growth rate scales with the

Internal diffusion constraints hypothesis 603
An alternative hypothesis is that larger cells grow faster because their cytoplasm is less crowded, thus 604 alleviating constraints from the internal diffusion of macromolecules [42]. According to this hypothesis, 605 growth is proportional to the abundance of metabolic proteins, implying that faster growth can be only 606 achieved by increasing their abundance. However, increased abundance of effectors can slow down intracellular 607 diffusion and thus reduce the encounter rates between cellular components. To offset this problem, an increase 608 in the total mass of these effectors has to be accompanied by a faster increase of volume, such that the 609 density of the cell decreases. On the contrary, volume and mass scale proportionally in our data set (slope of 610 0.93 ± 0.084 with adjusted R 2 = 0.88 is not significantly different from unity, p = 0.412), meaning that larger 611 cells are not less dense. Furthermore, if selection optimizes the density of cytoplasm such that the rates of 612 cellular reactions are maximized [54,55], it is not clear why decrease in density would not increase the mean 613 time required for two proteins to collide with one another and, thus, decrease the rate of cellular reactions. 614

A link between SMK law and cross-species growth rate scaling?
615 Finally, it is widely known that faster-growing cells tend to have larger volumes (colloquially referred to 616 as SMK or Schaechter-Maaloe-Kjeldgaard law), e.g., when growth is altered by rearing bacteria in media 617 of differing nutrient quality [56], and one might argue that the scaling of growth rate across bacteria and 618 across time in lines selected for fast growth is driven by the same underlying mechanism. The correlation 619 between size and growth within species is explained by the threshold initiator model, which assumes that 620 the cell divides once a critical threshold of a particular division protein is reached [35]. According to this 621 model, the cell has to balance between synthesis of division protein and ribosomes. When the cell is reared in 622 nutrient-rich conditions, it allocates a greater proteome mass fraction to ribosomal proteins to support fast 623 growth, and fewer resources are left for the division protein. Therefore, the cell produces division protein 624 more slowly than it grows in size, causing the average cell to be larger [57].

626
It is possible to co-opt this model in an attempt to explain cross-species correlation between size and growth. 627 Here, the idea is that fast-growing species have a more efficient metabolism than their slow-growing counter-628 parts and thus allocate more resources toward ribosomes, leading to the reduction in the rate of synthesis of 629 the division protein.
There are two problems with this explanation. First, it is unclear why selection would 630 make metabolism of some bacteria efficient but not others. Perhaps this might be attributed to differences in 631 environments that they inhabit. For example, bacteria living in other organisms might not be selected for fast 632 growth as a mechanism for preventing harm to the host, whereas selection in free-living species relentlesly 633 promotes faster growth. We show that this is not so, and that the negative correlation between S/V and 634 the growth rate holds both among host-associated bacteria, as well as among the free-living species (Section 635 S1.12). To illustrate this point, consider Streptococcus pyogenes, inhabiting throat and dividing in 25 minutes, 636 and Treponema denticola, inhabiting mouth and dividing in 20 hours. One is left wondering why selection in 637 two species living in similar environments yields dissimilar metabolic efficiencies. Second, the SMK-based 638 theory does not explain the difference in exponents between cross-species and within-species scaling. That 639 is, the within-species scaling exponent of growth rate with S/V is −1.68 and not −1, as is the case for the 640 cross-species scaling.

642
Perhaps one could argue that the observed cross-species scaling is caused by the fact that small-S/V species 643 require richer media, so that the perceived higher growth rate is merely the consequence of correlation between 644 nutrient concentration and S/V. However, most of the maximal growth rates in our data were measured in 645 media of similar nutrient quality containing yeast extract and peptone. Furthermore, many of the large-S/V 646 species -like spirochetes and mollicutes -require rich media with bovine serum albumin.

648
One may also use the SMK-based model to explain correlated changes in cell size with the growth rate in lines 649 experimentally evolved for faster growth. Cells start with fairly inefficient metabolism, and over the course 650 of experimental evolution, their metabolic capabilities are gradually improved. Over time, this causes an 651 increasing re-allocation of the proteome from metabolic proteins and division protein to ribosomal proteins, 652 thus inflating the cell. This hypothesis predicts that if a faster-growing evolved line is forced to divide more 653 slowly such that it has the same growth rate as the ancestor, it would also have the same cell size as the 654 ancestor. However, rearing an evolved population in a chemostat at lower dilution rate still yields larger cells 655 relative to the ancestor [58], implying that increase in cell size cannot be explained only as a by-product of 656 the physiological response to growth rate change.

658
One can speculate about alternative explanations for the growth scaling pattern that have not been entertained 659 in the literature. Two such scenarios come to mind. First, small-celled bacterial species have a smaller 660 number of proteins in the cell, meaning that the noise in gene expression and partitioning of molecules at 661 cell division time is exacerbated. This stochasticity can disbalance the optimal stoichiometry of biosynthetic 662 pathways and, in turn, cause a reduction in the growth rate. Second, following the reasoning proposed in [59], 663 fast-growing cells may be bigger because this reduces the noise in gene expression of metabolic proteins. 664 The idea is that external ecological factors (such as availability of nutrients) determine the cell's growth 665 rate independent of the cell size. Faster-growing cells have a greater mass fraction allocated to ribosomal 666 proteins, implying that a smaller mass fraction is allocated to other metabolic proteins. If the cell volume 667 does not change, the faster-growing cell will experience a greater noise in the expression of metabolic proteins, 668 owing to the low-copy number of this sector. Hence, fast growers might have evolved larger sizes to dampen 669 this stochasticity. A problem with both of these hypotheses is that one would intuitively expect that the 670 growth rate scales inversely with the volume of the cell. In contrast, judging from our preceding analysis, 671 surface-to-volume seems to be a stronger predictor. The theory developed here conflicts with three previously published conclusions. First, contrary to the 674 assumption that the selection coefficient (i.e., the evolutionary cost) of a particular cellular feature is directly 675 proportional to the fraction of total resources invested in the production of the trait [4,60], here we show 676 that this is not necessarily true, and that it can also depend on the rates of cellular reactions (Eq S1.2-S1.4). 677 This is because the costs of the trait also have to include the costs of machinery that builds the trait (in this 678 case, the envelope-producing enzyme), and the amount of enzymes produced depends on how fast it operates 679 relative to other proteome components: If the enzyme is slow, then a large fraction of the proteome has to be 680 allocated to it to achieve an appropriate flux. Second, although a recently published analysis [61] concluded 681 that there is no correlation between cell diameter and growth rate across prokaryotes, our analysis reveals 682 that growth rate correlates with S/V, at least when one focuses on bacteria for which both linear dimensions 683 are available.

685
Third, it was argued that the observed scaling of various bacterial features necessarily imposes the upper limit 686 on how large a bacterium can be [62]. According to this view, the ribosomes have to replicate themselves and 687 all other proteins within the cell doubling time. So, if ribosomes, other proteins, and doubling time scale with 688 different exponents, it is possible for a cell to reach a particular volume at which it does not have enough 689 ribosomes to replicate the entire proteome within the doubling time inferred from a specific power function -690 an event deemed the "ribosome catastrophe". However, given that many functions can fit the same data, it is 691 unclear whether fitting pure power functions is adequate. Indeed, our model can explain the scaling of both 692 proteome and growth rate without any fundamental cell size limit. Of course, there might be many reasons 693 why bacteria cannot evolve extreme cell sizes, but inferring this limit from the scaling laws within extant 694 bacterial species may be a questionable approach. Our model also offers a causal explanation for the correlation between genomic features associated with 697 translation and growth rate. For example, faster-growing bacteria tend to have a greater number of rRNA 698 operons [63]. According to our theory, bacteria grow slowly because they cannot allocate enough resources to 699 ribosomes, given that other surface-related constraints have to be satisfied. Suppose the high copy number of 700 rRNA genes is caused by the need to meet the high demand for ribosomes. In that case, species with large 701 S/V will have no selective advantage in possessing additional gene copies, meaning that they will be purged 702 by selection to reduce the costs of replicating the added DNA. The same explanation holds true for tRNA 703 genes. Unfortunately, our theory does not yield the precise expectation for the scaling of genomic features 704 because our model does not include genome as a resource component, so this hypothesis cannot be tested 705 rigorously at the moment. Nonetheless, we find a negative log-log scaling between S/V of the species and its 706 rRNA (p < 10 −9 , adjusted R 2 = 0.18) and tRNA (p < 10 −13 , adjusted R 2 = 0.26) gene number (see Section 707 S1.13 in S1 Appendix for details).

709
In addition to explaining the scaling of growth rate across bacterial species, the developed theory may also 710 explain an increase in cell size with an increasing growth rate in experimentally evolved lines. Mechanistically, 711 a genotype with an increased cell size will reduce the metabolic burden of synthesizing the cell envelope and, 712 hence, confer a higher growth rate. The cell size in long-term experimental evolution in E. coli exhibits a 713 step-like trajectory, where relatively long intervals of cell volume constancy are interupted by short bursts of 714 inflation of cell volume [64]. Interestingly, both relative fitness and cell size experience increase in roughly 715 three bursts in first 1500 generations that appear to coincide with each other [65]. It is possible that the high 716 correlation between size and fitness in early stages of the experiment are caused by the mechanism described 717 above. This does not mean that other factors -such as enhanced nutrient import [66,67] -do not improve 718 the fitness, but rather that reduction in envelope-related costs might be one of them. It is widely known that smaller eukaryotes tend to grow faster than larger ones. However, this trend does not 721 hold in bacteria, where small-celled species grow slower. We propose that small bacteria -compared to their 722 larger counterparts -have to invest a greater fraction of the total resources to the cell envelope owing to their 723 large surface-to-volume ratio, leaving fewer resources for internal biosynthetic processes that build the cell. 724 By representing the cell as being composed of proteins (that convert nutrients into biomass) and cell envelope, 725 we find that cells with large surface-to-volume ratios grow more slowly because they have to invest more 726 resources in the production of the cell envelope and the enzyme machinery that builds this structure, thus 727 leaving fewer resources to ribosomes that replicate the cell. These predictions are corroborated by comparison 728 with growth rate data across more than 200 bacterial species.

730
The model presented here is not universal, as it cannot account for the scaling of growth across the entire 731 Tree of Life, most notably eukaryotes. We would expect the growth rate to monotonically increase as S/V 732 decreases, but we know that this is not true because eukaryotes with much smaller S/V have growth rates 733 that decrease with size [68]. Even large bacteria, like Metabacterium polyspora with a volume of 480 µm 3 , 734 have doubling times measured in days [69]. Similarly, although our theory can explain size changes in E. 735 coli lines selected for fast growth, it fails to account for size changes in eukaryotes under the same selective 736 pressure. For example, propagation of Kluyveromyces marxianus in pH-auxostat leads to increase in S/V, 737 which contradicts our theory [70]. Therefore, the scaling law proposed here breaks at some point, as other 738 constraints become more dominant. Although advances in integrating multiple physical constraints acting at 739 once have been made [71], the theory that mechanistically unifies them and derives expected scaling laws 740 remains to be developed. impeding effects of cell envelope, derivation of physiological scaling laws, corrections for variation in S/V 747 across cell cycle and growth conditions, data collection procedures, the sensitivity of the scaling to variation 748 in protein degradation rates, correction for cell envelope thickness, cross-validation of the scaling trends with 749 independent dataset. The method for obtaining errors of inferred parameters is described. 750 S1 File. Notebooks and scripts. Contains Mathematica notebooks for reproduction of entire theoretical 751 derivation, spreadsheets with raw data, and R scripts used for data processing. The authors declare that they have no competing interests.

765
Supplementary Information -S1 Appendix 1 2 S1.1 The growth-impeding effects of cell envelope 3 We are ultimately interested in how the growth rate (λ max ) scales with the cell size across bacterial species 4 with different shapes. We start by investigating the cell without degradation processes. In the limit of 5 saturation kinetics, our model yields a closed-form solution for the maximal attainable growth given a 6 particular cell size and shape. For simplicity, let us consider a case when degradation is absent (d p = 0, 7 d l = 0), so that the Eq S1.9a simplifies to: This expression has a simple, intuitive interpretation. The first term corresponds to Monod's law [1], meaning 9 that the growth rate is a hyperbolic function of the nutrient quality (κ n ) and protein synthesis rate (κ t ). As 10 more nutrients are supplied (i.e., κ n increased), the production of building blocks becomes less limiting than 11 the downstream step of incorporating those components into the biomass. In the limit of infinite resource 12 concentration (κ n → ∞), the rate of growth will equal the rate at which those resources are converted into 13 the proteins by ribosomes (λ max = κ t ). Likewise, in the limit of infinitely fast translation (κ t → ∞) that 14 instantaneously convert amino acids into biomass, the rate of growth will be identical to the rate at which 15 nutrients are assimilated and supplied to ribosomes (λ max = κ n ). 16 17 The second term in Eq S1.1 captures the impediment of the growth rate caused by the production of the 18 cell envelope, which acts as a resource sink. Here, two important insights emerge. First, the impediment 19 term is a monotonically increasing function of Π, meaning that either smaller or elongated cells ought to 20 have low growth rates. For example, two cells with identical cell volumes should have different growth rates 21 if one is round and the other is elongated. Second, the growth rate impediment depends not only on the 22 bioenergetic costs of the envelope -set by Π term but also on the rates of cellular reactions. The term 23 Θ can be intuitively interpreted as the time required to produce a unit of surface area relative to the time 24 needed for the production of a unit of proteome. Because κ is the rate of the biochemical step, 1/κ is the 25 mean time to completion of that step. If the rate of the envelope-producing enzyme is low such that the time 26 to build an envelope is large, then the cell has to reallocate more of its proteome toward envelope producers. 27 28 Therefore, the envelope has two kinds of growth-impeding effects, or costs: (1) structure-related, via allo- 29 cation of resources to the envelope itself; and (2) machinery-related, via allocation of resources to enzymes 30 that build the envelope. Both of these features divert resources away from processes that replicate the cell 31 and thus impede the growth. The costs are computed by comparing the growth rates with and without 32 a particular feature. This is simply the definition of a selection coefficient in reproduction occurs in the 33 continuous-time [2]. For instance, the total growth impediment caused by envelope is: The structure-related cost is obtained by comparing growth rates of the cell lacking both structure and 35 machinery (Π → 0) to the one with structure and without machinery (κ l → ∞). Intuitively, the proteome 36 fraction allocated to envelope producers tends to zero when the envelope synthesis rate is infinitely fast. 37 Likewise, the machinery-related cost is retrieved by contrasting the growth rates of the cell with structure 38 and without machinery (κ l → ∞), with the one that has both of these components (wild type growth rate). 39 In the first case, the sole cause of the difference in growth rates is the presence of the structure, while in the 40 second case, it is the presence of machinery: Three interesting consequences emerge from expressions S1.2-S1.4. First, the machinery costs decrease as 42 envelope synthesis rate κ l increases (because ∂s M /∂κ l < 0). The faster the envelope producer is, the fewer 43 proteins are needed to achieve the same net rate. Second, the structure cost depends not only on the re- 44 sources needed for the envelope as a structure ( Π) but also on nutrient processing and protein synthesis 45 rates. Envelope synthesis competes with protein synthesis for the common building block pool, so even 46 instantaneous production of the envelope diverts resources away from protein synthesis and thus slows down the growth. Third, the total cost increases with Π (because ∂s T /∂Π > 0), and structure costs eventually 48 come to dominate the total costs while machinery cost approaches zero (as Π → ∞, s S → s T and s M → 0). 49 50 The inclusion of protein degradation in the model does not change the previously-reached conclusions sig-51 nificantly. Letting d l = 0 and d p > 0, in Eq S1.9a yields: Relative to the no-degradation model, the growth rate is further impaired in two ways (solid and dashed lines 53 in Fig S1.1). First, the maximal attainable growth rate in the absence of envelope is lowered because resources 54 are now partly dissipated via protein degradation (purple lines). Second, the overall growth scaling is reduced 55 because nutrient-and envelope-producing enzymes have to be overexpressed relative to the base scenario 56 without degradation (orange lines) to compensate for lowered flux caused by the continual degradation of 57 machinery. Finally, note that κ n > d p for the growth rate to be positive. If this condition is not met, the 58 cell dissipates resources faster than it assimilates them, ultimately leading to complete destruction. Despite 59 these quantitative differences, one still expects the growth rate to eventually approach inverse scaling with 60 Π, as in the no-degradation case. κ t κ n κ n + κ t 1 1 + Θ , Θ = (κ n + κ l )(κ n κ t + d l (κ n + κ t )) Π (κ n + κ t )(κ n κ l − d l (κ n + κ l ) Π) (S1.6) Two consequences are immediately clear. First, the asymptotic growth as cells become large (Π → 0) is the 64 same as in the no-degradation model. An infinitely large cell (i.e., Π → 0) still has to re-cycle proteins, but it does not have to re-cycle envelope because S/V approaches zero, implying that the asymptotic growth 66 rate is larger in the case of an envelope-than in protein degradation case. Second, for every piece of an 67 envelope that is added, envelope producers have to be overexpressed relative to the no-degradation case to 68 meet an ever-increasing degradation demand. More formally from Eq S1.6, ∂Θ/∂Π > 0 (i.e., costs increase 69 with Π) and ∂ 2 Θ/∂Π 2 > 0, (i.e., costs increase increasingly fast). This behavior ultimately leads to a critical 70 surface-to-volume ratio Π crit where the entire proteome is devoted to building, and re-cycling envelope and 71 no resources are left for ribosomes. We find this point by settingλ max = 0 in Eq S1.6 and solving for Π: To prove that our interpretation is correct, set d p = 0 in the expression for optimal fraction of ribosomes 73 allocated to ribosome translation (Eq S1.8a), then substitute Π crit , and lastly simplify to zero. The value 74 of Π crit increases with nutrient quality κ n and envelope synthesis rate κ l because faster rates mean that 75 fewer resources have to be allocated to the respective proteome component to achieve the same total flux.

76
Moreover, it decreases with envelope degradation rate d l and costs as this requires heavier processing and 77 synthesis machinery investment, thus shifting the critical point to smaller cells. Finally, allowing for both 78 envelope and lipid degradation (Eq S1.9a) only exacerbates the envelope burden described here (red solid 79 line).

S1.2 Derivation of physiological scaling laws
In the main text, we derived the optimal partitioning of ribosomes: which gives the maximal growth rate: The optimal mass fractions of proteome are then obtained by replacing φ parameters in the following ex-84 pressions: Now we have all the ingredients to derive relationships between the growth rate and proteome mass fractions 86 when cellular processes are perturbed. These regularities are usually referred to in literature as growth laws.

88
The growth rate and proteome composition depend on the external environment in which the cell is reared. 89 We obtain this dependence by modulating each of the three rate constants κ n , κ t , and κ l and deriving the 90 response that the cell elicits. We obtain the changes in proteome composition when the nutrient quality of 91 the media is varied in three steps. First, Eq S1.9a is solved for κ n , which is now the function of the growth 92 rate which is modulated by changing the amount of nutrients in the medium (λ N ). Second, this result is 93 placed in Eq S1.8a and S1.8b to obtain optimal ribosomal allocation. Third and final, these expressions are 94 plugged into formulae for the proteome mass fractions (Eq S1.10). After rearrangement: Growth rate can also be altered either by changing the concentration of translation inhibitor (λ T ). Thus, 96 by solving Eq S1.9a for κ t and following the same procedure outlined above, we get the optimal partitioning 97 when the growth rate is varied by translation-inhibiting antibiotic: Finally, when the growth is perturbed via changes in concentration of envelope synthesis inhibitor (λ L ), we 99 solve Eq S1.9a for κ l and obtain: Thus, the mass fractions are a linear function of the growth rate, when the latter is perturbed either by S1.11a): if κ t is small, then modulating growth by rearing the cells with better carbon source requires a 126 faster increase in ribosomal mass fraction relative to the case when protein synthesis rate is large. A similar 127 interpretation holds for other growth laws: if the step is slow, then the cell has to allocate resources to it 128 faster to meet the same increase in growth rate. There are generally three types of growth laws, illustrated 129 in Fig S1.2: proteomic changes when nutrients are varied (Eq S1.11a, S1.11b), the translation rate is varied 130 (Eq S1.12b, S1.12a), and the envelope synthesis rate is varied (Eq S1.13b, S1.13a). Upper row: Proteomic changes when nutrient quality and protein synthesis rate are modulated. Diamond and circle symbols represent numerically-computed optimal proteome mass fractions, and lines denote corresponding analytic solutions. Black line -ΦL (Eq S1.12b); Colored solid lines -ΦR (Eq S1.12a); Colored dashed lines -ΦB (the rest of the proteome mass). Lower row: Proteomic changes when nutrient quality and envelope synthesis rate are modulated. Black line -ΦR (Eq S1.13a); Colored dashed lines -ΦL (Eq S1.13b); Colored solid lines -ΦB (the rest of the proteome). Colors correspond to different κn values reported in the legend.

S1.3 Formulas for surface area and volume of bacteria 153
In the main text, we use formula in Table S1.1 when calclating surface area and volume of the cell:

Surface area Volume
Capsule πDL π(D/2) 2 (L − D/3)  We see that ∆Π is at least an order of magnitude smaller than Π across species with different. Therefore, 169 although S/V varies across the cell cycle, this variation is much smaller than variation in S/V across bacterial 170 species, and one can assume it fixed for a given species.
Another potential limitation of our model is that it does not include the synthesis of septum, which itself The derived growth laws assume that Π of a given species is constant and not a subject to change. However, 180 linear dimensions of the cell change across growth conditions. For example, cells tend to be larger and have 181 smaller Π when reared in nutrient-rich media [9], and the cell size can change in non-intuitive ways when 182 the culture is treated with various antibiotics [10][11][12]. Given that we are inferring capacity parameters (κ n , 183 κ t , and κ l ) from the slopes of the growth laws, it is possible that variable Π confounds are estimates. To 184 account for this growth-dependancy, we establish empirical relationships between Π andλ max from published 185 studies, and then use substitute it in the previously-derived growth laws.

187
The capacity parameters are inferred from three different equations. Parameters κ l and κ n are estimated from 188 Eq 26 and 27 in the main text, and these depend on Π, while κ t is inferred from Eq 25 and is independent of Π.

189
Cells reared in rich media have smaller Π, and this pattern holds across three experimental studies [9,10,12]: Substituting Π in Eq 28 of the main text for Eq S1.14-S1.16, one obtains the following expressions for Φ L as the function of nutrient-modulated growth rateλ N : where we parameterize d l and , and infer κ l by non-linear least squares analysis using R package 'nls'. The 193 value of κ l are similar regardless of the source of Π(λ N ) (Table S1.2), and the overall fit to data seems almost 194 identical ( Figure S1.4). In the main text, we use κ l inferred from Volkmer's study.  Second, it is possible that Π changes when growth is perturbed with translation-inhibiting antibiotic. To 196 assess this, we collected data from three studies, and found that surface-to-volume is mainly independent 197 of the growth rate, with only two out of 11 examined conditions resulting in correlation between Π and 198 translation-perturbed growth rate. Thus, one can assume that Π is fixed when inferring κ n , which is the 199 approach we take in the main text.
200 Figure S1.4: Scaling of the envelope-producer mass fraction ΦL as the function of nutrient-modulated steady state growth rateλN . In the main text, we use Π(λN ) inferred from Volkmer2011 data, but note that using alternative datasets Basan2015 or Si2017 yields almost identical scaling. Pink circles denote data from [13].
210 Figure S1.6: The effects of accounting for dependence of Π on growth conditions using different datasets on scaling patterns. Each line denotes κn and κ l being inferred from a dataset in the legend in the upper right panel. Parameters κ l and κn are inferred from ΦL ∼λN and ΦR ∼λT , respecitvely, and we account for changes in Π across nutrientor chloramphenicol-perturbed growth rate using data on cellular dimensions from [9,10,12]. Thick blue line denotes scaling using the mean of κn values across conditions in Table S1.3.

211
Growth rate and cell size data were obtained from the literature. Because most of the species were reared 212 in nutrient-rich medium, often containing yeast extract, under optimal temperature, pH, and salinity, we 213 assumed that the reported values correspond to the maximal growth that a species can attain. Given that dif-214 ferent species are grown in disparate temperatures, we normalized growth rates to 20 • C using Q 10 -correction 215 with Q 10 coefficient of 2.5, to exclude the potential confounding effect of temperature on growth scaling. The 216 linear dimensions of the cell were also obtained from the literature, and were used to calculate cell volume 217 and surface. Each cell is classified into one of the three categories based on its shape: spheres, rods, and 218 helices.

220
Because we are interested in whether variation in growth rate can be solely explained in terms of vari-221 ation in cell size and shape, we work with chemoorganoheterotrophic species. This ensures that varia-222 tion in growth is not caused by differences in the type of metabolism that species have. However, even 223 chemoorganoheterotrophs may generate energy in a variety of ways. One of the biggest differences is be-224 tween oxidative phosphorylation and substrate phosphorylation (i.e., fermentation), the latter having lower 225 energetic content. Hence, one can argue that obligate fermenters ought to grow slower owing to a slower rate 226 of energy extraction from nutrients. In our model, this would translate to an organism having a lower κ n .

227
If the growth rate of E. coli under anaerobic conditions is 63% of the growth rate under aerobic ones [14], 228 one can assume that κ n of anaerobes is also 63% of κ n of aerobes. Formally, let f be the growth rate as the 229 function of nutritional capacity κ n (Eq S1.9a). Then the growth rate of an anaerobe (λ * max ) is going to be a 230 function of the anaerobic nutritional capacity (κ * n ). Let anaerobic κ n be α = 0.63 of aerobic κ n : where a is conversion factor that depends on the model parameters and is given by: a = (κ l (d p − κ n )κ t + d l Π(κ l + κ n )(d p + κ t ))(κ l (ακ n + κ t ) + Π(κ l + ακ n )(d p + κ t )) (κ l (κ n + κ t ) + Π(κ l + κ n )(d p + κ t ))(κ l (d p − ακ n )κ t + d l Π(κ l + ακ n )(d p + κ t ))) (S1.30) Therefore, to compare aerobes and anaerobes on the same footing, the growth rate of anaerobes obtained from 233 the literature should be multiplied by the factor of a; The term a can be biologically interpreted as the fold-234 increase in growth rate that anaerobes would experience if they were to switch to oxidative phosphorylation 235 as the means of generating energy. More precisely, a is obtained by taking the ratio of aerobic (with 236 nutritional capacity being κ n ) to anaerobic growth rate (with nutritional capacity being 0.63κ n ). Each information on their energy metabolism from the literature. Lastly, when we could not find data on the type 241 of energy-generation pathways, we simply assumed that those species respire. 242 S1.7 Proteomic data analysis 243 We use proteomic data to 1) infer the rate constants via physiological scaling laws, and 2) test the predictions 244 on the scaling of the proteomic composition with the surface-to-volume ratio of the cell.

246
As outlined in the section S1.2, one needs to co-measure the growth rate across different conditions and 247 the proteome mass fraction allocated to three sectors to infer capacities from the slope of regressions (Eq 248 S1.11a-S1.13b). To this end, we use two types of studies: Those that directly quantified absolute abun-249 dances of each protein [13,[26][27][28][29][30], and those that indirectly measured ribosomal mass fraction from the 250 total RNA-to-total protein ratio [3,12,[31][32][33]. We use [13] as a primary source because it has the highest 251 coverage of the E. coli proteome (in excess of 90% of the proteome is detected). To estimate mass frac-252 tions belonging to nutrient-processing, lipid-producing, or ribosomal protein, we classified the total mass (in 253 units fg/cell) of each reported protein in one of the three possible groups based on its designation in KEGG 254 BRITE database [34]. If a protein's assigned function in the BRITE database had keywords "fatty acid 255 biosynthesis", "lipopolysaccharide biosynthesis", "peptidoglycan biosynthesis", we classified such protein as 256 contributing to Φ L : If protein's designation contained the keyword "ribosomal protein", we classified it as 257 contributing to Φ R . All other proteins were grouped in the Φ B fraction. After binning, we calculate each 258 mass fraction by dividing the total mass of proteins per cell in that class by the total mass of proteome per 259 cell. Ribosomal mass fractions from indirect sources are obtained by multiplying the total RNA/total protein 260 ratio with a conversion constant as reported in [33]. Some studies reported relative protein abundances, and 261 we converted these into relative mass fractions by multiplying each entry with the molecular mass of the 262 given protein.

264
To test the prediction on scaling of proteome composition with S/V, we collected quantified proteomes of 265 bacterial species with S/V ranging from less than 5 µm −1 in Lactococcus lactis to more than 50 µm −1 in 266 Spiroplasma poulsonii. In addition to proteomic studies, we also collected data for ribosome abundance 267 and converted it to proteome mass fractions in the following way. First, the total mass of ribosomes was 268 calculated as: where m aa = 110 g/mol is the molar mass of an average amino acid, N Avg = 6.022 × 10 23 molecules/mol, M prot = 0.54M cell (S1.32b) For cross-species data, we used proteomes from PaxDB [15], a recent compendium of proteomes across 100 277 species [16], a collection of ribosomal abundances from [17] and [19], and a number of additional quantitative 278 proteomic studies not reported in above-mentioned databases: Borrelia burgdorferi [18], Treponema pallidum 279 [25], Polynucleobacter asymbioticus [23], Mesoplasma florum [20], and Spiroplasma poulsonii [24]. In total, 280 we have quantified Φ R and Φ L for 40 and 30 bacterial species, respectively. 281 S1.8 Sensitivity of the growth rate scaling to variation in d p

282
Because protein degradation rates can vary depending on the protein class in question, we examined how the 283 growth rate scales with Π under various estimates of d p obtained from Table 2 in [36]. There is almost no 284 variation in the scaling pattern. Degradation rates were calculated by dividing Log (2) Figure S1.7: The growth rate scaling is robust to variation in protein degradation rates. All other rate constants as reported in the main text, and κn was taken for the medium with casamino acids and glucose.

S1.9 Error estimates of inferred model parameters 288
The standard error of the rate constants inferred from the correlation between growth rate and proteomic 289 mass fractions was calculated using the propagation of errors [37]. If Z is the function of k independent 290 parameters (i.e., Z = f (z 1 , z 2 , ..., z k )), then variance of Z can be expressed in terms of variances of k 291 independent parameters z as: Let c = Q (20−T )/10 10 = 0.2106 be the temperature correction for biological rates, with Q 10 = 2.5 and T = 37 • C.

293
One can compute the variances of the inferred parameters from their respective formulae (Table S1.5), and 294 the standard error of the mean is then σ/ √ n, where n is the number of data points included in regression.

295
Standard error of κ l is reported directly from non-linear least squares used to obtain the estimate.

296
Parameter Parameter estimate Parameter variance Regression larger-celled organisms might experience larger turgor pressure, so it has been suggested that cell walls have 301 to become thicker to prevent an increase in stress on the wall. Therefore, the thickness of the cell envelope 302 should scale inversely with the diameter of the cell, implying that one should observe 1/3 power scaling 303 between thickness and cell volume. More precisely, stress σ exerted on the cell wall is: where h is envelope thickness, V is the volume of the cell [38]. Hence, as the cell volume increases, the 305 benefit of producing less surface area relative to the cell's volume may be canceled by that surface being 306 thicker. Although we see a weak positive correlation, the scaling exponent is ∼ 1/10 and thus much smaller 307 than the expected 1/3 (left panel in Fig S1.8). This conclusion is not affected by the two outlier points with 308 the thinnest envelopes, and the slope is almost identical when these are excluded. Next, to correct for the difference in the cell envelope across species, we obtained the species-specific Π by 310 first subtracting 2h from both width and length of the cell and then calculating the volume and the surface 311 area; This gives us the S/V cyt while accounting for the fact that cell envelope might differ from the assumed 312 30 nm. The correction shifts points toward larger (smaller) Π when the envelope is more (less) than 30 nm 313 thick, as these cells ought to invest more (less) resources per unit of surface-to-volume. Note that correction 314 also moves points along y-axis because of the correction for anaerobic metabolism depends on Π. We find 315 that the transformation leaves points largely unaltered and still following the same trend expected by our 316 theoretical expectations (Fig S1.8).  Table S1.6: Regression analysis of cell envelope-corrected data. Each element reports the regression coefficient and its standard error. Parentheses slope in the null hypothesis. S1.11 Regression analysis in the limit of no degradation and large 318 envelope costs 319 Figures S1.9 and S1.10 plot the regression lines reported in Table 3 in the main text.  Spirochaetes), and that selection does not maximize growth to preserve the host. In that case, parasites 323 might have lost the ability to achieve fast growth even outside of the host in a nutrient-replete environment.

324
Although our data lacks environmental data, we looked at an independently collected sample that does have 325 this information [39] and found a negative correlation between Π and the growth rate both in the pooled 326 data as well as when the data was separated into free-living organisms and those that are host-associated 327 (Fig S1.11). This scaling also holds if the regression analysis is separately applied to the free-living species 328 (slope −0.87) and those that are associated with the host (slope −1.44), and both slopes are not significantly 329 different from −1. The same conclusions hold if S:V tot is used as an independent variable (see Table S1.7).  The scaling exponent is not significantly different from −1 (Table S1.7), indicating that the negative scaling 331 is not because high-S:V region is dominated by host-associated organisms that might have been selected for 332 slow growth. Genome database [40]. We find a negative correlation between S:V cyt and both tRNA and rRNA gene copy 336 number and a positive correlation between the growth rate and the same genomic features (Fig S1.12).
337 Figure S1.12: Size features as predictors of bacterial genomic features. Black line denotes OLS regression reported in Table S1.8. Three conclusions are reached. Firstly, we observe that the number of tRNA and rRNA gene copies increases 338 with the growth rate, and both of these features decrease with S:V. Secondly, cytoplasmic volume V is a 339 worse predictor of tRNA and rRNA gene repertoire than S:V. Thirdly and finally, S:V accounts for more 340 variation in the number of tRNA genes than in the number of rRNA genes (Table S1.8). 341 S1.14 Parameterization for the metabolic rate scaling 342

Regression
For parameterization of the metabolic rate equation, we use previously estimated model parameters (Table   343 2 in the main text), with an additional parameters retrieved from the literature (Table S1.9). Specifically, 344 not only that one needs to know the rates of chemical reactions, but also how much ATP is consumed by 345 these transformations. The cost is defined as the sum of the number of ATP molecules that has to be 346 hydrolyzed to produce a particular conversion (e.g., from a building block into a unit of cell envelope), and the number of ATPs that could have been synthesized from NADH that was oxidized in the same process.

348
This respectively corresponds to the direct costs and the opportunity cost of missed synthesis, as outlined 349 in [41]. We assume that degradation of cell envelope constituents does not require energy, because we could 350 not find data on ATP-dependence of cell wall hydrolases and other degradatory enzymes. Table S1.9: Parameterization of the metabolic rate equation. † Estimated in the text. • Assumed to be zero due to lack of data. * Total protein concentration (molecs/µm 3 ) taken from [46], and the volume of E. coli taken as average across measurements in [9].
Parameter m s is the total number of building block molecules required to produce a single unit of cell ρ l = n l /n l , ρ pgn = n pgn /n l , ρ lps = n lps /n l , ρ aa = n aa /n l (S1.36) We can now calculate the number of ATP hydrolysis events required to convert building blocks into cell 361 envelope unit: 362 Q l = 3ρ l c l + ρ pgn c pgn + ρ lps c lps + ρ aa c aa (S1.37) Lastly, the total number of building blocks needed for production of a single cell envelope unit is computed using costs of each constituent obtianed from [41,43]: m s = (27 + 4)ρ aa + 223ρ pgn + 232 × 3ρ l + 2243ρ lps ATP 27ATP/aa (S1. 38) which is reported in Table S1.9.