The origin of universal cell shape variability in a confluent epithelial monolayer

Cell shape is fundamental in biology. The average cell shape can influence crucial biological functions, such as cell fate and division orientation. But cell-to-cell shape variability is often regarded as noise. In contrast, recent works reveal that shape variability in diverse epithelial monolayers follows a nearly universal distribution. However, the origin and implications of this universality are unclear. Here, assuming contractility and adhesion are crucial for cell shape, characterized via aspect ratio (AR), we develop a mean-field analytical theory for shape variability. We find that a single parameter, $\alpha$, containing all the system-specific details, describes the probability distribution function (PDF) of AR; this leads to a universal relation between the standard deviation and the average of AR. The PDF for the scaled AR is not strictly but almost universal. The functional form is not related to jamming, contrary to common beliefs, but a consequence of a mathematical property. In addition, we obtain the scaled area distribution, described by the parameter $\mu$. We show that $\alpha$ and $\mu$ together can distinguish the effects of changing physical conditions, such as maturation, on different system properties. The theory is verified in simulations of two distinct models of epithelial monolayers and agrees well with existing experiments. We demonstrate that in a confluent monolayer, average shape determines both the shape variability and dynamics. Our results imply the cell shape variability is inevitable, where a single parameter describes both statics and dynamics and provides a framework to analyze and compare diverse epithelial systems.

D'Arcy Thompson argued, in his book On Growth and Form, physical principles could explain tissue packing and cell shape [1]. Shape formation of tissues and organs during embryogenesis is a long-standing, fascinating problem of developmental biology. Since cells are the functional units of a tissue, shapes in the organs must originate at the cellular level [2][3][4][5]. Cell shapes are vital in both health and disease. As cancer progresses [6,7], as asthma advances [7][8][9][10], as wounds heal [11,12], as an embryo develops [10,13], cells progressively change their shape. Besides, cell shape may influence crucial biological functions, such as cell growth or selective programmed cell death (apoptosis) [14], the orientation of the mitotic plane [3,15,16], stem cell lineage [17,18], terminal differentiation [19,20], and division-coupled interspersion in many mammalian epithelia [21]. Moreover, the nuclear positioning mechanism in neuroepithelia depends on cell shape variation [22]. Thompson regarded cell-to-cell shape variability as a biologically unimportant noise [1]; however, it is now known that shape variability is not an exception but a fundamental property of a confluent cellular monolayer [23]. In a seminal work, Atia et al [10] showed that cell shape variability, quantified by the aspect ratio (AR), follows virtually the same distribution across different epithelial systems. But, the origin of this near-universal behavior, and whether it is precisely universal, remains unclear.
Reference [10] argued that this behavior in an epithelial monolayer is related to the jamming transition. Previous works have established the similarity in dynamics between cellular monolayers and glassy systems [8,24,25]. The glassy dynamics in confluent systems has been ar- * ssadhukhan@tifrh.res.in † saroj@tifrh.res.in gued, via vertex model (VM) simulations, to be controlled by a rigidity transition akin to the jamming transition [8,26]. In a jammed granular system, the statistics of tessellated volume, x, follows a k-Gamma distribution, P (x, k), where k is a parameter. From this association, Ref. [10] conjectured the viability of describing the AR distribution via P (x, k). However, such an association has several problems: (1) The glassy dynamics in a cellular monolayer is also reasonably described by Voronoi and cellular Potts models (CPM) [27,28]. In these models, in contrast to the VM, there is no rigidity transition [28,29]. Hence, the relevance of jamming physics to an epithelial system remains unclear.
(2) More important, the experiments and simulations in Ref. [10] show that this distribution persists even in the fluid regime, where jamming physics is not applicable even within the VM.
(3) The analytical derivation of the k-Gamma distribution in granular packing [30] relies on the fact that x is additive, whereas, as the authors of Ref. [10] rightly point out, AR, r, is not. Thus, there exists no rigorous basis for the applicability of the k-Gamma distribution [10].
Yet, Ref. [10] and several subsequent works [31][32][33][34] have shown that the probability distribution function (PDF) of scaled AR, r s , in diverse biological and model systems is described by P (r s , k), and the value of k is almost universal, around 2.5. Furthermore, the standard deviation, sd, vs the mean AR,r, follows a universal relation [10,33,35]. What is the origin of this universality? What determines the value of k ∼ 2.5? How is the PDF of r related to the microscopic properties of a system? Answers to questions like these are crucial for deeper insights into the cell shape variability and unveiling the implications of the universality. However, it requires an analytical theory that is rare in this field due to the inherent complexity of the problem and the presence of many-body interactions.
In this work, we develop a mean-field analytical theory for cell shape variability. We find that the AR distribution is described by a single parameter, α, containing all the system-specific parameters, leading to a universal relation between sd andr. On the other hand, the PDF of r s is not strictly, but virtually, universal; k ∼ 2.5 is a direct consequence of a mathematical property. We also obtain the PDF for the scaled area, a, and show that it is not universal, in contrast to what has been proposed elsewhere [36]. We demonstrate that simultaneous measurements of the PDFs for a and r can reveal the effects of changing physical conditions, such as maturation, on the individual model parameters. We have verified our theory via simulations of two distinct models of a confluent epithelial monolayer: the discrete lattice-based CPM on square and hexagonal lattices and the continuous VM (see Supplementary Material (SM), Sec. S3 for details). Moreover, comparisons with existing experimental data on a wide variety of epithelial systems show excellent agreements. We illustrate that the same parameter describes both statics and dynamics, governs the origin and aspects of universality, and, thus, provides a framework to analyze and compare diverse epithelial systems.
Analytical theory for the shape variability Simplified model systems, representing cells as polygons, have been remarkably successful in describing both the static and dynamic aspects of an epithelial monolayer [8,13,26,[37][38][39][40]. The energy function, H, governing these models is where N is the number of cells, the first term constrains area, A i , to a target area A 0 , determined by cell height and cell volume, with strength λ A . Experimentally, it has been shown cell height remains almost constant in an epithelial monolayer [13]. The second term describes cortical contractility and adhesion [13,26,41]. It constrains the perimeter, P i , to the target perimeter P 0 with strength λ P . The energy function, Eq. (1), can be numerically studied [42] via different confluent models, such as the VM [13,26], the Voronoi model [27,37], or more microscopic models such as the CPM [43][44][45] and the phase-field model [34,46,47]. We aim to develop a mean-field theory for a microscopic version, as schematically shown in Fig. 1(a), to calculate the shape variability. The mechanical properties of a cell, for most practical purposes, are governed by a thin layer of cytoplasm just below the cell membrane, the cortex. Therefore, the crucial contribution should come from the perimeter term. We describe the cell perimeter via n points, where l j is the infinitesimal line-element between jth and (j + 1)th points (Fig. 1a). Now, AR and area can vary independently. We assume that the area constraint is satisfied and ignore the first term in Eq. (1). This assumption is motivated by our simulation results, where the AR distribution is nearly independent of λ A (Fig. 2f).
Next, consider the perimeter term: λ P P 2 i − 2λ P P 0 P i ; the first term represents contractility, and the second, effective adhesion; we have ignored the constant part. Exact description of adhesion is complex [23,48,49]. The VM represents it as a line tension: ij Λ ij ij , where ij is the length between two consecutive vertices, i and j, and Λ ij gives the line tension. Since the degrees of freedom in the VM are the vertices, considering Λ ij constant, we obtain Eq. (1). The constant Λ ij implies a regular cell perimeter between vertices. However, the entire cell boundary is in contact with other cells, and the perimeter is irregular in experiments [10,49]. We take a more general description, where the tension in a lineelement l j (Fig. 1a) is proportional to l j with strength P 0 . Thus, the adhesion part becomes −λ PK P 0 j l 2 j , whereK is a constant. On the other hand, the contractile term is proportional to ( j l j ) 2 . To make the calculation tractable, we use the Cauchy-Schwartz inequality [50] and write ( j l j ) 2 ≤ n j l 2 j = ν j l 2 j , where ν ∼ O(n) is a constant. Then, the perimeter part of H becomes νλ P j l 2 j = νλ P (1 − KP 0 ) j l 2 j , where K =K/ν. Thus, contractility and adhesion act as two competing effects (Fig. 1a). Since it is unclear how to measure P 0 in experiments, we mainly restrict our discussions on λ P and T . Unless otherwise stated, we assume P 0 remains constant. However, we show in simulations that our mean-field theory captures the variation in P 0 (Fig. 2h). Cell division and apoptosis rates are usually low in epithelial monolayers [8,12]; for example, they are of the order of 10 −2 per hour and per day, respectively, for an MDCK monolayer [51]. Nevertheless, we show in the SM (Fig. S4), the general conclusions of the theory remain unchanged even when their rates are significant.
We describe the cell perimeter by the vector x = {x 1 1 , x 2 1 , x 1 2 , x 2 2 , . . . x 1 n , x 2 n }, representing a set of n points on the perimeter (Fig. 1a). From the preceding discussion, we have H = νλ P x(K ⊗ I 2 )x , where x is the column vector, the transpose of x, K, the Kirchhoff's matrix [52,53] with K ii = 2 and K (i−1)i = K i(i−1) = −1, and I 2 , the two-dimensional identity tensor. Thus, we have where γ = νλ P (1−KP 0 )/k B T , k B is the Boltzmann constant, and T , the temperature that parameterizes all possible activities, including the equilibrium temperature.
To calculate r, we first obtain the moment of inertia tensor, a 2 × 2 tensor in spatial dimension two, in a coordinate system whose origin coincides with the center of mass (CoM) of the cell. Diagonalization of this tensor gives the two eigenvalues, Λ 1 and Λ 2 , that are the squared-radii of gyrations around the respective princi- 3.5 In c r e a s in g α  Theoretical results for cell shape variability. (a) Schematic illustration of the mean-field model. Cell perimeter is represented by a set of points (shown for a particular cell); lj is the distance between two neighboring points. Cortical contractility and adhesion impose competing forces. (b) PDF of aspect ratio, r, is governed by the parameter α. (c) PDF of the scaled aspect ratio, rs = (r − 1)/(r − 1) is nearly universal. (d) The PDF of rs can be universal ifr + 1/r is proportional to 1/α (Eq. 7), but there is a slight deviation showing that it is not strictly universal. Inset:r also slightly deviates from the 1/α form. Lines are fits with the 1/α form. (e) Standard deviation (sd) vsr follows a universal relation; the state points move towards the origin as α increases. (f) The PDF of a follows Gamma distribution with a single parameter, µ, Eq. (8).
pal axes. Therefore, r = Λ 1 /Λ 2 , assuming Λ 1 ≥ Λ 2 . However, a direct calculation of Λ 1 and Λ 2 is complex due to their anisotropic natures. It simplifies a bit for the radius of gyration, s, around the center of mass, and we have s 2 = Λ 1 + Λ 2 [54]. The distribution of s 2 is where Z is the partition function,ẋ = 2 α=1 n j=1 dx α j , the volume element [52,53]. Note that the definition of P (s 2 ), Eq. (3), assumes the Boltzmann distribution for perimeter at T and accounts for all possible configurations allowed by H. The applicability of an effective T at the cellular level might be questionable. But the degrees of freedom here are much more microscopic (Fig. 1a): within the CPM, it's a pixel, comparable to the resolution of the experimental image processing method. At this level of description, one can safely assume a Boltzmann distribution. Additionally, analyses of the experimental systems [10,33] suggest the applicability of an effective T description. The first δ-function in Eq. (3) ensures that the origin coincides with the CoM of the cell; the second δ-function selects specific values of s 2 , giving the distribution function. We next go to the normalcoordinate system that diagonalizes K. Integrating out the coordinates (see SM, Sec. S1), we obtain where λ j 's are the non-zero eigenvalues of K whose diagonal form, without the zero-eigenvalue, is Λ 0 . Taking contour integral of Eq. (4), we obtain where λ k are the distinct non-zero eigenvalues of K and Res(λ k ) gives the residue at the pole λ k . Since the cell perimeter must be closed-looped, K is a tridiagonal matrix with periodicity. Therefore, the number of zeroeigenvalue must be one, and the lowest degeneracy of the non-zero eigenvalues must be two [52,55]; as explained below, the value of k in P (x, k) is determined by this second property. The leading-order contribution (SM, Sec. S1) comes from the smallest eigenvalue, λ, and is P (s 2 ) = Cs 3 exp(−αs 2 ), where C is a normalization constant andα = γnλ. For cells, the two eigenvalues, Λ 1 and Λ 2 , are not independent since √ Λ 1 Λ 2 = A, the cell area. Using this relation, we obtain s 2 = A(r + 1/r) and from Eq. (6), we obtain where N is the normalization constant: N = Γ(5/2)/α 5/2 −W (5/2) 1 F 1 (5/2, 7/2, −2α), where W (x) = 2 x /x, and 1 F 1 (a, b, c) is the Kummer's confluent Hypergeometric function [56], α ∝ λ P (1−KP 0 )/T . As detailed in the SM (Sec. S2), Eq. (6), together with the constraint of confluency [57,58], give the distribution for the scaled area a = A/Ā, whereĀ is the average of area, A. It is a Gamma distribution, with a single parameter µ, Since µ is related to the constraint of confluency, it should be independent of λ P ; our simulations show that this is indeed true (Fig. 2f). Therefore, α and µ together can distinguish how the model parameters λ P and T are affected by changing conditions such as maturation. Equation (7) provides a remarkable description, where all the microscopic details of the system enter through a single parameter, α; it has profound implications leading to the universal behavior as we now illustrate.

Aspects of universality
We first show the PDF of the aspect ratio, P (r), at different values of α in Fig. 1(b), P (r) decays faster as α increases, as expected from Eq. (7). The plots look remarkably similar to the experimental results shown in Ref. [10]; we present detailed comparisons with experiments later. Reference [10] has demonstrated that the PDFs of the scaled aspect ratio, r s = (r − 1)/(r − 1), wherer is the ensemble-averaged r, across different systems follow a near-universal behavior. We now plot the PDFs of r s in Fig. 1(c). The PDFs almost overlap, but they are not identical. A closer look at Eq. (7) shows that ifr+1/r goes as 1/α, we can scale α out of the equation and obtain a universal scaled distribution. However, as shown in Fig. 1(d), there is a slight deviation inr+1/r with the functional form of 1/α. As shown in the inset of Fig. 1(d),r also slightly deviates from the 1/α form. This tiny deviation implies that the PDF of r s is not universal. If one ignores 1/r compared to r, Eq. (7) becomes a k-Gamma function for r s with k = 2.5. However, since r ∼ O(1), this cannot be a good approximation, and the observed spread of k around 2.5 is natural when fitted with this function [10,33,34,59]. On the other hand, since the deviation (Fig. 1d) is minute, the PDFs of r s for different systems look virtually universal. This result is a strong prediction of the theory and, as we show below, is corroborated by available experimental data on diverse epithelial systems.
Although the PDFs of r s are not strictly universal, there is another aspect, sd vsr, which is universal. We show the sd = r 2 −r 2 as a function ofr in Fig. 1

(e).
Since there is only one parameter, α, in Eq. (7), it determines both sd andr. The monotonic dependence of r on α (inset, Fig. 1(d)) implies a unique relationship between them. Therefore, we can express α in terms ofr and, in turn, sd as a function ofr. Since there is no other system-dependent parameter in this relation, it must be universal. Note that α ∝ λ P /T at a constant P 0 , thus, α increases as λ P increases or T decreases. Both sd and r become smaller as α increases, and the system on the sd vsr plot moves towards the origin (Fig. 1e). From the perspective of the dynamics, the relaxation time, τ , of the system grows as α increases [28]. Thus, smallr and large τ , that is, less elongated cells and slow dynamics, follow each other, and the energy function, Eq. (1), controls both behaviors. Finally, we show some representative PDFs, Eq. (8), at different values of k for the scaled area a = A/Ā. The PDF of a has been argued to be universal [36]; our theory shows that although they follow the same distribution, they need not be identical.

Comparison with simulations
We now compare our analytical theory with simulations of two distinct confluent models: the CPM and the VM. Figures 2(a) and (b) show representative plots for the comparison of the PDFs of r within the CPM, and the VM, respectively, where the lines represent fits with Eq. (7). Figure 2(a) shows data with varying λ P , and Fig. 2(b) shows data with changing T . As discussed above, our theory predicts almost universal behavior for the PDFs of r s (Fig. 1c). We plot the simulation data for the PDFs of r s for both the models at different parameters in Fig. 2(c); the PDFs almost overlap with each other, consistent with the theory (see SM, Fig. S2 for more results). Within our mean-field theory, we have ignored the area term in Eq. (1), arguing that the perimeter term contributes dominantly to the shape. We have obtained α at fixed λ P , P 0 , and T but varying λ A . As shown in the insets of Figs. 2(a) and (b), within both the CPM and the VM, α remains almost constant with varying λ A , justifying our assumption. An important prediction of the theory is that the parameter α, which governs the behavior of the cell shape variability, is linearly proportional to both λ P and 1/T , hence with λ P /T . This prediction also agrees with our simulations ( Fig. 2d and SM, Fig. S6). The slopes within the CPM and the VM are different; this possibly comes from the distinctive natures of the two models, but the qualitative behaviors are the same.
We now show the behavior of the PDF of scaled cell area, a, in Fig. 2(e) with varying λ A . The lines represent the fits with Eq. (8). Larger values of λ A ensure the area constraint is more effective and the distributions become sharply peaked around the average area. The inset of Fig. 2(e) shows that µ almost linearly increases with λ A . Since the area term is related to the cell height that remains nearly constant and the geometric constraint of confluency, we don't expect a substantial variation in λ A in a particular system. However, µ also varies with T (see SM, Fig. S7). Thus, in contrast to what has been proposed [36], as T changes, the PDF of a, though welldescribed by a single parameter µ via Eq. (8), can still be different.
Figure 2(f) shows α vs µ within the CPM when we vary one of the parameters, λ A , λ P , and T , keeping the other two fixed. First, when λ A increases, the value of µ     increases, but α remains almost constant (also see insets of Fig. 2a and b). Next, when λ P increases, although α linearly increases, µ remains nearly the same. Finally, both parameters linearly increase with 1/T ; since higher T implies more fluctuations, decreasing T helps both r and a to become sharply peaked (see SM, Figs. S6, S7 for their specific behaviors, and results within the VM). These results show when λ A remains constant, varying λ P and T have distinctive effects on µ and α. These results are significant from at least two aspects: First, µ comes from the constraint of confluency (see SM, Sec. S2), which should depend only on the area and be independent of the perimeter. Thus, the λ P -independence of µ validates the phenomenological implementation [57,58] of this constraint. Second, these results can provide crucial insights regarding the model parameters. Maturation of a monolayer can affect both λ P and T . Additional junctional proteins may be employed during maturation to increase λ P . On the other hand, different forms of activity may reduce decreasing T . Since α increases linearly with λ P /T , AR alone is not enough to determine the dominant mechanism during the maturation process. However, assuming that λ A remains constant in a particular system, simultaneous measurements of µ and α allow distinguishing effects of changing physical condi-tions, such as maturation, on the individual parameters. We next verify the universal result of the theory: sd vs r. Figure 2(g) shows sd vsr within the CPM; we plot the theoretical prediction by the dotted line for comparison. The theory predicts that the state points move towards the origin as α increases (Fig. 1e); this is consistent with our simulations. Since α ∝ λ P /T , higher α should correspond to slower dynamics. To test this hypothesis, we have simulated the CPM at different λ P , P 0 , and T to obtain the relaxation time, τ (see SM, Sec. S3 for details). From these control parameters, we have calculated α and thenr using our theory. We show τ as functions of α and r in the inset of Fig. 2(g); it is clear that indeed τ grows as α increases orr decreases. To show this behavior of τ on the same plot as sd vsr, we plot 2.5/ ln τ − 0.25 in the main-figure as a function ofr. A monolayer fluidizes under compressive or stretching experiments, where cell shape changes, but not cell area [8,10,60]. Such perturbations make the cells more elongated, increasingr; thus, our theory rationalizes the decrease in τ associated with fluidization. Finally, we show that our mean-field result that α decreases linearly with P 0 agrees with simulations (Fig. 2h). Further, to test our hypothesis that our main results remain unchanged in the presence of cell division and apoptosis, we have simulated the CPM, including these processes. As shown in the SM, Sec. S7,

y 8 q t Z s 8 j i I c w T G c g g d X U I M 7 q E M D K D z C M 7 z C m y O d F + f d + Z i 3 F p x 8 5 h D + w P n 8 A Y 4 d j y A = < / l a t e x i t >
In c r e a s in g α    Table I. Inset: PDF of the rs, the lines are theory, and the symbols are data. (b) Using the values of α, obtained for the three sets of data as quoted in Table I, we obtain sd vsr using our theory. Different symbols represent the types of the systems, and the colors blue, red, and black represent early, intermediate, and later time data, respectively. With maturation, the system moves towards lowerr and smaller sd. the simulation results justify our hypothesis.

Comparison with existing experiments
Having shown that our theory agrees well with the simulations, both within the CPM and the VM, we next confront it with the existing experimental data. We first compare the theory with data taken from Ref. [10] for three different confluent cell monolayers: the MDCK cells, the asthmatic HBEC, and the nonasthmatic HBEC. We chose the PDFs at three different times from Figs. 3 (a) and (c) in Ref. [10]. We fit Eq. (7) with the data to obtain α and present their values in Table I; the corresponding fits for the MDCK cells are shown in Fig. 3 (a) (see SM, Fig. S8 for the other fits). Table I shows that α increases with maturation. Thus, progressive maturation can be interpreted as an increase in either λ P or 1/T or both. The PDFs for r s corresponding to the MDCK cells are shown in the inset of Fig. 3(a) together with one set of experimental data [10]; note that this is not a fit, yet the theory agrees remarkably well with the data. We next calculate sd as a function ofr using the values of α, noted in Table I for the three systems. They are shown in Fig. 3(b) along with the theory prediction. With maturation, the state points move towards lower r, represented by the arrow in Fig. 3(b). As shown in Fig. 2(g), larger α corresponds to a system with higher τ . Thus, with maturation, as the PDFs become sharply peaked, as the cells look more roundish andr becomes smaller, the system becomes more sluggish. This effect of maturation is the same in all the systems (Fig. 3b) and agrees with the interpretation presented in Ref. [10]. We have also examined that the theoretical prediction of sd vsr agrees well with the experimentally measured values shown in Ref. [10].
Our theory predicts that the PDF for r s , though not strictly universal, should be almost the same for different systems (Fig. 1c). This prediction is a consequence of a crucial aspect of the theory: all the system-specific details enter via a single parameter, α in Eq. (7). As shown in Fig. 1(d),r + 1/r deviates slightly from the behavior 1/α. This slight deviation implies that the PDF for r s can not be strictly universal and manifests as a variation in k when the PDF is fitted with the k-Gamma function in different experiments and simulations [10,33,34,59,62]. Nevertheless, since the deviation in Fig. 1(d) is very weak, the values of k are very close to each other. Therefore, the PDFs for r s for diverse epithelial systems-in experiments, simulations, and theoryshould be virtually universal. To test this prediction, we have collected existing experimental and simulation data on different systems and show the PDFs of r s on the same plot in Fig. 3(c). The variety in our chosen set is spectacular: it consists of different cancer cell lines [33], both asthmatic and non-asthmatic HBEC cells, MDCK cells [10], Drosophila wing disk [62], simulations data on both active [59] and equilibrium versions of the VM, the active Voronoi model [10], and the CPM. Yet, the PDFs, as shown in Fig. 3(c), look virtually universal and in agreement with our analytical theory. Additionally, our theory predicts a strictly universal sd vsr. Since this relation does not have any systemspecific details, data across diverse confluent monolayers must follow a universal relationship. We have collected existing experimental data for several systems: cancerous cell lines [33], human breast cancer cells [35], and a jammed epithelial monolayer of MDCK cells [61]. Figure  3(d) shows the experimental data together with our theoretical prediction; the agreement with our theory, along with the aspect of universality, is truly remarkable. As α increases, dynamics slows down, and the points on this plot move towards lowerr. This result is consistent with the finding that cell shapes are more elongated and variable as the dynamics become faster in different epithelial systems [8,10].
We have argued that simultaneous measurements of the PDFs of cell area and AR distinguish the effects of maturation on the two key parameters: λ P and T . The argument relies on the negligible effect of the perimeter constraint on µ (Eq. 8). We now show a comparison of our theoretical result for the PDF of a with existing experiments. Figure 3(e) shows experimental data for four different systems [36,51] and the corresponding fits of Eq. (8). Unlike what has been proposed, that epithelial monolayers have a universal area distribution [36], we find, in agreement with experiments, the distribution can vary, although the functional form remains the same.
What are the implications of these universal aspects of cell shape variability and our theory? Cell shape controls several crucial biological functions such as the mitoticorientation [3,15,16] and cell fate [17,18,20]. Our theory shows that the microscopic system properties are encoded via a single parameter, α. Consequently, knowledge of one of the observables, such asr, contains the information of the entire statistical properties in a monolayer. We now illustrate this predictive aspect of the theory. Experimental measurement of an average property is usually less complex and more reliable. We have collected the data forr from the supplementary material of Ref. [33] for three different systems: 10Ca1A, 10AT, and 10A.ErbB2, shown by the hexagons in Fig. 3(d). From these average values, we obtain α, which we use to theoretically calculate the PDFs for r, as shown in Fig. 3(f). The inset of Fig. 3(f) shows our theoretical PDFs for r s , together with the corresponding experimental data for comparison. The excellent agreement demonstrates that cell shape variability results from the geometric constraint imposed by the energy function, Eq. (1), and is not a choice but inevitable for such systems. This result, we believe, will foster analysis of diverse epithelial systems to understand the interrelation between geometric properties and biological functions within a unified framework.

Discussion
We have obtained a mean-field theory for cell shape variability through the energy function H, Eq. (1) [13,[26][27][28]37]. The geometric restriction of confluency is a strong constraint on cell area. Considering that it is satisfied and that the cell cortex, described by the perimeter term, is crucial in determining the cell shape, allowed us to ignore the area term and obtain the distribution for AR. Recent experiments and simulations have shown that cell shape variability is virtually universal in a confluent epithelial monolayer [10,[33][34][35]; our work provides the theoretical basis for such behavior. Contrary to common beliefs [10,33], the jamming transition is not related to the cell shape distribution. We have analytically obtained the functional form for cell shape variability and find that the microscopic system properties enter the distribution via a single parameter, α: this leads to the universal behavior for sd vsr and a virtually universal distribution for the scaled aspect ratio r s . Our theoretical results are verified in simulations within the CPM, on the square and hexagonal lattices, and the VM. They also agree remarkably well with the existing experiments. Thus far, the PDF of r s has been fitted with a k-Gamma function with k being around 2.5. We show that the slight variation in k comes from the fact that the PDF is not strictly but almost universal. On the other hand, k 2.5 is a direct consequence of a mathematical property: the lowest degeneracy of the eigenvalues of the connectivity matrix being two for a closed-looped object, here the perimeter.
A better understanding of the connection between the theoretical parameters and different system properties is crucial to exploit the universal aspects for deeper insights. Since all of the parameters combine into α, the effects of changing physical conditions on the individual parameters are difficult to determine from the measurements of AR alone. Within our theory, λ P describes the cortical properties, and T parameterizes different biological activities, including temperature. These are effective parameters, and their direct measurement in biological systems is impractical. Our theory provides an indirect way to estimate these parameters. The cell area in a confluent system is geometrically constrained. We have used the phenomenological implementation of the constraint of confluency and obtained the PDF for a [57]. It is a Gamma function, described by a single parameter µ. Reference [36] has proposed that the PDF of a is universal in various epithelial monolayers. However, we show that though the area distribution follows the same function, there is a variation in µ. Our work connects µ to the microscopic model parameters of Eq. 1. In particular, µ should be independent of λ P , whereas α varies linearly with both λ P and 1/T . This distinction, assuming λ A remains constant, allows inferring the effects of maturation on the individual model parameters. We have neglected cell division, growth, and apoptosis in our theory. However, the consequences of these processes, and the effects of chemical gradients [63], should also enter the distribution via α, as the algebraic part comes from the topological property that remains the same. We have included them in our simulations, and the results support this expectation (SM, Sec. S7); this implies that the main predictions of the theory remain valid even in the presence of these processes.
Our work demonstrates that a single parameter describes both the cell shape statistics and the dynamics. We have shown in our simulations that the relaxation time, τ , grows as α increases orr decreases. Experiments on confluent cellular monolayer have also reported similar results [8,10,33]. Thus, as cells become more compact and their shape variability reduces, the monolayer becomes more sluggish. This result may have farreaching consequences. Most experiments usually measurer and analyze biological functions viar [3,15]. Our theory implies that such knowledge contains a wealth of information. One can obtain α fromr, and all other properties, such as the distribution, the standard deviation, and the dynamics, can be analytically calculated.
Whether this correlation between the statics and dynamics persists in all regimes, including the glassy-regime, remains an open question. Additionally, in biology, it is well-recognized that different levels of organizations are mechanistically related. One fundamental open question is how molecular-level events are related to cellular machines that control the cell shape [64]. The striking predictability, demonstrated by our theory, wherer determines the PDF, shows that the statistical distribution of cell shape is unavoidable. How do different cells respond to this inevitable distribution? Is cellular response similar across diverse systems? How is it related to organ-level morphogenesis? Having a single parameter that describes the static and dynamic aspects at the cellular level should help to compare and analyze different systems and answer these questions.
In conclusion, we have developed a mean-field theory, considering cortical contractility and adhesion are crucial in determining the shape. We have analytically derived the cell shape variability, characterized via the AR. The PDF for AR is described by a single parameter, α. As a result, sd vsr becomes universal, and the PDF for the scaled aspect ratio, r s , is virtually universal. The analytical form for the PDF of r s can be roughly approximated as a k-Gamma distribution [30] that has been fitted with the existing experimental data [10,33]; however, the distribution is unrelated to the jamming transition, in contrast to the general belief. The near-universal value of k ∼ 2.5 is the consequence of a mathematical property, and the variation results from the fact that the PDFs are not strictly universal. α can also provide information on dynamics. In addition, simultaneous measurements of the PDFs of a and r can relate the model parameters with the changing physical properties. Such a relationship should further strengthen the link between the energy function, Eq. (1), and an epithelial monolayer. Having a single parameter for the statistical and dynamical aspects of an epithelial monolayer should foster a detailed comparison of diverse systems, help to analyze the relation of biological functions to shapes, and illuminate the mechanisms of cellular response to the inevitable shape variability. In this supplementary material, we first provide the details of the analytical derivation of the probability distribution functions (PDFs) of the aspect ratio (AR) in Sec. S1, and the scaled area, a, in Sec. S2. We next provide the details of the simulations in Sec. S3, and explain the procedure to calculate the AR in our simulations in Sec. S4. We then present data of the CPM on the hexagonal lattice in Sec. S5, and show additional data in Sec. S6, supporting the claim that AR does not depend on λ A . Section S7 shows simulation results when we include cell division and apoptosis. We present the PDF of a within the vertex model in Sec. S8, show the dependence of the parameter α on λ P and T in Sec. S9. The T -dependence of µ is shown in Sec. S10 and, finally, the fits with the experimental data of HBEC cells are shown in Sec. S11.

S1. Details of the derivation
As shown in the main text, defining x = {x 1 1 , x 2 1 , x 1 2 , x 2 2 , . . . x 1 n , x 2 n } as a particular configuration of the perimeter of a specific cell, we can write the energy in units of k B T for this cell as where k B T is Boltzmann constant times temperature, γ = νλ P (1 − KP 0 )/k B T , K is the ndimensional Kirchoff's matrix with K ii = 2 and K (i−1)i = K i(i−1) = −1. I 2 is the two-dimensional identity tensor and ⊗ denotes the tensor product. x is a column vector, the transpose of x. Then, the distribution of the radius of gyration [1,2] can be written as where the volume elementẋ is defined asẋ = 2 α=1 n j=1 dx α j . The first δ-function in Eq. (S2) coincides center of mass with the origin of the coordinate system. The squared radius of gyration is s 2 = n −1 xx and the second δ-function in Eq. (S2) ensures that the correct values of s are chosen for the distribution. Z is the partition function of the system. Since the radius of gyration does arXiv:2108.03884v1 [cond-mat.soft] 9 Aug 2021 not depend on the coordinate system, we are allowed to chose one that diagonalizes K. Say the diagonal matrix is Λ, and q represents the normal coordinates in this system.
The radius of gyration can be defined as the root-mean-square distance of different parts of a system either from its center of mass or around a given axis. We have designated the former as s, defined as where N is the total volume element in the system with coordinates x i , and x CM is the center of mass (CoM) of the system. The other two radii of gyration can be defined around the two principal axes (since we are in spatial dimension two) passing through the CoM. We calculate these two radii of gyration by writing the inertia tensor in a coordinate system whose origin coincides with the CoM and diagonalizing the tensor. The eigenvalues, Λ 1 and Λ 2 , give the square of the radii of gyration. Thus, the aspect ratio, r, is obtained as Due to the anisotropic nature of Λ 1 and Λ 2 , a direct calculation for their distributions is more complex than that of s. We first calculate the distribution for s and then, using this result, obtain the distribution of r.
Equation (S2) can be written in the normal coordinate system as where we have used q α n ∝ x α j that corresponds to the zero-eigenvalue mode of the matrix. Integrating over q α n , we get rid of this zero-eigenvalue that gives translation. Thus, where we have defined q 0 as the 2(n − 1) dimensional vector excluding the coordinates corresponding to the zero-eigenvalue. The normalization factor, Z, can be calculated exactly through the integration as Note that the integration in the calculation of P (s 2 ) is around the boundary of the cell; to separate out the radial part, we now write the volume element in polar coordinate u: q 0 = n 1/2 su. Theṅ q 0 = n (n−1) s 2(n−1)−1 dsu and q 0 q 0 /ns 2 = uu = 1. Thus, we obtain from Eq. (S5) with A = γ π (n−1) |Λ 0 | 1 2π n (n−1) s 2(n−1)−1 . I n−1 is the identity matrix of rank n − 1. Carrying out the integration over u, we obtain Using the value of A, we obtain where λ j 's are the eigenvalues of K. The integral in Eq. (S9) can be performed via the contour integral and the resultant solution can be written as where λ k are the distinct eigenvalues of K and Res(λ k ) gives the residue at the pole λ k . As we show below, the residues will have a term exp[−nγs 2 λ k ] and in the limit s 2 → ∞, only the smallest λ k will contribute.
Since the cell perimeter must be closed-looped, K is a tridiagonal matrix with periodicity.
Therefore, the number of zero-eigenvalue must be one, and the lowest degeneracy of the non-zero eigenvalues must be two [1][2][3][4]. We have already integrated out the coordinate corresponding to the zero-eigenvalue. Let us designate the lowest non-zero eigenvalue as λ. The pole corresponding to λ is located at β = −iγns 2 λ, and of order 2. Thus, we obtain the residue as (S11) Let's first take the derivative, with respect to β, of the numerator and write part of the residue as term1 = −i e −γns 2 λ (γns 2 ) n−3 n−3 j=1 (λ j − λ) . (S12) Next, differentiating the denominator, we obtain the other part of the residue as A comparison of term1 and term2, given by Eqs. (S12) and (S13), respectively, shows that there is an extra factor of s 2 in the denominator of term2. Thus, term2 can be ignored compared to term1. Therefore, we obtain the distribution function for s 2 as where C is the normalization constant that we will fix later, andα = γnλ. Note that the lowest eigenvalue for the n-dimensional Kirchoff's matrix is proportional to 1/n, thus nλ ∼ O(1).
Now, s 2 = Λ 1 + Λ 2 and the aspect ratio where A is the average area. Since the distribution of cell area is sharply peaked (Fig. 2e in the main text), as the cell division and apoptosis are slow processes, A can be taken as a constant.
Therefore, using the last two relations in the first, we obtain s 2 = A(r + 1/r). Thus, we obtain from Eq. (S14), the distribution of r as where the condition that total probability must be unity sets the normalization constant N as where, W (x) = 2 x /x and 1 F 1 (a, b, c) is the Kummer's confluent Hypergeometric function [5]. Using the value of γ, we obtain α ∝ λ P (1 − KP 0 )/T . For the analysis of the theory, a symbolic software, such as "Mathematica" [6] is helpful. We note here that the relation between sd andr is quite complex, though for most practical purposes it can be taken as a straight line: sd 0.71r − 0.75.

S2. Distribution for area
Equation (S14) gives the distribution of the radius of gyration of different cells in a monolayer.
Using this equation, we now derive the distribution of area, A. Since, AR and A can vary independently, we can use s 2 ∝ A, as we are interested in the functional form. Therefore, Eq. (S14) gives where β is a constant related toα. Note that we ignored the area term in the energy function, H, in the derivation of Eq. (S14). In a confluent cellular monolayer, the individual cell area must obey the strong geometric constraint of confluency; thus, it becomes crucial for the area distribution. However, mathematically imposing this constraint is a challenging geometric problem for random patterns, and no exact result exists yet. Weaire et al [7] proposed a phenomenological implementation of this constraint as a polynomial in the area; this remains one of the simplest possible ways to date to deal with this constraint [8]. Keeping only one term of this polynomial for simplicity, we can write this constraint as f (A) ∼ A ν [7,8]. Using this in Eq. (S17), we obtain where we have defined µ = ν + 5/2. From Eq. (S18), we obtain the average area,Ā = µ/β. Thus, we obtain the normalized distribution for the scaled area, a = A/Ā, as this is the well-known k-Gamma function, defined in Ref. [9], and usually denoted with the variable k. This same function has been used in fitting the scaled AR, r s , data in different existing experiments and simulations. Therefore, to avoid confusion with k, which is obtained fitting the r s data, we have used µ to define the distribution function for a. Since µ comes from the constraint of confluency, it should be independent of λ P . Our simulation results within both the CPM and the VM (Fig. 2(f), in the main text) support this hypothesis.

S3. Simulation details
We have verified our theory via simulations of two distinct models: the CPM and the VM. In both the simulations, we have used the energy function H, given in Eq. (1) in the main text. The CPM is a lattice-based model. The underlying lattice has some effects on the quantitative aspects of the model; for example, on the square lattice, the object with the minimum possible perimeter for a given area is a square. However, the qualitative behaviors are independent of the lattice. To ascertain that the simulation results are independent of the lattice, we have simulated the CPM on two different lattices: the square lattice and the hexagonal lattice.
For the glassy dynamics, the geometric restriction leads to two different regimes, the low-P 0 , and the large-P 0 regime. However, the large-P 0 regime characterizes quite large adhesion compared to the cortical contractility; this regime, we feel, is not relevant for the experiments. Therefore, we present most results when P 0 is not too large. Here, we briefly discuss the different models. With square lattice : We have implemented CPM with square lattice in Fortran 90 and followed a Connectivity Algorithm developed in [10] to prevent fragmentation of cells [11]. We have chosen a simulation box of size 120×120 with 360 total cells in the system. The average area of cells in the system is 40, and the minimum possible perimeter on a square lattice with this area is 26. Unless otherwise specified, we always start with an initial configuration where each cell is rectangular with a size 5 × 8. We first equilibrate the system for 8 × 10 5 MC time steps before collecting the data of aspect ratio. The variables that characterize the system are λ A , λ P , P 0 , and T . We have ignored cell division and apoptosis, as discussed in the main text. However, to test the effects of these processes, we have included them within this model and present a specific set of results in Sec. S7.

Cellular
To obtain the relaxation time, τ , we have calculated the overlap function, Q(t), defined as where X σ cm (t) is the center of mass at time t of a cell with index σ, W (x) is a Heaviside step function . . . t 0 denotes averaging over initial times and the overline implies ensemble average. The parameter a is related to the vibrational motion of the particles (here cells) inside the cage formed by its neighbors. We have set a = 1.12 [11]. We have taken 50 t 0 averaging and 20 configurations for ensemble averaging.
With Hexagonal lattice : To simulate CPM on the hexagonal lattice, we have used an opensource application CompuCell3D [12,13]. CPM is the core of CompuCell3D. We have simulated the 2D confluent cell monolayer with periodic boundary conditions using the same energy function,  [14,15]. Within the Monte-Carlo simulation [16], dynamics proceeds by stochastically updating each vertex position of all the cells by a small amount δr. We accept the move with a probability P (C → C ) = e − ∆H T where we have set Boltzmann constant k B to unity, ∆H is the change in energy going from C → C .
Initially, we start will 1024 cells having equal area (= 1.0) and perimeter (= 3.72). We have equilibrated the system for 2 × 10 6 MC steps before the start of collecting data.

A Typical Cell Configuration
FIG. S1. The process to calculate the AR of a Cell. We first obtain the moment of inertia, I, in a frame of reference whose origin coincides with the center of mass of the cell, then diagonalize I, and calculate the AR, r, as the square root of the ratio of the two eigenvalues.

S4. Procedure to calculate AR in our simulation
As stated in the main text, we characterize the cell shape via AR. To calculate the AR, we follow Ref. [17] and briefly discuss the method here. Consider one particular cell, represented by the set of points {x 1 i , x 2 i }, as schematically shown in Fig. S1. We first calculate the moment of inertia in a frame of reference whose origin coincides with the center of mass (x 1 c , x 2 c ) of the cell. The moment of inertia, I, is a 2 × 2 tensor in spatial dimension two. We next diagonalize I and obtain the two eigenvalues, Λ 1 and Λ 2 . Without loss of generality, we assume that Λ 1 ≥ Λ 2 , and obtain the AR, r = Λ 1 /Λ 2 . This process can be viewed as approximating the cell with an ellipse with the major and minor axes given by √ Λ 1 and √ Λ 2 , respectively. For the CPM on the square lattice

S5. Distribution of AR is independent of lattice type
In the main text, we have presented the results for the CPM on a square lattice and the VM, the latter being a continuum model. The agreements between the two models show that the results are independent of the lattice. As further evidence of this lattice independence, we present the results within the CPM on the square and the hexagonal lattices in Fig. S2. We show that our analytical theory agrees with simulations on both lattices. Figures S2(a) and (b) show representative plots comparing the PDFs of r within the CPM with the square and the hexagonal lattices, respectively. As discussed in the main text, our theory predicts almost universal behavior for the PDFs of r s (Fig. S2 c). This result shows that qualitative behaviors within the CPM are lattice-independent and agree with our analytical theory.

S6. Distribution of AR does not depend on λ A
One of the arguments within our mean-field theory is the following: since AR can be independent of A and the shape should have the dominant contribution from the cortical properties, we can assume that individual cells satisfy the area constraint and ignore the area term in H, Eq. (1) in the main text. We have shown that the area distribution is sharply peaked around the average area ( Fig. 2(e) in the main text, and Fig. S5). We have also shown in the insets of Figs

S7. Distribution of AR including cell division and apoptosis in the CPM simulation
We have ignored cell division and apoptosis in our theory since the rates of these processes are extremely low. However, even when these processes are significant, it is easy to see that their effects can only enter via α. The reason behind this is that the other, algebraic part, comes from the topology of the cell perimeter, that it is a closed-looped object; this property must remain the same even in the presence of these processes. Therefore, all the conclusions of our theory should To test this result, we have included cell division and apoptosis in our simulations. We keep the rate of division, k −1 d , and apoptosis, k −1 a , the same such that the average number of cells remains the same. Every k d time step, we randomly select a cell and divide it into two, with a randomly chosen division plane. To decide the division plane, we first calculate the center of mass (CoM) and then chose a point on the perimeter in a random direction. The line connecting the CoM and this point gives the division plane. The area of these two daughter cells then grows till it becomes of the order of A 0 (Eq. (1) in the main text). To avoid diving a cell that has just undergone division, we impose a cut-off on area (≥ 32) for selecting a cell for division. For apoptosis, every k a time step, we randomly choose a cell and assign it a target area A 0 = 0. We also relax the constraint that does not allow fragmentation in our simulation for the cells undergoing apoptosis.
We show the PDF of r at three representative values of k a in Fig. S4(a), and the corresponding PDFs for r s are shown in Fig. S4(b). k a and k d are presented in units of the inverse of the Monte-Carlo time. We have fitted the PDFs of r with Eq. (6) of the main text, and the values of α at different k a are shown in Fig. S4(c). We emphasize two observations: first, the PDFs of r indeed fit well with Eq. (6) of the main text, and second, that α attains a constant value quite fast with increasing k a , that is, decreasing the rate of division or apoptosis.  S9. Dependence of α on λ P and T within the CPM and the VM In the main text, we have shown the linear dependence of α on λ P /T : within both the CPM and the VM (Fig. 2d in the main text). Here, we show how α varies within both models when we change λ P and T , individually. Figure S6(a) shows the behavior of α as a function of 1/T and Fig.   S6(b) as a function of λ P : α increases almost linearly as 1/T or λ P increases, in agreement with our theory. S10. Dependence of µ on T Figure 2(f) in the main text shows that µ remains almost constant with varying λ P . This behavior is expected since µ comes from the constraint of confluency and should be related to the area, whereas λ P is related to the perimeter constraint. On the other hand, from the same argument, we also expect µ to depend on λ A ; as shown in Fig. 2(e) in the main text, it increases linearly with λ A . In addition, since T controls the fluctuation in the system, we also expect µ to change with T ; this dependence leads to different distributions of a in epithelial systems. We show  Table I. (b) PDF of the scaled AR, r s for the asthmatic HBEC cells. Lines are the theoretical predictions using the values of α obtained from the fits in (a), symbols represent one set of experimental data [17]. (c) and (d) The same as in (a) and (b), but for non-asthmatic HBEC cells.
in Fig. S7 that µ almost linearly decreases as T increases.

S11. Fits with the HBEC data of AR
We have compared our theory for the AR, r, and the scaled AR, r s , with the experimental data for three different systems: MDCK cells, asthmatic HBEC, and non-asthmatic HBEC cells.
These data are taken from Ref. [17], where PDFs of r and r s were presented as a function of maturation. As discussed in the main text, we have selected the data at three different times, and the comparison for the MDCK cells are shown in Fig. 3(a) in the main text. Here, in Fig. S8, we present the comparisons for the asthmatic and non-asthmatic HBEC cells. We have collected only one set of experimental data for the PDF of r s in both the asthmatic and non-asthmatic systems, as all the data nearly overlap. The experimental data from different papers are collected using the