Automated model discovery for human brain using Constitutive Artificial Neural Networks

The brain is our softest and most vulnerable organ, and understanding its physics is a challenging but significant task. Massive efforts have been dedicated at testing the human brain, and various competing models have emerged to characterize its response to mechanical loading. However, selecting the best constitutive model remains a heuristic process that strongly depends on user experience and personal preference. Here we challenge the conventional wisdom to first select a constitutive model and then fit its parameters to experimental data. Instead, we propose a new strategy that simultaneously discovers both model and parameters that best describe the data. Towards this goal, we integrate more than a century of knowledge in thermodynamics and state-of-the-art machine learning to build a family of Constitutive Artificial Neural Networks that enable automated model discovery for human brain tissue. Our overall design paradigm is to reverse engineer a Constitutive Artificial Neural Network from a set of functional building blocks that are, by design, a generalization of widely used and commonly accepted constitutive models, including the neo Hooke, Blatz Ko, Mooney Rivlin, Demiray, Gent, and Holzapfel models. By constraining the input, output, activation functions, and architecture, our network a priori satisfies thermodynamic consistency, material objectivity, material symmetry, physical constrains, and polyconvexity. We demonstrate that our network autonomously discovers both model and parameters that best characterize the behavior of human gray and white matter under tension, compression, and shear. Importantly, our network weights translate naturally into physically meaningful material parameters, e.g., shear moduli of 1.82kPa, 0.88kPa, 0.94kPa, and 0.54kPa for the cortex, basal ganglia, corona radiata, and corpus callosum. Our results suggest that Constitutive Artificial Neural Networks have the potential to induce a paradigm shift in soft tissue modeling, from user-defined model selection to automated model discovery. Our source code, data, and examples are available at https://github.com/LivingMatterLab/CANN.


Motivation
Traumatic brain injury is a major cause of death and disability worldwide [21], with a global annual incidence of 69 million [18]. In the United States alone, 176 people die each day from traumatic brain injury, and every nine seconds, someone sustains a new injury to the brain. Fortunately, not all concussions are life-threatening; yet, more than 5 million Americans are living with brain-injury-related disabilities and need long-term assistance in their everyday life [14]. Without a doubt, understanding the mechanics of the brain during traumatic brain injury is a challenging but significant task [25]. Throughout the past decade, scientists across the world have made significant strides in testing, modeling, and simulating the human brain [5, 10-12, 31, 49, 51, 61, 62]. However, because of its ultrasoft nature, the results vary greatly, both qualitatively and quantitatively [13]. This has resulted in a wide selection of competing constitutive models for gray and white matter tissue, without any real guidance of which model to choose [11,41,42,45]. Throughout this manuscript, we ask how we can select the best constitutive model for the human brain, whether the current existing models are really the best, and if not, how we can systematically search and find a better model.
In machine learning, the process of finding relationships in complex data is known as automated model discovery [4,20,39]. The preface automated implies that model discovery can be performed entirely without human interaction [9,54]. Neural networks have emerged as a powerful strategy to discover constitutive models from large data, even in the complete absence of knowledge about the underlying physics [30]. However, classical neural networks ignore more than a century of research in constitutive modeling [1]: They violate thermodynamic constraints [40], neglect generally accepted physical principles [3], and fail to predict the behavior outside the training regime [39]. In essence, neural networks perform excellently at fitting a complex function to big data, but they are not interpretable; they teach us nothing about the underlying physics [50]. So, really, what we are looking for is a strategy to autonomously discover a physically motivated model.
Two successful but fundamentally different strategies have emerged to integrate physics into neural network models: Physics-Informed Neural Networks that add physics-based equations as additional terms to the loss function [33] and Constitutive Artificial Neural Networks that explicitly modify the network input, output, and architecture to hardwire physics-based constraints into the network design [36]. The first type of networks is more general and works well for ordinary [37] or partial [50] differential equations, whereas the second type is specifically tailored towards constitutive equations [38]. Constitutive Artificial Neural Networks, with strain invariants as input and free energy functions as output, were first proposed for rubber-like materials almost two decades ago [56], and have recently regained attention in the constitutive modeling community [3,24,34,63]. They are now also increasingly recognized in the soft tissue biomechanics community with applications to skin [57], blood clots [32], arteries [29,38], and myocardial tissue [32]. A common feature of all these neural networks is to use multiple hidden layers, generic activation functions, and several hundreds, if not thousands of unknowns. To no surprise, they perform well at interpolating non-linear stress-stretch relations from tension, compression, or shear experiments. However, one critical limitation remains: the lack of an intuitive interpretation of the model and its parameters [34].
Here, instead of using a generic neural network architecture, we reverse-engineer a new family of Constitutive Artificial Neural Networks from constitutive building blocks that are, by design, a generalization of widely used and commonly accepted constitutive models, including the neo Hooke [60], Blatz Ko [8], Mooney Rivlin [44,52], Demiray [17], Gent [22], and Holzapfel [27] models. As such, their network weights naturally translate into material parameters with standard physical units and a clear physical interpretation [39]. We train our network with tension, compression, and shear tests from the human cortex, basal ganglia, corona radiata, and corpus callosum [11][12][13] and demonstrate that it can simultaneously discover both model and parameters that best describe the data. Beyond automated model discovery, we show that we can also use our network for the parameter identification of existing constitutive models. By systematically comparing the goodness of fit of the different models, trained with the different experiments, we not only discover the model that best describes the experiments, but we also discover the experiment that best informs the models. Designing informative experiments is particularly significant for human brain tissue, for which fresh samples are rare, challenging to preserve, and difficult to mount and test [13].

Kinematics
To characterize the deformation of the sample, we introduce the deformation map ϕ that maps material particles X from the undeformed configuration to particles, x = ϕ(X), in the deformed configuration [2]. We describe relative deformations within the sample using the deformation gradient F, the gradient of the deformation map ϕ with respect to the undeformed coordinates X, and its Jacobian J, Multiplying F with its transpose F t introduces the symmetric right Cauchy Green deformation tensor C, In the undeformed state, both tensors are identical to the unit tensor, F = I and C = I, and the Jacobian is one, J = 1. A Jacobian smaller than one, 0 < J < 1, denotes compression and a Jacobian larger than one, 1 < J, denotes extension.

Isotropy.
To characterize an isotropic material, we introduce the three principal invariants I 1 , I 2 , I 3 and their derivatives ∂ F I 1 , ∂ F I 2 , ∂ F I 3 , In the undeformed state, F = I, the three invariants are equal to three and one, I 1 = 3, I 2 = 3, and I 3 = 1.
Perfect incompressibility. For isotropic, perfectly incompressible materials, the third invariant always remains identical to one, I 3 = J 2 = 1. This reduces the set of invariants to two, I 1 and I 2 .

Constitutive equations
In solid mechanics, constitutive equations are tensor-valued tensor functions that define the relation between a stress, for example the Piola or nominal stress, P = lim dA→0 ( d f /dA ), the force d f per undeformed area dA, and a deformation measure, for example the deformation gradient F [28,58], At this point, we could use an arbitrary neural network to learn the functional relation between P and F and many neural networks in the literature do exactly that [23,40,55]. However, the functions P(F) that we learn through this approach generally violate widely-accepted thermodynamical constraints and their parameters have no physical meaning [24]. For moderate amounts of data, standard neural networks are also associated with a high risk of overfitting [34]. Our objective is therefore to build a Constitutive Artificial Neural Network that a priori satisfies thermodynamic constraints and introduces parameters with a clear physical interpretation, while, at the same time, limiting the space of admissible functions to prevent overfitting when available data are sparse.
Thermodynamic consistency. First, we ensure thermodynamical consistency and guarantee that the Piola stress P inherently satisfies the second law of thermodynamics, the dissipation inequality [48], D = P :Ḟ −ψ(F) ≥ 0, where D is the dissipation and ψ is the Helmholtz free energy witḣ ψ = ∂ψ(F)/∂F :Ḟ. For hyperelastic or Green-elastic materials with D . = 0, the entropy inequality defines the Piola stress [59], This implies that, rather than approximating the nine stress components P(F) as nine generic functions of the nine components of the deformation gradient F, the network approximates the free energy function ψ(F) from which we derive the stress P in a post-processing step. Satisfying thermodynamic consistency according to equation (5) directly affects the output of the neural network.
Material objectivity and frame indifference. Second, we constrain the choice of the free energy function ψ to satisfy material objectivity or frame indifference and ensure that the constitutive laws do not depend on the external frame of reference [46]. This implies that the arguments of the free energy function are independent of rotations and must be functions of the right Cauchy Green deformation tensor C [58], This implies that, rather than using the nine independent components of the deformation gradient F as input, we constrain the input to the six independent components of the symmetric right Cauchy Green deformation tensor, C = F t · F. Satisfying material objectivity according to equation (6) directly affects the input of the neural network.
Material symmetry and isotropy. Third, we can further constrain the choice of the free energy function ψ to include material symmetry and assume that the material response remains unchanged under transformations of the reference configuration. Here we consider the special case of isotropy, for which the free energy function is a function of the strain invariants, ψ(I 1 , I 2 , I 3 ), and the Piola stress takes the following explicit representation, This implies that, rather than using the six independent components of the symmetric right Cauchy Green deformation tensor C as input, we constrain the input to our set of three invariants I 1 , I 2 , I 3 . Considering materials with known symmetry classes according to equations (7) directly affects, and ideally reduces, the input of the neural network.
Perfect incompressibility. Fourth, we can further constrain the choice of the free energy function ψ for the special case of perfect incompressibility for which the Jacobian remains constant and equal to one, I 3 = J 2 = 1. The condition of perfect incompressibility implies that equation (7) simplifies to an expression in terms of only the first two invariants I 1 and I 2 , supplemented by a term of the hydrostatic pressure, p = − 1 3 P : F, that we determine from the boundary conditions, This implies that, rather than using the set of three invariants, I 1 , I 2 , I 3 , as input, we reduce the input to a set of only two invariants, I 1 and I 2 . Considering materials with perfect incompressibility according to equation (8) reduces the input of the neural network.
Physically reasonable constitutive restrictions. Fifth, we can further constrain the functional form of the free energy ψ by including additional constitutive restrictions that are both physically reasonable and mathematically convenient [2]: (i) The free energy ψ is non-negative for all deformation states F, (ii) The free energy ψ and the stress P are zero in the reference configuration, F = I, (iii) The free energy ψ is infinite for infinite compression, J → 0, and infinite expansion, J → ∞, To facilitate a stress-free reference configuration according to equation (10), instead of using the invariants I 1 , I 2 , I 3 themselves as input, we use their deviation from the energy-and stress-free reference state, , as input. In addition, from all possible activation functions, we select activation functions that a priori comply with conditions (i), (ii), and (iii). Satisfying physical considerations according to equations (9), (10), and (11) directly affects the activation functions of the neural network.
Polyconvexity. Sixth, to guide the selection of the functional forms for the free energy function ψ, and ultimately the selection of appropriate activation functions, we consider polyconvexity requirements [6]. From the general representation theorem we know that in its most generic form, the free energy of an isotropic material can be expressed as an infinite series of products of powers of the invariants [53], ψ(I 1 , I 2 , where a jkl are material constants. Importantly, mixed products of convex functions are generally not convex, and it is easier to show that the sum of specific convex subfunction usually is [26]. This motivates a special subclass of free energy functions in which the free energy is the sum of three individual polyconvex subfunctions ψ 1 , ψ 2 , ψ 3 , such that ψ(F) = ψ 1 (I 1 ) + ψ 2 (I 2 ) + ψ 3 (I 3 ), is polyconvex by design and the stresses take the following form, This implies that we can either select polyconvex activation functions from a set of algorithmically predefined activation functions [39], or custom-design our own activations functions from known polyconvex subfunctions ψ 1 , ψ 2 , ψ 3 [3]. Here we select first and second powers of the invariants for the first hidden layer and linear, exponential, and logarithmic functions of these powers for the second hidden layer, all with non-negative coefficients. In addition, we abandon the fully-connected network architecture, in which mixed products of the invariants I 1 , I 2 , I 3 emerge naturally. Instead, we decouple the inputs I 1 , I 2 , I 3 and only combine them additively in the free energy function, ψ = ψ 1 + ψ 2 + ψ 3 . Satisfying polyconvexity, for example according to equation (12), can imply enforcing non-negative network weights [3], and directly affects the architecture of the neural network [34].

Constitutive Artificial Neural Networks
Motivated by these considerations, we build a family of Constitutive Artificial Neural Networks that satisfy the conditions of thermodynamic consistency, material objectivity, material symmetry, incompressibility, constitutive restrictions, and polyconvexity by design. This guides our selection of network input, output, architecture, and activation functions to a priori satisfy the fundamental laws of physics. Special members of this family represent well-known constitutive models, including the neo Hooke [60], Blatz Ko [8], Mooney Rivlin [44,52], Demiray [17], Gent [22], and Holzapfel [27] models, for which the network weights gain a clear physical interpretation.

Constitutive Artificial Neural Network input and output.
To ensure thermodynamical consistency, rather than directly approximating the stress P as a function of the deformation gradient F, the Constitutive Artificial Neural Network approximates the scalar-valued free energy function ψ as a function of the scalar-valued invariants I 1 , I 2 , I 3 . The Piola stress P then follows naturally from the second law of thermodynamics as the derivative of the free energy ψ with respect to the deformation gradient F according to equation (7). Figure 1 illustrates a Constitutive Artificial Neural Network with the invariants I 1 , I 2 , I 3 as input and the the free energy ψ as output.
Constitutive Artificial Neural Network architecture. To model a hyperelastic history-independent material, we select a feed forward architecture in which information only moves in one direction, from the input nodes, without any cycles or loops, to the output nodes. To ensure polyconvex-ity, we choose a selectively connected architecture according to equation (12), such that the free energy function does not contain mixed terms in the invariants. Figure 1 illustrates one possible network architecture that a priori decouples the individual invariants. Its free energy function,

Activation functions.
To ensure that our network satisfies basic physically reasonable constitutive restrictions, rather than selecting from a set of pre-defined activation functions such as the binary step, soft step, hyperbolic tangent, inverse tangent, or soft plus functions, we custom-design our own activation functions to reverse-engineer a free energy function that captures popular forms of constitutive terms. Specifically, we select linear and quadratic powers of the first and second invariants for the first layer of the network, and linear, exponential, or logarithmic functions for the second layer.
Using the second law of thermodynamics, we can derive an explicit expression for the Piola stress from equation (5), P = ∂ψ/∂F, or, more specifically, for the case of perfect incompressibility from equation (8), corrected by the pressure term, − 1 3 P : F. The stress definition (15) suggests that our model is a generalization of many popular constitutive models for incompressible hyperelastic materials. It seems natural to ask whether and how its network parameters w 1..2,1..12 relate to the parameters of these models.

Special types of constitutive equations.
To demonstrate that the family of Constitutive Artificial Neural Networks in Figure 1 and its specific example in Figure 3 are generalizations of popular constitutive models, we consider six widely used models and systematically compare their material parameters to our network weights w 1..2,1.. 12 : The neo Hooke model [60], the simplest of all models, has a free energy function that is a constant function of only the first invariant, [ I 1 − 3 ], scaled by the shear modulus µ, The Blatz Ko model [8], has a free energy function that depends only the second and third invari- . For perfectly incompressible materials, I 3 = 1, it simplifies to the following form, The Mooney Rivlin model [44,52] is a combination of both and accounts for the first and second invariants, , scaled by the moduli µ 1 and µ 2 that sum up to the overall shear modulus, µ = µ 1 + µ 2 , The Demiray model [17] or Delfino model [16] uses linear exponentials of the first invariant, [I 1 − 3], in terms of two parameters a and b, The Gent model [22] uses linear logarithms of the first invariant, [I 1 − 3], in terms of two parameters α and β, The Holzapfel model [27] uses quadratic exponentials, typically of the fourth invariant, which we adapt here for the the first invariant, [ I 1 − 1 ], in terms of two parameters a and b, These simple examples demonstrate that we can recover popular constitutive functions for which the network weights gain a well-defined physical meaning.
Loss function. The objective of our Constitutive Artificial Neural Network is to learn the network parameters θ = {w ij } , the network weights, by minimizing a loss function L that penalizes the error between model and data. Similar to classical neural networks, we characterize this error as the mean squared error, the L 2 -norm of the difference between model P(F i ) and dataP i , divided by the number of training points n trn , To reduce potential overfitting, we also study the effects of Lasso or L1 regularization and L2 regularization, by enhancing the loss function by the weighted L1 norm, where α 1 and α 2 are the weighting coefficients. We train the network by minimizing the loss function (22) or (23) and learn the network parameters θ = {w ij } using the ADAM optimizer, a robust adaptive algorithm for gradient-based first-order optimization, and constrain the weights to always remain non-negative, w ij ≥ 0.

Data
To illustrate the automated model discovery with our Constitutive Artificial Neural Network, we perform a systematic study using widely-used benchmark data for human brain tissue [11][12][13]. Specifically, we train our two-layer Constitutive Artificial Neural Network for isotropic, perfectly incompressible materials from Figure 3, discover a material model and its parameters, and compare the model and parameters against six traditional constitutive models for soft biological tissues [8,17,22,27,44,52,60]. We consider two training scenarios, single-mode training and multi-mode training, for the homogeneous deformation modes of uniaxial tension, uniaxial compression, and simple shear.

Tension and compression.
For the case of uniaxial tension and compression, we stretch the specimen in one direction, F 11 = λ 1 = λ. For an isotropic, perfectly incompressible material with I 3 = λ 2 1 λ 2 2 λ 2 3 = 1, the stretches orthogonal to the loading direction are identical and equal to the square root of the stretch, F 22 = λ 2 = λ −1/2 and F 33 = λ 3 = λ −1/2 . From the resulting deformation gradient, F = diag { λ, λ −1/2 , λ −1/2 }, we calculate the first and second invariants and their derivatives, to evaluate the nominal uniaxial stress P 11 using the general stress-stretch relationship for perfectly incompressible materials, Here, p denotes the hydrostatic pressure that we determine from the zero stress condition in the transverse directions, P 22 = 0 and P 33 = 0, as p = [2/λ] ∂ψ/∂I 1 + [2λ + 2/λ 2 ] ∂ψ/∂I 2 . This results in the following explicit uniaxial stress-stretch relation for perfectly incompressible, isotropic materials, Shear. For the case of simple shear, we shear the specimen in one direction, F 12 = γ. For an isotropic, perfectly incompressible material with F 11 = F 22 = F 33 = 1, we calculate the first and second invariants and their derivatives, to evalute the nominal shear stress P 12 using the general stress-stretch relationship for perfectly incompressible materials. This results in the following explicit shear stress-strain relation for perfectly incompressible, isotropic materials,  Testing and training data. Tables 1 and 2 summarize our benchmark data of gray matter tissue from the cortex and basal ganglia and white matter tissue from the corona radiata and corpus callosum tested in tension, compression, and shear [11][12][13]. We report each data set as 17 pairs of stretches and nominal uniaxial stresses, {λ, P 11 } or shear strains and nominal shear stresses, {γ, P 12 }, where the stretches and shear strains range from 0.9 ≤ λ ≤ 1.1 and 0.0 ≤ γ ≤ 0.2, and the stresses are the means of the loading and unloading curves of n samples. Throughout the remainder of this study, we perform single-mode training with one mode used as training data and the remaining two modes as test data, and multi-mode training with all three modes used as training data. Table 3 and Figure 4 summarize and illustrate the discovered models for the human cortex for the tension, compression, and shear data from Table 1 using the isotropic, perfectly incompressible Constitutive Artificial Neural Network with two hidden layers from Figure 3. Table 3 summarizes the 24 weights w 1:2,1:12 and the goodness of fit R 2 for single-mode training with the three individual modes and for multi-mode training with all three modes combined. Second, for single-mode training, we observe that the network performs moderately at extrapolating or predicting data outside the training regime: The network parameters trained individually for each mode do not predict the other modes well, with R 2 test values ranging from 0.00 for the tension prediction with compression training to 0.93 for the shear prediction with tension training. Third, for multi-mode training with all data sets combined, we find that the goodness of fit R 2 train of the individual tests decreases to 0.36, 0.90, and 0.99, while the sum of the three R 2 values, the collective fit of all three tests, increases. Fourth, for single-mode training, all twelve terms of the model are activated as indicated through the full color spectrum in the first three columns. At the same time, for for multi-mode training, the Constitutive Artificial Neural Network discovers a model with only four terms, while the weights of the other terms train to zero. Strikingly, these four terms are all functions of the second invariant, [I 2 − 3], as indicated through the cold blue-type colors. Table 4 and Figure 5 summarize and illustrate the discovered models for the human corona radiata for the tension, compression, and shear data from Table 2. The white matter results from the corona radiata confirm the trends of the gray matter results for the cortex in Table 3 and Figure 4. First, for single-mode training, our neural network succeeds in interpolating or fitting the individual training data: The learned network parameters define stress functions that fit the individual tension, compression, and shear data excellently with R 2 train values of 0.99, 1.00, and 1.00. Second, the network performs moderately at extrapolating or predicting data outside the training regime: The network parameters trained for each individual mode fail to predict the other modes equally well, with R 2 test values ranging from 0.0 for the tension predictions with both compression and shear training to 0.83 for the shear prediction with tension training. Third, we find that, for all tests combined, the goodness of fit R 2 train of the tensile test remains 0.00 and decreases to 0.79 and 0.93 for compression and shear, but the collective fit increases. Fourth, similar to the gray matter results in Figure 4, the white matter model trained with the individual tests in Figure 5 activates seven, eight, and eleven terms as indicated through the broad color spectrum in the first three columns. Interestingly, for all three tests combined, the Constitutive Artificial Neural Network discovers a model with only four terms, which are again all functions of the second invariant, [I 2 − 3], as indicated through the cold blue-type colors. Table 5 and Figure 6 summarize and illustrate the discovered models for the human cortex, basal ganglia, corona radiata, and corpus callosum for multi-mode training with the tension, compression, and shear data from Tables 1 and 2. First and foremost, for multi-mode training, the fit of the shear data with R 2 train values of 0.99, 0.97, 0.93, and 0.92 is uniformly the best across all four brain regions. Second, the model universally underestimates the compressive stresses with R 2 values ranging from 0.78 to 0.90, and overestimates the tensile stresses with R 2 values from 0.00 to 0.36,   Figure  3. Dots illustrate the tension, compression, and shear data of the human cortex [11] from Table 1; color-coded areas highlight the twelve contributions to the discovered stress function according to Figure 3 from Table 3.

Results
indicating a poor fit of the tensile data. Third, and most importantly, the side-by-side comparison of all four brain regions confirms the trends of the cortex and the corona radiata: Our Constitutive Artificial Neural Network uniquely discovers a family of models that is parameterized in terms of the second invariant only, while the weights of the first invariant terms consistently train to zero. The blue color spectrum in Figure 6 underscores this observation.   Figure 3. Dots illustrate the tension, compression, and shear data of the human corona radiata [11] from Table 1; color-coded areas highlight the twelve contributions to the discovered stress function according to Figure 3 from Table 4. Table 6 and Figure 7 highlight the effects of the L2 regularization according to equation (23). As expected, the regularization reduces the number of non-zero terms, in our case from six in Table  5 and Figure 6, to two for the cortex, the corona radiata, and the corpus callosum, and only one for the basal ganglia. The associated non-zero weights, w 1,8 , w 2,8 , w 1,9 , w 2,9 , activate the linear exponential term, exp([I 2 − 3]) − 1, and the linear logarithmic term, ln(1 − [I 2 − 3]), which are   Figure 3. Dots illustrate the tension, compression, and shear data of all four brain regions [11] from Table 1; color-coded areas highlight the six contributions to the discovered stress function according to Figure 3 from Table 5.
highlighted in turquoise and light blue in Figure 7. The general trends are the same for the discovered six-term model without regularization and two-term model with L2 regularization: Both models depend on the second invariant only and their fits are best for the shear data with R 2 values well above 0.90 and worst for the tension data with R 2 values ranging from 0.00 to 0.48. Table 7 and Figure 8 demonstrate an application of our Constitutive Artificial Neural Network beyond model discovery, the parameter identification and comparison of special cases of the generalized network according to equations (16) to (21). The first set of models, the neo Hooke, Blatz Ko, and Mooney Rivlin models, are all linear in terms of the first invariant, second invariant, or both; the second set, the Demiray, Gent, and Holzapfel models, contain linear exponential, lin-   Figure 3 with additional L2 regularization for the weights. Dots illustrate the tension, compression, and shear data of all four brain regions [11] from Table 1; color-coded areas highlight the two contributions to the discovered stress function according to Figure 3 from Table 6. ear logarithmic, or quadratic exponential terms. Table 7 shows that each model, except for the Mooney Rivlin model, activates only one term of our network, either the first, second, third, fifth, or seventh. For all six models, we can convert the weights into a stiffness-like parameter with units [kPa]; the linear Mooney Rivlin model has an additional stiffness-like parameter, and the three nonlinear models have an additional coefficient of nonlinearity. Figure 8 shows the behavior of the neo Hooke, Blatz Ko, Demiray, and Holzapfel models when simultaneously trained for the tension, compression, and shear data of the human cortex. Notably, for the small stretch and shear strain ranges of 0.9 ≤ λ ≤ 1.1 and 0.0 ≤ γ ≤ 0.2, only the Holzapfel model displays a marked strain stiffening, while the neo Hooke, Blatz Ko, and Demiray models remain is their predominantly linear regimes. This allows the Holzapfel model to perform best not only in shear, with an R 2 values of 0.96, but also in tension with a value of 0.48, where most other models fail.       Table 1; color-coded areas highlight the terms of the stress function according to Figure 3 from Table 7. Figure 9 summarizes and compares the performance of all models, the Constitutive Artificial Neural Network without and with L2 regularization and its special cases, the neo Hooke, Blatz Ko, Demiray, Gent, and Holzapfel models. The graphs in the first three columns result from singlemode training with the individual tension, compression, and shear data [11] from Tables 1 and  2; the last column results from multi-mode training with all three data sets combined. The three rows highlight the coefficients of determination for tension R 2 t , compression R 2 c , and shear R 2 s . The color-coded blocks and error bars represent the means and standard deviations of the R 2 value from 50 training runs with varying with varying random weight initializations. First, in the three graphs on the diagonal that reflect the training of the models, the R 2 train values of all seven models are close to one, with only three models training poorly, the neo Hooke and Holzapfel models in tension and the Blatz Ko model in compression. Notably, our non-regularized Constitutive Artificial Neural Network outperforms all other models and has the largest R 2 train values when trained individually for tension, compression, and shear. Second, from the six off-diagonal graphs that reflect the testing of the models, we conclude that the model and parameters trained for tension are generally incapable of predicting the compression behavior and vice versa. However, the tension parameters are reasonably well suited to characterize the shear behavior, with our Constitutive Artificial Neural Network and the Holzapfel model performing best; vice versa, the shear parameters are moderately suited to characterize the tensile behavior, with our Constitutive Artificial Neural Network and the Blatz Ko model performing best. Finally, from the right column that reflects training with all three data sets combined, we conclude that our Constitutive Artificial Neural Network performs best for all three modes, followed by its L2 regularized counterpart, and the Blatz Ko model. Interestingly, we observe large R 2 values across the entire bottom row, indicating that, of all three tests, the shear tests are generally the easiest to fit and predict for all seven models. Taken together, our non-regularized Constitutive Artificial Neural Network performs best in eight of all twelve cases, second best in two, and fifths in one suggesting that our proposed neural network successfully discovers both model and parameters that best describe the data. Coefficients of determination R 2 of our Constitutive Artificial Neural Network, without and with L2 regularization, and its special cases, the neo Hooke, Blatz Ko, Demiray, Gent, and Holzapfel models, trained with the tension, compression, and shear data [11] from Tables 1 and 2; color-coded blocks and error bars highlight the means and standard deviations of the goodness of fit R 2 from 50 training runs with varying random weight initializations for each model, with colors and terms according to Figure 3.

Discussion
Characterizing human brain tissue is a challenging but important task. Throughout the past decade, driven by the need to improve diagnostic and predictive clinical tools, neuroscience has seen an enormous, growing interest in accurately characterizing and modeling the human brain [25]. Numerous research groups have proposed competing constitutive models to best characterize the behavior of gray and white matter tissue and calibrate the model parameters in response to mechanical loading [13]. Amongst the wide variety of possible models, the neo Hooke [60], Blatz Ko [8], Mooney Rivlin [44,52], Demiray [17], Gent [22], and Holzapfel [27] models have emerged as the most successful candidates to approximate the stress-stretch relations in the human brain. The gold standard strategy of all these approaches is to first select a constitutive model, either from the above list or beyond, and then tune its parameters by fitting the model to data. Often, these data are collected for a single loading mode-tension [43], compression [51], or shear [19]-and the parameters that fit one type of loading fail to predict the behavior for the other modes [11,45]. This simplification can have fatal consequences; for example, it could overestimate the stiffness of the brain in injury simulations. To address these limitations, our group has recently performed a comprehensive set of human brain tissue experiments in tension, compression, and shear and calibrated the neo Hooke, Mooney Rivlin, Demiray, and Gent models for four different brain regions, the cortex, basal ganglia, corona radiata, and corpus callosum [11][12][13]. While this approach is valuable to generate the best sets of parameters for existing models, some natural follow-up questions to ask are: How good are these models in the first place? Which one of them performs best? Are there other models that perform equally well, or even better? And, if so, how can we find them?
Constitutive Artificial Neural Networks are a family of neural networks that a priori satisfy thermodynamic constraints. When searching for generic models that could outperform traditional constitutive models, neural networks are a natural first choice [30]. Neural networks have advanced as a powerful strategy to approximate data by cleverly combining nested and weighted activation functions with several thousand unknowns [35]. They have become the go-to strategy to interpolate data within a well-defined domain when the underlying physics are completely unknown [1]. At the same time, classical neural networks typically fail to predict the behavior outside the training domain, they violate common physical constraints, and their parameters have no real physical interpretation [37]. This has sparked the recent trend to integrate physical information into classical neural networks [36]. In the spirit of this idea, we propose a new family of neural networks that a priori satisfy common kinematic, thermodynamic, and physical constraints. Towards this goal we consult the non-linear field theories of mechanics [2,58,59] and constrain the network output to enforce thermodynamic consistency; the network input to enforce material objectivity, and, if desired, material symmetry and incompressibility; the activation functions to implement physically reasonable constitutive restrictions; and the network architecture to ensure polyconvexity. These ideas are not entirely new. Several recent network models are designed around enforcing thermodynamic constraints [3,24], for example through additional terms in their loss function [15]. However, the problem of overfitting sparse data with a large set of physically meaningless parameters remains [34]. This raises the questions: How do we harness decades of knowledge in constitutive modeling to create a neural network,from easy-to-understand modular building blocks, with well-defined physical parameters, that we can constrain with our domain knowledge?
Constitutive Artificial Neural Networks can be made up of building blocks that feature prominent constitutive models. At a closer look, most popular constitutive models for human brain tissue have a similar functional structure. Here we propose to hardwire this structure into our neural network architecture. The underlying design paradigm is to reverse-engineer a Constitutive Artificial Neural Network that is, by design, a generalization of widely used and commonly accepted constitutive models including the neo Hooke, Blatz Ko, Mooney Rivlin, Demiray, Gent, and Holzapfel models. In Figure 3, we prototype this idea for an isotropic perfectly incompressible feed forward network with two hidden layers and four and twelve nodes. This network takes the scalar-valued first and second invariants of the deformation gradient, [ I 1 − 3 ] and [ I 2 − 3 ], as input and approximates the scalar-valued free energy function, ψ(I 1 , I 2 ), as output. The first layer generates the first and second powers, ( • ) 1 and ( • ) 2 , of the input, and the second layer applies the identity ( • ), the exponential, (exp((•)) − 1), and the natural logarithm (−ln(1 − (•))) to these powers. This results in twelve building blocks that additively feed into the final free energy function ψ from which we derive the Piola stress, P = ∂ψ/∂F, following standard arguments of thermodynamics. It is easy to show that our network is a generalization of popular constitutive models with the neo Hooke [60], Blatz Ko [8], Mooney Rivlin [44,52], Demiray [17], Gent [22], and Holzapfel [27] models as special cases. More importantly, through a direct comparison with these models in equations (16) to (21), the weights of our network gain a clear physical interpretation. Table 7 and Figure 8 show, for example, that we recover the classical neo Hooke model with shear moduli of µ =1.82kPa, 0.88kPa, 0.94kPa, 0.54kPa for simultaneous training with the tension, compression, and shear data of the cortex, basal ganglia, corona radiata, and corpus callosum, which agree well with the reported values of µ =2.07kPa, 0.99kPa, 1.15kPa, 0.65kPa [11]. Interestingly, both our network and the parameter fit in the literature find that one of the two shear moduli of the Mooney Rivlin model is consistently zero in all four regions, while the other is µ 2 = 1.88kPa, 0.90kPa, 0.97kPa, 0.56kPa for our approach compared to µ 1 =2.08kPa, 1.00kPa, 1.16kPa, 0.65kPa in the literature [11]. This agrees well with other studies in which one of the Mooney Rivlin shear moduli was also significantly smaller than the other across all brain regions [45]. Figure 8 reveals several additional universal trends for human brain tissue: First, tension is not only the most challenging test to perform [43], but also the most difficult test to fit, with R 2 values ranging from 0.14 to 0.48, followed by compression with 0.53 to 0.76, and shear with 0.94 to 0.96. Second, when trained simultaneously for tension, compression, and shear, all models consistently overestimate the tensile stiffness and underestimate the compression stiffness, highlighting the tension-compression asymmetry of all four types of human brain tissue [45]. Third, of all existing models, only the Holzapfel model captures the nonlinear stress response [27], suggesting that the classical invariant-based models struggle to reproduce the nonlinear behavior of human brain tissue for small deformations with 0.9 ≤ λ ≤ 1.1 and 0.0 ≤ γ ≤ 0.2. This raises the question: Can we design a Constitutive Artificial Neural Network that not only learns the best set of parameters for a given constitutive model, but also learns the model itself?
Constitutive Artificial Neural Networks simultaneously discover both model and parameters. In essence, we propose a radically different approach towards soft tissue modeling and abandon the common strategy to first select a constitutive model and then tune its parameters by fitting the model to data [39]. Instead, we propose a family of Constitutive Artificial Neural Networks, with the general architecture in Figure 1, specified for soft tissues in Figure 3, to simultaneously discover both, model and parameters that best describe the data. Probing our network with the tension, compression, and shear experiments from the gray matter cortex in Table 3 and Figure 4 and from the white matter corona radiata in Table 4 and Figure 5, reveals several interesting trends: When trained with all three experiments individually, the network activates all its twelve terms, and fails to discover a single best model. Nonetheless, with these twelve terms, it succeeds in interpolating or fitting the training data from one experiment; however, it only performs moderately at extrapolating or predicting the test data from the other two experiments. This suggests that the data from a single loading mode are not sufficient to characterize the entire breadth of the mechanical response of human brain which agrees well with observations in the literature [11,45]. Notably, when trained with all three experiments simultaneously, the Constitutive Artificial Neural Network robustly discovers a single model that best approximates the data: For the cortex, in the last columns of Table 3 and Figure 4, the network discovers four relevant terms, while the weights of the other eight terms train to zero, The non-zero weights translate into physically meaningful cortex parameters with well-defined physical units, the four stiffness-like parameters, µ 2 = 7.60kPa, a 2 = 6.23kPa, α 1 = 1.25kPa, α 2 = 4.67kPa, and the three nonlinearity parameters, b 2 = 1.65, β 1 = 0.99, β 2 = 1.40. For the corona radiata, in the last columns of Table 4 and Figure 5, the network discovers four relevant terms, while the weights of the other eight terms train to zero, The non-zero weights translate into physically meaningful parameters with well-defined physical units, the four stiffness-like parameters, µ 1 = 0.44kPa, a 1 = 0.24kPa, a 2 = 6.37kPa, α 2 = 4.51kPa, and the three nonlinearity parameters, b 1 = 0.24, b 2 = 1.89, β 2 = 1.18. Notably, of all seven models in Figure 9, the Constitutive Artificial Neural Network performs best in eight of all twelve cases, second best in two, and fifths in one, suggesting that it successfully discovers the model and parameters that best describe the data. Since the network autonomously self-selects both model and parameters, the human user no longer needs to decide which model to choose. This could have enormous implications, for example in finite element simulations: Instead of selecting a specific material model from a library of available models, finite element solvers could be designed around a single generalized model, the Constitutive Artificial Neural Network, which would autonomously discover the model from experimental data, populate the model parameters, and activate the relevant terms. This brings up the final and probably most interesting question: Can we learn anything from the discovery process itself?
For human brain tissue, the Constitutive Artificial Neural Network robustly discovers I 2 based models. Our Constitutive Artificial Neural Network combines the advantages of both, our knowledge of constitutive modeling [2,6,28,[46][47][48]58] and the efficiency of neural network algorithms [33,35,50]. For insufficient training data that only probe individual modes, in the three left columns of Figures 4 and 5, our network approximates the overall function ψ(I 1 , I 2 ) robustly with R 2 values well above of 0.99, but similar to classical neural networks, the contributions of the individual activation functions are non-unique. Enriching the training data by multi-mode training for tension, compression, and shear in Table 5 and Figure 6 eliminates this limitation. For sufficiently rich data that probe all three modes combined, in the right columns of Figures 4 and 5, our network successfully captures the behavior of both gray and white matter, and consistently identifies the same unique subset of activation functions, without overfitting the data. The reduced color spectra in Figure 6 confirm that the network self-selects only a subset of activation functions, while most of its weights train to zero. For classical neural networks, a common approach to prevent overfitting is to enrich the loss function by L1 or L2 regularization as we suggest in equation (23). For L1 regularization, the discovered model and parameters are virtually identical to the plain model in Table 5 and Figure 6. For L2 regularization, the network robustly discovers a reduced model with only two terms, a subset of the non-regularized models in equations (28) and (29), while the weights of the other terms train to zero,  Figure 7 summarize the model and parameters for the regularized network with two stiffness-like parameters, a and α, and two nonlinearity parameters b and β. Strikingly, in multimode training, both the standard and L2 regularized Constitutive Artificial Neural Network consistently discover models in terms of the second invariant only, while all terms of the first invariant train to zero. We can easily see this selective activation in the color-coded stress terms in Figures  6 and 7, which only cold blue-type colors associated with the second invariant [I 2 − 3]. The dominance of the second invariant is consistent with observations in the literature [45]. We hypothesize that the second invariant term [I 2 − 3], that ranges from 0.0264 in tension to 0.0346 in compression is better suited to model the characteristic tension-compression asymmetry of human brain tissue than the first invariant term [I 1 − 3], that only ranges only from 0.0282 to 0.0322 for our experimental range. In addition to discovering the best model and parameters, the goodness of fit in Figure  9 also teaches us something about the best experiment. If we had to select a single one experiment, tension, compression, or shear, Figure 9 suggests that the tension experiment, with the largest R 2 values overall, would provide the richest data and the best insight into the behavior of human brain tissue.

Current limitations and future applications.
In the present work, we demonstrate the use of Constitutive Artificial Neural Networks for human brain under the assumption of perfect incompressibility and isotropy. The general concept extends naturally to compressibily or near incompressibility and to materials with other symmetry classes, transverse isotropy or orthotropy, by expanding the network input to other sets of strain invariants. A more dedicate extension would be to incorporate viscous effects [12], or rather history-dependence or inelasticity in general, for example, by replacing the feed forward architecture through a long short-term memory network with feedback connections [7], while still keeping the same overall network input, output, activation functions, and selectively connected architecture. Another limitation, which involves more complex changes, is the additive architecture of our network, which facilitates incorporating polyconvexity. Especially for human brain tissues that display a pronounced Poynting effect with shear softening in tension and shear stiffening in compression [5], it could be beneficial to introduce a multiplicative coupling between the individual invariants. Expressing the free energy as a truncated infinite series of products of powers of the invariants, instead of a sum of individual invariant terms, would result in a fully connected feed forward network architecture for which polyconvexity is cumbersome to include a priori [26]. Another technical limitation we foresee for these more complex networks, is that the majority of weights might no longer train to zero and that a more involved L1 or L2 regularization could become necessary. This could artificially bias the training towards a subset of physical parameters. One interesting future direction along these lines, especially in view of human brain tissue, would be to compare invariant-based and principal-stretch-based Constitutive Artificial Neural Networks [47]. Several recent studies suggest that principal-stretch-based models outperform invariant-based models, especially in the context of combined loading and strainstiffening [11,41,45]. Finally, an important extension would be to embed the network in a Bayesian inference to supplement the analysis with uncertainty quantification [37]. Instead of simple point estimates for the network parameters, a Bayesian Constitutive Artificial Neural Network would learn parameter distributions with means and credible intervals. In contrast to classical Bayesian Neural Networks, here, these distributions would have a clear physical interpretation, since our network weights have a well-defined physical meaning.

Conclusion
Human brain is an ultrasoft material that is difficult to test and challenging to model. Numerous competing constitutive models for human brain tissue exist in the literature, but selecting the most appropriate model remains a matter of user experience and personal preference. The underlying idea of this manuscript is to automate the process of model selection. Towards this goal, we formulate the problem of autonomous model discovery as a neural network and harness the power of gradient-based adaptive optimizers for deep learning to train the network on human brain data. However, rather than using conventional fully-connected feed-forward networks, we reverse engineer a family of Constitutive Artificial Neural Networks with a sparsely-connected architecture from a set of modular building blocks. We rationalize these building blocks from commonly accepted and widely used constitutive models for soft biological tissues, including the neo Hooke, Mooney Rivlin, Demiray, Gent, and Holzapfel models. This strategy guarantees thermodynamic consistency, material objectivity, material symmetry, physical restrictions, and polyconvexity by design. Probably even more importantly, the weights of our Constitutive Artificial Neural Networks gain a clear physical interpretation and translate naturally into common mechanical parameters. When trained with tension, compression, and shear experiments of gray and white matter tissue, the network simultaneously discovers both model and parameters, and describes the data better than any of the commonly used invariant-based models. When constrained to its individual building blocks, the network learns weights that translate into shear moduli of 1.82kPa, 0.88kPa, 0.94kPa, and 0.54kPa for the human cortex, basal ganglia, corona radiata, and corpus callosum which agree well with the reported shear moduli in these four brain regions. Taken together, Constitutive Artificial Neural Networks have the potential to enable automated model discovery and could induce a paradigm shift in soft tissue modeling, from user-defined to automated model selection and parameterization.

Data availability
Our source code, data, and examples will be available at https://github.com/LivingMatter Lab/CANN.