A Data-driven Surrogate Model for Work Computation of a Periodically Forced Half-Sarcomere

Muscle force generation follows from molecular scale interactions that drive macroscopic behaviors and macroscopic processes that influence those at the molecular scale. A particuarly challenging issue is that models at the molecular level of organization are often quite difficult to apply to larger spatial scales. This is particularly true of moleuclar models driven by Monte-Carlo simulations. This challenge of multiscale dynamics requires methods to extract reduced order behaviors from detailed high-dimensional simulations. In this work we present a novel deterministic simulation method yielding accurate predictions of force-length behaviors of contracting muscle sarcomeres undergoing periodic length changes (work loops). The model maintains interpretability by tracking macroscopic state variables throughout the simulation while using data-driven representations of dynamics. Parameters of the data-driven dynamics are learned from trajectories from Monte-Carlo simulations of a half-sarcomere. Our method significantly reduces computational cost by tracking the state of the sarcomere in a course grained set of variables while maintaining accurate prediction of macroscopic level observables and time series for course grained variables. This allows for rapid sampling of the model’s output and builds towards the ability to scale to multiple-sarcomere simulations. Author Summary We develop a data-driven surrogate model for the dynamics of the half-sarcomere. This model achieves the same behavior with respect to force traces as more sophisticated Monte Carlo models at a substantially lower computational cost. The model is built by finding a course grained description of the full state space of the Monte Carlo simulation and learning dynamical models on the course grained space. Data-driven representations of the dynamics in the course grained space are trained using data from the full model. Data-driven models for forcing are also learned, and the result fed back into the dynamics. In doing so, the model seeks to replicate the effects of filament compliance on macro scale dynamics without explicitly tracking micro scale features. We withhold some input parameter regimes and demonstrate accurate reconstruction of course grained state and force traces using the data-driven model and given only knowledge of the initial condition and input. This work allows for faster computation of the forcing behavior of the half-sarcomere, as well as consistent representations of the course grained state variables. It is therefore promising as a step towards multi-sarcomere or even tissue scale models of skeletal muscle.

Since it was first proposed in 1954 by A.F. Huxley et al. [1,2], the sliding filament 2 model for muscle contraction has inspired myriad experimental and theoretical 3 approaches for understanding the molecular and cellular basis of force generation. The 4 focus of the sliding filament model is based on a sarcomere, the fundamental structural 5 unit of muscle cells. Sarcomeres are subcellular structures arranged in chains along the 6 length of a muscle cell. Each sarcomere, in turn, is composed of overlapping protein 7 filaments running between sequential z-disks. Actin containing thin filaments attach to 8 the z-disks. The thin filaments overlap with thicker myosin containing filaments in the 9 center of each sarcomere. Myosin filaments are, in turn, equipped with crossbridges that 10 act together as ensembles of molecular motors to pull the actin filaments towards the 11 center of the sarcomere, thus shortening its length and driving a sliding motion between 12 the thick and thin filaments (see Figure 1 ). 13 Mathematical models for muscle contraction based on sliding filament theory were 14 first proposed in 1957 [3] and drew largely on ordinary differential equation models that 15 assumed cross-bridges detach and attach to binding sites on the thin filaments according 16 to a first-order kinetic scheme. These and many other early models sought to explain 17 the transduction of chemical energy into mechanical work. With the initial assumptions 18 that actin and myosin filaments were inextensible, the force generated by the sarcomere 19 was computed by simply summing the forces generated by each crossbridge as a 20 function of its local distortion and effective spring constant. Assuming a continuum 21 limit the crossbridge's density along the length of the myosin filaments and binding sites 22 along actin filaments, one can derive differential equation models for the sarcomere [3]. 23 Following early modeling work that assumed filaments were inextensible, more recent 24 experimental evidence suggested that filaments do in fact exhibit compliance [4,5]. This 25 discovery had several implications for any mathematical model of the half sarcomere. 26 First, the assumption that each crossbridge behaves independently would not apply, 27 since the distortion of the filament lattice in response to local force generation by one 28 crossbridge would influence the binding probability of other crossbridges.. Additionally, 29 the net axial force would no longer be a simple sum of all forces created by crossbridges 30 since lattice distortion influences the total force. Thus, the local deformation of 31 filaments may lead to coupling between crossbridges via realignment of binding sites 32 due to filament compliance. Indeed, experimental evidence suggests a highly nonlinear 33 relationship between filament overlap, which corresponds to the number of bound 34 crossbridges, and force. [6].

35
Accounting for filament compliance with partial differential equation (PDE) models 36 is nontrivial, though the original Huxley model has been adapted to include extensible 37 filaments [7]. The adapted PDE based theory, however, does not consider mechanical 38 coupling between crossbridges. Monte Carlo techniques have subsequently been 39 proposed that explicitly track the location of each crossbridge and binding site, as well 40 as crossbridge states [8,9]. These models started with single filament pair 41 interactions [8] and were extended to consider radial geometry of the sarcomere [9]. In 42 contrast to PDE based models, Monte Carlo simulations have revealed that compliance 43 based mechanical coupling does play a role in crossbridge dynamics. Additionally, 44 similar Monte Carlo models for force generation by cardiac muscle cells have shown how 45 coupling between active sites on thin filamentscan drives emergent dynamics [9,10] 46 Macroscopic and tissue scale models of muscle will require large scale coupling of 47 2/15 individual sarcomere models. Coupling of small numbers of sarcomeres has been 48 explored in the differential equations model case [10,11]. Coupled models explicitly 49 including effects from filament compliance require scaling already expensive 50 computational models and are thus infeasible. Recent work has looked at coupled 51 ordinary and partial differential equation models that mimic results of Monte Carlo 52 simulations [12] but has not been extended to the multiple sarcomere case. The  This work constructs a data-driven surrogate model for the half-sarcomere meant to 57 replicate the behavior of the spatially explicit Monte Carlo techniques discussed 58 in [9,13,14]. Each simulation of the model is parameterized by an externally imposed 59 length. Binding sites on each actin filament have permissiveness modulated by calcium, 60 which binds to troponin molecules on the filaments inducing a conformal change and 61 allowing binding. In contrast to past methods which represent dynamics as differential 62 equations, we use a discrete model to track the probability of crossbridges being in one 63 of three bound states throughout the simulations. Transition probabilities between 64 states are modeled in a data-driven manner from simulation data using the Monte Carlo 65 technique outlined in [9,13,14]. We are specifically interested in the case of periodically 66 lengthened and activated half-sarcomeres. Here we seek to develop accurate reduced 67 order models of force generation by the half sarcomere in response to periodic length 68 changes and a variety of input parameter regimes. These reduced order models, in turn, 69 can be used to bridge dynamics at microscopic scales to mechanical behaviors at 70 macroscopic scales.

72
In this section we more formally discuss the problem of building a surrogate model for  we motivate the problem of a reduced cost surrogate model using an abstract notation 76 of the full scale mechanistic model presented in [8] and also discussed in [13,14]. In the 77 subsection Data-driven half-sarcomere model we provide a detailed description of the 78 computational techniques used in this work as well as considerations for model 79 validation. In all that follows we will be discussing a dynamical system evolving in time 80 with discretized increments t j . The notation • (j) will always refer to the quantity bound state, and locations of each binding site as x(t) ∈ X . Input parameters, including 88 actin permissiveness and externally induced length changes are denoted by u(t) ∈ U. A 89 single step of the mechanistic simulation described in [8] is given by, where ω (j) ∈ Ω is a random variable capturing the simulation's stochasticity. In practice, 91 ω is a number of identical independently distributed uniform random variables used to 92  [13,14]. Muscle cells (A) consist of axially repeating sarcomeres (B) which of which contain thin filaments comprised of double helical arrangements of actin monomers (blue) anchored to z-disks (green). Thin filaments interdigitate with thick filaments (red) that contain myosin motor molecules anchored to the backbone of thick filaments. The model developed in prior studies [8,9,13,14] is based on a three state system with transitions shown in D.

3/15 Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z
compute bound states of each cross-bridge. In this work we restrict our attention to discretely sampled time series of τ periodic signals and u = {u (j) } m j=1 . The full length 95 mechanistic simulation in [8] is then a function Φ : X × P × Ω → X m given by, for l > 1. Equation (2) allows one to sample trajectories of x given a prescribed input 98 parameterization u. In this research we are interested in the work generated by the half 99 sarcomere. Letting f : X → R denote the state-dependent force generated by the 100 4/15 half-sarcomere, work over one loop of the periodic forcing is, where z(t) is sarcomere length, m τ = τ /dt, and the negative sign has been added since 102 force is computed in the negative z direction. Evaluation for the full model Φ requires 103 sampling from Ω as well as a complex numerical optimization problem for each step. In 104 some cases, for example when evaluating the work generated by a new parameter 105 regime, it is useful to use a lower fidelity model with decreased computational cost.

106
Critically, any such model must also maintain sufficient resolution to accurately 107 compute the force exerted by the half-sarcomere at each timestep in the simulation. In 108 this work, we find a reduced set of variabes and both hold with small residuals for all x, u in a large training set. That is, we want a 112 dynamical modelφ of the course grained space that is consistent with the full model φ 113 and a consistent meansf for computing force from the course grained space.

114
The fact that u is in the domain off and not f implicitly allows for unresolved state 115 variables to affect the force computed from the course grained space, so long as they 116 react quickly to u. A prototypcial example is crossbridge length, which is modulated by 117 changes in sarcomere length encoded in u.

118
State space of the mechanistic simulation 119 We provide a brief description of the state space X of the mechanistic simulation Φ used 120 as a high fidelity model and refer the reader to [9,13,14] for more detail. The simulation 121 exploits the symmetry of a sarcomere to focus modeling on the span between a single 122 z-disk and m-line, the midpoint of the sarcomere. A grid of actin and myosin filaments 123 is arranged according to the multifilament geometry discussed in [9]. Spatial location of 124 each crossbridge and binding site are explicitly tracked throughout the simulation. We 125 enumerate the crossbridges by which myosin filament they lie on and their location on 126 that filament, and likewise for binding sites on actin filaments. The state space of the 127 mechanistic simulation is uniquely defined by 128 where n M and n A are the number of myosin and actin filaments, n c is the number of where G attach to a given binding site and a crossbridge may only bind to one site, at most one 137 element of any particular row or column of G may be nonzero. Many entries will 138 necessarily be zero due to the simulation's geometry. However, it will be useful for 139 describing changes in the sarcomere's state that we aim to capture in the data-driven 140 model.

141
All simulations in this work are parameterized by three input time series; the length 142 of the half sarcomere, lattice spacing, and actin permissiveness. The last variable 143 models binding site affinity modulated by calcium. These are abbreviated as z (j) , l (j) , 144 and p (j) , respectively and are described by time series, We assume constant volume so that z (j) l (j) 2 is constant throughout any simulation.

146
Three output variables are recorded throughout the simulation: axial force, radial force, 147 and radial tension. We denote these by where it is implicitly understood that each is dependent on x (j) . Radial force exists in 149 the plan perpendicular to the sarcomere and is generally very close to zero. We are 150 primarily concerned with accurate computations of axial force throughout a simulation. 151 The training data is composed of a number of runs of the mechanistic model using 152 many parameter regions. We let P train ⊂ P denote the set of parameter regimes used in 153 the training data. Since we do not vary the initial condition x (1) , each run is 154 characterized by its input parameters u and random variable ω ∈ Ω m . We denote the 155 collection of random variables used for each parameter regime u as Ω u ⊂ Ω m .

156
Data-driven half-sarcomere model 157 We first consider the problem of finding a good set of course grained coordinates Y on 158 which to construct a data-driven surrogate model. In many fields of computational 159 physics such as fluid dynamics, there are established means of finding reduced order 160 bases on which to construct reduced models [15]. The problem here is less straight 161 forward, since many variables tracked in the mechanistic simulation are categorical and 162 we cannot generally assume that dynamics are constrained to a low dimensional space. 163 We therefore rely on domain knowledge of the half-sarcomere to determine Y.
Radial force is omitted from the data-driven model since it is expected to fluctuate 173 in a small region about zero and is not relevant for computing work. Radial tension was 174 found to be important for computing the dynamics of the cross-bridge states.
where the constraint i A(f y (y (j) ), u (j) ) ij = 1 enforces that the total number of 181 crossbridges in maintained. Since crossbridges do not communicate with each other, it 182 seems intuitive that the dynamics should be linear in y. Indeed, the mechanistic model 183 f computes transition probabilities for individual crossbridges based on distance to 184 binding site and binding site permissiveness. However, u alone seems to be insufficient 185 to accurately compute transition probabilities and we include radial tension. Individual 186 transition probabilities are modeled using polynomial regression on features including 187 radial tension and input. Specifically, where P q indicates the set of polynomial features of order less than or equal to q in the 189 features and ξ ik is a learned weight vector. The term A(f y (y (j) ), u (j) ) ik is the 190 probability of a crossbridge transitioning from state i to state k during the timespan 191 (t j , t j+1 ]. 192 We impose several constraints on the matrix A. Transitions from unbound (state 1) 193 to tightly bound (state 3) are prohibited in the mechanistic model so A 13 is defined to 194 be zero. We also need to ensure each column sums to one and that entries are This relates nicely to the course grained variables through the relation, Empirical transition probabilities for each step of the simulation are given by, i . However, 220 this method does not yield predictive results. The denominator of (16) is highly 221 correlated with the actin permissiveness p (j) and other input parameters. Indeed, the 222 mechanistic simulation generally has perioidic intervals in which y 2 = y 3 = 0. Instead, 223 we threshold observations where y (j) i is below a fixed tolerance y min = 5 and weight the 224 remaining observations equally.

225
Since the cubic polynomial kernel may result in a more ill-conditioned linear system 226 we impose an L 1 penalty to encourage parsimonious weight vectors, as is common in 227 many physical applications [16][17][18]. To maintain a tractable size for the linear system 228 even with many trials of each parameter region, we average over the empirical transition 229 rates and radial tension within each parameter regime. The resulting cost function is 230 given by, and δ is a Kronecker delta function indicating when there is sufficient sample size to 233 trust the observation. Each monomial term in the range of P 3 is normalized across the 234 training dataset so that regularization is independent of measurement units.

235
Optimization for (17) is performed in the Scikit Learn python library [19] using 10-fold 236 cross validation to find regularization term λ. 237

8/15
Forcing model in course grained space 238 Force in the mechanistic simulation is computed by summing over forces generated by 239 individual crossbridges. However, there are properties of the crossbridges not encoded in 240 the averaged bound states. Each crossbridge acts as a spring which is stretched 241 according to the length between it's base on a myosin filament and binding site. This 242 distance changes according to input parameters. Therefore, the data-driven model for 243 forcing is constructed as a function of the averaged cross-bridge states as well as the 244 input parameters u.

245
Specifically we constructf as an ensemble of regression trees using gradient 246 boosting [20] as implemented in Scikit Learn [19]. The gradient boosting algorithm 247 sequentially constructs an emsemble of regression trees [21]. The resultingf is of the 248 form, where each h m (y, u) is a regression tree and M = 100. The term (z − z 0 ) + is the The optimization for each tree is given by, where 258 Each regression tree h m is a piecewise constant function that splits the domain along 259 variables that result in minimal variance of the target function in each half of the 260 resulting split domain. This is performed recursively until a maximum depth of 5 is 261 achieved or variance reduction in the splitting is below a threshold.

262
There are several advantages to this approach. The learned functionf is composed 263 of a large ensemble of weak learners which allows the model to be accurate without

9/15
where y (1) = y x (1) . The work done by the half-sarcomere over one period of 273 oscillation is computed using the data driven approximation of force as, where β = 3 is an integer number of periods to average over and α = m − βm τ is a burn 275 in time to allow for decay of any initial transient onto a periodic orbit.

276
Training Data Fast Simulation Data-driven Modeling  [12, 15, 18, 21, 25 ms]. In this section we compare results of the Monte Carlo simulation 283 to those from the data-driven simulation on the parameter regimes reserved for testing. 284 All results from the data-driven model are computed only from the initial condition of 285 the mechanistic simulation using the same inputs u. Therefore, accuracy is shown not 286 only for point wise computations of transition probabilities, but of the full length time 287 series.    Work loops for each of the 15 testing regimes are shown in Fig. 6. Red trajectories 304 are generated using the data-driven methods and blue trajectories are given by the 305 average force from the Monte Carlo simulation,f . A cursory inspection reveals that the 306  Computed work in mechanistic model on x-axis and data driven model on y-axis. Blue data points correspond to input in training data while red is testing.
Error bars indicate one standard deviation around mean value observed over 20 trials of the mechanistic model.

316
We have presented a novel data-driven technique for modeling the half-sarcomere under 317 periodic length changes and activation. The data-driven model is trained on data from 318 and mimics the behavior of the Monte Carlo techniques developed in [8,9] at a 319 significantly decreased computational expense. In particular, force traces and work 320 loops are largely consistent when the two models are compared.

321
The current work offers several paths for future research. While methods presented 322 here replicate dynamics on the course grained variables and force traces given new input 323 parameters, several regimes in the testing set indicate that some features are missed.

324
Increases in axial force during the contracting phase of the sarcomere are 325 underestimated in some cases, even though crossbridge state densities are accurate.

326
This indicates that the data-driven model for force could be improved through different 327 parameterizations off or by including finer variables in Y. However, such work could 328 come at the expense of increased model complexity and could significantly increase the 329 challenge in finding dynamicsφ in the course grained space.

330
One goal of the current work to build towards a scalable model that may be used to 331 run simulations of many sarcomeres coupled end to end. In this case, periodic length 332 changes imposed on the string of sarcomeres may not imply periodicity on each 333 sarcomere, so the model would need to be tested on a larger space on inputs.

334
Furthermore, physiologically accurate models at a more macroscopic scale would require 335 considering this force as acting on a set of sarcomeres with masses moving in a viscous 336 medium. Tuning for drag coefficients may be a highly nontrivial problem. 337 We have constructed the framework in this work to demonstrate that it is feasible to 338 replicate the behavior of the Monte Carlo simulation with a fast deterministic system, 339 13/15 but do not claim that the method presented is optimal. It is common for reduced order 340 models of Markovian dynamical systems in physics to lose the Markov property and 341 thus require a memory term for accurate representation of dynamics in the reduced 342 space [22]. There is reason to believe that the dynamics in course grained variables 343 developed in this work would benefit from incorporating non-Markovian terms, though 344 data-driven simulation would then require initialization with a short time seriesrather 345 than only requiring the initial condition.

346
To facilitate future work and in the interest of reproducible research, all data and 347 code used for this paper has been made publicly available under an MIT license on 348 GitHub at https://github.com/snagcliffs/data driven sarc.