## Abstract

Quantitative experiments are essential for investigating, uncovering and confirming our understanding of complex systems, necessitating the use of effective and robust experimental designs. Despite generally outperforming other approaches, the broader adoption of model-based design of experiments (MBDoE) has been hindered by oversimplified assumptions and computational overhead. To address this, we present PARameter SEnsitivity Clustering (PARSEC), an MBDoE framework that identifies informative measurable combinations through parameter sensitivity (PS) clustering. We combined PARSEC with a new variant of Approximate Bayesian Computation for rapid, automated assessment and ranking of designs. By inherent design, PARSEC can take into account experimental restrictions and parameter variability. We show that PARSEC improves parameter estimation for two different types of biological models. Importantly, PARSEC can determine the optimal sample size for information gain, which we show correlates well with the optimal number of PS clusters. This supports our rationale for PARSEC and demonstrates the potential to harness both model structure and system behaviour to efficiently navigate the experiment design space.

## Introduction

The scientific approach relies on careful experiments to understand complex systems. Experiments must not only provide quantitative information such as model parameter estimates but also be economically and practically viable. Accurate parameter estimation is essential for refining model representations and enhancing model predictions. Compared to traditional model-free techniques, Model-Based Design of Experiments (MBDoE) utilises system-specific models to optimise experiment design for improved parameter estimation [1, 2, 3, 4, 5, 6, 7]. MBDoE allows for a more effective evaluation of the experimental design space and offers a more comprehensive understanding of the variables and the interactions between the components. It can facilitate resource optimisation, decrease the number of trials, and improve the accuracy of the estimation of the model parameters [3, 4, 5] and model selection [6, 7]. Over the years, MBDoE has become widely applied in a variety of fields, including engineering and biomedicine, due to the increasing use of quantitative models to understand complex systems. Despite the advantages of MBDoE, its application in parameter estimation is hindered by the increasing intricacies of modern experiments and their inherent constraints.

Fisher’s Information Matrix (FIM) is a popular framework for designing experiments aimed at estimating model parameter values [3, 4, 5, 6, 7, 8]. It is based on the concept of the expected information gain from an experiment, which is calculated by taking the expected value of the second derivative of the log-likelihood function with respect to the model parameters [9, 10]. If changes in the values of the parameters have a major effect on the output of the model, it implies that the output is useful in providing information about the parameters. Nevertheless, FIM-MBDoEs usually assume a linear statistical model that links measurements to parameter values, requiring the linearisation of nonlinear models at expected ground-truth parameter values. Therefore, the efficiency of designs is markedly diminished when the guesses are not precise or accurate [9]. This limits the ability of FIM methods to incorporate parameter uncertainty in the experimental design. The complexity of matrix inversion and the need for a large experimental sample size to achieve the desired accuracy and a Gaussian noise distribution present additional difficulties that restrict the use of FIM-MBDoE [11]. Recently, new approaches have been proposed to address some of these issues [8, 12, 11, 10, 13, 14, 9]. For example, design space discretization can be employed to handle non-linear models, and the average information content and risk can be calculated for a large uncertainty to identify robust and reliable designs [14]. Additionally, the pseudo-inverse [15, 16, 17] can be used to tackle ill-conditioned matrices [8].

Alternately, Bayesian approaches to MBDoE operate independently of assumptions related to linearity or direct likelihood estimations [18, 19, 20, 21, 22, 23, 24, 25]. These techniques can take into account any prior information or convictions about the parameters and offer more flexibility that successfully sidesteps the issues with a single-point estimation. However, they suffer from high computational demands, particularly when dealing with complex systems or large design spaces, and often produce biased and potentially suboptimal designs when tailored to prespecified sample sizes [18, 22, 21, 20]. To address these limitations, certain algorithms employ a sequential greedy approach, which build on existing designs to manage complexity and improve design efficiency [26, 8, 19]. However, such design search approaches are influenced by the initial design choice and may be susceptible to local optima artefacts.

Here, we introduce the PARSEC DoE algorithm, which uses the model architecture of the system through parameter sensitivity analysis to direct the search for informative experiment designs. PARSEC recognises combinations of measurables with a low overlap in the vectors that represent their parameter sensitivity indices (PSI). These PSI indices indicate how the value of the measurable changes when the value of the parameter changes. We demonstrate that the precision of parameter estimates increases when the overlap in the PSI vectors is reduced, thus enabling PARSEC to generate an ‘optimal’ DoE effectively. We show that fuzzy c-means clustering of PSI can reduce the number of evaluations required to identify ‘optimal’ designs substantially. PARSEC combines the PSI evaluated at different parameter values to identify generalist and robust designs informed by the parameter sensitivities while accommodating the uncertainty in prior knowledge. PARSEC combines concepts from FIM-based and Bayesian MBDoEs to determine the best experimental designs. We find that the optimal number of PSI clusters is in agreement with designs that yield the highest information gain, thus introducing a new way to calculate the sampling frequency.

In order to implement PARSEC, a reliable high-throughput parameter estimation framework is needed to assess the predicted designs. Existing methods for parameter estimation are restricted by their inherent assumptions or are not suitable for high-throughput analysis. For instance, likelihood-based techniques are limited to systems where the likelihood function can be specified and a Gaussian distribution for parameter distribution and measurement noise can be assumed.[27, 28, 29, 30]). Rather than using error thresholds or correlated sampling, Approximate Bayesian Computation (ABC) based methods [30, 31, 32, 33]) typically rely on data-dependent stipulations. This can make the estimation vulnerable to local minima and initial biases. To alleviate this, here we develop the Approximate Bayesian Computation - Fixed Acceptance Rate (ABC-FAR) method [34] for parameter estimation for PARSEC designs. Our ABC-based algorithm uses a global and relative parameter sampling rejection criterion (FAR). Thus, ABC-FAR avoids data-dependent stipulations, making it suitable for automated analysis and fair comparison of estimation derived from different data sets. We show that parameter estimation via ABC-FAR is accurate, free of computational artefacts, and less susceptible to noise in the data and initial guess bias.

## Results

### Sensitivity driven design of experiments

The PARSEC (PARameter SEnsitivity driven Clustering based DoE) framework represents a MBDoE rooted in four key ideas (Figure 1). Firstly, it employs the parameter sensitivity of a variable as an indicator of its informativeness towards the estimation of a parameter value [35, 10]. The information content profile of a measurement is, thus, approximated using a vector of its parameter sensitivity indices (PSI) (Step 1, Figure 1). Secondly, PARSEC computes the PSI vectors at various parameter values that sample the distribution linked to parameter uncertainty. Concatenating the PSI vectors for a measurement candidate yields the composite PARSEC-PSI vector. This approach accounts for the uncertainty and helps inform robust DoE. The intricacies of creating the PARSEC-PSI vector are detailed in the Supplementary text (SI S2, Supplementary Figure S1). The third idea revolves around the selection of measurements in a manner that minimizes the overlap amongst the corresponding PARSEC-PSI vectors. This maximizes the overall information yield from their combination. The process involves clustering measurements based on their PARSEC-PSI vectors and subsequently selecting a representative measurement from each cluster (Step 2, Figure 1). Finally, we expect the performance of a PSI-clustering-based design to depend on the quality of clustering, which in turn hinges on the number of clusters. As the number of clusters dictates the size of the experimental sample, we posit that optimal clustering can provide a well-informed guess for the sample size, which can be tuned to identify efficient designs that are both economical and informative (Step 3, Figure 1). The choice of clustering depends on the design specifications. For predetermined sample sizes, k-means and c-means clustering algorithms are suitable, especially for PARSEC-PSI vectors which would typically be high-dimensional, continuous data. In this study, we focus on two alternate implementations of PARSEC, namely PARSEC(k) and PARSEC(c), that uses the k-means and c-means clustering algorithms respectively (see Methods).

Optimizing the design parameters requires evaluation of the associated designs predicted. Moreover, given the stochastic nature of PSI-clustering and subsequent measurement selection, PARSEC predicts multiple designs. Thus the process of design comparison is vital for selecting the most favorable design from the array of designs predicted by the PARSEC framework (Step 4, Figure 1). In this context, we introduce the Approximate Bayesian Computation - Fixed Acceptance Rate (ABC-FAR) technique, designed to automate parameter estimation and facilitate impartial comparison of the generated experimental designs.

For a likelihood-free approach and wider applicability, we propose an ABC-based algorithm for parameter estimation. Like the other ABC methods, ABC-FAR iteratively refines the parameter value distribution using χ^{2} statistics. In each iteration (Figure 2a), it samples the current distribution estimate (step 2, Figure 2a), and among them chooses some based on the corresponding χ^{2} values to update the distribution estimate (steps 3 & 4, Figure 2a). However, ABC-FAR differs from other ABC methods in that it selects a fixed fraction (FAR) of parameter values with the lowest χ^{2} values to update the marginals. The rejection criterion employs relative comparison and eliminates the need for specifying absolute χ^{2} thresholds, thereby avoiding data-specific customization. Unlike other relative rejection criterion-based approaches that involve local comparisons with neighboring parameter combinations, ABC-FAR performs a global comparison, considering all combinations sampled from the marginal together. Latin Hypercube Sampling (LHS, [36, 37]) is employed to sample the marginal, eliminating the need for transition kernels. Additionally, a small noise is introduced in the sampling distribution, resembling Simulated Annealing [38], to enhance robustness against local minima and initial guess biases (step 1, Figure 2a).

### Accurate and robust parameter estimation using ABC-FAR

We demonstrate the performance of ABC-FAR using the popular Lotka-Volterra model. We simulate the model for a known combination of parameter values (ground truth) and initial conditions to generate synthetic data (Figure 2b). Assuming the same initial conditions, we use ABC-FAR to estimate parameter values using this data. Estimation is done in the logarithmic scale (base 10) of parameter values to explore a large dynamical range. We use a uniform initial guess for all parameters to indicate a lack of prior knowledge about the values. We see that the deviation between model predictions and data (χ^{2} statistics) decreases with every iteration of ABC-FAR, and the model predictions recapitulate the data well (Figure 2b, c). The algorithm iteratively transforms the prior guess for the model parameters into sharp distribution around the corresponding ground truth value used (Figure 2c, Supplementary Figure S2). However, it doesn’t alter the marginal of a dummy parameter (Figure 3c), a parameter that doesn’t affect the model predictions (and hence χ^{2}) but undergoes the same treatment as the model parameters. This suggests that the algorithm doesn’t introduce unwanted computation artifacts.

Additionally, we identify higher-order relations among model parameter values, conditioned on data. For example, the selected values of the ‘birth rate of prey’ (a) and ‘death rate of prey due to predation’ (b) show a strong positive correlation (PnI = .74, Figure 3b), indicative of significant practical nonidentifiability (PnI). The slope of linear regression (=0.93) suggests that changes in χ^{2} due to a 1% increase in log_{10}(a) value can be compensated by .93% increase in the value log_{10}(b). Although local, the findings are statistically significant as realized by comparing them to similar analysis on dummy parameter (Figure 3d, Supplementary Figures S3 and S4). We further verify that the estimation is robust to measurement noise (Supplementary Figure S6) and see that ABC-FAR can easily be adapted to exploit the model structure and system-specific knowledge to speed up and improve convergence (Supplementary text SI S7, Supplementary Figure S7).

The efficiency of ABC-based methods typically depends on the implementation of their rejection criterion. In the previous analysis, we used a FAR value of 0.25. Here we vary it to see the effect on the algorithm’s efficiency. For this, we monitor two performance statistics at every iteration for the different implementations - the accuracy of estimation (inversely related to χ^{2} statistics) and cumulative computational cost (∝ EAR^{−1}, described in methods, see Supplementary Figures S8, S9 and S10). The χ^{2} statistics monotonically decrease with iteration irrespective of the FAR value used; this is expected as we use the ‘history-dependent update strategy’ (strategy described in Methods). However, the estimation converges at higher accuracy (lower χ^{2} statistics) as the FAR value decreases. Among those reaching a particular level of accuracy, the schemes with higher FAR values (weaker rejection criteria) seem to be more efficient. The efficiency of our schemes is comparable to that of the popular ABC-SMC algorithm [32] (Supplementary Figure S10, comparison detailed in Methods).

### Identifying informative designs using PARSEC with ABC-FAR

We use PARSEC(k) (PARSEC employing k-means clustering, see Methods) and ABC-FAR to design experiments characterizing the parameters of a three-gene repressilator model (Elowitz and Leibler in 2000, details in Methods and SI). First, we establish the methodology with predetermined initial conditions and sample size, assuming an accurate prior knowledge of parameters. We account for practical considerations, such as (a) limiting the experiment duration, (b) preferring simultaneous measurements when multiple variables are observed, and (c) constraining the measurables based on feasibility (see Methods).

To validate PARSEC(k)’s performance, we compare PARSEC(k) designs (PD_{BC}) against random designs (RD_{BC}), for parameter estimation accuracy. The suffix indicates the variables being measured. We also evaluate the importance of clustering, using anti-PARSEC(k) designs (WD_{BC}) where all measurements are selected from the fewest of clusters (the ones with the most candidates). The parameter estimation error for the random designs (RD_{BC} and RD_{AB}) are approximately uniformly distributed (Figure 4a). Interestingly, the spread of errors becomes strongly skewed when we consider designs based on PSI vector clustering (PD_{BC} and WD_{BC}, Figure 4a). The error for designs where all the measurements belong to a single cluster (WD_{BC}) tends to be higher; whereas designs (PD_{BC}) generated using measurements from different clusters are highly informative with lower error (barring few outliers). The top ten percentile of designs monitoring the levels of proteins B and C (RD_{BC}, WD_{BC} and PD_{BC}) are predominantly PARSEC-predicted ones (PD_{BC}). Despite no significant bias in the selection of measurements in PD_{BC} compared to that of RD_{BC} (Figure 4c), the average estimation error for the PD_{BC} is about three times lower than that for the RD_{BC} (Figure 4a). Furthermore, the clustering-based algorithm samples from a small and informative sub-space of the experiment design space, indicated by the low variance in estimation error (Figure 4a) and performance of sub-samplings of PD_{BC} compared to that of RD_{BC} (Figure 4b).

We also observe that the performance of designs generated by PARSEC(k) depends on how closely certain design specifications are implemented. For instance, PD_{BC} optimized the measurements of proteins B and C based on their PSI. But if we quantify proteins A and B using the measurement time points optimized for the quantification of proteins B and C (PD_{AB}), we observe a decrease in estimation accuracy compared to PD_{BC}. However, PD_{AB} still outperforms RD_{AB} on average, with a 1.5 times lower error, which can be attributed to two factors: (a) PD_{BC} accounts for the sensitivity profiles of protein B, which is one of the variables measured in PD_{AB}, and (b) a potential correlation between the sensitivity profiles of proteins A and C due to the coupled dynamics.

### PARSEC(k) tolerates parameter uncertainty

In the previous analysis, we optimized PD_{BC} using PSI evaluated at a guess of parameter values (GT_{G}). But the guess might differ from the actual ground truth value (GT_{T}). PARSEC(k) tolerates a significant disparity between the guess and actual value of ground truth but suffers when the guess is very bad as indicated by the accuracy ratio (Figure 5a), defined as the ratio of mean estimation error for random design to that for PARSEC(k) designs.

Rather than informing designs using a single-point guess, PARSEC can do so for multiple guesses to accommodate uncertainty associated with parameter knowledge (Figure 5b). To do so, PARSEC samples the associated distribution for representative guesses called training samples (Θ^{k}). PSI evaluated at the training samples are concatenated together to form the PARSEC-PSI vectors, used for clustering and subsequent design selection. Estimation errors for the PARSEC(k) and random designs are evaluated using data generated at the training samples (Θ^{k}) and validated at another set of parameter samples (test samples, denoted as Ω^{k}).

PARSEC(k) designs are on average twice as informative as random designs given the parameter uncertainty (Figure 5c). Interestingly, the top 5% of the PARSEC(k) designs in the training analysis, are also the best performers in the test analysis. Such designs robust to parameter uncertainty, can be good candidates for generalist designs. Moreover, comparable accuracy ratios for training and test samples (2 ± .85 and 2.05 ± .53, respectively) further emphasize the robustness of PARSEC(k) designs.

### Optimizing the sample size

In the previous analyses, we constrained the design search to those with pre-defined experiment sample size; here we optimize it. Random and PARSEC(k) designs lead to more accurate parameter estimation when their sample size increases (Figure 6a). However, the marginal gain in estimation accuracy decreases steadily (Supplementary Figure S12). Notably, PARSEC(k) designs of sample sizes five and higher exhibit similar performance on average (Figure 6a), suggesting that a five-measurement design would be both informative and economical.

As the number of clusters in PARSEC defines the experiment sample size, we posit that the clustering efficiency affects the information gain. PARSEC allows us to evaluate the relationship between the accuracy ratio and the clustering efficiency for a given sample size. In our example of designing experiments for the repressilator network, we find that the intra-cluster distance for the PARSEC-PSI vectors reaches an elbow point between five to six clusters, indicating optimal partitioning (Figure 6b). Interestingly, five clusters also correspond to repressilator PARSEC(k) designs with the highest accuracy ratio.

This suggests that clustering efficiency can serve as a valuable indicator for determining sample size for PARSEC, leading to high accuracy ratios. This arises due to two reasons. On one hand, using insufficient clusters limits the representation of PARSEC-PSI vectors in the PARSEC designs. Therefore, increasing the number of clusters significantly enhances the representation and information content of the designs, thereby improving the accuracy ratio. On the other hand, excessive clustering results in an over-representation of similar PARSEC-PSI vectors, resulting in only a modest increase in the information contents compared to that in random designs. Hence the advantage of using a PARSEC approach is likely to decrease with over-clustering. The observed correlation between clustering efficiency and accuracy ratio further validates the effectiveness of the PSI-clustering-based approach in experiment design.

While our implementation of PARSEC(k) provides a good estimate for optimal sample size, it still demands a substantial computational cost due to the evaluation of numerous designs. This cost escalates when considering parameter uncertainty across multiple parameter values. The multitude of predictions in PARSEC(k) arises from the discrete partitioning of continuous PARSEC-PSI vectors using the k-means clustering algorithm and subsequent candidate selection. An alternative approach is to employ the fuzzy c-means clustering (Supplementary Figure S13), as in PARSEC(c). This method results in fewer unique optima, drastically reducing the computational burden (Figures 6b). PARSEC(c) exhibits similar clustering convergence (Figure 6a: inset) and comparable, if not superior, estimation error and accuracy ratio profiles (Figures 6a and b) when varying sample sizes, all while requiring nearly twenty times fewer computations (Figure 6b). This enhanced efficiency renders sample size optimization for generalist design search manageable. We identified informative, generalist designs for various sample size experiments requiring a total of nine design evaluations (Figure 6c, Supplementary Figure S14), compared to the 100 designs evaluated to identify a generalist design of sample size six via PARSEC(k) (in Figure 5c).

## Methods

### Implementation of PARSEC

First, we identify the feasible measurement candidates according to design specifications and sample the parameter uncertainty selecting the ‘training samples’. PARSEC then evaluates the parameter sensitivity indices (PSI) for each variable within a measurement candidate across these training sample (step 1, Figure 1). These PSIs are concatenated into PARSEC-PSI vector for each candidate (Supplementary Figure S1). The candidates are grouped by similarity in their PARSEC-PSI vectors (step 2, Figure 1), guiding PARSEC’s final selection of measurement candidates (step 3, Figure 1). The candidates are selected such that each cluster is well-represented in the design.

We use Latin Hypercube Sampling (LHS [36]), a Monte-Carlo algorithm to efficiently sample multi-dimensional parameter spaces [37]. To generate N samples, LHS divides the distribution for each parameter into N equal probability intervals. Then it randomly picks an interval (without repetition) for each parameter and based on the distribution, chooses a random value from it. For sensitivity analysis, we use extended Fourier Amplitude Sensitivity Test (eFAST, [39, 40]). It is a variance-based method that gauges how parameter fluctuations contribute to uncertainty in the variable of interest. eFAST provides a global measure of sensitivity and captures non-linear relationships among parameters and outputs.

In our implementation, we apply either k-means or fuzzy c-means clustering algorithm. Suitable for partitioning continuous data into predetermined number of clusters, these unsupervised learning methods aim to minimize the total intra-cluster distance. k-means algorithm designates each data point to a single cluster. Whereas the c-means clustering algorithm assigns a probability of the data point belonging to each of the clusters. Upon c-means clustering, we guide the selection of measurement candidates using this probability measure of cluster membership. We iteratively select the measurement candidate corresponding to the highest degree of membership to any of the clusters (randomly breaking ties). Once selected, the measurement candidate and its corresponding cluster are ignored from the subsequent selection process. We denote to this flavor of PARSEC as PARSEC(c). Alternatively in PARSEC(k), we use k-means clustering algorithm and construct the design by randomly selecting a measurement candidate from each cluster. Properties such as silhouette scores can be used to bias selection of candidates from a cluster, alternate to the random analyzed in this study.

It’s important to note that the clustering and subsequent measurement selection steps in PARSEC can be stochastic. Thus we execute PARSEC to predict multiple designs, eventually selecting the most informative one. Designs are evaluated at one or more parameter values (step 4, Figure 1). At each parameter combination, we simulate the model and generate the data set according to the design being evaluated. We employ ABC-FAR to estimate the parameter values using the data set and calculate the estimation error.

### Implementation of ABC-FAR

ABC-FAR iteratively samples parameter combinations and rejects/accepts them based on associated χ^{2} values, which quantifies the deviation between data (R_{D}) and corresponding model prediction , where the summation is over all data points and σ_{D} denotes the standard deviation in data R_{D}.

ABC-FAR samples a noisy version of the current estimate of the distribution. However, the amplitude of noise added decreases with iteration (mimicking simulated annealing [38]). This improves exploration in early iterations without drastically slowing convergence in the later iterations. For sampling, we again rely on Latin Hypercube Sampling (LHS [36]). Typically each parameter is sampled independently, although correlations can be imposed using modified LHS algorithms [41, 42]. Independent sampling ignores the joint distribution of model parameters which can affect convergence. However, in an iteration, the algorithm allows us to include the combinations selected in the previous iteration during the parameter selection step (History-dependent update strategy, HDUS). This maintains the joint distribution to a certain extent, particularly when the current sampling of parameters is not as good as the previous selection. It also ensures that good combinations once sampled are not lost, guaranteeing a monotonic decrease in χ^{2} statistics with iteration. We can choose to not recall the previously selected parameter combinations (History independent update strategy, HIUS). HIUS saves on memory but does not guarantee convergence when weak rejection conditions (high FAR values) are used (Figures S6 and S7).

We also look at relationships among the components of the selected parameter combinations, like correlation and linear regression. This helps infer data-dependent properties like practical non-identifiability and associated degree of compensation between the parameters, respectively. Such measures capture linear and local relationships conditioned on values of other model parameters. Since parameter estimation and higher-order relationships are derived from a sampling-based approach, we infer their significance using a corresponding analysis of the dummy parameter. Although the dummy parameter doesn’t affect model prediction and hence χ^{2}, we let ABC-FAR modify its distribution. Ideally, the distribution of the selected dummy parameter values should ideally mimic that of the sampled values. A deviation can help us identify computational artifacts and sampling biases. Higher-order relationships between a model parameter and the dummy parameter can help infer the statistical significance of relationships between model parameters.

### DoE for a three-gene repressilator system using PARSEC

A three-gene repressilator system (modeled by Elowitz and Leibler in 2000 [43]) involves the cyclic repression of gene expression, where one gene represses the expression of another. We use a simple ODE model that monitors the levels of the proteins expressed by these genes, with parameters representing the repression kinetic rates and thresholds. The protein levels show oscillatory dynamics.

In the analysis corresponding to Figure 4, we aim to identify a set of six synchronized measurements of protein B and C levels within 72 units of time (e.g. hours), for accurate parameter estimation. For this PARSEC calculates the parameter sensitivity indices (PSI) at a specific time point for levels of both proteins B and C. We concatenate the indices for proteins B and C to create a PARSEC-PSI vector for the measurement candidate at that time point. Based on the similarity of the PARSEC-PSI vectors, the measurement candidates are clustered via k-means clustering. We create six clusters, adhering to the constraint of having six measurements. From each cluster, we randomly select one representative candidate, which together forms the final design (PD_{BC}). We predict 100 designs and compare them using ABC-FAR to identify the most informative one. Design comparison is based on the error in parameter estimation derived from the data emulating the design. To test the performance of PARSEC, we generate random designs (RD_{BC} and RD_{AB}), non-specific designs (PD_{AB}), and anti-PARSEC designs (WD_{BC}). Each design constitutes six simultaneous measurements of the two variables (the protein levels). We generate a total of 100 designs for each mechanism, except for WD_{BC} where we generate only 40 designs.

Next, we study the robustness of PARSEC designs to uncertainty, characterized by a uniform distribution in log scale, over a four-fold range of values of two of the six model parameters (Figures 5a and b). PARSEC-PSI vector is constructed using PSI evaluated at five training values (Θ^{k}) sampling the uncertainty via Latin Hypercube Sampling. Design evaluations are done at these five training samples and at another set of four test values (sampled via LHS). Since we consider a large dynamical range of uncertainty, we allow for a longer time frame of 120 units. Finally, while tuning the sample size (Figures 5c and d), we ignore uncertainty considerations evaluating PARSEC-PSI vectors at a single parameter guess and limit the duration of experiments to 72 units of time. Since the sample size is altered, we accordingly change the number of clusters which is used to partition the PARSEC-PSI vectors.

### Computational evaluation of a design

We evaluate design evaluation at specific values of model parameters. At the parameter combination, we simulate the model behavior based on the design specifications and sample this behavior according to the designs of interest to generate the associated data set. ABC-FAR is used to recover parameter values from the data. ABC-FAR selected parameter values are compared to the parameter combination used to generate the data, to calculate the estimation error associated with the design.
Here denotes the value of the j^{th} parameter of the combination used to generate data, is the corresponding value of the m^{th} parameter combination selected by ABC-FAR. r and M indicate the number of free parameters to be estimated and the number of parameter combinations selected by ABC-FAR. In our implementation, ABC-FAR refines the posterior five times starting from a prior uniform distribution in log scale, spanning across two orders of magnitude. In each iteration, we sample 60,000 parameter combinations using a noisy version of the current estimate of marginal. We employ the History-dependent update strategy and a FAR value of 0.01. Furthermore, we add a uniform sampling noise according to the following expression,
where is the current estimate of marginal used to generate the sampling distribution , for the j^{th} free parameter (θ_{j}) in the k^{th} iteration. Note would be the prior or initial guess.

## Discussion

PARameter SEnsitivity Clustering-based algorithm (PARSEC) exploits the model structure and its parameter sensitivity to optimise the experimental design.First, even though the distribution of measurement candidates selected in PARSEC(k) is similar to that in the random designs (Figure 4c), the distribution of PARSEC-derived estimation errors is comparatively narrower and low (Figure 4a). This suggests that the degree of overlap between PARSEC-PSI vectors is more important to DoE than the individual characteristics of the measurements. Although we pick candidates randomly from each cluster, a systematic selection can further improve the efficiency of design search. Second, the parameter estimation error derived from designs decreases with a reduction in the overlap of the parameter sensitivity (PSI) vectors of measurements (comparison of PD_{BC} and WD_{BC}, Figure 4a). Third, the efficiency of PSI clustering correlates with the accuracy ratio of the designs, as we vary the sample size (Figure 5d). This highlights the advantage of clustering the parameter sensitivity profiles in design generation and proposes a good guess for the optimal sample size for the experiment.

The critical ingredients of the algorithm are the PARSEC-PSI vectors. Our approach of vectorially conjoining the PSIs to create the PARSEC-PSI vectors allows us to identify generalist designs to mitigate parameter uncertainty. Here we concatenate the PARSEC-PSI evaluated at various parameter combinations (sampling the uncertainty), to construct the high-dimensional PARSEC-PSI vectors used for subsequent clustering and design selection. The robustness of designs to parameter uncertainty is a desirable property distinguishing PARSEC from locally optimal MBDoE approaches like Fisher’s information Matrix-based methods. Depending on the extent of uncertainty, PARSEC can either plan a ‘robust generalist’ design or a ‘locally optimized’ design. Nevertheless, robustness and accuracy are negatively affected as uncertainty increases (Supplementary Figure S11).

The vector-conjunction approach also allows us to easily implement various design constraints. For example, in our current analysis, we enforced simultaneous measurements of the different variables. Alternately, suppose that measurements of different variables have to be acquired at a fixed offset; then the PSI of the variables would be concatenated together in an appropriately staggered manner, to construct the PARSEC-PSI vectors. In case such restrictions are not there, one can treat each of the individual variables as different measurement candidates, during clustering and design selection (see Supplementary text SI S2 and Supplementary Figure S1). Furthermore the type of measurement also influences design specifications: certain variables may be more convenient/cheaper to measure than others. PARSEC can be implemented to identify variable combinations for economical and informative designs. Other specifications, like the choice of initial conditions, can also be tuned to improve designs for parameter estimation.

Optimizing the design parameters and design selection requires design evaluation. PARSEC employs Approximate Bayesian Computation-Fixed Acceptance Rate (ABC-FAR) algorithm for automated and fair comparison of designs. This is possible as ABC-FAR considers a global and relative sampling-rejection criterion (FAR), bypassing data-dependent stipulations. Additionally, a global parameter sampling method incorporating simulated annealing is used to improve the robustness of estimation and avoid local minima and biases. We have validated the algorithm for accuracy, reliability, efficiency, and robustness to initial bias and measurement noise. The efficiency can be further improved by adapting the FAR value for each iteration based on the improvement of convergence seen in the previous step. It has wide applicability and is amenable to the incorporation of available system-specific knowledge to improve convergence accuracy and speed (Supplementary Figure S10). Furthermore, we derive data-specific correlations and dependencies among the estimated parameter values, which can be used to infer properties, like practical identifiability. Like some of the other ABC methods, ABC-FAR can also be adapted for model selection by treating the identity of the model as a free parameter.

Parameter estimation-based design comparison, the final step in PARSEC is computationally intensive, much like other Bayesian Model-Based Design of Experiments (MBDoEs). However, the tight distribution of low estimation errors in PARSEC(k) indicates that its predicted designs are generally informative. This suggests that a limited number of design space samplings through PARSEC(k) should suffice, thereby reducing computational costs.

n alternative approach to mitigate computational burdens involves the use of fuzzy clustering. When considering the two variations of PARSEC presented here, it’s worth noting that PARSEC(c) designs (fuzzy clustering) represent a subset of PARSEC(k) designs. PARSEC(c) is highly efficient at identifying good designs but may overlook the most informative ones. In contrast, PARSEC(k) conducts a more exhaustive search, aiming to identify the most informative designs.

In summary, PARSEC leverages mathematical models and simulations to guide experimental design, maximizing the information obtained about model parameters within constraints. It can be applied to characterize complex systems or investigate the effects of perturbations. For instance, PARSEC can identify standardized, informative, and cost-effective drug screening experiments, facilitating the quantification and comparison of the impacts of different drugs on model parameters.

## Author Contributions

Conceptualization, result interpretation and original draft preparation: HC and RR. Software and formal analysis: HC.

## Funding

This work was supported by the Indian Institute of Science Bangalore (RR), Wellcome Trust—DBT India Alliance intermediate fellowship (RR) and the Prime Minister Research Fellowship (HC).

## Competing interests

The authors declare that no competing interests exist.

## Supplementary Information

### SI S1 Constructing the PARSEC-PSI vector

Suppose *p*_{j} is the *j*^{th} of the ‘r’ model parameter being estimated and *V*_{1}, *V*_{2}, … *V*_{q} are the different variables whose levels can be measured. Parameter sensitivity is calculated only for the model parameters whose values are being estimated. To account for uncertainty we evaluate the PSI at multiple training samples (Θ^{1}, Θ^{2}, …, Θ^{k}) representing the associated distribution. Now suppose we have a constraint where the variable V_{i} must be measured at a time point that is offset by Δ_{i} time units after the measurement of variable V_{1}. So by definition Δ_{1} = 0. Furthermore, the case of simultaneous measurement of variables can be implemented by equating the offsets to zero, that is, Δ_{2} = Δ_{3} = … = Δ_{q} = 0.

Here we demonstrate how to construct the PARSEC-PSI vector for the general offset situation (top panel, SFigure S1). Here a measurement candidate is defined as the sequence of measurement of the q variables at their proper offset. Let the measurement time point be labeled using, t, the time point where V_{1} is measured. We have a PARSEC-PSI vector for each of the measurement time points. Parsing the top panel of Sfig from right: SI_{p}(V_{k}(t + Δ_{k})) |_{Θm} denotes the parameter sensitivity index of V_{k} at time t + Δ_{k} towards the p^{th} model parameters evaluated at Θ^{m}. These indices form the components of PSI[V_{k}(t + Δ_{k})] |_{Θm} representing the parameter sensitivity of the variable V_{k} at time t + Δ_{k} evaluated at parameter value Θ^{m}. PSI (V_{k}(t)) |_{Θm} is a r-dimensional vector. These vectors for different variables accounting for appropriate offsets are chained together to give PSI(t, Θ^{m}), the parameter sensitivity profile of the entire measurement candidate evaluated at Θ^{m}. Finally, PSI(t, Θ^{m}) for all the training samples (m = 1, 2, …, n) are conjoined to form the PARSEC-PSI(t), which will be used for clustering. The PARSEC-PSI vector is an (r.q.n)-dimensional vector, accounting for the ‘r’ free parameters, the ‘q’ variables constituting a measurement, and ‘n’ training samples that accommodate the uncertainty.

Alternatively, if the constraint of offset and collective measurement of the variable is relaxed, the construction of the PARSEC-PSI vectors is simpler (SFigure S1). In this flexible scheme, if and when a variable is measured is not constrained by the measurement of the others. Thus, a measurement candidate is characterized by which variable is being considered (variable label = k) and the time point of measurement (time label = t). Thus, the corresponding PARSEC-PSI vector, PARSEC-PSI(t, k), is a concatenation of ‘n’ PSI vectors - PSI(t, k, Θ^{1}), PSI(t, k, Θ^{2}), …, PSI(t, k, Θ^{n}) - where PSI(t, k, Θ^{m}) is the parameter sensitivity vector for the variable V_{k} at time t evaluated at training sample Θ^{m} (for m = 1, 2, … n). PSI(t, k, Θ^{m}) is a ‘r’ dimensional vector, whose p^{th} component is SI_{p}(V_{k}) |_{Θm} denoting the parameter sensitivity index of V_{k} at time t towards the p^{th} model parameters evaluated at Θ^{m}. So each of the PARSEC-PSI vectors, PARSEC-PSI(t, k), is an (r.n)-dimensional vector, accounting for the ‘r’ free parameters and ‘n’ training samples that accommodate the uncertainty.

### SI S2 ABC-FAR algorithm for model parameter estimation

We propose a likelihood-free, Approximate Bayesian computation-(ABC) based algorithm for fitting model and estimating corresponding parameter values (Figure 2a, Main text). We use Monte-Carlo sampling technique to avoid explicitly calculating the (typically intractable) likelihood as a function of parameter value. Instead we calculate the deviation between model prediction and data (denoted as χ^{2}), by simulating the model for sampled parameter combinations. We select a fixed fraction of the number of the sampled parameter combinations with the lowest χ^{2} value to update the current distribution. We denote the value of this fixed fraction as the Fixed Acceptance Rate or FAR. The process of sampling from the current distribution, selection of parameter combination and updating the current distribution, is repeated to narrow down the plausible parameter value range. This concentrates computations in the later iterations to narrower ranges where the parameter values are more likely to lie, making the algorithm efficient. In our analysis, we include an additional dummy parameter which doesn’t affect model prediction but undergoes the same algorithmic treatment as the model parameters being estimated. It is used to (a) detect artefacts, if any introduced by the algorithm, and (b) inform on significance of properties of estimated model parameters.

Here we describe the parameter sampling and acceptance modules of the algorithm. Suppose the model being fit, has k parameters. In each iteration, N realizations of parameter combination is sampled using the current estimate of the marginal of the k model parameters. For the first iteration, the current estimate of marginal would be the prior distribution (initial guess), whereas in subsequent iterations, the current estimate is the posterior estimated in the previous iteration. The algorithm employs Latin Hyper-Cube Sampling (LHS, [1]) to effectively scan the (potentially high-dimensional) parameter space. A noise, whose magnitude decreases with iteration, is added to the distribution each time before sampling. This helps in exploring the parameter space better in the earlier iterations without hindering the convergence, significantly, in the later ones, like in simulated annealing [2]. Adding a sampling noise helps avoid local minima in the earlier iterations and improve robustness of the estimation.

The model is simulated for each of the N parameter combinations sampled. Appropriate model predictions are compared to the data, to estimate χ^{2} statistics as a measure of the deviation. χ^{2} is the weighted sum of squared of difference between prediction and data, weighted by variance in measurement. Desirable model fits should closely match the data. We select the M (= N × FAR) parameter combinations corresponding to the lowest χ^{2} (best agreement between prediction and data) to update the estimate of distribution of the parameter values.

From the second iteration and onwards, we can choose to include previously selected parameter combinations, while sorting the N sampled combinations and updating the posterior in an iteration. This ‘history-dependent update strategy’ (HDUS) ensures that newly sampled parameter combinations are only accepted if they are better than those selected earlier. On the other hand, the ‘history-independent update strategy’ (HIUS) selects parameter combinations solely from the current sampling to update the posterior. Irrespective of the update strategy, in each iteration, N model parameter combinations are sampled and only M parameter combinations are selected. This maintains an acceptance rate of FAR = M/N, in each iteration. However the total number of sampling and thus computations increases every iteration. Thus the computation efficiency falls inversely with iteration and is characterized by the effective acceptance rate . In this article, we discuss how the choice of update strategy and FAR values affect efficiency and convergence of the algorithm, after we demonstrate the working of the algorithm.

Keeping the initial condition fixed, we employ the ABC-FAR algorithm, to fit the model to the data. We consider N = 10^{4} samplings and a FAR (fixed acceptance rate) of 2.5% to update the posterior, which is iterated eight times. We use the. Thus second iteration and onwards, the algorithm selects 250 (M = N×FAR = 250) best parameter combinations from a collection of ‘N (sampled in the current iteration) + M (selected in the previous iteration)’ combinations.

### SI S3 Description of the systems considered

#### Prey-predator system (Lotka-Votka model)

We look at the popular Lotka Volterra model, which is typically used to benchmark parameter estimation algorithms (as in [3]). The model tries to explain the population dynamics due to the interaction between prey and predator species. In the model, the population of prey (N) increase exponentially in absence of the predators (associated parameter: a). If predators are present, they interact with and consume the prey. This leads to a decrease in the prey population (associated parameter: b), but promotes the proliferation of predators (associated parameter: c). However, in absence of preys, the predator population (P) decays exponentially (associated parameter: δ), due to lack of food/resources.

**Lotka-Votka model**

#### Three-gene repressilator system

The system consists of three genes regulating each other’s expression. Expression of one gene produces a protein, that represses the expression of the other gene in cyclic manner.

**GRN model**

#### Coherent feed forward (type-1) loop - FFL-1

We look at a coherent feed-forward (type-1) network for gene regulation. Molecules of species S (population size - S) activates the production of protein A (population size - A). Presence of A induces the degradation of S. Protein A activates the production of a protein B (population size - B). Presence of both the proteins, A and B, promotes the production of another protein C (population size - C). Additionally we also consider a natural decay/dilution kinetics for the proteins A, B and C (rate parameter - δ).

**FFL-1 model**

### SI S4 Data generation for fitting

Data used for fitting were computationally generated by simulating the model (on MATLAB). We assume values for the model parameters (Ground truth value, GT) and initial conditions, and simulate the model. The relevant model predictions sampled at the time points of interest constitute the data.

To emulate measurement error, we pick data points from a distribution centered around the model prediction. For example, suppose the model prediction for variable, V at time t, is V (t) = V_{1}. A corresponding noisy data would be V_{N} (t) = V_{2} Normal[V_{1}, σ], where Normal[a, b] is a normal distribution with mean a and variance b^{2}.

We consider two dynamical (biological) systems to demonstrate various aspects of ABC-FAR, the parameter estimation algorithm developed. We consider (a) a two-species prey-predator system, and (b) a coherent feed forward (type-1) loop, described in the previous section.

For description of the model parameters refer to the discussion on corresponding models. GT^{#} refers to the ground truth values which was used to generate synthetic data.

### SI S5 Correlation-based statistics corresponding to selected parameter values

We posit that PnI(p, q) captures practical non-identifiability between the p^{th} and the q^{th} model parameters. The presence of correlation would indicate that changes in predictions of interest (in the context of data) due to perturbations in the p^{th} parameter can be compensated via appropriate changes in the q^{th} parameter, and vice-versa. For example, if PnI(p, q) is positive, then perturbation to predictions of interest due a decrease in value of p^{th} parameter would be buffered by a decrease in value of q^{th} parameter. Since this is Bayesian-based measure, we call it Bayesian Practical Identifiability. Note that the correlations would primarily capture local and linear effects.

Additionally we can also look at correlation between selected parameter values and corresponding χ^{2} values. Let ConvCorr(p) evaluates the correlation between the value of the p^{th} parameter in the selected combination and the corresponding χ^{2} value. A positive (negative) correlation suggests that among the selected values of the parameter, the ones with lower (higher) value correspond to predictions with higher (lower) deviation from data. Thus the presence of correlation indicate that further refinement in (the centrality measure of) the posterior is needed. Note that the correlation between χ^{2} and a model parameter is a local property, conditional on the other model parameter values being fixed. We can look at a more global measure - TCC. TCC denotes the total magnitude of correlation between selected parameter values and χ^{2} summed over all model parameter. If TCC decays to zero, then each term in the sum goes to zero, suggesting convergence to a minima for each of the parameters. Simultaneous local convergence in turn implies global convergence in (the centrality measure of) posterior estimation. Thus we posit the decay of TCC to zero to indicate convergence of model fitting. Since TCC based convergence is Bayesian computation based, we refer to it as a Bayesian correlation-convergence criterion.

We borrow the same notations as used in the main text. are the M parameter combinations selected based on their χ^{2} values. Let denote the value of the p^{th} model parameter in the i^{th} combination. Further let denote the χ^{2} statistics corresponding to the i^{th} parameter combination, . The correlative measures are evaluated as:

### SI S6 Efficiency of parameter estimation algorithm

We evaluate (a) minimum/maximum χ^{2} statistics, and (b) Effective Acceptance Rate (EAR), to fairly compare the performance of different schema of ABC-FAR with each other and with the reported performance of ABC-SMC. Minimum and maximum χ^{2} statistics looks at the minimum and maximum of the χ^{2} values corresponding to the parameter combinations selected at every iteration of the estimation procedure. Central tendencies, like mean or median, of the (selected) χ^{2} values can also be used. These χ^{2} statistics indicate the goodness of fit, or how well the model predictions mimic the data used for fitting. On the other hand, the Effective Acceptance Rate (EAR) looks at the (inverse of the) cumulative computational complexity. EAR is proportional to the total number of parameter combinations evaluated or the total number of times the model was simulated.
where N and M are the numbers of parameter combinations sampled and selected in an iteration respectively.

An algorithm or scheme leads to a good/accurate fit if it can identify parameter combinations corresponding to low χ^{2} values. Comparing schemes that reach a particular threshold of χ^{2} statistics, the one with higher EAR (that is, less computational cost) is considered more efficient than the other. Comparing the different schemes of our algorithm was straightforward as we use the same χ^{2} function and identical value of M (the number of parameter combinations selected in an iteration). However, we have to normalize for rejection criterion and differences in the χ^{2} function while comparing different algorithms. The ABC-SMC study reports their performance for a single iteration implementation along with the performance of an optimized realization of their algorithmic parameters. We consider the following normalization,
where is the normalization of (from ABC-SMC paper) based on S_{F AR} and S_{SMC} which are the χ^{2} statistics corresponding to single iteration implementation of ABC-FAR and ABC-SMC algorithms respectively. To normalize for different rejection criteria, we calculate S_{F AR} imposing a FAR value of 7 × 10^{−4}, so that the EAR corresponding to both S_{F AR} and S_{SMC} are same.

### SI S7 Compatibility of ABC-FAR to utilize system-structure to improve convergence

In the previous section we saw how algorithm-based strategies affect efficiency of parameter estimation. In some situations, we can also utilize system-specific knowledge to speed up and/or improve convergence, for example, when we have partial knowledge about parameter value distribution or the mathematical structure of the model allows us to define the fitting problem cleverly. We demonstrate the compatibility of our program to exploit such information to speed up and improve convergence. We demonstrate how we can do so for the FFL1 system, described in the previous section.

According to the model, the dynamics of levels of S and A are influenced only by each other. Thus we can focus on the sub-model and data associated with only their dynamics to estimate the associated parameters - log_{10}(k_{A}), log_{10}(S_{A}), log_{10}(A_{S}) and log_{10}(δ)). We label this sub-problem as ‘H1’, where we ignore the parts of the model associated with dynamics of B and C. Next we observe that dynamics of B is affected by levels of S and A, but not by the level of C. Thus we define sub-problem ‘H2’, where we augment the sub-problem H1 with the dynamics of B and the additional parameters associated with it - log_{10}(k_{B}) and log_{10}(A_{B})). The posterior estimated in H1, for the first four parameters can be used as prior in H2. After H2, we finally consider the entire model in sub-problem ‘H3’. For all the older parameters, we use the posterior estimated for the parameters in H2, as the prior for respective parameters in H2. We denote this strategy of defining sub-problems as Hierarchical calibration (hCal). We compare its efficiency against the one-shot calibration (oCal) strategy where we fit the entire model at a single go.

In this analysis, parameter values in log scale to explore a large dynamical range. Also, whenever a parameter is estimated for the first time in either of the calibration strategies, we use a uniform prior. So for hCal, we use a uniform prior for (a) log_{10}(k_{A}), log_{10}(S_{A}), log_{10}(A_{S}) and log_{10}(δ)) in H1, (b) log_{10}(k_{B}) and log_{10}(A_{B})) in H2, (c) log_{10}(k_{C}), log_{10}(A_{C})) and log_{10}(B_{C}) in H3; we use uniform priors for all the parameters in the oCal strategy. We use two iterations each for H1 and H2, four for H3, and eight for oCal; thus a fair comparison can only be made after the fifth iteration when the entire model is considered and χ^{2} functions used in both the strategies are identical. Other algorithm options are kept identical: FAR = 10%, HDUS, parallel execution. To compare efficiency we look at the cumulative execution of time (cTime) and χ^{2} values for selected parameter combinations.

We see that both the calibration strategies fits the model to data (SFigure S7), but hCal is more efficient and accurate compared to oCal. For example, fifth iteration onwards (a) the χ^{2} values corresponding to the selected parameters, and (b) the cumulative execution time, are lower for hCal compared to oCal. Actually, hCal outperforms oCal by almost an iteration in both these measures. cTime corresponding to the seventh iteration of hCal compares with that of the sixth iteration of oCal, whereas χ^{2} distribution corresponding to the seventh iteration of hCal is as good as that of the eighth iteration of oCal (SFigure S7). The better performance of hCal can be attributed to (a) fewer calculations required for simulation and χ^{2} evaluation in H1 and H2, (b) lower dimensionality of parameter space for H1 and H2, and (c) propagation of information about parameters from one sub-problem to next.

Our algorithm was conveniently adapted and easily set-up to execute the hCal strategy. We didn’t have to define different sets of parameter rejection threshold for the three sub-problems associated with hCal, a convenient advantage over the typically used ABC-based parameter estimation algorithms.

## Supplementary figures

## Acknowledgement

We thank Dr. Mohit Kumar Jolly for sharing computational resources for carrying out the various analyses. We also appreciate the valuable feedback provided by Thiruvickraman Jothiprakasam, Ramya Boddepalli, Paras Jain, Sumanta Mukerjee and Suraj Jagtap on the manuscript.

## References

## References

- [1].
- [2].
- [3].