Methods and considerations for estimating parameters in biophysically detailed neural models with simulation based inference

Biophysically detailed neural models are a powerful technique to study neural dynamics in health and disease with a growing number of established and openly available models. A major challenge in the use of such models is that parameter inference is an inherently difficult and unsolved problem. Identifying unique parameter distributions that can account for observed neural dynamics, and differences across experimental conditions, is essential to their meaningful use. Recently, simulation based inference (SBI) has been proposed as an approach to perform Bayesian inference to estimate parameters in detailed neural models. SBI overcomes the challenge of not having access to a likelihood function, which has severely limited inference methods in such models, by leveraging advances in deep learning to perform density estimation. While the substantial methodological advancements offered by SBI are promising, their use in large scale biophysically detailed models is challenging and methods for doing so have not been established, particularly when inferring parameters that can account for time series waveforms. We provide guidelines and considerations on how SBI can be applied to estimate time series waveforms in biophysically detailed neural models starting with a simplified example and extending to specific applications to common MEG/EEG waveforms using the the large scale neural modeling framework of the Human Neocortical Neurosolver. Specifically, we describe how to estimate and compare results from example oscillatory and event related potential simulations. We also describe how diagnostics can be used to assess the quality and uniqueness of the posterior estimates. The methods described provide a principled foundation to guide future applications of SBI in a wide variety of applications that use detailed models to study neural dynamics.

time series data. Our example uses a multi-scale model designed to connect human MEG/EEG recordings to the underlying cell and circuit level generators. Our approach allows for crucial insight into how cell-level properties interact to produce measured neural activity, and provides guidelines for diagnosing the quality of the estimate and uniqueness of predictions for different MEG/EEG biomarkers. Graphical Abstract. Summary of SBI workflow used to infer model parameters that can account for recorded neural dynamics. 1) A prior distribution of assumed relevant model parameters and ranges is constructed. 2) A dataset of simulated neural activity patterns is generated with parameters sampled from the prior distribution. 3) User defined summary statistics are chosen to describe waveform features of interest. 4) A specialized deep learning architecture is trained to learn the mapping from neural activity constrained by summary statistics to underlying model parameters. 5) Specific neural activity patterns of interest are fed into the trained neural network, which subsequently outputs a distribution over the potential underlying model parameters. 6) Parameter estimates for different waveforms can be compared through diagnostics like parameter recovery (if the ground truth is known), or posterior predictive checks.

Introduction 1
Biophysically detailed neural modeling is a fundamental and established framework to 2 study fast time scale neural dynamics [1,2]. While challenging to construct, advances in 3 computational resources have enabled the proliferation of detailed models from 4 principled models of single neurons [3] to large scale biophysically detailed neural 5 networks [4] that enable multi-scale interpretation from cell spiking to local field 6 potentials to macroscale magneto-and electroencephalographic (MEG/EEG) 7 signals [4][5][6][7]. Numerous detailed models are now openly distributed to encourage their 8 use and expansion [4,5,[7][8][9][10][11][12][13]. A common goal of detailed neural modeling is to infer 9 biophysical parameters in individual cells and/or network connections that can account 10 for observed changes in neural activity over time. Given the large-scale nature of any 11 detailed model, parameter inference is an inherently challenging and unsolved problem. 12 The difficulty of parameter inference is closely tied to the level of biophysical detail in 13 the model, as the number of parameters increases with more realistic models. In 14 practice, parameters can not all be estimated simultaneously, but rather model elements 15 are estimated in a serial fashion (e.g. cell dynamics followed network connectivity) and 16 fixed. Then, a limited set of parameters are chosen as the target for estimation. Even 17 with this limited set, the parameter estimation process is complex. The problem is 18 confounded by the fact that there may be many parameter configurations that produce 19 an equally good representation of the data [14]. Identifying unique biophysically 20 constrained parameter sets that can account for observed neural dynamics, and 21 differences across experimental conditions, is essential to the meaningful use of large 22 scale biophysically detailed models for neuroscience research. For example, if you want 23 to use a biophysically detailed model to infer circuit level mechanisms generating an 24 EEG waveform that is a biomarker of a healthy compared to neuropathological 25 condition, you need a way not only to estimate the parameter distributions that can 26 generate the waveforms but also to assess if the distributions are distinguishable. 27 A powerful approach to estimate parameters in neural models is Bayesian inference. 28 There is an extensive history of research applying Bayesian inference, and specifically the 29 algorithm of variational inference, to estimate parameters in reduced models of neural 30 activity, for example in the Dynamic Causal Modeling (DCM) [15] framework that relies 31 on reduced "neural mass models". However, while compatible with reduced models that 32 are mathematically tractable, the algorithm of variational inference is not compatible 33 with detailed biophysical models due to their computational complexity, and specifically 34 lack of access to a known likelihood function (see also Discussion). Additionally, mean 35 field variational inference assumes that model parameters are independent, preventing 36 the ability to capture biologically meaningful parameter interactions. In recent years, 37 simulation based inference (SBI) has been proposed as an alternative Bayesian inference 38 framework to estimate parameters in detailed neural models. SBI overcomes the 39 challenge of not having access to a likelihood function by leveraging advances in deep 40 learning to perform density estimation [16][17][18]. An advantage of SBI is that it only 41 relies on a dataset of simulations from the model being investigated, rather than the data. Fig 1 outlines the overall SBI workflow to estimate parameters and possible 53 parameter combinations that can reproduce recorded neural activity. 54 While the substantial methodological advancements offered by SBI are promising, 55 and have been applied to estimate parameters in small biophysically detailed neuron 56 models [18,21] and in models with reduced representations of neural dynamics [22,23], 57 there is currently little guidance on how these methods should be used with large-scale 58 biophysical models. Guidance is particularly lacking in the context of performing 59 inference on neural time series data, and in comparing estimates for data from different 60 experimental conditions. In this paper, we provide guidelines on how SBI can be 61 applied to estimate parameters underlying time series waveforms generated by 62 biophysically detailed neural models. We emphasize the importance of the first steps of 63 (i) identifying the parameters and ranges over which the inference process will be 64 performed (i.e. prior distribution), which necessarily depends on user-defined 65 hypotheses, and (ii) of selecting informed summary statistics of the waveform activity. 66 We also describe how diagnostics can be used to assess the uniqueness and quality of the 67 posterior estimates and to assess the overlap of distributions from estimation applied to 68 two different waveforms. These evaluation steps are particularly important to resolve 69 the uniqueness of distributions for two or more waveforms. 70 We begin with a simplified example of a non-linear resistor-capacitor circuit, and 71 then extend the results to an example large-scale biophysically detailed modeling 72 framework that was developed by our group to study the multi-scale neural origin of 73 human MEG/EEG signals, namely the Human Neocortical Neurosolver (HNN) [5]. The 74 foundation of HNN is a biophysically detailed model of a neocortical column, with layer 75 specific synaptic activation representing thalamocortical and cortico-cortical drive 76 (Fig 2). HNN has been applied to study the cell and circuit origin of commonly 77 measured MEG/EEG signals, including low frequency oscillations (e.g. Beta Events [24] 78 and event related potentials (ERPs) [25,26]), along with differences across experimental 79 conditions [25][26][27][28][29]. We demonstrate applications of SBI to estimate parameter 80 distributions that can account for variation in example hypothetical Beta Event and in 81 ERP waveforms with selected parameter priors (see Step 1 in Fig 1) based on our 82 previous studies [24][25][26]. We show that due to the model complexity some parameters 83 can be inferred uniquely while others are indeterminate. The methods described provide 84 a principled foundation to guide future applications of SBI in a wide variety of 85 applications that use detailed models to study neural dynamics.

87
Below we provide a summary of the primary techniques used in this work. Specific 88 aspects are emphasized to provide better context on the significance/motivation of 89 analyses performed. We invite readers to refer to the citations for a more thorough 90 treatment of each subject. In particular, the software publication detailing HNN [5], 91 and a study demonstrating the use of SBI on classical neural models [18]. We also detail 92 an RC circuit example, which is a building block of HNN-type models, and which offers 93 a SBI setup with time series inputs and the presence of indeterminacies.

101
Biophysical modeling of cortical activity underlying MEG/EEG signals was performed 102 using the Human Neocortical Neurosolver (HNN) [5]. The standard HNN model used in 103 this study and described in [26] is composed of 100 pyramidal neurons and 33 inhibitory 104 neurons in both layers 2/3 and 5 for a total of 266 neurons and represents a localized 105 small patch of neocortex (Fig 2). To accurately reproduce macroscale electrical signals, 106 HNN utilizes multi-compartment pyramidal neuron models [30], and synchronous 107 intracellular current flow of aligned layer 2 and 5 pyramidal neuron dendrites is assumed 108 to be the generator of the primary electrical current sources underlying recorded 109 extracranial MEG/EEG signals, due to their length and parallel alignment [31].

110
Inhibitory basket cells are modeled as single compartment point neurons given their 111 negligible impact in producing the recorded electrical currents, but are none-the-less 112 crucial to the local network dynamics (see [5] for further bakground). Pyramidal cells 113 in the model connect to other cells with both AMPA and NMDA synapses, while basket 114 cells produce GABAa and GABAb synaptic connections.  [5].
In addition to the local circuitry, HNN models extrinsic inputs to the column via 116 layer specific synaptic excitatory drives generated by predefined patterns of action 117 potentials presumed to come from exogenous brain areas (i.e., thalamus and other 118 cortical regions). In general these are referred to as proximal and distal drives, reflecting 119 the effective location of the synapses on the pyramidal neuron proximal and distal 120 dendrites. The proximal drive reflects so called feedforward inputs from lemniscal 121 thalamus, and the distal drive reflecting inputs from either non-lemniscal 122 thalamus [32,33], or "feedback" connections from other cortical regions. These proximal 123 and distal drives induce excitatory post-synaptic currents that drive current flow up and 124 down the aligned pyramidal cell dendrites (see red and green arrows in Fig 2)  When working with a posterior approximation Φ, it is useful to characterize its behavior 131 in different regions of the parameter space. For a given parameter configuration θ 0 and 132 simulated output x 0 ∼ p(x | θ 0 ), we quantify how concentrated Φ(θ | x 0 ) is around θ 0 .
We refer to this quantity as the parameter recovery error (PRE) and define it as the 134 Wasserstein distance between the k-th marginal of Φ(θ | x 0 ), and a Dirac delta centered 135 at θ 0 [k]. This can be empirically estimated as per where we generate N samples {θ 1 , . . . , θ N } ∼ Φ(θ | x 0 ) from our posterior 137 approximation conditioned at x 0 ∼ p(x | θ 0 ), which is an observation from the simulator 138 at ground truth θ 0 . Note that θ 0 is composed of k distinct values for each individual 139 parameter, therefore there are k distinct PRE values. Additionally, each parameter 140 θ 0 [k] is mapped from its range defined in the prior distribution, to the range (0,1).

141
Therefore, the maximum PRE is 1.0, indicating the worst possible recovery, whereas a 142 PRE of 0.0 indicates perfect recovery.

143
The previous diagnostic quantified how well Φ represents the relationship between x 0 144 and θ 0 in terms of the parameter space. Alternatively, we can assess the relationship in 145 the observation space. Specifically, given samples from our approximate posterior 146 θ i ∼ Φ(θ | x 0 ), we generate simulations x i ∼ p(x | θ i ), and assess how close x i is to the 147 conditioning observation x 0 . This is known as a posterior predictive check (PPC) [34,35] 148 and is quantified as the root mean squared error between x 0 and the x i as per Lastly, in many applications it is useful to characterize how well two distributions 150 can be distinguished from one another. To this end we introduce the distribution 151 overlap coefficient (OVL) [36]. Given our approximate posterior Φ, we define OVL as: 152 where x 0 and x 1 are two different observations whose posterior distributions we seek to 153 compare on the marginal distribution for the k-th parameter. To calculate the OVL k 154 numerically we used an evenly spaced grid of 50 d for a prior distribution over d ∈ N + 155 parameters.

157
Approach to applying SBI in biophysically detailed models that 158 simulate time series data 159 Recently SBI has been established as an approach to estimate parameters in detailed 160 biophysical models [18] that simulate time series data. This approach overcomes the 161 challenges of applying Bayesian inference in highly detailed non-linear models, namely 162 estimation of complex posterior distributions that exhibit parameter interactions, by 163 leveraging recent advances in likelihood-free inference and deep learning [17]. We begin 164 by reviewing the SBI process and providing the mathematical description of each of the 165 steps outlined in Fig 1. 166 The primary goal of SBI in the context of our manuscript is to estimate parameters 167 and possible parameter distributions that can account for an observed neural dynamic 168 (e.g. time series waveform). In mathematical terms, this goal is stated as follows. Given 169 April 17, 2023 6/30 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 17, 2023. an observation x and model with parameters θ, SBI seeks to create an approximation Bayes' rule specifies a closed form for the desired posterior distribution p(θ | x) as 172 being proportional to the likelihood p(x | θ) multiplied with the prior p(θ) [37].

173
Unfortunately, in detailed biophysical models the likelihood function p(x | θ), which 174 encodes the relationship between model parameters θ and outputs x, is often 175 analytically intractable but can be approximated from a large number of simulations.

176
The novelty of SBI is that it circumvents likelihood evaluations altogether, and instead 177 approximates the posterior distribution directly from a simulated dataset of model 178 outputs x i ∼ p(x | θ i ) with parameter values sampled from a user defined prior 179 distribution θ i ∼ p(θ). There are numerous approaches in the literature to achieve this 180 goal, each with their own unique considerations, benefits, and challenges [38]. In this 181 study, we use a deep learning architecture known as a conditional neural density 182 estimator (Φ), which is a function that takes an observation x as input, and returns a 183 probability density function defined over the parameter space. More specifically, the 184 conditional neural density estimator utilizes normalizing flows following standard 185 practices described in [17,18]. The detailed steps in applying SBI (Fig 1) are as follows. 186 Steps 1-2: Define prior and generate training data. SBI begins with the user 187 choosing the parameters of interest to be estimated, and the range and statistical    [23,38]. In practice, truly sufficient summary statistics are 238 rare, but they can still provide a close approximation to inferences achieved when using 239 the full observations x. A major advantage of hand-crafted features is that they can be 240 readily interpretable, and come associated with hypotheses on their physiologic  One of the most challenging aspects of likelihood free inference is that the models 258 studied typically do not permit access to a ground truth posterior distribution, over 259 which we can validate our inferences. To highlight this challenge and better understand 260 how decisions in the SBI pipeline impact the resulting approximate posterior, we will 261 first apply the SBI pipeline on a model where the ground truth posterior is known; 262 namely an RC circuit model.

263
The equation for the RC circuit simulations is as follows: where V (t) is the voltage response of an RC circuit to a current injection I e (t). In our 265 example we use a capacitance C = 6 F, resistance R = 1 Ω, and constant voltage source 266 E = 0 V [1].

267
April 17, 2023 8/30 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  The example RC circuit simulations described here were parameterized in a way that 268 will enable comparison to similar simulations in the biophyscially detailed simulations  was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 17, 2023. ; https://doi.org/10.1101/2023.04.17.537118 doi: bioRxiv preprint attempting to infer these parameters from the voltage response. In contrast, Fig 3A(top, 286 orange) shows an example with a non-zero latency ∆t ̸ = 0. Specifically, the first current 287 injection with I + = 0.3 mA is delivered at 80 ms, and the second current injection with 288 I − = 0.5 mA at 117.5 ms, therefore ∆t = 37.5 ms. The voltage response exhibits two 289 unique peaks due to the offset between the square pulses. Since there is only one 290 combination of I + and I − that can produce this waveform, their values can be inferred 291 exactly from the waveform. In other words, the amplitude parameters underlying the 292 voltage response can be inferred exactly only when ∆t ̸ = 0. Note that since we are 293 approximating the posterior distribution, even values close to ∆t ≈ 0 will still produce 294 an indeterminacy.

295
To visualize and interpret posterior distributions produced by SBI, we must first 296 draw a sufficient number of random samples from the distribution. Since posterior 297 distributions are often multidimensional, it is useful to plot the samples using a 298 "pair-plot" (Fig 3B). The diagonal of the pair-plot is used to visualize the univariate 299 (also known as marginal) distribution of each parameter, here using a kernel density (bottom) plots the inverse transformed PCA30 waveform to highlight that the summary 311 statistic retains almost identical information.

312
The results in Fig 3B show that the expected posterior distribution described above 313 can be recovered with SBI. As shown in Fig 3B(a-e), when conditioned on the voltage 314 response with ∆t = 0, any distribution involving I + or I − (blue) will exhibit an 315 indeterminacy (i.e., multiple recovered values along one dimension). The high 316 correlation between I + and I − of 0.998 (p ≈ 0) demonstrates that the indeterminacy is 317 characterized by a strong linear interaction between these parameters. Specifically, the 318 line in Fig 3B( In the previous example, we utilized PCA30 as a summary statistic to learn a low 326 dimensional representation of the voltage time series. PCA is a common choice for 327 dimensionality reduction, and has been used in historical MEG/EEG inference work 328 with only the first 3-4 principal components [15,42]. However, it is not guaranteed that 329 PCA, which aims to only capture variance, will retain the features that best allow SBI 330 to map waveforms to simulation parameters. An alternative approach is to leverage 331 domain-expertise to construct hand-crafted summary statistics specific to the model 332 and inference problem. Unfortunately, it cannot be known a priori which summary 333 statistic will allow SBI to perform best, necessitating quantitative diagnostics that allow 334 a systematic comparison. Here, we introduce two simple hand-crafted summary  The first hand-crafted summary statistic we defined is a four dimensional vector 339 including the amplitude and timing of the maximum and minimum peaks (Peak) of the 340 simulated voltage response (Fig 4A(iii)). It can be readily seen that these features 341 reflect the underlying simulation parameters. Upon visual inspection of the voltage 342 response with ∆t ̸ = 0 (orange), the height of the maximum and minimum peak directly 343 correspond to the parameters I + and I − , and the distance between these peaks 344 correspond to the latency parameter ∆t. We'll show below that Peak features permits 345 inference that is close to that achieved with PCA30 Fig 4A(i) and also PCA4 Fig 4A(ii). 346 The second hand-crafted summary statistic we defined was a four dimensional vector 347 April 17, 2023 11/30 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 17, 2023. ; https://doi.org/10.1101/2023.04.17.537118 doi: bioRxiv preprint including the band power (BandPower) of common frequency ranges used to study 348 neural oscillations (Fig 4A(iv)). Specifically, we considered the beta (13-30 Hz), 349 low-gamma , and high-gamma (50-80 Hz) ranges, as well as the aggregate 350 band power of the alpha and lower frequency ranges (0-13 Hz). As we'll show below, 351 this feature was intentionally selected as a cautionary example of a summary statistic 352 that is ill-suited for the inference problem, but has a basis in previous neuroscience and 353 Bayesian inference literature [43]. 354 Next, we describe two diagnostics that allow comparison of desirable properties of 355 the posterior distribution for different summary statistics, namely PRE and PPC (see 356 Fig 1 Step 6 and Materials and methods). In the results below, we calculated PRE 357 values over a grid, with 10 points for every parameter dimension, spanning the range of 358 the prior distribution (Fig 4C).

359
Taking inspiration from [44], we visualize the PRE as heatmaps over the parameter 360 grid (Fig 4C-D). One of the advantages of inspecting local posterior diagnostics ("local" 361 as in specific to the pair (x 0 , θ 0 )) is the ability to identify patterns. Fig 4C plots  compared to the rest of the parameter grid (Fig 4C(i-iii)). This is due to the 367 indeterminacy in I + as seen in the blue posterior samples of Fig 4B(i-iii). It is apparent, 368 however, that PCA30 values produce the lowest PRE values, even near a ∆t of zero. In 369 contrast, the BandPower summary statistic produces a posterior distribution with 370 complex indeterminacies for both observations. This results in high PRE values across 371 the entire parameter grid (Fig 4C(iv)), indicating that this summary statistic is not 372 effective at recovering the ground truth parameters.   These diagnostics demonstrate PCA30 performs the best for this inference problem. 391 Additionally, the local PRE analysis revealed differences between summary statistics 392 that were not apparent with the global diagnostics. It is important to note that neither 393 of these diagnostics quantify the closeness of the approximation Φ to the true posterior. 394 For instance, if there is an interaction between parameters of the model causing a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Building from the RC circuit, we now describe a nearly identical inference problem in 405 our large-scale biophysically detailed model constructed to study the neural mechanisms 406 of human MEG/EEG, HNN (Fig 5). As described further in Materials and methods,

407
HNN is a neocortical column model with multiple layers and biophysically detailed local 408 cell types. The local network receives exogenous excitatory synaptic input through layer 409 specific pathways that effectively synapse on the proximal and distal dendrites of the 410 pyramidal neurons, representing "feedforward" and "feedback" input from thalamus and 411 higher order cortical areas. These inputs are simulated with external "spikes" that 412 activate layer specific excitatory synapses (see Fig 2B-C, and reduced schematic in 413 Fig 5C). This synaptic activity induces current flow within the pyramidal neuron 414 dendrites, which is summed across the population to simulate a net current dipole that 415 is directly comparable to that measured with MEG/EEG. Several previous studies have 416 shown that patterns of activation of the local network through these pathways can 417 generate commonly measured MEG/EEG current dipole signals such as event related 418 potentials and low frequency brain rhythms, e.g., see [5]. To set up an inference problem that is comparable to our RC circuit example, we 420 begin by considering patterns of drive to the network that create subthreshold current 421 flow in the pyramidal neurons, effectively "disconnecting" the network, because local 422 synaptic interactions depend on local cell firing. Simulations with spiking dynamics will 423 be demonstrated in the following section. For simplicity, we first describe the 424 subthreshold current flow in the L5 pyramidal cells only (Fig 5C). Specifically, we ran 425 HNN simulations with single exogenous spikes that activate excitatory synapses on the 426 proximal and distal dendrites of L5 pyramidal cells. Synaptic excitation of distal 427 synapses generates current flow down the dendrites (e.g. see green arrow Fig 5C), and 428 excitation of proximal dendrites generates current flow up the dendrites (e.g. see red 429 arrow Fig 5C). A delay between these the time of the two driving spikes can create a 430 net current dipole signal that is analogous to that observed in the RC circuit for a 431 non-zero time delay between the applied currents (see Fig 5A, orange). Further, when 432 the delay between the spikes is zero (see Fig 5A, blue) an indeterminacy in parameter 433 estimation can occur, as described below.

434
With this set up, we applied SBI to infer parameters that mimic those of the RC 435 circuit example, using PCA30 as the chosen summary statistic to constrain the inference 436 problem. Namely, the strength of proximal and distal excitatory inputs, referred to as P 437 and D, parameterized as the maximal conductance at their respective synapses and the 438 latency ∆t between the two inputs . We kept the proximal input time fixed, and let the 439 distal input time vary with ∆t. The prior distribution over P and D was set to ensure 440 that all simulations remained subthreshold (see Table 1 for prior distribution ranges).

441
Similar to the RC circuit example, simulations with ∆t = 0 produce a current dipole 442 with a reduced amplitude (Fig 5A(blue)), whereas ∆t ̸ = 0 produces a clear positive and 443 negative peak (orange). As shown in Fig 5B(a-e, blue), when conditioned with ∆t = 0, 444 any posterior distribution involving P and D will exhibit an indeterminacy. Intuitively, 445 this indicates that the proximal and distal inputs can compensate within a small range 446 to produce similar current dipole waveforms. Unlike the RC circuit, this interaction 447 does not span the full range of input strengths, and instead is more tightly concentrated 448 around the ground truth (Fig 5B(a,c,f)

449
Once again, we show that the choice of summary statistics impact the learned 450 posterior distribution approximation, and that diagnostics can be used to evaluate the 451 quality of the parameter estimation (Fig 6A-B). When ∆t ̸ = 0, both PCA30 and 452 PCA4 produced a posterior that is localized around the ground truth (Fig 6B(i-ii), 453 orange). When ∆t = 0, the posterior was still concentrated but with a slight 454 indeterminacy (Fig 6B(i-ii, blue)) that was less prominent than the analogous simulation 455 in the RC circuit (Fig 4B(i-ii, blue)). The Peak summary statistic produced a posterior 456 that is well clustered around the ground truth for ∆t ̸ = 0 (Fig 6B(iii, orange)), but for 457 ∆t = 0 exhibited a much more striking indeterminacy (Fig 6B(iii, blue)). In contrast, 458 BandPower was insufficient for ground truth recovery for both the ∆t ̸ = 0 and ∆t = 0 as 459 was the case for the RC circuit simulations (Fig 6B(iv)). Interestingly, the 460 indeterminacy for the ∆t = 0 waveform with BandPower features (Fig 6B(iv)) is distinct 461 from the RC circuit (Fig 4B(iv)) and exhibits a clear linear interaction. 462 We quantified desirable properties of the posterior estimates, with the local PRE 463 and PPC analysis described above. At ∆t ≈ 0, PRE values were large when using 464 BandPower features (Fig 6C(iv)), relatively smaller for PCA4 and Peak (Fig 6C(ii-iii)), 465 and almost completely disappears for PCA30 (Fig 6C(i)). The PPC values for the 466 BandPower show that inference using this summary statistic produced results that were 467 highly dissimilar to the conditioning observations (Fig 6D(iv)), while PCA30, PCA4, and 468 Peak exhibit a much lower PPC values (Fig 6D(i-iii)). Local PRE and PPC analysis 469 largely agrees with the summary statistic performance captured by the RC circuit PPC 470 heatmaps (Fig 4C-D).  was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  [24,28,45]. Many studies have shown that Beta Events occur throughout the 491 brain (e.g. see [46]) and that they often have a stereotypical waveform shape that 492 resembles an inverted Ricker wavelet lasting ∼150 ms [24,45,47], see Fig 7A. Changes in 493 transient beta activity have been associated with sensory processing and motor 494 action [45,48,49], and the Beta Event waveform shape has been specifically associated 495 with motor pathology in Parkinson's disease and with the aging process [47,50]. HNN 496 provides potential mechanistic explanations for how changes in waveform shape may 497 emerge. The HNN derived novel Beta Event mechanism put forth by Sherman et al.
498 [24] showed that Beta Events can arise from the dendritic integration of coincident 499 bursts of subthreshold proximal and distal dendritic excitatory synaptic inputs to 500 cortical pyramidal neurons [24,28], such that the distal drive is effectively stronger and 501 lasts one beta period (∼50 ms); a prediction that was supported by invasive laminar 502 recordings in mice and monkeys [24] and high-resolution MEG in humans [45]. More  down (see Fig 7A).

514
Previous work also showed that the amplitude of the prominent middle trough 515 depended on the variance of the distal drive, such that a parametric lowering of the 516 variance pushed more current flow down the dendrites generating an increased 517 amplitude and sharper peak [24]. This prior study did not perform an automated 518 parameter inference, but rather the results were based on hand tuning and parameter 519 sweeps. Extending these prior results, the SBI methods can be applied to HNN to infer 520 distributions of proximal and distal drive variance that can account for different 521 waveform shapes, and the same diagnostics used above can be used to assess the quality 522 of the estimates. Note that the waveforms analyzed below are simulated Beta Events 523 using parameters from previous HNN studies, and not real recorded data. 524 We defined a prior distribution over proximal and distal input variance (P σ 2 , D σ 2 ) 525 using the same parameters for Beta Event generation as in [24], see Table 1. All other 526 parameters, including the number of spikes, mean input time, and synaptic 527 conductances were all held fixed as in [24]. The choice emphasizes the need for an a 528 priori hypothesis based on domain expertise to constrain the prior distribution to a 529 tractable subset of all the possible model parameters. We used PCA30 as the summary 530 statistic motivated from the results above. We ran the HNN-SBI workflow to obtain a 531 posterior distribution approximation that allows us to infer P σ 2 and D σ 2 for a given 532 waveform (Fig 7C). Example waveforms consisting of large (blue) and small (orange) was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 17, 2023. ; whereas when comparing the D σ 2 distributions they exhibit almost no overlap with an 547 OVL of 2e-10. 548 We should note that unlike the simulations from the previous sections, where all 549 parameters were deterministic, the exogenous input times considered here are stochastic. 550 As a result, there is not a 1:1 mapping from parameters to simulation output. To where simulations from each posterior are close to the conditioning observation, but not 554 a perfect match. In this setting, the stochasticity in the simulations make it so that 555 measures such as PPC's are not guaranteed to be zero even with a perfect 556 approximation of the ground truth posterior distribution. We additionally observe a 557 pattern in the local PRE heatmap of D σ 2 , indicating that this parameter is accurately 558 recovered from simulations generated with a small D σ 2 (Fig 7D, dark colors   In summary, the Beta Event example demonstrates how SBI can be used to estimate 561 parameter distributions for given time series waveforms, and to compare potential 562 mechanisms underlying different waveforms. For the hypothetical example comparison 563 shown, the variance of the distal drive could be uniquely inferred while the variance of 564 proximal drive could not. Further, parameter estimates are more accurately recovered 565 for simulations with small distal variance. In the Beta Event example above, the effective strength of the proximal and distal input 569 were maintained in a range where the activity of the cells remained subthreshold, which 570 naturally limits the dynamic range of the simulation. Next, we consider a more complex 571 example, in which the cells are driven to a suprathreshold spiking regime and show that 572 this additional complexity can lead to parameter estimation indeterminacy that 573 indicates a compensatory interaction between parameters. Importantly, such parameter 574 interactions would not be revealed with other estimation methods such as mean field 575 variational inference given their assumption of Gaussian and independent parameter 576 distributions (i.e. the Laplace approximation).

577
The example considered describes simulations of a sensory evoked response or event 578 related potential (ERP). HNN has been applied to study source localized ERPs in 579 primary somatosensory cortex from tactile stimuli (e.g. [25,26]) and in primary 580 auditory cortex from auditory stimuli (e.g. [27]). In both cases, ERPs were simulated 581 with a sequence of exogenous input that represented an evoked volley of drive to the 582 local circuit that occurs after a sensory stimulus. The drive sequence consisted of a 583 proximal drive representing the initial feedforward input from the thalamus, followed by 584 a distal drive representing feedback input from higher order cortex, followed by a second 585 proximal drive representing a loop of re-emergent thalamocortical drive (see schematic 586 red and green arrows in Fig 8A. These drives were strong enough to generate spiking 587 interactions in the local network and induced current flow up and down the pyramidal 588 neuron dendrites to generate a current dipole ERP waveform analogous to those 589 experimentally recorded (Fig 8A). Note, here we are not examining recorded data, but 590 only example simulations. The specific timing of this exogenous drive sequence for 591 example simulations is shown with arrows in Fig 8B. The parameters regulating the 592 timing and the strength these drives were fixed to the same values for the different 593 conditions considered. 594 in [25], HNN was applied to infer the circuit mechanisms underlying differences in ERP 597 waveforms emerging from perceptual threshold tactile stimulation (namely brief finger 598 taps that were detected 50% of the time) that were reported as detected (felt) or 599 not-detected (not felt). In that study, earlier and larger amplitude ERP peaks on 600 detected trials could be reproduced with decreases in the timing and increases in the 601 strength of the exogenous inputs. Here, we are not trying to reproduce any empirical 602 findings but rather apply SBI to examine the influence of changes in local network 603 connectivity on the ERP waveform as a proof of concept example that examines a small 604 subset of parameters distinct from our prior investigation. Our results lay the 605 foundation for future application using empirical data. 606 We start by simulating example ERPs with different peak amplitudes as show in in 607 Fig 8B as follows. ERPs were generated using a sequence of exogenous 608 proximal-distal-proximal inputs (Fig 8A and as described above). The parameters 609 representing the timing and strength of the sequence of exogenous proximal and distal 610 input to the local circuit were chosen to be those distributed with the HNN software 611 representing an example tactile evoked response simulation and fixed to those values 612 (see Table 1). We then defined a prior distribution over parameters that define a small 613 subset of the local network excitatory and inhibitory connectivity Fig 2A. These Schematic of ERP simulations in HNN. Evoked activity is driven by a fixed sequence of proximal-distal-proximal exogenous inputs. SBI is used to infer the maximal conductance strength (ḡ) of local excitatory/inhibitory connections to the proximal/distal dendrites of L5 pyramidal neurons for example waveforms. B: Exemplar simulated ERPs (blue and orange solid lines) with differing local connectivity strengths chosen from a defined prior distribution (described in the text) are shown, along with the fixed timing of the sequence of exogenous inputs for each simulation (red and green arrows). C: Spike raster plots of cell specific firing for the two ERP simulation conditions from panel B. D: Posterior distributions over local connectivity parameters alongside ground truth parameters (stars on diagonal) for conditioning observations. A strong interaction between excitatory/inhibitory distal inputs (E L2 and I L2 ) is visible in the lower square. Overlap coefficients (OVL) quantifying the separability of the marginal posterior distributions conditioned on each waveform are shown on the diagonal for the corresponding parameters. E: Local parameter recovery error (PRE) for distal inhibition I L2 indicates errors are higher for observations generated with strong excitatory E L2 and weak inhibitory I L2 distal connections. apparent difference in L5 pyramidal neuron spiking, as well as the sustained negativity 649 (blue) and positivity (orange) observed in the current dipole due to the complex 650 dynamics that each parameter configuration creates.

651
To quantify the separability of these distributions, we calculated the OVL coefficient 652 for the marginal distributions of all parameters (Fig 8D diagonal). Estimated parameter 653 distributions of the synapses on the layer 5 distal dendrites, E L2 and I L2 , exhibit a were also separable, albeit with slightly more overlap, and a marked interaction between 667 these two parameters. parameters that can account for a given time series waveform can be inferred using an 679 integrated HNN-SBI workflow applied to two common MEG/EEG signal motifs (Beta 680 Events and ERPs) and how to assess overlap of the distributions from two different 681 waveforms. This work does not aim to be an exhaustive guide to inference in HNN, nor 682 to focus on specific neuroscientific questions, but instead to provide useful examples and 683 methods to highlight critical decisions in the inference pipeline. There are several major 684 takeaways from our study. First, highly non-linear biophysically detailed neural models 685 like HNN are not suitable for Bayesian estimation methods that require access to a 686 likelihood function (e.g. variational inference) or that approximate posterior 687 distributions with independent Gaussians (a.k.a Laplace approximation). Rather they 688 necessitate a method that can estimate complex posterior distributions and parameter 689 interactions from a simulated dataset (e.g. SBI with masked autoregressive flows).

690
Second, an important initial step in the SBI process is to identify a limited set of 691 parameters and a range of values for those parameters that are assumed to be able to 692 generate the waveform of interest and variation around it (i.e. the prior distribution).

693
Due to the large-scale nature of biophysically detailed network models, it is not possible 694 to perform SBI on all parameters at once. The choice of the prior distribution 695 represents a hypothesis about parameters that are assumed to contribute to variation in 696 the waveform. This hypothesis can be informed by domain knowledge of the question of 697 interest. In the HNN examples shown, the hypothesized parameters of interest for 698 estimation were the strength of the proximal and distal excitatory synaptic drive for the 699 Beta event simulation, and local excitatory and inhibitory connectivity for the ERP 700 simulation; these parameters were chosen only for didactic purposes. All other 701 parameters were fixed based on previous studies. Third, posterior diagnostics like PRE 702 and PPC, are valuable tools to guide decisions in the inference pipeline, e.g. optimal 703 summary statistic selection, and OVL can be used to assess the uniqueness of estimated 704 distribution for two different waveforms. Fourth, when estimating parameters that such as PCA are the most effective at retaining essential information for mapping 707 recorded signals to underlying parameters. While hand-crafted summary statistics, such 708 as peak latency or amplitude, can permit an accurate mapping for certain waveform 709 features, their selection may be insufficient to identify unique parameters distributions. 710 Comparison with inference in MEG/EEG neural modeling 711 frameworks that rely on dynamic causal modeling 712 To our knowledge, while there are several modeling frameworks for simulating 713 MEG/EEG signals, the other frameworks that use likelihood-based Bayesian inference 714 to estimate parameters fall in the category of Dynamic Causal Modeling (DCM) [15,51]. 715 It is important to emphasize that while the HNN-SBI framework conceptually overlaps 716 with DCM, they are two fundamentally distinct techniques which address different 717 questions. At its base, DCM combines variational inference, a computationally efficient 718 Bayesian inference algorithm, with neural mass models, as well as an observation model 719 which translates simulated activity to experimental measures (e.g., MEG/EEG). Neural 720 mass models refer to a specific class of neural models where a single variable represents 721 the aggregate activity of large neural populations (e.g. population spike rates). The 722 inferred parameters in the DCM framework most often represent the coupling strength 723 between distinct neural masses (e.g. population nodes). By making simplifying 724 anatomical and physiologic assumptions, neural mass models in the DCM framework 725 can be employed in a large variety of inference problems due to their computational 726 efficiency. However, their ability to make precise biophysical predictions on cellular and 727 local circuit level processes is limited as the parameters are an abstraction representing 728 population level activity [51]. For example, DCM employs the mean-field assumption, 729 meaning that biologically important compensatory interactions such as 730 excitatory/inhibitory (E/I) balance cannot be directly characterized in terms of 731 synaptic conductance. Further, this means that DCM is not capable of representing the 732 parameter indeterminacies with interactions that we showed can occur in the HNN-SBI 733 framework. There are, however, advantages of DCM over the HNN-SBI framework that 734 access to a known likelihood function and other simplifying assumptions allow. For 735 example, one critical question that the HNN-SBI framework is currently not suited to 736 address is inference with multiple spatially separated cortical sources. While 737 theoretically possible, the high computational demands of HNN-SBI severely limit the 738 ability to explore multi-area interactions, and highlight the importance of using neural 739 mass models and DCM in the analysis of whole-brain neuroimaging data. Recent work 740 has shown that neural mass models can also be integrated with the SBI-framework for 741 whole-brain studies [22], highlighting the adaptability of the SBI methodology to a wide 742 variety of neural models. The simulation process outlined here extends prior work using SBI to estimate 746 parameters in detailed neural models. Prior work applying SBI to neural models has 747 included a single compartment Hodgkin-Huxley model, and the stomatogastric ganglion 748 model [18], both of which include an extensive parameter set, but contain significantly 749 less detail and are smaller scale models than HNN. Additionally, non-amortized 750 inference was performed in these models using sequential neural posterior estimation, 751 allowing a much larger parameter set to be inferred, but only for a specific single 752 observation. In contrast, we omit the use of sequential methods to perform inference on 753 multiple observations using the same trained neural density estimator, but at the expense of the number of parameters that can be inferred simultaneously. The SBI 755 workflow applied here used the neural density estimation technique known as masked 756 autoregressive flows. There are currently a large number of neural density estimation 757 techniques beyond this choice, each offering distinct advantages such as sample 758 efficiency, expressivity, and likelihood evaluation [16]. While this field is rapidly 759 evolving, recent concerns have been raised about the limits of such tools in the domain 760 of Bayesian inference for scientific discovery [39,52]. Unfortunately, there currently exist 761 very few techniques for the validation of posterior distributions learned through neural 762 density estimation beyond PPC and PRE diagnostics shown here. One promising work 763 is simulation-based calibration [22,53], which plays a similar role as PPC's by SBI over other estimation techniques, for example COBYLA estimation, which has also 775 been applied in HNN [5]. The main distinguishing factor is that SBI makes use of every 776 simulation to build an accurate approximation of the posterior distribution for many 777 waveforms. In contrast, COBYLA uses simulations to iteratively search for an 778 optimized parameter set for a single waveform. Once trained, the neural density 779 estimator in the SBI framework can be applied again on new time series waveforms 780 (that fall within the prior distribution) without retraining. As shown in the results, the 781 posterior distribution is an object with several utilities. We emphasize the mapping 782 between observations and ground truth parameters in this paper, but there are 783 alternative uses such as parameter optimization via non-amortized inference [17], as well 784 as building a more basic understanding of the model itself. Further, significant research 785 efforts currently underway have the potential to improve the computational cost of 786 parameter estimation in biophysical neuron models enhancing the accessibility.

787
Other important future directions 788 In this study, we showed that PCA is the appropriate choice when compared to simple 789 hand-crafted summary statistics when performing SBI on time series waveforms.

790
However, PCA is constrained to preserve high-variance features, when in fact 791 low-variance features may also be critical for identifying certain parameters. An 792 important line of future work is the improvement of methods to learn summary statistics 793 from neurophysiological signals that can help identify features of the signal that are 794 essential for accurate parameter estimation. A promising development in this domain is 795 the use of embedding networks that are trained to estimate summary statistics 796 simultaneously with the neural density estimator used to approximate the posterior 797 distribution that can account for those summary statistics [17,23,40,41]. Currently, it is 798 unclear if existing methods to train these embedding networks coupled to neural density 799 estimators are sufficient and require further analysis. Our HNN-SBI examples focused 800 on making inferences by constraining only to one output of the model; namely simulated 801 current dipole waveforms. However, due to the multi-scale nature of the HNN model 802 there are many other model outputs that could help constrain the inference problem, 803 such as cell spiking profiles, and/or local field potential signals. A major advantage of 804 the Bayesian framework is the ability to flexibly integrate multiple features into the parameter estimation. If additional multiscale data is known, properties of this data can 806 provide further summary statistics over which the inference problem can be constrained. 807

808
Using detailed neural models in a Bayesian framework is the product of significant 809 developments in machine learning, biophysical modeling, and high-performance 810 computing that have evolved largely independently. Our results demonstrate that 811 large-scale biophysically detailed models, like HNN, are now amenable to Bayesian 812 methods via the SBI framework, an approach that has not been feasible in the past. 813 However, this novel combination produces new conceptual and technical challenges that 814 must be addressed to effectively use these techniques. While in this work we provide 815 guidelines for addressing such challenges, more research in the domains of neural 816 modeling and likelihood-free inference are needed. It is apparent that the combination 817 of HNN with SBI is a step forward for making mechanistic inferences underlying 818 MEG/EEG biomarkers, with the potential to provide novel circuit-level predictions on 819 disease and neural function. Our results lay the foundation for similar integration of 820 SBI into the growing number of biophysically detailed neural modeling frameworks to 821 advance neuroscience discovery.  was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 17, 2023. ;