## Abstract

Mechanistic modeling in neuroscience aims to explain neural or behavioral phenomena in terms of underlying causes. A central challenge in building mechanistic models is to identify which models and parameters can achieve an agreement between the model and experimental data. The complexity of models and data characterizing neural systems makes it infeasible to solve model equations analytically or tune parameters manually. To overcome this limitation, we present a machine learning tool that uses density estimators based on deep neural networks—trained using model simulations—to infer data-compatible parameters for a wide range of mechanistic models. Our tool identifies all parameters consistent with data, is scalable both in the number of parameters and data features, and does not require writing new code when the underlying model is changed. It can be used to analyze new data rapidly after training, and can be applied to either raw data or selected data features. We demonstrate our approach for parameter inference on ion channels, receptive fields, and Hodgkin–Huxley models. Finally, we use it to explore the space of parameters which give rise to the same rhythmic activity in a network model of the crustacean stomatogastric ganglion and to search for potential compensation mechanisms. The approach presented here will help close the gap between data-driven and theory-driven models of neural dynamics.

## Introduction

New experimental technologies allow us to observe neurons, networks, brain regions and entire systems at un-precedented scale and resolution, but using these data to understand how behavior arises from neural processes remains a challenge. To test our understanding of a phenomenon, we often take to rebuilding it in the form of a computational model that incorporates the mechanisms we believe to be at play, based on scientific knowledge, intuition, and hypotheses about the components of a system and the laws governing their relationships. The goal of such mechanistic models is to investigate whether a proposed mechanism can explain experimental data, uncover details that may have been missed, inspire new experiments, and eventually provide insights into the inner workings of an observed neural or behavioral phenomenon [1–4]. Examples for such a symbiotic relationship between model and experiments range from the now classical work of Hodgkin and Huxley [5], to population models investigating rules of connectivity, plasticity and network dynamics [6–10], network models of inter-area interactions [11, 12], and models of decision making [13, 14].

A crucial step in building a model is adjusting its free parameters to be consistent with experimental observations. This is essential both for investigating whether the model agrees with reality and for gaining insight into processes which cannot be measured experimentally. For some models in neuroscience, it is possible to identify the relevant parameter regimes from careful mathematical analysis of the model equations. But as the complexity of both neural data and neural models increases, it becomes very difficult to find well-fitted parameters by inspection, and *automated* identification of data-consistent parameters is required.

Furthermore, to understand how a model quantitatively explains data, it is necessary to find not only the *best*, but *all* parameter settings consistent with experimental observations. This is especially important when modeling neural data, where highly variable observations can lead to broad ranges of data-consistent parameters. Elucidating these ranges can reveal which combinations of parameters are well-constrained by data, and helps us design further experiments to be maximally informative [15]. Moreover, many models in biology are inherently robust to some perturbations of parameters, but highly sensitive to others [3, 16], e.g. because of processes such as homeostastic regulation. For these systems, identifying the full range of data-consistent parameters can reveal how multiple distinct parameter settings give rise to the same model behavior [7, 17, 18]. Yet despite the clear benefits of mechanistic models in providing scientific insight, identifying their parameters given data remains a challenging open problem that demands new algorithmic strategies.

The gold standard for automated parameter identification is *statistical inference*, which uses the likelihood *p*(**x**|**θ**) to quantify the match between parameters ** θ** and data

**x**. Likelihoods can be derived for purely statistical models commonly used in neuroscience [19–25], but are unavailable for most mechanistic models. Mechanistic models are designed to reflect knowledge about biological mechanisms, and not necessarily to be amenable to efficient inference: Many mechanistic models are defined implicitly through stochastic computer simulations (e.g. a simulation of a network of spiking neurons), and likelihood calculation would require the ability to integrate over all potential paths through the simulator code. Similarly, a common goal of mechanistic modeling is to capture selected summary features of the data (e.g. a certain firing rate, bursting behavior, etc…),

*not*the full dataset in all its details. The same feature (such as a particular average firing rate) can be produced by infinitely many realizations of the simulated process (such as a time-series of membrane potential). This makes it impractical to compute likelihoods, as one would have to average over all possible realizations which produce the same output.

Since the toolkit ofstatistical inference is inaccessible for mechanistic models, parameters are typically tuned ad-hoc (often through laborious, and subjective, trial-and-error), or by computationally expensive parameter search: A large set of models is generated, and grid search [26–28], genetic algorithms [29–32], or Approximate Bayesian Computation (ABC) [33–35] are used to filter out models whose simulations do not match the data. Parameter search methods require the user to define a heuristic rejection criterion to decide which simulations to keep, and typically end up discarding most simulations. They struggle when models have many parameters or data features, cannot cope with large datasets or high-throughput applications, and (except for ABC) yield only a single *best-fitting model*, rather than the full range of data-compatible models. Thus, computational neuroscientists face a dilemma: Either create carefully designed, highly interpretable mechanistic models (but rely on ad-hoc parameter tuning), or resort to purely statistical models offering sophisticated parameter inference but limited mechanistic insight.

Here we propose a new approach using machine learning to combine the advantages of mechanistic and statistical modeling. We present SNPE (Sequential Neural Posterior Estimation), a tool that rapidly identifies all mechanistic model parameters consistent with observed experimental data (or summary features). SNPE builds on recent advances in simulation-based Bayesian inference [36–39]: Given observed experimental data (or summary features) **x**_{o}, anda mechanistic model with parameters ** θ**, it expresses both prior knowledge and therange of data-compatible parameters through probability distributions. It returns a posterior distribution

*p*(

**|**

*θ***x**

_{o}) which is high for parameters

**consistent with both the data**

*θ***x**

_{o}and prior knowledge but approaches zero for

**inconsistent with either (Fig. 1).**

*θ*Similar to parameter search methods, SNPE uses simulations instead of likelihood calculations, but instead of filtering out simulations, it uses *all* simulations to train a multi-layer artificial neural network to identify admissible parameters (Fig. 1). By incorporating modern deep neural networks for conditional density estimation [40, 41], it can capture the full *distribution* of parameters consistent with the data, even when this distribution has multiple peaks or lies on curved manifolds. Critically, SNPE decouples the design of the model and design of the inference approach, giving the investigator maximal flexibility to design and modify mechanistic models. Our method makes minimal assumptions about the model or its implementation, and can e.g. also be applied to non-differentiable models, such as networks of spiking neurons. Its only requirement is that one can run model simulations for different parameters, and collect the resulting synthetic data or summary features of interest.

We test SNPE using mechanistic models expressing key neuroscientific concepts. Beginning with a simple neural encoding problem with a known solution, we progress to more complex data types, large datasets and many-parameter models inaccessible to previous methods. We estimate visual receptive fields using many data features, demonstrate rapid inference of ion channel properties from high-throughput voltage-clamp protocols, and show how Hodgkin—Huxley models are more tightly constrained by increasing numbers of data features. Finally, we explore how multiple network models can explain the activity in the stomatogastric ganglion [7], and provide hypotheses for which compensation mechanisms might be at play.

Concurrently with our work, Bittner and colleagues [42] developed an alternative approach to parameter identification for mechanistic models, and showed how it can be used to characterize neural population models which exhibit specific emergent computational properties. Both studies differ in their methodology and domain of applicability (see descriptions of underlying algorithms in our [37, 38] and their [43] prior work), as well in the focus of their neuroscientific contributions, but they share the overall goal of using deep probabilistic inference tools to build more interpretable models of neural data. These complementary and concurrent advances will expedite the cycle of building, adjusting and selecting mechanistic models in neuroscience.

## Results

### Estimating stimulus-selectivity in linear-nonlinear encoding models

We first illustrate SNPE on linear-nonlinear (LN) encoding models, a special case of generalized linear models (GLMs). These are simple, commonly used phenomenological models for which likelihood-based parameter estimation is feasible [44–49], and which can be used to validate the accuracy of our approach. We will show that SNPE returns the correct posterior distribution over parameters, that it can cope with high-dimensional observation data and that it can can recover multiple solutions to parameter inference problems.

An LN model describes how a neuron’s firing rate is modulated by a sensory stimulus through a linear filter ** θ**, often referred to as the

*receptive field*[50, 51]. We first considered a model of a retinal ganglion cell (RGC) driven by full-field flicker (Fig. 2A). A statistic that is often used to characterize such a neuron is the

*spike-triggered average*(STA) (Fig. 2A, right). We therefore used the STA, as well as the firing rate of the neuron, as input

**x**

_{o}to SNPE. (Note that, in the limit of infinite data, and for white noise stimuli, the STA will converge to the receptive field [45]-for finite, and non-white data, the two will in general be different.) By randomly drawing receptive fields

**, we generated synthetic spike trains and calculated STAs from them (Fig. 2B), and subsequently trained a neural conditional density estimator to recover the underlying receptive-field model (Fig. 2C). This allowed us to estimate the posterior distribution over receptive fields, i.e. to estimate which receptive fields are consistent with the data (and prior) (Fig. 2C-D). For LN models, likelihood-based inference is possible, allowing us to validate the SNPE posterior by comparing it to a reference posterior obtained via Markov Chain Monte Carlo (MCMC) sampling [48, 49](Supplementary Fig. 1 and Supplementary Fig. 2).**

*θ*As a more challenging inference problem, we inferred the receptive field of a neuron in primary visual cortex (V1) [52, 53]. Using a model composed of a bias (related to the spontaneous firing rate) and a Gabor function with 8 parameters [54] to describe location, shape and strength of the receptive field, we simulated responses to 5-minute random noise movies of 41 × 41 pixels. In this case, the STA has 1681 dimensions (Fig. 2E), causing classical ABC methods to fail (Supplementary Fig. 3). This problem admits multiple solutions (as e.g. rotating the receptive field by 180°). As a result, the posterior distribution has multiple peaks (‘modes’). Starting from a simulation result **x**_{o} with known parameters, we used SNPE to estimate the respective posterior distribution. To deal with the high-dimensional data **x**_{o} in this problem, we used a convolutional neural network (CNN), as this architecture excels at learning relevant features from image data [55, 56]. To deal with the multiple peaks in the posterior, we fed the CNN’s output into a mixture density network (MDN) [57], which can learn to assign probability distributions with multiple peaks as a function of its inputs (full details in Methods). Using this strategy, SNPE was able to infer a posterior distribution that tightly enclosed the ground truth simulation parameters which generated the original simulated data **x**_{o}, and closely matched a reference MCMC posterior (Fig. 2F, full posterior in Supplementary Fig. 4). We also applied this approach to electrophysiological data from a V1 cell [53], identifying a sine-shaped Gabor receptive field consistent with the original spike-triggered average (Fig. 2H; posterior distribution in Supplementary Fig. 5).

### Functional diversity of ion channels: efficient high-throughput inference

We next show how SNPE can be efficiently applied to estimation problems in which we want to identify a large number of models for different observations in a database. We considered a flexible model of ion channels [59], which we here refer to as the *Omnimodel*. This model uses 8 parameters to describe how the dynamics of currents through non-inactivating potassium channels depend on membrane voltage (Fig. 3A). For various choices of its parameters ** θ**, it can capture 350 specific models in publications describing this channel type, cataloged in the lonChannelGenealogy (ICG) database [58]. We aimed to identify these ion channel parameters

**for each ICG model, based on 11 features of the model’s response to a sequence of 5 voltage clamp protocols, resulting in a total of 55 characteristic different features per model (Fig. 3B, see Methods for details).**

*θ*Because this model’s output is a typical format for functional characterization of ion channels both in simulations [58]and in high-throughput electrophysiological experiments [60–63], the ability to rapidly infer different parameters for many separate experiments is advantageous. Existing approaches for model fitting based on numerical optimization [59, 63] must repeat all computations a new for a new experiment or data point (Fig. 3C). However, for SNPE the only heavy computational tasks are carrying out simulations to generate training data, and training the neural network. We therefore reasoned that by training a network once using a large number of simulations, we could subsequently carry out rapid “amortized” parameter inference on new data using a single pass through the network (Fig. 3D) [64, 65]. To test this idea, we used SNPE to train a neural network to infer the posterior from any data **x**. To generate training data, we carried out 1 million Omnimodel simulations, with parameters randomly chosen across ranges large enough to capture the models in the ICG database [58]. In this case, SNPE was run using a single round, i.e. it learned to perform inference for all data from the prior (rather than a specific observed datum). Generating these simulations took around 1000 CPU-hours and training the network 150 CPU-hours, but afterwards a full posterior distribution could be inferred for new data in less than 10 ms.

As a first test, SNPE was run on simulation data, generated by a previous characterization of a non-inactivating potassium channel (Fig. 3B). Simulations of the Omnimodel using parameter sets sampled from the obtained posterior distribution (Fig. 3E) closely resembled the input data on which the SNPE-based inference had been carried out, while simulations using “outlier” parameter sets with low probability under the posterior generated current responses that were markedly different from the data **x**_{o} (Fig. 3F). Taking advantage of SNPE’s capability for rapid amortized inference, we further evaluated its performance on all 350 non-inactivating potassium channel models in ICG. In each case, we carried out a simulation to generate initial data from the original ICG model, used SNPE to calculate the posterior given the Omnimodel, and then generated a new simulation **x** using parameters sampled from the posterior (Fig. 3F). This resulted in high correlation between the original ICG model response and the Omnimodel response, in every case (>0.98 for more than 90% of models, see Supplementary Fig. 6). However, this approach was not able to capture all traces perfectly, as e.g. it failed to capture the shape of the onset of the bottom right model in Fig. 3G. Additional analysis of this example revealed that this example is not a failure of SNPE, but rather a limitation of the Omnimodel–thus, SNPE can be used to reveal limitations of candidate models and aid the development of more verisimilar mechanistic models.

Calculating the posterior for all 350 ICG models only took a few seconds, and was fully automated, i.e. did not require user interactions. These results show how SNPE allows fast and accurate identification of biophysical model parameters on new data, and shows how these approaches could scale to high-throughput or online applications which require rapid and automated inference.

### Hodgkin–Huxley model: stronger constraints from additional data features

The Hodgkin–Huxley (HH) model [5] of action potential generation through ion channel dynamics is a highly influential mechanistic model in neuroscience. A number of algorithms have been proposed for fitting HH models to electro-physiological data [26, 31, 32, 66–68], but (with the exception of [69]) these approaches do not attempt to estimate the full posterior. Given the central importance of the HH model in neuroscience, we sought to test how SNPE would cope with this challenging non-linear model. As previous approaches for HH models concentrated on reproducing specified features (e.g. the number of spikes) [66], we also sought to determine how various features provide different constraints. We considered the problem of inferring 8 biophysical parameters in a HH single-compartment model, describing voltage-dependent sodium and potassium conductances and other intrinsic membrane properties (Fig. 4A, left). We simulated the neuron’s voltage response to the injection of a square wave of depolarizing current, and defined the model output **x** used for inference as the number of evoked action potentials along with 6 additional features of the voltage response (Fig. 4A, right, details in Methods). We first applied SNPE to observed data **x**_{o} created by simulation from the model, calculating the posterior distribution using all 7 features in the observed data (Fig. 4B). The posterior contained the ground truth parameters in a high probability-region, as in previous applications, indicating the consistency of parameter identif cation. The variance of the posterior was narrower for some parameters than for others, indicating that the 7 data features strongly constrain some parameters (such as the potassium conductance), but only weakly others (such as the adaptation time constant). Additional simulations with parameters sampled from the posterior closely resembled the observed data **x**_{o}, in terms of both the raw membrane voltage over time and the 7 data features (Fig. 4C, purple and green). Parameters with low posterior probability (outliers) generated simulations that markedly differed from **x**_{o} (Fig. 4C, magenta).

To investigate how individual data features constrain parameters, we compared SNPE-estimated posteriors based 1) solely on the spike count, 2) on the spike count and 3 voltage-features, or 3) on all 7 features of **x**_{o}. This analysis revealed that as more features are taken into account, the posterior became narrower and centered more closely on the ground truth parameters (Fig. 4D, Supplementary Fig. 7). Posterior simulations matched the observed data only in those features that had been used for inference, e.g. applying SNPE to spike counts alone identified parameters that generated the correct number of spikes, but for which spike timing and subthreshold voltage time course were off, unless these additional data features were also provided to SNPE (Fig. 4E). For some parameters, such as the potassium conductance, providing more data features brought the peak of the posterior (the *posterior mode*) closer to the ground truth and also decreased uncertainty. For other parameters, such as *V _{T}*, a parameter adjusting the spike threshold [66], the peak of the posterior was already close to the correct value with spike counts alone, but adding additional features reduced uncertainty. While SNPE can be used to study the effect of additional data features in reducing parameter uncertainty, this would not be the case for methods that only return a single best-guess estimate of parameters. These results show that SNPE can reveal how information from multiple data features imposes collective constraints on channel and membrane properties in the HH model.

We also inferred HH parameters for 8 *in vitro* recordings from the Allen Cell Types database using the same current-clamp stimulation protocol as in our model [60, 70] (Fig. 4F, Supplementary Fig. 8). In each case, simulations based on the SNPE-inferred posterior closely resembled the original data (Fig. 4F). We note that while inferred parameters differed across recordings, some parameters (the spike threshold, the density of sodium channels, the membrane reversal potential and the density of potassium channels) were consistently more strongly constrained than others (the intrinsic neural noise, the adaptation time constant, the density of slow voltage-dependent channels and the leak conductance) (Supplementary Fig. 8). Overall, these results suggest that the electrophysiological responses measured by this current-clamp protocol can be approximated by a single-compartment HH model, and that SNPE can identify the admissible parameters.

### Crustacean stomatogastric ganglion: sensitivity to perturbations

For some biological systems, multiple parameter sets give rise to the same system behavior [7, 18, 72–75]. In particular, neural systems can be robust to specific perturbations of parameters [75–77], yet highly sensitive to others, properties referred to as *sloppiness* and *stiffness* [3, 16, 78]. To demonstrate how SNPE can identify which parameter perturbations affect model outputs, we applied it to a model [7] and data [71] of the pyloric rhythm in the crustacean stomatogastric ganglion (STG). This model describes a triphasic motor pattern generated by a fully characterized circuit (Fig. 5A). The circuit consists of two electrically coupled pacemaker neurons (anterior burster and pyloric dilator, AB/PD), modeled as a single neuron, as well as two types of follower neurons (lateral pyloric (LP) and pyloric (PY)), all connected through inhibitory synapses (details in Methods). Eight membrane conductances are included for each modeled neuron, along with 7 synaptic conductances, for a total of 31 parameters. This model has been used to demonstrate that virtually indistinguishable activity can arise from vastly different membrane and synaptic conductances in the STG [7, 18].

We applied SNPE to an extracellular recording from the STG of the crab *Cancer borealis* [71] which exhibited pyloric activity (Fig. 5B), and inferred the posterior distribution over all 31 parameters based on 18 salient features of the voltage traces, including cycle period, burst durations, burst delays, and phase gaps (Fig. 5C, full posterior in Supplementary Fig. 9, details in Methods). Consistent with previous reports, the posterior distribution has high probability over extended value ranges for many membrane and synaptic conductances. To verify that parameter settings across these extended ranges are indeed capable of generating the experimentally observed network activity, we sampled two sets of membrane and synaptic conductances from the posterior distribution. These two samples have widely disparate parameters from each other (Fig. 5C, purple dots, details in Methods), but both exhibit activity highly similar to the experimental observation (Fig. 5D, top left and top right).

We then investigated the geometry of the parameter space producing these rhythms [17, 18]. First, we wanted to identify directions of sloppiness, and we were interested in whether parameter settings producing pyloric rhythms form a single connected region, as has been shown for single neurons [79], or whether they lie on separate ‘islands.’ Starting from the two above parameter settings showing similar activity, we examined whether they were connected by a path through parameter space along which pyloric activity was maintained. To do this, we algorithmically identif ed a path lying only in regions of high posterior probability (Fig. 5C, white, details in Methods). Along the path, network output was tightly preserved, despite a substantial variation of the parameters (voltage trace 1 in Fig. 5D, Supplementary Fig. 10A,C). Second, we inspected directions of stiffness by perturbing parameters off the path. We applied perturbations that yield maximal drops in posterior probability (see Methods for details), and found that the network quickly produced non-pyloric activity (voltage trace 2, Fig. 5D). In identifying these paths and perturbations, we exploited the fact that SNPE provides a differentiable estimate of the posterior, as opposed to parameter search methods which provide only discrete samples.

Overall, these results show that the pyloric network can be robust to specific perturbations in parameter space, but sensitive to others, and that one can interpolate between disparate solutions while preserving network activity. This analysis demonstrates the flexibility of SNPE in capturing complex posterior distributions, and how the differentiable posterior can be used to study directions of sloppiness and stiffness.

### Predicting compensation mechanisms from posterior distributions

Experimental and computational studies have shown that stable neural activity can be maintained despite variable circuit parameters [7, 82, 83]. This behavior can emerge from two sources [82]: either the variation of a certain parameter barely influences network activity at all, or alternatively, the variation of several parameters influence network activity, but their effects compensate for one another. Here, we investigated these possibilities by using the posterior distribution over membrane and synaptic conductances of the STG.

We begin by drawing samples from the posterior and inspecting their pairwise histograms (Fig. 6A, full posterior over all parameters in Supplementary Fig. 9). Consistent with previously reported results [84], we found that most pairs of parameters are only weakly correlated (Fig. 6B). However, in these histograms over two parameters, all other parameters are fully unconstrained and can take on diverse values, which could blur out compensation mechanisms. Therefore, we held all but 2 parameters constant at a given consistent circuit configuration (sampled from the posterior), and observed the network activity across different values of the remaining pair of parameters. We can do so by calculating the conditional posterior distribution (details in Methods), and do not have to generate additional simulations (as would be required by parameter search methods). Doing so has a simple interpretation: when all but 2 parameters are fixed, what values of the remaining 2 parameters can then lead to pyloric activity? We found that pyloric activity can emerge only from narrowly tuned and often highly correlated combinations of the remaining 2 parameters, showing how these parameters can compensate for one another (Fig. 6C). When repeating this analysis across multiple network configurations, i.e. when holding the parameters fixed at different values, we found that these ‘conditional correlations’ are often preserved (Fig. 6C, left and right). We calculated conditional correlations for each parameter pair using 500 different circuit configurations sampled from the posterior (Fig. 6D). Compared to correlations based on the pairwise histograms (Fig. 6B), these conditional correlations were substantially stronger. They were particularly strong across membrane conductances of the same neuron, but primarily weak across different neurons (black boxes in Fig. 6D).

Finally, we tested whether the conditional correlations were in line with experimental observations. For the PD and the LP neuron, it has been reported that overexpression of the fast transient potassium current leads to a compensating increase of the hyperpolarization current, suggesting a positive correlation between these two currents [80, 85]. Also, using current injections into the LP neuron, a positive correlation has been reported between the strength of the synaptic input and the maximal conductance of the hyperpolarization current [81]. These results are qualitatively consistent with the conditional correlations (Fig. 6E), which were positive both between the fast transient potassium and hyperpolarization currents for all three model neurons, as well as for 6 out of 7 correlations between synaptic input strength and hyperpolarization current.

Overall, we showed how SNPE can be used to study parameter dependencies in circuits with parameter degeneracy, and how the posterior distribution can be used to efficiently explore potential compensation mechanisms. We found that our method can predict compensation mechanisms which are qualitatively consistent with experimental studies.

## Discussion

How can we build models which give insights into the causal mechanisms underlying neural or behavioral dynamics? The cycle of building mechanistic models, generating predictions, comparing them to empirical data, and rejecting, or refining models has been of crucial importance in the empirical sciences. However, a key challenge has been the difficulty of identifying mechanistic models which can quantitatively capture observed phenomena. We predict that a generally applicable tool to constrain mechanistic models by data is going to expedite progress in neuroscience. While many considerations should go into designing a model that is appropriate for a given question and level of description [2, 3, 86, 87], the question of whether and how one can perform statistical inference on the model should not compromise model-design. In our tool, SNPE, the process of model building and parameter inference are entirely decoupled. We illustrated the power of our approach on a diverse set of applications, highlighting the potential of SNPE to rapidly identify data-compatible mechanistic models, to investigate which data-features constrain which parameters, to reveal shortcomings of candidate-models, and to explore the parameter-landscape of a neural oscillator and provide hypotheses for compensation mechanisms.

### Related work

SNPE builds on recent advances in machine learning, and in particular in density-estimation approaches to likelihood-free inference [36–39, 88, 89]. The idea of learning inference networks on simulated data can be traced back to *regression-adjustment* methods in ABC [33, 90]. Papamakarios and Murray (2016) [36] first proposed to use expressive conditional density estimators in the form of deep neural networks [41, 57], and to optimize them sequentially over multiple rounds with cost-functions derived from Bayesian inference principles. Compared to commonly used rejection-based ABC methods [91, 92], such as MCMC-ABC [34], SMC-ABC [35, 93], Bayesian-Optimization ABC [94], or ensemble methods [95, 96], SNPE approaches do not require one to define a distance function in data space. In addition, by leveraging the ability of neural networks to learn informative features, they enable scaling to high-dimensional estimation problems, as are common in neuroscience and other fields in biology. Alternative likelihood-free approaches include *synthetic likelihood* methods [97–102], moment-based approximations of the posterior [103, 104], inference compilation [105, 106], and density-ratio estimation [107]. For some mechanistic models in neuroscience (e.g. for integrate-and-fire neurons), likelihoods can be computed via stochastic numerical approximations [67, 108, 109] or model-specific analytical approaches [68, 110–113].

Finally, a complementary approach to mechanistic modeling is to pursue purely phenomenological models, which are designed to have favorable statistical and computational properties: these data-driven models can be efficiently fit to neural data [19–25, 44, 46] or to implement desired computations [114]. Although tremendously useful for a quantitative characterization of neural dynamics, these models typically have a large number of parameters, which rarely correspond to physically measurable or mechanistically interpretable quantities, and thus it can be challenging to derive mechanistic insights or causal hypotheses from them (but see e.g. [115–117]).

### Use of summary statistics

When fitting mechanistic models to data, it is common to target summary statistics to isolate specific behaviors, rather than the full data. For example, the spike shape is known to constrain sodium and potassium conductances [29, 30, 66]. When modeling population dynamics, it is often desirable to achieve realistic firing rates, rate-correlations and response nonlinearities [42, 118], or specified oscillations [7]. In models of decision making, one is often interested in reproducing psychometric functions or reaction-time distributions [119]. Choice of summary statistics might also be guided by known limitations of either the model or the measurement approach, or necessitated by the fact that published data are only available in summarized form. Several methods have been proposed to automatically construct informative summary statistics [120–122]. SNPE can be applied to, and might benefit from the use of summary statistics, but it also makes use of the ability of neural networks to automatically learn informative features in high-dimensional data. Thus, SNPE can also be applied directly to raw data (e.g. using recurrent neural networks [37]), or to high-dimensional summary statistics which are challenging for ABC approaches (Fig. 2). In all cases, care is needed when interpreting models fit to summary features, as choice of features can influence the results [120–122].

### Applicability and limitations

A key advantage of SNPE is its general applicability: it can be applied whenever one has a simulator that allows to stochastically generate model outputs from specific parameters. Furthermore, it can be applied in a fully ‘black-box manner’, i.e. does not require access to the internal workings of the simulator, likelihoods or gradients. It does not impose any other limitations on the model or the summary features, and in particular does not require them to be differentiable. However, it also has limitations: First, current implementations of SNPE scale well to high-dimensional observations (~100s dims, also see [38]), but scaling to high-dimensional parameter spaces (>30) is challenging. Second, while it is a long-term goal for these approaches to be made fully automatic, our current implementation still requires choices by the user: As described in Methods, one needs to provide the architecture of the density estimation network, and specify settings related to network-optimisation, and the number of simulations and inference rounds. These settings depend on the complexity of the relation between summary statistics and model parameters, and the number of simulations that can be afforded. In the documentation accompanying our code-package, we provide examples and guidance. For small-scale problems, we have found SNPE to be robust to these settings. However, for challenging, high-dimensional applications, SNPE might currently require substantial user interaction. Third, the power of SNPE crucially rests on the ability of deep neural networks to perform density estimation. While deep nets have had ample empirical success, we still have an incomplete understanding of their limitations, in particular in cases where the mapping between data and parameters might not be smooth (e.g. near phase transitions). Fourth, when applying SNPE (or any other model-identification approach), validation of the results is of crucial importance, both to assess the accuracy of the inference procedure, as well as to identify possible limitations of the mechanistic model itself. In the example applications, we used several procedures for assessing the quality of the inferred posteriors. One common ingredient of these approaches is to sample from the inferred model, and search for systematic differences between observed and simulated data, e.g. to perform *posterior predictive checks* [37, 38, 93, 123, 124] (Fig. 2G, Fig. 3F,G, Fig. 4C, and Fig. 5D). There are challenges and opportunities ahead in further scaling and automating simulation-based inference approaches. However, in its current form, SNPE will be a powerful tool for quantitatively evaluating mechanistic hypotheses on neural data, and for designing better models of neural dynamics.

## Methods

### Code availability

Code implementing SNPE is available at http://www.mackelab.org/delfi/.

### Simulation-based inference

To perform Bayesian parameter identification with SNPE, three types of input need to be specified:

A mechanistic model. The model only needs to be specified through a simulator, i.e. that one can generate a simulation result

**x**for any parameters. We do not assume access to the likelihood*θ**p*(**x**|) or the equations or internals of the code defining the model, nor do we require the model to be differentiable. This is in contrast to many alternative approaches (including [42]), which require the model to be differentiable and to be implemented in a software code that is amenable to automatic differentiation packages. Finally, SNPE can both deal with inputs*θ***x**which resemble ‘raw’ outputs of the model, or summary features/statistics calculated from data.Observed data

**x**_{o}of the same form as the results**x**produced by model simulations.A prior distribution

*p*() describing the range ofpossible parameters.*θ**p*() could consist ofupper and lower bounds for each parameter, or a more complex distribution incorporating mechanistic first principles or knowledge gained from previous inference procedures on other data.*θ*

For each problem, our goal was to estimate the posterior distribution *p*(** θ**|

**x**

_{o}). To do this we used SNPE [36–38]. Setting up the inference procedure required three design choices:

A network architecture, including number of layers, units per layer, layer type (feedforward or convolutional), activation function and skip connections.

A parametric family of probability densities

*q*(_{ψ}) to represent inferred posteriors, to be used as conditional density estimator. We used either a mixture of Gaussians (MoG) or a masked autoregressive flow (MAF) [41]. In the former case, the number of components*θ**K*must be specified; in the latter the number of*MADES*(Masked Autoencoder for Distribution Estimation)*n*_{MADES}. Both choices are able to represent richly structured, and multi-modal posterior distributions.A simulation budget, i.e. number of rounds

*R*and simulations per round*N*._{r}

We emphasize that SNPE is highly modular, i.e. that the the inputs (data, the prior over parameter, the mechanistic model), and algorithmic components (network architecture, probability density, optimization approach) can all be modified and chosen independently. This allows neuroscientists to work with models which are designed with mechanistic principles—and not convenience of inference—in mind. Furthermore, it allows SNPE to benefit from advances in more flexible density estimators, more powerful network architectures, or optimization strategies.

With the problem and inference settings specified, SNPE adjusts the network weights *ϕ* based on simulation results, so that *p*(** θ**|

**x**) ≈

*q*

_{F(xo, ϕ)}(

**) for any**

*θ***x**. In the first round of SNPE simulation parameters are drawn from the prior

*p*(

**). If a single round of inference is not sufficient, SNPE can be run in multiple rounds, in which samples are drawn from the version of**

*θ**q*

_{F(xo, ϕ)}(

*) at the beginning of the round. After the last round,*

**θ***q*

_{F(xo, ϕ)}is returned as the inferred posterior on parameters

**given observed data**

*θ***x**

_{o}. If SNPE is only run for a single round, then the generated samples only depend on the prior, but not on

**x**

_{o}: In this case, the inference network is applicable to any data (covered by the prior ranges), and can be used for rapid amortized inference.

SNPE learns the correct network weights *ϕ* by minimizing the objective function where the simulation with parameters * θ_{j}* produced result

**x**

_{j}. For the first round of SNPE , while in subsequent rounds a different loss function accounts for the fact that simulation parameters were not sampled from the prior. Different choices of the loss function for later rounds result in SNPE-A [36], SNPE-B [37] or SNPE-C algorithm [38]. To optimize the networks, we used ADAM with default settings [125].

The details of the algorithm are below:

### Linear-nonlinear encoding models

We used a Linear-Nonlinear (LN) encoding model (a special case of a generalized linear model, GLM, [19, 21, 44–47]) to simulate the activity of a neuron in response to a univariate time-varying stimulus. Neural activity *z _{i}* was subdivided in

*T*= 100 bins and, within each bin

*i*, spikes were generated according to a Bernoulli observation model, where

**v**

_{i}is a vector of white noise inputs between time bins

*i*− 8 and

*i*,

**f**a length-9 linear filter,

*β*is the bias, and

*η*(·) = exp(·)/(1 + exp(·)) is the canonical inverse link function for a Bernoulli GLM. As summary statistics, we used the total number of spikes

*N*and the spike-triggered average , where

**V**= [

*v*

_{1},

*v*

_{2},…,

*v*] is the so-called design matrix of size 9 ×

_{T}*T*. We note that the spike-triggered sum

**Vz**constitutes sufficient statistics for this GLM, i.e. that selecting the STA and

*N*together as summary statistics does not lead to loss of model relevant information over the full input-output dataset {

**V, z**}. We used a Gaussian prior with zero mean and covariance matrix , where

**F**encourages smoothness by penalizing the second-order differences in the vector of parameters [126].

For inference, we used a single round of 10000 simulations, and the posterior was approximated with a Gaussian distribution . We used a feedforward neural network with two hidden layers of 50 units each. We used a Polya Gamma Markov Chain Monte Carlo sampling scheme [48] to estimate a reference posterior.

For the spatial receptive field model of a cell in primary visual cortex, we simulated the activity of a neuron depending on an image-valued stimulus. Neural activity was subdivided in bins of length Δ*t* = 0.025*s* and within each bin *i*, spikes were generated according to a Poisson observation model,
where **v**_{i} is the vectorized white noise stimulus at time bin *i*, **h** a 41 × 41 linear filter, *β* is the bias, and *η*(·) = exp(·) is the canonical inverse link function for a Poisson GLM. The receptive field **h** is constrained to be a Gabor filter:
where (*g _{x}, g_{y}*) is a regular grid of 41 × 41 positions spanning the 2D image-valued stimulus. The parameters of the Gabor are gain

*g*, spatial frequency

*f*, aspect-ratio

*r*, width

*w*, phase

*ϕ*(between 0 and

*π*), angle

*ψ*(between 0 and 2

*π*) and location

*x, y*(assumed within the stimulated area, scaled to be between −1 and 1). Bounded parameters were transformed with a log-, or logit-transform, to yield unconstrained parameters. After applying SNPE, we back-transformed both the parameters and the estimated posteriors in closed form, as shown in Fig. 2. We did not transform the parameters bias

*β*and gain

*g*.

We used a factorizing Gaussian prior for the vector of transformed Gabor parameters
where transforms ensured the assumed ranges for the Gabor parameters *ϕ, ψ, x, y*. Our Gaussian prior had zero mean and standard deviations [2, 0.5, 0.5, 0.5, 1.9, 1.78, 1.78, 1.78]. We note that a Gaussian prior on a logit-transformed random variable logit*X* with zero mean and standard deviation around 1.78 is close to a uniform prior over the original variable *X*. For the bias *β*, we used a Gaussian prior with mean −0.57 and variance 1.63, which approximately corresponds to an exponential prior *exp*(*β*) ~ *Exp*(λ) with rate λ = 1 on the baseline firing rate exp(*β*) in absence of any stimulus.

The ground-truth parameters for the demonstration in Fig. 2 were chosen to give an asymptotic firing rate of 1 Hz for 5 minutes stimulation, resulting in 307 spikes, and a signal-to-noise ratio of −12 dB.

As summary statistics, we used the total number of spikes *N* and the spike-triggered average , where **V** = [*v*_{1}, *v*_{2},…, *v _{T}*] is the stimulation video of length

*T*= 300/Δ

*t*= 12000. As for the GLM with a temporal filter, the spike-triggered sum

**Vz**constitutes sufficient statistics for this GLM.

For inference, we applied SNPE-A with in total 2 rounds: an initial round serves to first roughly identify the relevant region of parameter space. Here we used a Gaussian distribution to approximate the posterior from 10000 simulations each. A second round then used a mixture of 8 Gaussian components to estimate the exact shape of the posterior from another 100000 simulations . We used a convolutional network with 5 convolutional layers with 16 to 32 convolutional filters followed by three fully connected layers with 50 units each. The total number of spikes *N* within a simulated experiment was passed as an additional input directly to the fully-connected layers of the network. Similar to the previous GLM, this model has a tractable likelihood, so we use MCMC to obtain a reference posterior.

We applied this approach to extracelullar recordings from primary visual cortex of alert mice obtained using silicon microelectrodes in response to colored-noise visual stimulation. Experimental methods are described in Dyballa et al 2018 [53].

#### Comparison with Sequential Monte Carlo (SMC) ABC

In order to illustrate the competitive performance of SNPE, we obtained a posterior estimate with a classical ABC method, Sequential Monte Carlo (SMC) ABC [35, 127]. Likelihood-free inference methods from the ABC family require a distance function *d*(**x**_{o}, **x**) between observed data **x**_{o} and possible simulation outputs **X** to characterize dissimilarity between simulations and data. A common choice is the (scaled) Euclidean distance *d*(**x**_{o}, **x**) = ||**x** − **x**_{o}||_{2}. The Euclidean distance here was computed over 1681 summary statistics given by the spike-triggered average (one per pixel) and a single summary statistic given by the ‘spike count’. To ensure that the distance measure was sensitive to differences in both STA and spike count, we scaled the summary statistic ‘spike count’ to account for about 20% of the average total distance (other values did not yield better results). The other 80% were computed from the remaining 1681 summary statistics given by spike-triggered averages. To showcase how this situation is challenging for ABC approaches, we generated 10000 input-output pairs (* θ_{i}*,

**x**

_{i}) ~

*p*(

**x**|

**)**

*θ**p*(

**) with the prior and simulator used above, and illustrate the 10 STAs and spike counts with closest**

*θ**d*(

**x**

_{o},

**x**

_{i}) in Supplementary Fig. 3A. Spike counts were comparable to the observed data (307 spikes), but STAs are noise-dominated and the 10 ‘closest’ underlying receptive fields (yellow contours) show substantial variability in location and shape of the receptive field. If even the ‘closest’ samples do not show any visible receptive field, then there is little hope that even an appropriately chosen acceptance threshold will yield a good approximation to the posterior. This findings were also reflected in the results from SMC-ABC with a total simulation budget of 10

^{6}simulations (Fig. 3B). The estimated posterior marginals for ‘bias’ and ‘gain’ parameters show that the parameters related to the firing rate were constrained by the data

**x**

_{o}, but marginals of parameters related to shape and location of the receptive field did not differ from the prior, highlighting that SMC-ABC was here not able to identify the posterior distribution. Further comparisons of neural-density estimation approaches with ABC-methods can be found in the publications describing the underlying machine-learning methodologies [36, 38, 101].

### Ion channel models

We simulated non-inactivating potassium channel currents subject to voltage-clamp protocols as:
where *V* is the membrane potential, *ḡ*_{k} is the density of potassium channels, *E _{K}* is the reversal potential of potassium, and

*m*is the gating variable for potassium channel activation.

*m*is modeled according to the first-order kinetic equation where

*m*∞(

*V*) is the steady-state activation, and

*τ*(

_{m}*V*) the respective time constant. We used a general formulation of

*m*∞(

*V*) and

*τ*(

_{m}*V*) [59], where the steady-state activation curve has 2 parameters (slope and offset) and the time constant curve has 6 parameters, amounting to a total of 8 parameters (

*θ*

_{1}to

*θ*

_{8}):

Since this model can be used to describe the dynamics of a wide variety of channel models, we refer to it as *Omnimodel*.

We modeled responses of the Omnimodel to a set of five voltage-clamp protocols described in [58]. Current responses were reduced to 55 summary statistics (11 per protocol). Summary statistics were coefficients to basis functions derived via Principal Components Analysis (PCA) (10 per protocol) plus a linear offset (1 per protocol) found via least-squares fitting. PCA basis functions were found by simulating responses of the non-inactivating potassium channel models to the five voltage-clamp protocols and reducing responses to each protocol to 10 dimensions (explaining 99.9% of the variance).

To amortize inference on the model, we specified a wide uniform prior over the parameters: .

For inference, we trained a shared inference network in a single round of 10^{6} simulations generated by sampling from the prior (** θ** ϵ ℝ

^{8},

**x**ϵ ℝ

^{55}). The density estimator is a masked autoregressive flow (MAF) [41] with five MADES with [250, 250] hidden units each.

We evaluated performance on 350 non-inactivating potassium ion channels selected from IonChannelGenealogy (ICG) by calculating the correlation coefficient between traces generated by the original model and traces from the Omnimodel using the posterior mode.

### Single-compartment Hodgkin–Huxley neurons

We simulated a single-compartment Hodgkin–Huxley type neuron with channel kinetics as in [66],
where *V* is the membrane potential, *C _{m}* is the membrane capacitance,

*g*is the leak conductance,

_{l}*E*is the membrane reversal potential,

_{l}*ḡ*

_{c}is the density of channels of type

*c*(Na

^{+}, K

^{+}, M),

*E*is the reversal potential of

_{c}*c*, (

*m, h, n, p*) are the respective channel gating kinetic variables, and

*ση*(

*t*) is the intrinsic neural noise. The right hand side of the voltage dynamics is composed of a leak current, a voltage-dependent Na

^{+}current, a delayed-rectifier K

^{+}current, a slow voltage-dependent K+ current responsible for spike-frequency adaptation, and an injected current

*l*

_{inj}. Channel gating variables

*q*have dynamics fully characterized by the neuron membrane potential

*V*, given the respective steady-state

*q*

_{∞}(

*V*) and time constant

*τ*(

_{q}*V*) (details in [66]). Two additional parameters are implicit in the functions

*q*

_{∞}(

*V*) and

*τ*(

_{q}*V*):

*V*adjusts the spike threshold through

_{T}*m*

_{∞},

*h*

_{∞},

*n*

_{∞},

*τ*,

_{m}*τ*and

_{h}*τ*;

_{n}*⅛*scales the time constant of adaptation through

_{max}*τ*(

_{p}*V*) (details in [66]). We set

*E*

_{Na}= 53 mV and

*E*

_{K}= –107 mV, similar to the values used for simulations in Allen Cell Types Database (http://help.brain-map.org/download/attachments/8323525/BiophysModelPeri.pdf).

We applied SNPE to infer the posterior over 8 parameters (*ḡ*_{Na}, *ḡ*_{K}, *ḡ*_{l}, *ḡ*_{M}, *τ*_{max}, *V _{T}*,

*σ*,

*E*

_{l}), given 7 voltage features (number of spikes, mean resting potential, standard deviation of the resting potential, and the first 4 voltage moments, mean, standard deviation, skewness and kurtosis).

The prior distribution over the parameters was uniform,
where *p*_{low} = [0.5,10^{−4},10^{−4},10^{−4},50,40,10^{−4},35] and *p*_{high} = [80, 15, 0.6, 0.6, 3000, 90, 0.15, 100]. These ranges are similar to the ones obtained in [66].

For inference in simulated data, we used a single round of 100000 simulations (** θ** ϵ ℝ

^{8},

**×**ϵ ℝ

^{11}). The density estimator was a masked autoregressive flow (MAF) [41] with five MADES with [50, 50] hidden units each.

For the inference on in vitro recordings from mouse cortex (Allen Cell Types Database, https://celltypes.brain-map.org/data), we selected 8 recordings corresponding to spiny neurons with at least 10 spikes during the current-clamp stimulation. The respective cell identities and sweeps are: (518290966,57), (509881736,39), (566517779,46), (567399060,38), (569469018,44), (532571720,42), (555060623,34), (534524026,29). For each recording, SNPE-B was run for 2 rounds with 125000 Hodgkin–Huxley simulations each, and the posterior was approximated by a mixture of two Gaussians. In this case, the density estimator was composed of two fully connected layers of 100 units each.

### Circuit model of the crustacean stomatogastric ganglion

We used extracellular recordings from the crab *Cancer borealis* [71]. The preparations from the stomatogastric ganglion were decentralized, i.e. the input from descending modulatory inputs was removed. The data was recorded at a temperature of 11 °C. See Haddad & Marder (2018) [71] for full experimental details.

We simulated the circuit model of the crustacean stomatogastric ganglion by adapting a model described in [7]. The model is composed of three single-compartment neurons, AB/PD, LP, and PD, where the electrically coupled AB and PD neurons are modeled as a single neuron. Each of the model neurons contains 8 currents, a Na^{+} current *I*_{Na}, a fast and a slow transient Ca^{2+} current *I*_{CaT} and *I*_{CaS}, a transient K^{+} current *I*_{A}, a Ca^{2+}-dependent K^{+} current *I*_{KCa}, a delayed rectifier K^{+} current *I*_{Kd}, a hyperpolarization-activated inward current *I*_{H}, and a leak current *I*_{leak}. In addition, the model contains 7 synapses. As in [7], these synapses were simulated using a standard model of synaptic dynamics [128]. The synaptic input current into the neurons is given by *I*_{s} = *g*_{s}*s*(*V*_{post} − *E*_{s}), where gs is the maximal synapse conductance, *V*_{post} the membrane potential of the postsynaptic neuron, and *E*_{s} the reversal potential of the synapse. The evolution of the activation variable *s* is given by
with

Here, *V*_{pre} is the membrane potential of the presynaptic neuron, *V*_{th} is the half-activation voltage of the synapse, *δ* sets the slope of the activation curve, and *k*_{−} is the rate constant for transmitter-receptor dissociation rate.

As in [7], two types of synapses were modeled since AB, LP, and PY are glutamatergic neurons whereas PD is cholinergic. We set *E _{s}* = −70 mV and

*k*

_{−}= 1/40 ms for all glutamatergic synapses and

*E*= −80 mV and

_{s}*k*

_{−}= 1/100 ms for all cholinergic synapses. For both synapse types, we set

**V**

_{th}= −35 mV and

*δ*= 5 mV.

For each set of membrane and synaptic conductances, we numerically simulated the rhythm for 10 seconds with a step size of 0.025 ms. To make the model stochastic, at each time step, we added Gaussian noise with a standard deviation of 0.001 mV to the input of each neuron.

We applied SNPE to infer the posterior over 24 membrane parameters and 7 synaptic parameters, i.e. 31 parameters in total. The 7 synaptic parameters were the maximal conductances *g _{s}* of all synapses in the circuit, each of which is varied uniformly in logarithmic domain from 0.01 nS to 1000 nS, with an exception of the synapse from AB to LP, which is varied uniformly in logarithmic domain from 0.01 nS to 10000 nS. The membrane parameters were the maximal membrane conductances for each of the neurons. The membrane conductances were varied over an extended range of previously reported values [7], which led us to the uniform prior bounds

*p*

_{low}= [0, 0, 0, 0, 0, 25, 0, 0]mS cm

^{−2}and

*p*

_{high}= [500, 7.5, 8, 60, 15, 150, 0.2, 0.01]mS cm

^{-2}for the maximal membrane conductances of the AB neuron,

*p*

_{low}= [0, 0, 2, 10, 0, 0, 0, 0.01]mS cm

^{−2}and

*p*

_{high}= [200, 2.5, 12, 60, 10, 125, 0.06, 0.04]mS cm

^{-–2}for the maximal membrane conductances of the LP neuron, and

*p*

_{low}= [0, 0, 0, 30, 0, 50, 0, 0]mS cm

^{−2}and

*p*

_{high}= [600, 12.5, 4, 60, 5, 150, 0.06, 0.04]mS cm

^{−2}for the maximal membrane conductances of the PY neuron. The order of the membrane currents was: [Na, CaT, CaS, A, KCa, Kd, H, leak].

We used the 15 summary statistics proposed by [7], and extended them by 3 additional features. The features proposed by [7] are 15 salient features of the pyloric rhythm, namely: cycle period T (s), AB/PD burst duration , LP burst duration , PY burst duration , gap AB/PD end to LP start , gap LP end to PY start , delay AB/PD start to LP start , delay LP start to PY start , AB/PD duty cycle *d*_{AB}, LP duty cycle *d*_{LP}, PY duty cycle *d*_{PY}, phase gap AB/PD end to LP start Δ*ϕ*_{AB-LP}, phase gap LP end to PY start Δ*ϕ*_{LP-PY}, LP start phase *ϕ*_{LP}, and PY start phase *ϕ*_{PY}. Note that several of these values are only defined if each neuron produces rhythmic bursting behavior. In addition, for each of the three neurons, we used one feature that describes the maximal duration of its voltage being above −30 mV.We did this as we observed plateaus at around −10 mV during the onset of bursts, and wanted to distinguish such activity traces from others. If the maximal duration was below 5 ms, we set this feature to 5 ms. To extract the summary statistics from the observed experimental data, we first found spikes by searching for local maxima above a hand-picked voltage threshold, and then extracted the 15 above described features. We set the additional 3 features to 5 ms.

We used SNPE to infer the posterior distribution over the 18 summary statistics from experimental data. For inference, we used a single round with 18.5 million samples, out of which 174,000 samples contain bursts in all neurons. We therefore used these 174,000 samples with well defined summary statistics for training the inference network . The density estimator was a masked autoregressive flow(MAF) [41] with five MADES with [200, 400] hidden units each. The synaptic conductances were transformed into logarithmic space before training and for the entire analysis.

#### Finding paths in the posterior

In order to find directions of robust network output, we searched for a path of high posterior probability. First, as in [7], we aimed to find 2 similar model outputs with disparate parameters. To do so, we sampled from the posterior and searched for 2 parameter sets whose summary statistics were within 0. 1 standard deviations of all 174,000 samples from the observed experimental data, but that had strongly disparate parameters from each other. In the following, we denote the obtained parameter sets by * θ_{s}* and

*.*

**θ**_{g}Second, in order to identify whether network output can be maintained along a continuous path between these 2 samples, we searched for a connection in parameter space lying in regions of high posterior probability. To do so, we considered the connection between the samples as a path and minimize the following path integral:

To minimize this term, we parameterized the path *γ*(*s*) using sinusoidal basis-functions with coefficients *α _{n,k}*:

These basis functions are def ned such that, for any coefficients *α _{n,k}*, the starting and end points of the path are exactly the two parameter sets def ned above:

With this formulation, we have framed the problem of finding the path as an unconstrained optimization problem over the parameters *α _{n,k}*. We can therefore minimize the path integral

*L*using gradient descent over

*α*. For numerical simulations, we approximated the integral in equation 1 as a sum over 80 points along the path and use 2 basis functions for each of the 31 dimensions, i.e.

_{n,k}*K*= 2.

In order to demonstrate the sensitivity of the pyloric network, we aimed to find a path along which the circuit output quickly breaks down. For this, we picked a starting point along the high-probability path and then minimize the posterior probability. In addition, we enforced that the orthogonal path lies within the orthogonal disk to the high-probability path, leading to the following constrained optimization problem:
where *n* is the tangent vector along the path of high probability. This optimization problem can be solved using the gradient projection method [129]:
with projection matrix and indicating the identity matrix. Each gradient update is a step along the orthogonal path. We let the optimization run until the distance along the path is 1/27 of the distance along the high-probability path.

#### Identifying conditional correlations

In order to investigate compensation mechanisms in the STG, we compared marginal and conditional correlations. For the marginal correlation matrix in Fig. 6B, we calculated the Pearson correlation coefficient based on 1.26 million samples from the posterior distribution *p*(** θ**|

**x**). To find the 2-dimensional conditional distribution for any pair of parameters, we fixed all other parameters to values taken from an arbitrary posterior sample, and varied the remaining 2 on an evenly spaced grid with 50 points along each dimension for figure 6C and with 20 points along each dimension for figure 6D, covering the entire prior space. We evaluated the posterior distribution at every value on this grid. We then calculated the conditional correlation as the Pearson correlation coefficient over this distribution. For the 1-dimensional conditional distribution, we varied only 1 parameter and kept all others fixed. Lastly, in Fig. 6D, we sampled 500 parameter sets from the posterior, computed the respective conditional posteriors and conditional correlation matrices, and took the average over the conditional correlation matrices.

## Acknowledgments

We thank Mahmood S. Hoseini and Michael Stryker for sharing their data for Fig. 2, and Philipp Berens, Sean Bittner, Jan Boelts, John Cunningham, Richard Gao, Scott Linderman, Eve Marder, Iain Murray, George Papamakarios, Astrid Prinz, Auguste Schulz and Srinivas Turaga for discussions and/or comments on the manuscript. This work was supported by the German Research Foundation (DFG) through SFB 1233 ‘Robust Vision’, (276693517), SFB 1089 ‘Synaptic Microcircuits’ and SPP 2041 ‘Computational Connectomics’, the German Federal Ministry of Education and Research (BMBF, project ‘ADIMEM’, FKZ 01IS18052 A-D), and UK Research and Innovation, Biotechnology and Biological Sciences Research Council (UKRI-BBSRC) through BB/N019512/1.

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].
- [9].
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].
- [21].↵
- [22].
- [23].
- [24].
- [25].↵
- [26].↵
- [27].
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].
- [62].
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].
- [74].
- [75].↵
- [76].
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].↵
- [86].↵
- [87].↵
- [88].↵
- [89].↵
- [90].↵
- [91].↵
- [92].↵
- [93].↵
- [94].↵
- [95].↵
- [96].↵
- [97].↵
- [98].
- [99].
- [100].
- [101].↵
- [102].↵
- [103].↵
- [104].↵
- [105].↵
- [106].↵
- [107].↵
- [108].↵
- [109].↵
- [110].↵
- [111].
- [112].
- [113].↵
- [114].↵
- [115].↵
- [116].
- [117].↵
- [118].↵
- [119].↵
- [120].↵
- [121].
- [122].↵
- [123].↵
- [124].↵
- [125].↵
- [126].↵
- [127].↵
- [128].↵
- [129].↵