Model-based inference of synaptic plasticity rules

Inferring the synaptic plasticity rules that govern learning in the brain is a key challenge in neuroscience. We present a novel computational method to infer these rules from experimental data, applicable to both neural and behavioral data. Our approach approximates plasticity rules using a parameterized function, employing either truncated Taylor series for theoretical interpretability or multilayer perceptrons. These plasticity parameters are optimized via gradient descent over entire trajectories to align closely with observed neural activity or behavioral learning dynamics. This method can uncover complex rules that induce long nonlinear time dependencies, particularly involving factors like postsynaptic activity and current synaptic weights. We validate our approach through simulations, successfully recovering established rules such as Oja’s, as well as more intricate plasticity rules with reward-modulated terms. We assess the robustness of our technique to noise and apply it to behavioral data from Drosophila in a probabilistic reward-learning experiment. Notably, our findings reveal an active forgetting component in reward learning in flies, improving predictive accuracy over previous models. This modeling framework offers a promising new avenue for elucidating the computational principles of synaptic plasticity and learning in the brain.


Introduction
Synaptic plasticity, the ability of synapses to change their strength, is a key neural mechanism underlying learning and memory in the brain.These synaptic updates are driven by neuronal activity, and they in turn modify the dynamics of neural circuits.Advances in neuroscience have enabled the recording of neuronal activity on an unprecedented scale (Steinmetz et al., 2018;Vanwalleghem et al., 2018;Zhang et al., 2023), and connectome data for various organisms is becoming increasingly available (Winding et al., 2023;Takemura et al., 2023;Hildebrand et al., 2017;Scheffer et al., 2020).However, the inaccessibility of direct large-scale recordings of synaptic dynamics leaves the identification of biological learning rules an open challenge.Existing neuroscience literature (Abbott & Nelson, 2000;Morrison et al., 2008) suggests that synaptic changes are functions of local variables such as presynaptic activity, postsynaptic activity, and current synaptic weight, as well as a global reward signal.Uncovering the specific form of this function in different brain circuits promises profound biological insights and holds practical significance for developing more biologically plausible learning algorithms for AI, particularly with neuromorphic implementations (Zenke & Neftci, 2021).  t) and m m m (t) .A loss function quantifies trajectory mismatch to produce a gradient, enabling the inference of synaptic plasticity rules.
In this paper, we introduce a gradient-based method for inferring synaptic plasticity rules.Our method optimizes parameterized plasticity rules to align with both neural and behavioral data, thereby elucidating the mechanisms governing synaptic changes in biological systems.We utilize interpretable models of plasticity, allowing direct comparisons with existing biological theories and addressing specific questions, such as the role of weight decay in synaptic plasticity or postsynaptic dependence.We validate our approach for recovering plasticity rules using synthetic neural activity or behavior2 .Finally, applying our model to behavioral data from fruit flies, we uncover an active forgetting mechanism in mushroom body circuits related to decision making, learning, and memory.
This modeling framework offers new opportunities for exploring the core mechanisms behind learning and memory processes.It enables the swift evaluation of a wide range of biological theories and can be readily adapted to various experimental paradigms.Additionally, gaining insights into synaptic plasticity can help in the creation of artificial intelligence models that more accurately emulate the complexities of biological systems, without requiring the massive energy consumption needed for training current large language models (de Vries, 2023).

Method overview
Our goal is to infer the synaptic plasticity function by examining neural activity or behavioral trajectories in a learning organism.Specifically, we aim to find a function that prescribes changes in synaptic weights based on relevant biological variables.For simplicity, we consider a model with plasticity localized to a single layer of a neural network: where the vector x x x (t) represents the input to the plastic layer (Figure 1, "stimulus") and y y y (t) is the resulting postsynaptic neuron activity at time t.The synaptic weight matrix W (t) is updated at each time step based on a parameterized biologically plausible plasticity function g θ .The change in synaptic weight between neurons i and j is given by i , w where θ are the (trainable) parameters of the function, x j is the presynaptic neural activity, y i the postsynaptic activity, w (t) ij the current synaptic weight between neurons i and j, and r (t) is a global reward signal that influences all synaptic connections.However, it may be the case that we do not have direct access to the neuronal firing rates y y y (t) .We therefore further define a (fixed) readout function f that determines the observable variables m m m (t) of the network, given by In the context of neural activity fitting, the readout is a subset of y y y (t) , whereas for behavioral models the readout aggregates y y y (t) to yield the probability of a specific action.We introduce our specific choices for readout functions in the following sections.
We use stochastic gradient descent (Kingma & Ba, 2014) to optimize the parameters θ of the plasticity rule g θ .At each iteration, we use the model (Equation 1-3) to generate a length-T trajectory m m m (1) , . . ., m m m (T ) (Figure 1, blue traces), driven by input stimulus x x x (1) , . . ., x x x (T ) (Figure 1, white box).We then use backpropagation through time to compute the gradient of the loss (Figure 1, purple) between the model trajectory and the corresponding experimental observations o o o (1) , . . ., o o o (T ) (Figure 1, orange) generated using the same input stimulus: where the choice of ℓ depends on the particular modeling scenario, specified in the following sections.In practice, Equation 4 may also be summed over multiple trajectories to generate a minibatch.

Inferring a plasticity rule from neural activity
To validate our approach on neural activity, we generate synthetic neural trajectories of observed outputs o o o (t) from a single-layer feedforward network that undergoes synaptic plasticity according to the well-known Oja's rule (Oja, 1982).At each timestep, the weight updates depend on pre-and post-synaptic neuronal activity, as well as the strength of the synapse itself (Figure 2A, top), where we omit the time index t for brevity (see subsection A.2 for details).To infer the plasticity rule, we use a model network with an architecture identical to the ground-truth network (Figure 2A, bottom): this is a reasonable assumption, as the model architecture can be designed based on the the target circuit's connectome, which are becoming increasingly available in biology (Bentley et al., 2016;Hildebrand et al., 2017;Scheffer et al., 2020).Following previous work (Confavreux et al., 2020), we parameterize the model's plasticity function with a truncated Taylor series, where the coefficients θ αβγ are learned.Note that Oja's rule can be represented within this family of plasticity rules by setting θ 110 = 1, θ 021 = −1, and all others to zero.Finally, we compute the loss as the mean squared error (MSE) between the neural trajectories produced by the ground truth network and the model: where we let m m m (t) = y y y (t) , assuming all neurons in the circuit are recorded (but see next section for analysis of sparse recordings).

Recovering Oja's rule
Despite the fact that the model is optimized using neuronal trajectories, the error of the synaptic weight trajectories decreases over the course of training (Figure 2B), indicating that that the model is successfully learning to approximate the ground-truth plasticity rule.More explicitly, examining the coefficients θ αβγ over the course of training illustrates the recovery of Oja's rule as θ 110 and θ 021 approach 1 and −1, respectively, and all others go to zero (Figure 2C).
To evaluate the robustness of our method, we assess how both noise and sparsity affect the model's performance (Figure 2D).We first consider the case where all neurons in the circuit are recorded, and vary the degree of additive Gaussian noise in the recorded neurons.Figure 2E summarizes the trained model performance (R 2 score between the ground-truth and model synaptic weight trajectories) with increasing noise variance.To simulate varying sparsity levels, we consider the readout m m m (t) = f (y y y (t) ) = (y kn ) to be the activity of a random subset n of all N postsynaptic neurons y y y (t) , and use the corresponding subset o kn ) of recorded neurons from the ground-truth network in Equation 7 to optimize the plasticity rule.Our model maintains a high level of accuracy even when data is available from only 50% of the neurons (Figure 2F), which is particularly beneficial given that sparse neural recordings are common in experimental settings.However, the model struggles to learn a sparse set of parameters for the plasticity rule when faced with both high recording sparsity and noise.The evolution of the plasticity parameters during training in this case is illustrated in Figure 2G.

Inferring plasticity rules from behavior
Our approach can also be applied to behavioral data.This is particularly important because behavioral experiments are more widely available and easier to conduct than those that directly measure neural activity.We first validate the method on simulated behavior, mimicking decision-making experiments in which animals are presented with a series of stimuli which they choose to accept or reject.The animals' choices result in rewards and subsequent synaptic changes at behaviorallyrelevant synapses.
For this proof-of-principle, our ground-truth network architecture (Figure 3A, top) is inspired by recent studies that have successfully mapped observed behaviors to plasticity rules in the mushroom body (MB) of the fruit fly Drosophila melanogaster (Aso & Rubin, 2016;Rajagopalan et al., 2023;Li et al., 2020;Modi et al., 2020;Davis, 2023).In particular, this work indicates that the difference between received and expected reward information is instrumental in mediating synaptic plasticity  (Rajagopalan et al., 2023), and that learning and forgetting happen on comparable timescales (Aso & Rubin, 2016).
The neural network is structured with three layers: an input layer comprising 100 neurons, an output layer with 1000 neurons with sigmoid non-linearity, and a single readout neuron that calculates the average activity of the output layer to generate the probability of accepting a given stimulus, which is then used to sample a binary decision: either "accept" or "reject" the presented stimulus.A probabilistic binary reward R ∈ {0, 1} is provided based on the choice.The reward signal is common to all synapses, which could be interpreted as a global neuromodulatory signal like dopamine.This reward leads to changes in the plastic weights of the network, determined by the underlying synaptic plasticity rule.
Plasticity occurs exclusively between the input and output layers.We simulate a covariance-based learning rule (Loewenstein & Seung, 2006) known from previous experiments (Rajagopalan et al., 2023).The change in synaptic weight ∆w ij is determined by the presynaptic input x j , and a global reward signal r.This reward signal is the deviation of the actual reward R from its expected value E[R], which calculated as a moving average over the last 10 trials.We neglect hypothetical dependencies on y i because they are known to not impact reward learning in the fly mushroom body (although see also Table 1 and Appendix Table 2 for experiments with alternative plasticity rules): We model a plastic layer of neural connections that gives rise to learned behavior.The synaptic weights of the model are initialized randomly, reflecting the fact that in real-world biological systems, the initial synaptic configurations are usually unknown a priori.We consider a plasticity function parameterized through either a truncated Taylor series or a multilayer perceptron (MLP):

Recovering reward-based plasticity from behavior
Figure 3B presents the weight dynamics of three networks: the ground-truth synaptic update mechanism, as well as those fitted with an MLP or a Taylor series.In the case of the MLP, we tested various architectures and highlight results for a 4-10-1 neuron topology.Our evaluation metricshigh R 2 values for both synaptic weights and neural activity -affirm the robustness of our models in capturing the observed data (Figure 3C).The method accurately discerns the plasticity coefficient of the ground truth rule (Figure 3D,E), albeit with a reduced magnitude.The model also does a good job at explaining the observed behavior (Figure 3F), where we use the percent deviance explained (see section 4) as the performance metric.
Finally, we consider alternative plasticity rules in the ground-truth network.Table 1 summarizes the recoverability of various reward-based plasticity rules for both MLP and Taylor series frameworks, with results averaged over 3 random seeds.Note that solely reward-based rules (without E[R] or w) are strictly potentiating, as they lack the capacity for bidirectional plasticity.This unidirectional potentiation ultimately results in the saturation of the sigmoidal non-linearity.Therefore, it is possible to simultaneously observe high R 2 values for neural activities with low R 2 values for weight trajectories.
5 Application: inferring plasticity in the fruit fly In extending our model to biological data, we explore its applicability to the decision-making behavior in Drosophila melanogaster that inspired our simulated behavior results.Recent research (Rajagopalan et al., 2023) employed logistic regression to infer learning rules governing synaptic plasticity in the mushroom body, the fly's neural center for learning and memory.However, logistic regression cannot be used to infer plasticity rules that incorporate recurrent temporal dependencies, such as those that depend on current synaptic weights.Our method offers a more general approach.Specifically, we apply our model to behavioral data obtained from flies engaged in a two-alternative choice task, as outlined in Figure 4A.This allows us to investigate two key questions concerning the influence of synaptic weight on the plasticity rules governing the mushroom body.

Experimental setup and details
In the experimental setup, individual flies are placed in a symmetrical Y-arena where they are presented with a choice between two odor cues (Rajagopalan et al., 2023).Each trial starts with the fly in an arm filled with clean air (Fig. 4A, left).The remaining two arms are randomly filled with two different odors, and the fly was free to navigate between the three arms.When the fly enters the 'reward zone' at the end of an odorized arm, a choice was considered to have been made (Fig. 4A, right).Rewards are then dispensed probabilistically, based on the odor chosen.For model fitting, we use data from 18 flies, each subjected to a protocol that mirrors the trial and block structures in the simulated experiments presented previously.Over time, flies consistently showed a preference for the odor associated with a higher probability of reward, and this preference adapted to changes in the relative value of the options (Fig. 4B; example fly).

Plasticity in the fruit fly includes a synaptic weight decay
Existing behavioral studies in fruit flies have shown that these insects can forget learned associations between stimuli and rewards over time (Shuai et al., 2015;Aso & Rubin, 2016;Berry et al., 2018;Gkanias et al., 2022).One prevailing hypothesis attributes this forgetting to homeostatic adjustments in synaptic strength within the mushroom body (Davis & Zhong, 2017; Davis, 2023).However, earlier statistical approaches aimed at estimating the underlying synaptic plasticity rule present in the mushroom body were unable to account for recurrent dependencies such as synapse strength (Rajagopalan et al., 2023).Here we explore two types of plasticity rules: one based solely on reward and presynaptic activity, and another that incorporates a term dependent on current synaptic weight.Both rule types allocate significant positive weights to a term representing the product of presynaptic activity and reward (Fig. 4C, gray).Our results indicate that the model with a weight-dependent term offers a better fit to observed fly behavior (Wilcoxon signed-rank test: p = 5 × 10 −5 ; Fig. 4D), whereas the model without it matched the performance reported in Rajagopalan et al. ( 2023).Intriguingly, our analysis additionally reveals that the inferred learning rule assigns a negative value to the weight-dependent term (Fig. 4C).This negative sign aligns with the hypothesis that a weight-dependent decay mechanism operates at these synapses.The relative-magnitude of this decay term compared to the positive learning-related terms suggests that forgetting happens over a slightly longer-time scale than learning, in agreement with observed time-scales of forgetting in behavioral experiments Shuai et al. (2015); Davis & Zhong (2017) .

Incorporating reward expectation provides better fit than reward alone
The previous study used reward expectations to generate bidirectional synaptic plasticity (Rajagopalan et al., 2023).Our discovery of a negative weight-dependent component in the plasticity rule provides an alternate mechanism for bidirectional plasticity, raising the question of whether the neural circuit really needs to calculate reward expectation.Could a plasticity rule incorporating the product of presynaptic activity and absolute reward combine with a weight-dependent homeostatic term to approximate a plasticity rule that involves reward expectation?To answer this, we contrast two models: one using only the absolute reward and another using reward adjusted by its expectation, both complemented by weight-dependent terms.Our analyses show that adding a weightdependent term enhances the predictive power of both models (Fig 4E ,F).However, the model that also factors in reward expectations provides a superior fit for the majority of flies in the data set (Wilcoxon signed-rank test: p = 0.067; Fig 4F).These compelling preliminary findings reaffirm the utility of reward expectations for fly learning, and larger behavioral datasets could increase the statistical significance of the trend.Overall, our model-based inference approach, when applied to fly choice behavior, suggests that synaptic plasticity rules in the mushroom body of fruit flies are more intricate than previously understood.These insights could potentially inspire further experimental work to confirm the roles of weight-dependent homeostatic plasticity and reward expectation in shaping learning rules.

Related work
Recent work has begun to address the question of understanding computational principles governing synaptic plasticity by developing data-driven frameworks to infer underlying plasticity rules from neural recordings.Lim et al. (2015) infer plasticity rules, as well as the neuronal transfer function,  from firing rate distributions before and after repeated viewing of stimuli in a familiarity task.The authors make assumptions about the distribution of firing rates, as well as a first-order functional form of the learning rule.Chen et al. (2023) elaborate on this approach, fitting a plasticity rule by either a Gaussian process or Taylor expansion, either directly to the synaptic weight updates or indirectly through neural activity over the course of learning.Both approaches consider only the difference in synaptic weights before and after learning.In contrast, our approach fits neural firing rate trajectories over the course of learning and can be adapted to fit any parameterized plasticity rule.
Other work infers learning rules based on behavior instead.Ashwood et al. (2020) uses a Bayesian framework to fit parameters of learning rules in a rodent decision-making task.The authors explicitly optimize the weight trajectory in addition to parameters of the learning rules, requiring an approximation to the posterior of the weights.Our approach directly optimizes the match between the model and either neural or behavioral data, as defined by a pre-determined loss function.Interestingly, despite this indirect optimization, we see matching in the weight trajectories as well.Rajagopalan et al. ( 2023) fit plasticity rules in the same fly decision-making task we consider here.They assumed that the learning rule depended only on presynaptic activity and reward, which recasts the problem as logistic regression and permits easy optimization.Our approach allows us to account for arbitrary dependencies, such as on postsynaptic activities and synaptic weight values, and we thereby identify a weight decay term that leads to active forgetting.
Ramesh et al. ( 2023) also consider optimization of plasticity rules based on neural trajectories.Unlike our approach which uses an explicit loss function, the authors use a generative adversarial (GAN) approach to construct a generator network, endowed with a plasticity rule, to produce neural trajectories that are indistinguishable by a discriminator network from ground-truth trajectories.
Although, in principle, this approach can account for arbitrary and unknown noise distributions, it comes at a cost of high compute resources, a need for large amounts of data, and potential training instability -all well-known limitations of GANs.In practice, it is common to make an assumption about the noise model through an appropriately defined loss function (e.g.Gaussian noise for the MSE loss we use here).Importantly, the authors note a degeneracy of plasticity rules -different rules leading to similar neural dynamics.We see similar results, although we interpret this as "sloppiness" (Gutenkunst et al., 2007) -overparameterized models being underconstrained by the data (e.g.functions x and x 2 are indistinguishable if the only values of x which are sampled are 0 and 1), and hypothesize that in the infinite data limit the fitted plasticity rules would, in fact, be unique.
Previous work has also considered inferring plasticity rules directly from spiking data (Stevenson & Koerding, 2011;Robinson et al., 2016;Linderman et al., 2014;Wei & Stevenson, 2021) or selecting families of plausible rules in spiking neural network models (Confavreux et al., 2024).Due to the gradient-based nature of our optimization technique, our proposed approach can account for such data by converting spike trains to a rate-based representation by smoothing.Alternatively, blackbox optimization techniques such as evolutionary algorithms can be used to circumvent the need for computing gradients, allowing non-differentiable plasticity rules like spike-timing dependent plasticity to be used as model candidates.
Alternatively, meta-learning techniques (Thrun & Pratt, 2012) can be used to discover synaptic plasticity rules optimized for specific computational tasks (Tyulmankov et al., 2022;Najarro & Risi, 2020;Confavreux et al., 2020;Bengio et al., 1990).The plasticity rules are represented as parameterized functions of pre-and post-synaptic activity and optimized through gradient descent or evolutionary algorithms to produce a desired network output.However, the task may not be welldefined in biological scenarios, and the network's computation may not be known a priori.Our method obviates the need for specifying the task, directly inferring plasticity rules from recorded neural activity or behavioral trajectories.
Finally, Nayebi et al. (2020) do not fit parameters of a learning rule at all, but use a classifier to distinguish among four classes of learning rules based on various statistics (e.g.mean, variance) of network observables (e.g.activities, weights).Similarly, Portes et al. (2022) propose a metric for distinguishing between supervised and reinforcement learning algorithms based on changes in neural activity flow fields in a recurrent neural network.

Limitations and future work
Despite its strengths, our model has several limitations that offer avenues for future research.One such limitation is the lack of temporal dependencies in synaptic plasticity, neglecting biological phenomena like metaplasticity (Abraham, 2008).Extending our model to account for such temporal dynamics would increase its biological fidelity.Another issue is the model's "sloppiness" in the solution space; it fails to identify a unique, sparse solution even with extensive data.As neural recording technologies like Neuropixels (Steinmetz et al., 2021(Steinmetz et al., , 2018) ) and whole-brain imaging (Vanwalleghem et al., 2018) become more advanced, and connectome data for various organisms become increasingly available (Bentley et al., 2016;Scheffer et al., 2020;Hildebrand et al., 2017), there are exciting opportunities for validating and refining our approach.Incorporating these highresolution, large-scale datasets into our model is a crucial next step.In particular, future work could focus on scaling our approach to work with large-scale neural recordings and connectomics, offering insights into the spatial organization of plasticity mechanisms.Additional considerations for future research include the challenges posed by unknown initial synaptic weights, the potential necessity for exact connectome information, and the adequacy of available behavioral data for model fitting.

A.4 Additional experimental parameters
In these experiments, the ground truth plasticity rule is denoted as x j .r.We use a network with a 2-10-1 architecture and a sigmoid non-linearity.The plasticity MLP has a size of 4-10-1.The default L1 regularization is set to 1e-2, the moving average window is 10, and the input firing mean is 0.75.In each subsection, these values remain fixed except for the parameter being varied.

A.4.1 L1 regularization
We experiment with various values of the L1 regularization penalty applied to the Taylor coefficients.This encourages sparse plasticity solutions and prevents the coefficients from exploding into NaNs due to the learning of positive values that exponentially increase the synaptic weight as the number of time points grows.We do not apply L1 regularization to the MLP parameters.The moving average window refers to the window size used for calculating the expected reward.For example, a moving average window (MAW) of 10 would take the average of the rewards received over the last 10 trials.

A.5 Additional plasticity rules
In these additional experiments, we maintain the reward term as the difference from the expected reward.This approach facilitates bidirectional plasticity.Additionally, we incorporate a weight decay term, experimenting with several coefficients, ultimately choosing a value of 0.05 as it seems reasonable for our experimental configuration.Table 2 presents a comparison between the MLP and the Taylor series, reporting the R 2 over weights, R 2 over activity, and the percent deviance explained.At a high level, both methods appear to perform similarly.To gain deeper insights into why certain rules are more "recoverable" than others, an examination of weight dynamics for each method is necessary.The Taylor series model has 81 trainable parameters, while the MLP has 61.Fitting the rules with a relevant subset of the Taylor series, selected through biological priors as done in the Drosophila experimental data, is expected to result in better performance.

Figure 1 :
Figure1: Schematic overview of the proposed method.Animal-derived time-series data (yellow) and a plasticity-regulated in silico model g θ (blue) generate trajectories o o o(t) and m m m(t) .A loss function quantifies trajectory mismatch to produce a gradient, enabling the inference of synaptic plasticity rules.

Figure 2 :
Figure 2: Recovery of Oja's plasticity rule from simulated neural activity.(A) Schematic of the models used to simulate neural activity and infer plasticity.(B) Mean-squared difference between ground-truth and model synaptic weight trajectories over time (horizontal axis) over the course of training epochs (vertical axis).(C) The evolution of θ during training.Coefficients θ 110 and θ 021 , corresponding to Oja's rule values (1, −1), are highlighted in orange.(D) R 2 scores over weights, under varying noise and sparsity conditions in neural data.(E, F) Boxplots of distributions, across 50 seeds, corresponding to the first column (E) and row (F) in (D).(G) The evolution of learning rule coefficients over the course of training showing inaccurate θ recovery under high noise and sparsity conditions.

Figure 3 :
Figure 3: Recovery of a reward-based plasticity rule from simulated behavior.(A) Schematic of the models used to stimulate behavior and infer plasticity rules.(B) The evolution of the weight of a single synapse, trained with g Taylorθ

Figure 4 :
Figure 4: Inferring principles of plasticity in the fruit fly.(A) Schematic of the experimental setup used to study two-alternative choice behavior in flies.Left Design of arena showing how odor entry ports and location of the reward zones.Right Description of the trial structure showing two example trials.(B) The behavior of an example fly in the task.Top Schematics indicate the reward baiting probabilities for each odor in the three blocks.Bottom Individual odor choices are denoted by rasters, tall rasters -rewarded choices, short rasters -unrewarded choices.Curves show 10-trial averaged choice (red) and reward (black) ratio, and horizontal lines the corresponding averages over the 80-trial blocks.(C) Final inferred θ value distribution across 18 flies, comparing models with and without a w ij (θ 001 ) term and the method from Rajagopalan et al. (2023).(D) Left Goodness of fit between fly behavior and model predictions plotted as the percent deviance explained (n = 18 flies).Right Change in the percent deviance explained calculated by subtracting percent deviance explained of model without a w ij (θ 001 ) term from that of a model with a w ij (θ 001 ) term.(E,F) Same as (C,D), except comparing models that do or don't incorporated reward expectation.Since these models include weight dependence, they cannot be fit using the method ofRajagopalan et al. (2023).

Figure 5 :
Figure 5: Effect of L1 regularization on R2 of weights for Taylor plasticity rule

Figure 6 :
Figure 6: Effect of moving average window (used for calculated expected reward) on the performance of learned plasticity rule

Figure 7 :
Figure 7: Effect of input firing mean (used for odor representation) on the performance of learned plasticity rule

Table 1 :
Assessment of various reward-based plasticity rules: R 2 scores for weight and individual neural activity trajectories, and the percentage of deviance explained for behavior.Refer to Appendix Table2for a comprehensive list of plasticity rules.truth reward r is used as the reward value in the weight update function.We use Binary Cross-Entropy (BCE) as the loss function, which is proportional to the model's negative log-likelihood function, to quantify the difference between the observed decisions and the model's predicted probabilities of accepting a stimulus.Crucially, the training data available for training the model only consists of these binary decisions (accept or reject), without direct access to the underlying synaptic weights or neural activity:

Table 2 :
Evaluation of various different reward-based plasticity rules: R 2 scores for weight and individual neural activity trajectories, and percentage of deviance explained for behavior.Weights R 2 Activity % Deviance R 2 Weights R 2 Activity % Deviance x j y i w ij r − 0.05r