## Abstract

The search for biologically faithful synaptic plasticity rules has resulted in a large body of models. They are usually inspired by – and fitted to – experimental data, but they rarely produce neural dynamics that serve complex functions. These failures suggest that current plasticity models are still under-constrained by existing data. Here, we present an alternative approach that uses meta-learning to discover plausible synaptic plasticity rules. Instead of experimental data, the rules are constrained by the functions they implement and the structure they are meant to produce. Briefly, we parameterize synaptic plasticity rules by a Volterra expansion and then use supervised learning methods (gradient descent or evolutionary strategies) to minimize a problem-dependent loss function that quantifies how effectively a candidate plasticity rule transforms an initially random network into one with the desired function. We first validate our approach by re-discovering previously described plasticity rules, starting at the single-neuron level and “Oja’s rule”, a simple Hebbian plasticity rule that captures the direction of most variability of inputs to a neuron (i.e., the first principal component). We expand the problem to the network level and ask the framework to find Oja’s rule together with an anti-Hebbian rule such that an initially random two-layer firing-rate network will recover several principal components of the input space after learning. Next, we move to networks of integrate-and-fire neurons with plastic inhibitory afferents. We train for rules that achieve a target firing rate by countering tuned excitation. Our algorithm discovers a specific subset of the manifold of rules that can solve this task. Our work is a proof of principle of an automated and unbiased approach to unveil synaptic plasticity rules that obey biological constraints and can solve complex functions.

## 1 Introduction

Synaptic plasticity is widely agreed to be essential for high level functions such as learning and memory. Its mechanisms are usually modelled with plasticity rules, i.e., functions that describe the evolution of the strength of a synapse. Current experimental techniques do not allow the tracking of relevant synaptic quantities over time at the population level, especially over the duration of learning. Therefore, most plasticity rules in the literature were derived from a few experiments in single synapses *ex vivo*, e.g., spike timing-dependent-plasticity [1–5]. Such rules do not usually construct a specific function or architecture to a network model on their own [6], unless they are carefully combined and orchestrated [7–9]. The link between the function of a network and the low level mechanisms that lead to its structure thus remains elusive.

Here we aim to bridge this gap by deducing plasticity rules from indirect but accessible quantities in the brain: the function of a network (e.g., elicited behaviour, population activity, etc.) or its architecture. Major technical breakthroughs in the field of behavioural neuroscience and connectomics have vastly increased the amount of data for different aspects (or levels) of neuroscience [10–13], and we wondered if we could use these newly available results to deduce how a nervous system is constructed from scratch. Here, we present a meta-learning framework aiming to infer plasticity rules based on their ability to ascribe a desired function or architecture to an initially random neural network model. We present three example cases of rate and spiking neural network models for which such a numeric deduction of plasticity rules can be successfully performed. We point out their current limitations and discuss possible ways forward.

## 2 Related work

The idea to use supervised learning to learn unsupervised (local) learning rules dates back to the early 90s [14, 15] and resurfaced recently with the development of robust numerical optimization methods [16, 17], growing computational resources, and the advent of meta-learning [18, 19], which provides a convenient framework to tackle such questions. In some approaches the focus was to learn unsupervised or semi-supervised rules for representation learning or improved generalisation capabilities [20–24]. Others aim to learn optimizers that can then be used for supervised learning [25]. More Neuroscience-oriented approaches attempt to find which learning rules could implement a biologically plausible version of backpropagation [26, 27]. In contrast to most works described previously relying on numerical optimization to find learning rules, others analytically develop and infer learning rules that can elicit certain biologically inspired functions [7, 8, 28–30].

Overall, our work is complementary to the above work. Specifically, we provide a scalable framework to automatically discover biologically plausible rules that carve specific computational functions into otherwise analytically intractable spiking networks while at the same time relying on interpretable parametrizations suitable for experimental verification.

## 3 Results

As a proof of principle that biologically plausible rules can be deduced by numerical optimization, we show that a meta learning framework was able to rediscover known plasticity rules in rate-based and spiking neuron models.

### 3.1 Rediscovering Oja’s rule at the single neuron level

As a first challenge, we aimed to rediscover Oja’s rule, a Hebbian learning rule known to cause the weights of a two-layer linear network to converge to the first principal vector of any input data set with zero mean [31]. Towards this end, we built a single rate neuron following dynamics such that
where *y _{i}* is the activity of the postsynaptic neuron

*i*(

*i*= 1 in the single neuron case),

*x*is the activity of the presynaptic neuron

_{j}*j*, and

*w*is the weight of the connection between neurons

_{ij}*j*and

*i*. Oja’s rule can be written as where

*η*is a learning rate. To rediscover Oja’s rule from a nondescript, general starting point, the search space of possible plasticity rules was constrained to polynomials of up to second order over the parameters of the presynaptic activity, postsynaptic activity and connection strength. This resulted in widely flexible learning rules with 27 parameters

*A*, where each index indicates the power of either presynaptic activity, , postsynaptic activity, , or weight, ,

_{αβδ}In this formulation, Oja’ rule can be expressed as

Note that higher order terms could be included but are not strictly necessary for a first proof of principle. We went on to investigate if we could rediscover this rule from a random initialisation. We simulated a single rate-neuron with *N* input neurons and an initially random candidate rule in which parameters were drawn from a Gaussian distribution, (Fig. 1A). A loss function was designed by comparing the final connectivity of a network trained with the candidate rule to the first principal vector of the data, **PC _{1}**, i.e., we aimed for a rule able to produce the first principal vector, but were agnostic with respect to the exact parametrisation,

The norm used throughout this study is the *L*^{2} norm. Updates of the plasticity rule were implemented by minimizing using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) method [17]. We chose CMA-ES instead of a gradient based strategy such as ADAM [16]due to its better scalibilty with network size (Supplementary Fig. S1). Clipping strategies to deal with unstable plasticity rules (rules that trigger a numerical error in the inner loop and thus an undefined loss) were used. To prevent overfitting (rules that would score low losses only on specific datasets), each plasticity rule was tested on many input datasets from a *N*-dimensional space (typically *N _{dataset}* = 20). Overall, this approach successfully recovers Oja’s rule, for any input dataset dimensions that we could test in reasonable time (up to 100 input neurons, Fig. 1B and Fig. 1C).

### 3.2 Rediscovering two co-active plasticity rules in a rate network

Next, we extended our framework to the network level, using a two-layer rate network, in which *M* ≤ *N* interconnected output neurons extracted further principal components from a *N*-dimensional input dataset [32] (Fig. 2A). To extract additional principal components, this network should use Oja’s rule to modify the input (feedforward synapses), and the lateral connections between neurons of the output layer should be adjusted by an anti-Hebbian learning rule [32]. In our model, in addition to the all-to-all connections between input and output layers, the output neurons were thus interconnected with plastic lateral connections, being described by
where is the lateral connection between output neurons *j* and *i*. The desired network function (that determines our loss function) expressed the first *M* principal vectors of the input dataset in the activity of the output neurons.

In our framework, both the feedforward and the lateral plasticity rules were parameterised like in the single neuron case above. Specifically, the additional *lateral* plasticity rule acting on synapses within the output layers followed

In this formulation, the correct anti-Hebbian rule could be parameterised as

It should be noted here that the structure of the lateral connectivity within the output layer was fixed and hierarchical (Fig. 2A), such that some neurons only sent, or only received synapses. Only the weights of the existing connections were changed according to the candidate plasticity rules described above. Both candidate plasticity rules were co-optimized using the same optimizer as previously described [17]. A loss function was designed to quantify how much the incoming weights to each output neuron differed from the *i ^{th}* principal component,
where [

*w*

_{i1}

*w*

_{i2}⋯

*w*] are the incoming connections to the

_{iN}*i*output neuron and

^{th}**PC**

_{i}the

*i*principal vector of the input dataset after training.

^{th}Our algorithms were able to recover both target plasticity rules – Oja’s rule and the anti Hebbian rule – up to a scalar factor (which could be attributed to the *L*^{1} regularization used on the coefficients of the plasticity rule) in networks of up to 50 input and 5 output neurons (Fig. 2B and Fig. 2C). Increasing the size of the networks in the inner loop exponentially increases the computing time until convergence of the meta-optimization. We observed that, for larger network sizes, more metaiterations were needed, while each iteration in the inner loop required longer to compute, and more iterations were required for the networks in the inner loop to converge.

### 3.3 Learning inhibitory plasticity in a spiking neuron

In a first attempt to produce candidate rules that could inspire and guide future experimental studies of natural, multi-cell-type plasticity, we introduced a more biologically realistic neuron model, i.e., a model that produced spikes. Following previous work [29], we constructed a model with a single conductance-based leaky integrate-and-fire neuron, receiving 800 excitatory and 200 inhibitory afferents that are separated in 8 input groups (Fig. 3A). Excitatory afferents were hand-tuned, with a set of strengthened, preferred signal afferents. Inhibitory afferents were plastic and initially random.

The membrane potential dynamics of the postsynaptic neuron followed
with the excitatory and inhibitory conductances *g _{E}*(

*t*) and

*g*(

_{I}*t*) for excitatory and inhibitory synapses, respectively. A postsynaptic spike is emitted whenever the membrane potential

*V*(

*t*) crosses a threshold

*V*

_{th}from below, with an instantaneous reset to

*V*

_{reset}. The membrane potential is clamped at

*V*

_{reset}for the duration of the refractory period,

*τ*

_{ref}, after the spike. Conductances changed according to where

*τ*

_{m}= 20 ms,

*V*

_{rest}= −60 mV,

*E*

_{E}=0 mV,

*E*= −80 mV,

_{I}*g*

_{leak}= 10 nS, ,

*τ*

_{E}= 5 ms, and

*τ*

_{I}= 10 ms were taken from previous work [29], and is the spike train of presynaptic neuron

*k*, with being the spike times of neuron

*k*. The variables

*x*(

_{k}*t*) and

*x*(

_{post}*t*) account for the trace of spike trains of pre- and postsynaptic spikes, where

*τ*

_{pre}and

*τ*

_{post}are the time constants of the traces associated to the pre- and postsynaptic neurons, respectively.

The input spike trains were generated similarly to previous work [29]. We defined 8 input groups, each having a time varying firing rate added to a baseline of 5 Hz. The varying firing rate consisted of a random walk with *τ*_{input} = 50 ms, followed by a sparsification of the number of activity bumps above baseline, in which every second bump of activity was omitted (see ref. 29 for details).

Only the inhibitory synaptic weights, i.e., were plastic. Excitatory weights were defined according to their input group,
where *G _{k}* is the input group of afferent

*k, P*= 5 is the preferred input group, and

*ϵ*is a noise term drawn from uniform distribution between 0 and 0.1. The candidate plasticity rule was parametrised using pre- and postsynaptic spike times,

_{k}The average changes of the rule can be described as a Volterra expansion of synaptic changes based on the activity of pre- and postsynaptic activities, *ν*_{pre} and *ν*_{post}, respectively,

Following previous work [29], we aimed for a plasticity rule to establish stable postsynaptic firing rate *r*_{tg} = 5 Hz (Fig. 3A). As such we expressed the loss function as
where is the postsynaptic firing rate measured during a 10 seconds window, chosen randomly inside a 30 seconds scoring phase after a 1 minute training phase. The denominator penalized very low firing rates and thus aided the optimization.

For the learning rule to enforce a stable (and low) firing-rate, we expected the framework to produce a learning rule that balances excitation and inhibition, similar to previously proposed, hand-tuned spiketiming based inhibitory plasticity rules [29, 33], in which EI balance was shown to be a by-product of a learning rule imposing stable (constant) firing-rates. The family of rules found using ADAM [16] with gradients computed using finite differences on the search space defined in Eq. 14 differs from the rules reported previously [29] (Fig. 3B). The previously published inhibitory plasticity rule [29] relies solely on a presynaptic decay term and a Hebbian term to establish target firing (and EI balance). The rules found here use a mixture of Hebbian terms (*γ, τ*_{pre}, *κ*, *τ*_{post}) and sole postsynaptic terms (*β*). This can be explained by a steady state analysis of the model, showing that

This means that the six parameters of the plasticity rule are bound by a single equation, and can compensate for each other, so there are many more ways to achieve a target firing rate than relying on Hebbian terms only [29, 33], or on postsynaptic terms only. The parameters *α* and *β* are constrained to be negative and positive, respectively, to ensure that the fixed-point for the firing-rate is stable.

Our algorithm always started the meta-learning process by balancing *α* and *β*, because that is effectively the quickest way to decrease the loss. Only in a second step the algorithm optimised other parameters that give additional, albeit smaller benefits. Mathematically, we can understand this behaviour from the fact that the combination of the Hebbian learning rates, *γ* and *κ*, multiplied by the time constants of the traces, *τ*_{pre} and *τ*_{post}, automatically sets *β* to dominate the denominator of the equation above (*β* >> *γτ*_{pre} + *κτ*_{post}). This creates a bound in the possible learning rules,

Our intuition is confirmed when plotting several optimization trajectories in the (*α, β*) plane (Fig. 3D). We noticed that the family of rules found by our framework all stayed closed to the boundary, reflecting the small subset of solutions selected by the algorithm, both relatively “easy” to find in parameter space and quick to establish steady state.

The manifold of compliant plasticity rules is bigger than Eq. 18 implies (i.e., that only *α, β* matter). This can be seen, e.g., in rules that make use of the Hebbian terms at their disposal. However, due to long compute times, we only allowed one minute of simulated time to elapse for a postsynaptic neuron to enforce the desired firing rate, therefore the steady state solutions that are too slow to appear don’t have as good a loss with our framework. It follows that non-zero *β* is an efficient way to quickly establish the desired firing rate in our set-up. Notably, non-zero Hebbian terms allow the establishment of a *detailed* balance [34] (Fig. 3C), in a similar fashion to previously studied inhibitory plasticity rule [29] (Fig. 3C, see Supplementary Fig. S2 for more details). They allow a more regular firing and thus help the loss to be more reliable. Such elements to the rules could become more important in recurrent network simulations.

## 4 Discussion

We propose a meta-learning approach that searches and finds (locally) optimal plasticity rules with respect to a desired network function or architecture. Our framework requires both the ability to quantify the desired network function through the design of a loss, and a sensible yet reasonably flexible parametrisation of the candidate plasticity rules, and the quantities and variables they must rely on. Our framework is able to recover known rules in both rate and spiking models.

A recent approach with similar aims predicts testable plasticity rules in spiking neurons, but uses a different optimization strategy (cartesian genetic programming) and subsequently a different parametrisation [35], and is thus complementary to our work.

However, several challenges remain: our approach is computationally heavy, it remains to be studied how it fares in large non linear systems like spiking networks. Even though we are interested in the learning rules themselves (which should be relatively scale invariant) and not in large networks per se, problems that cannot be down scaled efficiently could remain out of reach. Moreover, the parametrisations used in this study, while flexible enough for a proof of principle, might need to be extended to describe real-life plasticity rules.

## 5 Conclusion and future work

In summary, we present a proof of principle that plasticity rules can be derived with a meta-learning framework that iteratively refines new rules through minimisation of a loss function reflecting a desired network output. There are multiple challenges ahead, ranging from technical issues of computing gradients (or not) to the choice of a parametrisation but our results promise a new perspective on plasticity rules that may explain both form and function of cortical circuitry.

### Broader Impact

There may be up to 140 different synaptic plasticity rules at play in everyday behaviours such as making a simple memory. We have only begun to understand five or less of these rules, and for the foreseeable future experimental neuroscience will not be able to deliver the necessary data to dis-entwine this difficult puzzle. Machine learning and modern computing, on the other hand, have made huge advances in being able to simulate and analyse highly complex tasks. Utilising this power to infer plasticity rules and thus create experimental hypothesis is entirely possible, timely and urgent. We thus propose a first step in the development of a set of computational tools that allows us to discover the synaptic plasticity mechanisms responsible for developing and maintaining complex structures through neuronal activity. Machine learning techniques give us the benefit of targeted, gradient-directed searches combined with fast and computationally powerful searches. We aim to eventually run our meta-learning algorithms to achieve connectivity and function of healthy and aberrant neural phenomena. Soon, we will be able to directly affect translational approaches that aim to utilise plasticity protocols for therapeutic approaches. Finding the families of plasticity rules that create functional neuronal networks in the brain will be a crucial and long lasting contribution to basic and applied science. Finally, our findings may also inspire the development of new ML tools, both for the analysis and training of artificial neural networks, which still have to live up to their potential in terms of generalisation and semantic knowledge representation. Biologically inspired rules may just prove to be the solution to many a problem at hand.

## Acknowledgments and Disclosure of Funding

We would like to thank Chaitanya Chintaluri, Georgia Christodoulou, Bill Podlaski and Merima Šabanović for useful discussions and comments. This work was supported by a Wellcome Trust Senior Research Fellowship (214316/Z/18/Z), a BBSRC grant (BB/N019512/1), an ERC consolidator Grant (SYNAPSEEK), a Leverhulme Trust Project Grant (RPG-2016-446), and funding from École Polytechnique, Paris.

## Footnotes

34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.

https://github.com/basile6/MetaLearnBiologicallyPlausibleRules