Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A Neuronal Least-Action Principle for Real-Time Learning in Cortical Circuits

View ORCID ProfileWalter Senn, Dominik Dold, Akos F. Kungl, Benjamin Ellenberger, Jakob Jordan, Yoshua Bengio, João Sacramento, Mihai A. Petrovici
doi: https://doi.org/10.1101/2023.03.25.534198
Walter Senn
aDepartment of Physiology, University of Bern
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Walter Senn
  • For correspondence: senn@pyl.unibe.ch
Dominik Dold
aDepartment of Physiology, University of Bern
bKirchhoff-Institute for Physics, Heidelberg University
cEuropean Space Research and Technology Centre, European Space Agency
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Akos F. Kungl
aDepartment of Physiology, University of Bern
bKirchhoff-Institute for Physics, Heidelberg University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Benjamin Ellenberger
aDepartment of Physiology, University of Bern
fInsel Data Science Center, University Hospital Bern
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jakob Jordan
aDepartment of Physiology, University of Bern
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yoshua Bengio
dMILA, University of Montreal
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
João Sacramento
eDepartment of Computer Science, ETH Zurich
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mihai A. Petrovici
aDepartment of Physiology, University of Bern
bKirchhoff-Institute for Physics, Heidelberg University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

One of the most fundamental laws of physics is the principle of least action. Motivated by its predictive power, we introduce a neural least-action principle that we apply to motor control. The central notion is the somato-dendritic mismatch error within individual neurons. The principle postulates that the somato-dendritic mismatch errors across all neurons in a cortical network are minimized by the voltage dynamics. Ongoing synaptic plasticity reduces the somato-dendritic mismatch error within each neuron and performs gradient descent on the output cost in real time. The neuronal activity is prospective, ensuring that dendritic errors deep in the network are prospectively corrected to eventually reduce motor errors. The neuron-specific errors are represented in the apical dendrites of pyramidal neurons, and are extracted by a cortical microcircuit that ‘explains away’ the feedback from the periphery. The principle offers a general theoretical framework to functionally describe real-time neuronal and synaptic processing.

Introduction

Wigner’s remark about the ‘unreasonable effectiveness’ of mathematics in allowing us to understand physical phenomena (Wigner, 1959) is famously contrasted by Gelfand’s quip about its ‘unreasonable ineffectiveness’ in doing the same for biology (Borovik, 2021). Considering the component of randomness that is inherent to evolution, this may not be all that surprising. However, while this argument holds just as well for the brain at the cellular level, ultimately brains are computing devices. At the level of computation, machine learning and neuroscience have revealed near-optimal strategies for information processing and storage, and evolution is likely to have found similar principles through trial and error (Hassabis et al., 2017). Thus, we have reason to hope for the existence of fundamental principles of cortical computation that are similar to those we have found in the physical sciences. Eventually, it is important for such approaches to relate these principles back to brain phenomenology and connect function to structure and dynamics.

In physics, a fundamental measure of ‘effort’ is the action of a system, which nature seeks to ‘minimize’. Given an appropriate description of interactions between the systems constituents, the least-action principle can be used to derive the equations of motion of any physical system (Feynman et al., 2011; Coopersmith, 2017). Here, we suggest that in biological information processing, a similar principle holds for prediction errors, which are of obvious relevance for cognition and behavior. Based on such errors, we formulate a neuronal least-action (NLA) principle which can be used to derive neuronal dynamics and map them to observed dendritic morphologies and cortical microcircuits. Within this framework, local synaptic plasticity at basal and apical dendrites can be derived by stochastic gradient descent on errors. The errors that are minimized refer to the errors in output neurons that are typically thought to represent motor trajectories planed and encoded in cortical motor areas and ultimately in the spinal cord and muscles.

The theory considers the minimization of a mismatch between the effective output trajectory and an internally or externally imposed target trajectory. The strength by which the output trajectory is pushed towards the target trajectory is referred to as ‘nudging strength’. For weak nudging, the feedback slightly adapts the internal network states so that the output activities closer follow their targets. For a target that is fixed in time, the dynamics reduces to the Equilibrium Propagation (Scellier & Bengio, 2017). For strong nudging, the output neurons are clamped to tightly follow the target trajectory (Song et al., 2022; Meulemans, Zucchet, et al., 2022). Each network neuron performs gradient descent on its own dendritic prediction error (Urbanczik & Senn, 2014), and with this eventually also contributes to a reduction of the output errors. For both versions, weak and strong nudging, synaptic plasticity in an interneuron feedback circuitry supports the extraction of neuron-specific dendritic errors (Sacramento et al., 2018), providing a cortical implementation of the suggested ‘real-time dendritic error propagation (rt-DeEP)’.

The NLA principle can be put into the history of energy-based models used to understand neuronal processing and learning in recurrent neural networks (Hopfield, 1982), artificial intelligence (LeCun et al., 2006), and biological versions of deep learning (Scellier & Bengio, 2017; Richards et al., 2019; Song et al., 2022). As a continuous-time theory of cortical processing that instantaneously corrects neuronal activities throughout the network, it makes a link to optimal feedback control (Todorov & Jordan, 2002) that has recently been considered in terms of energy-based models at equilibria (Meulemans, Zucchet, et al., 2022). It can further be seen in the tradition of predictive coding (Rao & Ballard, 1999; Bastos et al., 2012), where cortical feedback connections try to explain away lower-level activities. Yet, our prospective coding extrapolates from current quantities to predict activity in the future, not the activity at the current point in time, as in classical predictive coding. The NLA principle combines energy-based models with prospective coding in which neuronal integration delays are compensated on the fly (as also done in Haider et al., 2021). The prospective coding within each neuron may overcome putative delays that extend across the whole cortex and would allow for a real-time processing of stimuli while instantaneously correcting for ongoing errors. In this sense, the NLA represents an alternative to the recently suggested forward-forward algorithm which circumvents the delay problem by avoiding error propagation altogether (Hinton, 2022).

The paper is organized as follows: we first define the prospective somato-dendritic mismatch error, construct out of this the mismatch energy of a network, and ‘minimise’ this energy to obtain the error-corrected, prospective voltage dynamics of the network neurons. We then show that the prospective error coding leads to an instantaneous and joint processing of low-pass filtered input signals and backpropagated errors. Applied to motor control, the instantaneous processing is interpreted as a moving equilibrium hypothesis according to which sensory inputs, network state, motor commands and muscle feedback are in a self-consistent equilibrium at any point of the movement. We then derive a local learning rule that globally minimizes the somato-dendritic mismatch errors across the network, and show how this learning can be implemented through error-extracting cortical microcircuits and dendritic predictive plasticity.

Results

Somato-dendritic mismatch errors and the Lagrangian of cortical circuits

We consider a network of neurons – identified as pyramidal cells – with firing rates ri(t) in continuous time t. The somatic voltage ui of pyramidal neuron i is driven by the close-by basal input current, ∑j Wij rj, with presynaptic rates rj and synaptic weights Wij, and an additional distal apical input ei that will be learned to represent a prospective prediction error at any moment in time (Fig. 1a). While in classical rate-based neuron models the firing rate ri of a neuron is a function of the somatic voltage, we postulate that the firing rate is prospective and extrapolates from ρ(ui) into the future with the temporal derivative, Embedded Image, with Embedded Image representing the temporal derivative of ρ(ui (t)). There is experimental evidence for such prospective coding in cortical pyramidal neurons to which we return later (Fig. 2a).

Figure 1:
  • Download figure
  • Open in new tab
Figure 1: Somato-dendritic mismatch energies and the neuronal least-action (NLA) principle.

(al) Sketch of a cross-cortical network of pyramidal neurons described by NLA. (a2) Correspondence between elements of NLA and biological observables such as membrane voltages and synaptic weights. (b1) The NLA principle postulates that small variations Embedded Image (dashed) of the trajectories Embedded Image (solid) leave the action invariant, δA = 0. It is formulated in the look-ahead coordinates Embedded Image (symbolized by the spyglass) in which ‘hills’ of the Lagrangian (shaded grey zones) are foreseen by the prospective voltage so that the trajectory can turn by early enough to surround them. (b2) In the absence of output nudging (β = 0), the trajectory u(t) is solely driven by the sensory input, and prediction errors and energies vanish (L = 0, outer blue trajectory at bottom). When nudging the output neurons towards a target voltage (β > 0), somato-dendritic prediction errors appear, the energy increases (red dashed arrows symbolising the growing ‘volcano’) and the trajectory u(t) moves out of the L = 0 hyperplane, riding on top of the ‘volcano’ (red trajectory). Synaptic plasticity W reduces the somato-dendritic mismatch along the trajectory by optimally ‘shoveling down the volcano’ (blue dashed arrows) while the trajectory settles in a new place on the L = 0 hyperplane (inner blue trajectory at bottom).

Figure 2:
  • Download figure
  • Open in new tab
Figure 2: Prospective coding in cortical pyramidal neurons enables instantaneous voltage-to-voltage transfer.

(a1) The instantaneous spike rate of cortical pyramidal neurons (top) in response to sinusoidally modulated noisy input current (bottom) is phase-advanced with respect to the input (adapted from Köndgen et al., 2008). (a2) Similiarly, in NLA, the instantaneous firing rate of a model neuron (Embedded Image, black) is phase-advanced with respect to the underlying voltage (u, red, postulating that the low-pass filtered rate is a function of the voltage, Embedded Image). (b) Dendritic input in the apical tree (here called Embedded Image) is instantaneously causing a somatic voltage modulation (u, (modeling data from Ulrich, 2002)). The low-pass filtering with τ along the dendritic shaft is compensated by a lookahead mechanism in the dendrite Embedded Image. In Ulrich (2002) a phase advance is observed even with respect to the dendritic input current, not only the dendritic voltage, although only for slow modulations (as here). (c) While the voltage of the first neuron (u1) integrates the input rates rin from the past (bottom black upward arrows), the output rate r1 of that first neuron looks ahead in time, Embedded Image (red dashed arrows pointing into the future). The voltage of the second neuron (u2) integrates the prospective rates r1 (top black upwards arrows). By doing so, it inverts the lookahead operation, resulting in an instantaneous transfer from u1(t) to u2(t) (blue arrow and circles).

The second central notion of the theory is the prospective somato-dendritic mismatch error in the individual network neurons, ei(t). It is defined as a mismatch between the prospective voltage, Embedded Image, and the weighted prospective input rates, Embedded Image. If the prospective error is low-pass filtered with time constant τ, it takes the form Embedded Image, where Embedded Image is the corresponding low-pass filtered firing rate of the presynaptic neuron j (Methods). We refer to Embedded Image as somato-dendritic mismatch error of neuron i.

We next interpret the mismatch error Embedded Image in terms of the morphology of pyramidal neurons with basal and apical dendrites. While the error is formed in the apical dendrite, this error is added to the somatic voltage and, from the perspective of the basal inputs, it becomes a somato-dendritic mismatch error. The mismatch error tells the difference between ‘what a neuron does’, which is based on the somatic voltage ui, and ‘what the basal inputs think it should do’, which is based on its own input Embedded Image (Fig. 1a2). The two quantities may deviate because the neuron gets the apical input ēi that integrates in the soma, Embedded Image, but not in the basal dendrite. What cannot be predicted from the somatic voltage ui by the basal input remains as ‘somato-basal’ mismatch error, Embedded Image, and this is just the apical error Embedded Image.

Associated with this mismatch error is the somato-dendritic mismatch energy defined for each network neuron Embedded Image as the squared mismatch error, Embedded Image

On a subset of output neurons of the whole network, Embedded Image, a cost is defined as a function of the somatic voltage and some instructive reference signal such as a targets or a reward. When a target trajectory Embedded Image is available, the cost is defined at each time point as a squared target error Embedded Image

Much more general cost functions and mismatch energies are conceivable, encompassing e.g. conductance-based neurons or including further dynamic variables (see SI). The cost represents a performance measure for the entire network that produces the output voltages uo(t) in response to some input rates rin(t). The cost directly relates to behavioral or cognitive measures such as the ability of an animal or human to perform a particular task in real time. The target could be provided by explicit external supervision, for example target movements in time encoded by Embedded Image, it could represent an expected reward signal, or it could arise via self-supervision from other internal prediction errors.

We define the Lagrangian (or total ‘energy’) of the network as a sum across all mismatch energies and costs, weighted by the nudging strength β of the output neurons, Embedded Image

The low-pass filtered presynaptic rates, Embedded Image, also encompass the external input neurons. Due to the prospective coding, the Lagrangian can be minimal at any moment in time while the network dynamics evolves. This is different from the classical predictive coding (Rao & Ballard, 1999) and energy-based approaches (Scellier & Bengio, 2017; Song et al., 2022), where a stimulus needs to be fixed in time while the network relaxes to a steady state, and only there the prediction error is minimized. Since we postulated that the firing rates are prospective and linearly extrapolate into the future, Embedded Image, and because any quantity can be reconstructed from its low-pass filtering at any point in time, Embedded Image, we conclude that the low-pass filtered rate is a function of the instantaneous somatic voltage, Embedded Image, see SI. As a consequence, also the Lagrangian becomes a function of the instantaneous voltage u(t) and does not depend on past voltage values, although it depends on past firing rates.

The previous mathematical operation of extrapolating from the voltage to future rates, and going back to the current voltage via low-pass filtering, is the key ingredient for the magically sounding instantaneous processing, see Fig. 2c.

To get the instantaneity it is not really necessary to precisely look into the future. It is only necessary to undo the temporal operation of the postsynaptic neuron, and this is achieved by encoding the presynaptic voltage in the prospective rate. We come back to this instantaneity, including its practical limitations, in the overnext section.

The least-action principle expressed for prospective voltages

Motivated by the prospective firing in pyramidal neurons, we postulate that cortical networks strive to look into the future to prevent instantaneous errors. Each neuron tries to move along a trajectory that minimizes its own mismatch error ēi across time (Fig. 1b1). The ‘neuronal currency’ with which each neuron ‘trades’ with others to choose its own error-minimizing trajectory is the future discounted membrane potential, Embedded Image

These prospective voltages Embedded Image are the ‘canonical coordinates’ entering the NLA principle, and in these prospective coordinates the overall network searches for a ‘least action trajectory’. Since from Embedded Image we can recover the instantaneous voltage via Embedded Image (see SI), we can replace u in the Lagrangian and obtain L as a function of our new canonical coordinates Embedded Image and the ‘velocities’ Embedded Image, i.e. Embedded Image, where bold fonts represent vectors. Inspired by the least-action principle from physics, we can now define the neuronal action A as a time integral of the Lagrangian, Embedded Image

The NLA principle postulates that the trajectory Embedded Image keeps the action A stationary with respect to small variations Embedded Image (Fig. 1b1). In other words, nature chooses a trajectory such that, when deviating a little bit from it, say by Embedded Image, the value of A will not change (or at most up to second order in the variation), formally δA = 0. The equations of motion that keep the action stationary are known to satisfy the Euler-Lagrange equations Embedded Image

Applying these equations to our Lagrangian yields a prospective version of the classical leaky integrator voltage dynamics, with rates r and errors e that are looking into the future (Methods), Embedded Image Embedded Image

The ‘o’ denotes the component-wise product, and the weight matrix splits into weights from input neurons and weights from network neurons, W = (Win, Wnet). While for output neurons a target error can be defined, Embedded Image, for non-output neurons i no target exist and we hence set Embedded Image. In a control theoretic framework, the neuronal dynamics (Eq. 7a) represents the state trajectory, and the adjoint error dynamics (Eq. 7b) represents the integrated costate trajectory (Todorov, 2006).

Formally, the voltage dynamics in Eq. 7a specifies an implicit differential equation since Embedded Image also appears on the right-hand side. This is because the prospective rates Embedded Image imply Embedded Image through Embedded Image. Likewise, the prospective errors Embedded Image, with Embedded Image given in Eq. 7b and plugged into Eq. 7a, imply Embedded Image through Embedded Image. Nevertheless, the voltage dynamics can be run by replacing Embedded Image on the right-hand side of Eq. 7a by the temporal derivative Embedded Image from the previous time step (technically, the Hessian Embedded Image is required to be strictly positive definite, see Methods and SI). This ensures that the voltage dynamics of Eq. 7 can be implemented in cortical neurons with a prospective firing and a prospective dendritic error (see Fig. 2).

The error expression in Eq. 7b is reminiscent of error backpropagation (Rumelhart et al., 1986) and can in fact be mathematically linked to this (Methods). Formally, the errors are backpropagated via transposed network matrix, Embedded Image, modulated by Embedded Image, the derivative of Embedded Image with respect to the underlying voltage. While the transpose can be constructed with various local methods (see Akrout et al., 2019; Max et al., 2022) in our simulations we mainly adhere to the phenomenon of feedback alignment (Lillicrap, Cownden, et al., 2016) and consider fixed and randomized feedback weights B (unless stated differently). Recent control theoretical work is exploiting the same prospective coding technique as expressed in Eq. 7 to tackle general time-varying optimization problems (see Simonetto et al., 2020 for a review and the SI for the detailed connection).

Prospective coding in neurons and instantaneous propagation

The prospective rates and errors entering via r and e in the NLA (Eq. 7) are consistent with the prospective coding observed in cortical pyramidal neurons in vitro (Köndgen et al., 2008). Upon sinusoidal current injection into the soma, the somatic firing rate is advanced with respect to its voltage (Fig. 2a), effectively compensating for the delay caused by the current integration. Likewise, sinusoidal current injection in the apical tree causes a lag-less voltage response in the soma (Fig. 2b, Ulrich, 2002). While the rates and errors in general can be reconstructed from their low-pass filterings via Embedded Image and Embedded Image, they become prospective in time because Embedded Image and Embedded Image are themselves instantaneous functions of the voltage u, and hence r and e depend on Embedded Image. The derivative of the membrane potential implicitly also appears in the firing mechanism of Hodgkin-Huxley-type conductances, with a quick depolarization leading to a stronger sodium influx due to the dynamics of the gating variables (Hodgkin & Huxley, 1952). This advances the action potential as compared to a firing that would only depend on u, not Embedded Image, giving an intuition of how such a prospective coding may arise. A similar prospective coding has been observed for retinal ganglion cells (Palmer et al., 2015) and cerebellar Purkinje cells (Ostojic et al., 2015), making a link from the visual input to the motor control.

To understand the instantaneous propagation through the network, we low-pass filter the dynamic equation Embedded Image (obtained by rearranging Eq. 7a), with Embedded Image given by Eq. 7b, to obtain the somatic voltage Embedded Image. At any point in time, the voltage is in a moving equilibrium between forward and backpropagating inputs. Independently of the network architecture, whether recurrent or not, the output is an instantaneous function of the low-pass filtered input and a putative correction towards the target, Embedded Image, see Fig. 2c and Methods. The mapping again expresses an instantaneous propagation of voltages throughout the network in response to both, the low-pass filtered input Embedded Image and feedback error Embedded Image. This instantaneity is independent of the network size, and in a feed-forward network is independent of its depths (see also Haider et al., 2021, where the instantaneity is on the rates, not the voltages). In the absence of the look-ahead activity, each additional layer slows down the network relaxation time.

Notice that an algorithmic implementation of the time-continuous dynamics of a N-layer feedforward network would still need N calculation steps until information from layer 1 reaches layer N. However, this does not imply that an analog implementation of the prospective dynamics will encounter delays. To see why, consider a finite step-change Δu1 in the voltage of layer 1. In the absence of the look-ahead, Δμ1 were mapped within the infinitesimal time interval dt to an infinitesimal change dμ2 in the voltages of layer 2. But with a prospective firing rate, Embedded Image, a step-change Δμ1 translates to a delta-function in r1, this in turn to a step-change in the low-pass filtered rates Embedded Image, and therefore within dt to a step-change Δu2 in the voltages μ2 of the postsynaptic neurons (Fig. 2c). Iterating this argument, a step-change Δμ1 propagates ‘instantaneously’ through N layers within the ‘infinitesimal’ time interval N dt to a step-change ΔμN in the last layer. When run in a biophysical device in continuous time that exactly implements the dynamical equations (Eq. 7), the implementation becomes an instantaneous computation. Yet, in a biophysical device information has to be moved across space. This typically introduces further propagation delays that may not be captured in our formalism where low-pass filtering and prospective coding cancel each other exactly. Nevertheless, analog computation in continuous time, as formalized here, offers an idea to ‘instantaneously’ realize an otherwise time consuming numerical recipe run on time-discrete computing systems that operate with a finite clock cycle.

Prospective control and the moving equilibrium hypothesis

Crucially, at the level of the voltage dynamics (Eq. 7a) the correction is based on the prospective error e. This links our framework to optimal control theory and motor control where delays are also taken into account, so that a movement can be corrected early enough (Wolpert & Ghahramani, 2000; Todorov & Jordan, 2002; Todorov, 2004). The link between energy-based models and optimal control was recently drawn for strong nudging (β → ∞) to learn individual equilibrium states (Meulemans, Zucchet, et al., 2022). Our prospective error e(t) appears as a ‘controller’ that, when looking at the output neurons, pushes the voltage trajectories towards the target trajectories. Depending on the nudging strength β, the control is tighter or weaker. For infinitely large β, the voltages of the output neurons are clamped to the time-dependent target voltages, Embedded Image (implying Embedded Image), while their errors, Embedded Image, instantaneously correct all network neurons. For small β, the output voltages are only weakly controlled, and they are dominated by the forward input, Embedded Image.

In the context of motor control, our network mapping Embedded Image can be seen as a forward internal model that quickly calculates an estimate of the future muscle length μo based on some motor plans, sensory inputs, and the current proprioceptive feedback (Fig. 3a). Forward models help to overcome delays in the execution of the motor plan by predicting the outcome, so that the intended motor plans and commands can be corrected on the fly (Kawato, 1999; Wolpert & Ghahramani, 2000).

Figure 3:
  • Download figure
  • Open in new tab
Figure 3: Moving equilibrium hypothesis for motor control and real-time learning of cortical activity.

(a) A voluntary movement trajectory can be specified by the target length of the muscles in time, Embedded Image, encoded through the γ-innervation of muscle spindles, and the deviation of the effective muscle lengths from the target, Embedded Image. The Ia-afferents emerging from the spindles prospectively encode the error, so that their low-pass filtering is roughly proportional to the length deviation, truncated at zero (red). The moving equilibrium hypothesis states that low-pass filtered inputs encoding the movement plan, Embedded Image, the sensory inputs, Embedded Image, and the condition of the plant, Embedded Image, instantaneously generate the muscle lengths, Embedded Image, and are at any point in time in an instantaneous equilibrium with the error feedback from the spindles (such that Eq. 7 is satisfied). (b1) Intracortical iEEG activity recorded from 56 deep electrodes and projected to the brain surface. Red nodes symbolize the 56 iEEG recording sites modeled alternately as input or output neurons, and blue nodes symbolize the 40 ‘hidden’ neurons for which no data is available, but used to reproduce the iEEG activity. (b2) Corresponding NLA network. During training, the voltages of the output neurons were nudged by the iEEG targets (black input arrows, but for all red output neurons). During testing, nudging was removed for 14 out of these 56 neurons (here, represented by neurons 1, 2, 3). (c1) Voltage traces for the 3 example neurons in a2, before (blue) and after (red) training, overlaid with their iEEG target traces (grey). (c2) Total cost, integrated over a window of 8 s of the 56 output nodes during training with sequences of the same duration. The cost for the test sequences was evaluated on a 8 s window not used during training.

The observation that muscle spindles prospectively encode the muscle length and velocity (Dimitriou & Edin, 2010)) suggests that the prospective coding in the internal forward model mirrors the prospective coding in the effective forward pathway. This forward pathway leads from motor plan to spindle feedback, integrating also cerebellar and brainstem feedback (Kawato, 1999). Based on the motor plans, the intended spindle lengths and the effective muscle innervation are communicated via descending pathway to activate the γ- and α-motoneurons, respectively (Li et al., 2015). The transform from the intended arm trajectory to the intended spindle lengths via 7-innervation is mainly determined by the joint geometry. The transform from the intended arm trajectory to the force generating α-innervation, however, needs to also take account of the internal and external forces, and this is engaging our network W.

When we prepare an arm movement, spindles in antagonistic muscles pairs that measure the muscle length are tightened or relaxed before the movement starts (Papaioannou & Dimitriou, 2021). According to the classical equilibrium-point hypothesis (Feldman & Levin, 2009; Latash, 2018), top-down input adjusts the activation threshold of the spindles through (γ-)innervation from the spinal cord so that slight deviations from the equilibrium position can be signaled (Fig. 3a). We postulate that this μ-innervation acts also during the movement, setting an instantaneous target Embedded Image for the spindle lengths. The effective lengths of the muscle spindles is μo, and the spindles are prospectively signaling back the deviation from the target through the Ia-afferents (Dimitriou & Edin, 2010; Dimitriou, 2022). The low-pass filtered Ia-afferents may be approximated by a threshold-nonlinearity, Embedded Image, with β being interpreted as spindle gain (Latash, 2018). Combining the feedback from agonistic and antagonistic muscle pairs allows for extracting the scaled target error Embedded Image. Taking account of the prospective feedback, we postulate the moving equilibrium hypothesis according to which the instructional inputs, Embedded Image, the spindle feedback, Embedded Image, and the muscle lengths, μo, are at any point of the movement in a dynamic equilibrium. The moving equilibrium hypothesis extends the classical equilibrium-point hypothesis from the spatial to the temporal domain.

With regard to the interpretation of the prospective feedback error Embedded Image as spindle activity, it is worth noticing that in humans the spindle activity is not only ahead of the muscle activation (Dimitriou & Edin, 2010), but also shares the property of a motor error (Dimitriou, 2016). The experiments show that during the learning of a gated hand movement, spindle activity is initially stronger when making movement errors, and it returns back to baseline with the success of learning. We next address how the synaptic strengths W specifying the forward model can be optimally adapted to capture this learning.

Local plasticity at basal synapses minimizes the global cost in real time

The general learning paradigm starts with input time series rin(t),i and target time series Embedded Image, while assuming that the target series are an instantaneous function of the low-pass filtered input series, Embedded Image. The low-pass filtering in the individual inputs could be with respect to any time constant Embedded Image (that may also be learned, see SI), but for the network neurons we assume the same time constants for the voltage integration and the prospective rate. The goal of learning is to adapt the synaptic strengths W in the student network so that this approximates the target mapping, FW ≈ F*. This will also reduce the cost C defined on the output neurons in terms of the deviation of the voltage from the target, Embedded Image (Eq. 2).

The problem of changing synaptic weights to correct the behavior of downstream neurons, potentially multiple synapses away, is typically referred to as the credit assignment problem and is notoriously challenging in physical or biological substrates operating in continuous time. A core aspect of the NLA principle is how it relates the cost C to the Lagrangian L and eventually to somato-dendritic prediction errors Embedded Image that can be reduced through synaptic plasticity Embedded Image. We define this synaptic plasticity as partial derivative of the Lagrangian with respect to the weights, Embedded Image. Since the somato-dendritic mismatch error is Embedded Image, this leads to the local learning rule of the form ‘postsynaptic error times low-pass filtered presynaptic rate’, Embedded Image

This plasticity rule runs simultaneously to the neuronal dynamics in the presence of a given nudging strength β that tells how strongly the voltage of an output neurons is pushed towards the target, Embedded Image. The learning rule is local in space since Embedded Image is represented as voltage of the basal dendrites, and the somatic voltage u may be read out at the synaptic site on the basal dendrite from the backpropagating action potentials that sample u at a given time (Urbanczik & Senn, 2014). The basal voltage Embedded Image becomes the dendritic prediction of the somatic activity u, interpreting Eq. 8 as ‘dendritic predictive plasticity’.

We have derived the neuronal dynamics as a path that keeps the action stationary. Without external teaching signal, the voltage trajectory wriggles on the bottom of the energy landscape (L = 0, Fig. 1b1). If the external nudging is turned on, β > 0, errors emerge and hills grow out of the landscape. The trajectory still tries to locally minimize the action, but it is lifted upwards on the hills (L > 0, Fig. 1b2). Synaptic plasticity reshapes the landscape so that, while keeping β fixed, the errors are reduced and the landscape again flattens. The transformed trajectory settles anew in another place (inside the ‘volcano’ in Fig. 1b2). Formally, the local plasticity rule (Eq. 8) is shown to perform gradient descent on the Lagrangian. In the energy landscape picture, plasticity ‘shovels off’ earth along the voltage path so that this is lowered most efficiently. The error that is back-propagated through the network tells at any point on the voltage trajectory how much to ‘dig’ in each direction, i.e. how to adapt the basal input in each neuron in order to optimally lower the local error.

The following theorem tells that synaptic plasticity Embedded Image pushes the network mapping Embedded Image towards the target mapping Embedded Image at any moment in time. The convergence of the mapping is a consequence of the fact the plasticity reduces the Lagrangian L = EM + βC along its gradient.

Theorem 1

(real-time Dendritic Error Propagation, rt-DeEP) Consider an arbitrary network W with voltage and error dynamics following Eq. 7. Then the local plasticity rule Embedded Image (Eq. 8), acting at each moment along the voltage trajectories, is gradient descent

  1. on the Lagrangian L for any nudging strength β ≥ 0 i.e. Embedded Image with Embedded Image.

  2. on the cost C for small nudging, β → 0, while up-scaling the error to Embedded Image, i.e. Embedded Image

The gradient statements hold at any point in time (long enough after initialization), even if the input trajectories rin(t) contain delta-functions and the target trajectories Embedded Image contain step-functions.

Loosely speaking, the NLA enables the network to localize an otherwise global problem: what is good for a single neuron becomes good for the entire network. In the limit of strong nudging (β → ∞), the learning rule performs gradient descent on the mismatch energies Embedded Image in the individual neurons in the network. If the network architecture is powerful enough so that after learning all the mismatch energies vanish, Embedded Image, then the cost will also vanish, Embedded Image. This is because for the output neurons the mismatch error includes the target error. In the limit of weak nudging (β → 0), the learning rule performs gradient descent on C, and with this also finds a local minimum of the mismatch energies.

In the case of weak nudging and a single equilibrium state, the NLA algorithm reduces to the Equilibrium Propagation algorithm (Scellier & Bengio, 2017) that minimizes the cost C for a constant input and a constant target. In the case of strong nudging and a single equilibrium state, the NLA principle reduces to the Least Control Principle (Meulemans, Zucchet, et al., 2022) that minimizes the mismatch energy EM for a constant input and a constant target. While in the Least Control Principle the inputs and outputs are clamped to fixed values, the output errors are backpropagated and the network equilibrates in a steady state where the corrected network activities reproduce the clamped output activities. This state is called the ‘prospective configuration’ in (Song et al., 2022) because neurons deep in the network are informed about the distal target and are correspondingly adapted to be consistent with this distal target. In the NLA principle, after an initial transient, this relaxation process occurs on the fly while inputs and targets dynamically change, and hence the network moves along a continuous sequence of prospective configurations.

In the motor control example, the theorem tells that a given target motor trajectory Embedded Image is learned to be re-produced by the forward model Embedded Image through the dendritic predictive plasticity (Eq. 8), applied in the various layers of the forward model. We give an example below for such a learning that involves different sensory modalities to train the forward model.

Reproducing intracortical EEG recordings and recognizing handwritten digits

To illustrate the framework, we consider a recurrently connected network that learns to represent intracortical electroencephalogram (iEEG) data from epileptic patients (Fig. 3b). For each electrode, we assign a neuron within this network to represent the activity of the cell cluster recorded in the corresponding iEEG signal via its membrane potential. During learning, a randomly selected subset of electrode neurons are nudged towards the target activity from recorded data while learned to be reproduced by the other neurons. After learning, we can present only a subset of electrode neurons with previously unseen recordings and observe how the activity of the other neurons closely matches the recordings of their respective electrodes (Fig. 3c). The network derived from NLA is thus able to learn complex correlations between signals evolving in real time by encoding them into its connectivity structure.

As an example of visual processing in the NLA framework, we next consider a well-studied image recognition task, here reformulated in a challenging time-continuous setting, and interpreted as motor task where 1 out of 10 fingers has to be bent upon seeing a corresponding visual stimulus (see Fig. 3a). In the context of our moving equilibrium hypothesis, we postulate that during the learning phase, but not the testing phase, an auditory signal identifying the correct finger sets the target spindle lengths of the 10 finger flexors, Embedded Image, i.e. a desired contraction for the correct finger in response to the visual input rin(t), and a desired relaxation for the 9 incorrect fingers.

We train a hierarchical three-layer network on images of handwritten digits (MNIST, LeCun, 1998), with image presentation times between 0.5τ (=5 ms, with τ the membrane time constant) and 20τ (=200ms). Fig. 4a-c1 depict the most challenging scenario with the shortest presentation time. Synaptic plasticity is continuously active, despite the network never reaching a steady state (Fig. 4b1). Due to the lookahead firing rates in NLA, the mismatch errors Embedded Image propagate without lag throughout the network, as explained above. As a consequence, our mismatch errors are equal to the errors obtained from classical error backpropagation applied at each time step to the purely forward network, without error-correcting the voltage, and instead, at each layer l considering only the forward voltage updates u1 = Wl ρ(μl–1), see Fig. 4b2. After learning, the network learned to implement the mapping Embedded Image with a performance comparable to classical error-backpropagation, despite the short presentation time of only 5 ms (Fig. 4c). The limiting presentation time of 5 ms is due to the low-pass filtering of the input rates rin entering in Embedded Image, which we choose to be 10 ms. We also chose the membrane time constant of the other network neurons to be 10 ms, but these could have been much longer, without violating the instantaneous voltage propagation (as long as the look-ahead rates again compensate these longer time constants).

Figure 4:
  • Download figure
  • Open in new tab
Figure 4: On-the-fly learning of finger responses to visual input with real-time Dendritic Error Propagation (rt-DeEP).

(a) Functionally feedforward network with handwritten digits as visual input Embedded Image in Fig. 3a, here from the MNIST data set, 5 ms presentation time per image), backprojections enabling credit assignment, and activity of the 10 output neurons interpreted as commands for the 10 fingers (forward architecture: 784 × 500 × 10 neurons). (b) Example voltage trace (b1) and local error (b2) of a hidden neuron in NLA (red) compared to an equivalent network without lookahead rates (orange). Note that neither network achieves a steady state due to the extremely short input presentation times. Errors calculated via exact backpropagation, i.e. by using the error backpropagation algorithm on a purely feedforward NLA network at every simulation time step (with output errors scaled by β), shown for comparison (blue). (c) Comparison of network models during and after learning. Color scheme as in (b). (c1) The test error under NLA evolves during learning on par with classical error backpropagation. In contrast, networks without lookahead rates are incapable of learning such rapidly changing stimuli. (c2) With increasing presentation time, the performance under NLA further improves, while networks without lookahead rates stagnate at high error rates. This is caused by transient, but long-lasting misrepresentation of errors following stimulus switches: when plasticity is turned off during transients and is only active in the steady state, comparably good performance can be achieved (dashed orange). (d) Receptive fields of 6 hidden-layer neurons after training, demonstrating that even for very brief image presentation times (5ms), the combined neuronal and synaptic dynamics are capable of learning useful feature extractors such as edge filters.

The instantaneous voltage propagation reduces an essential constraint of previous models of bio-plausible error backpropagation (e.g., Scellier & Bengio (2017), Whittington & Bogacz (2017), and Sacramento et al. (2018), with reviews Richards et al. (2019), Whittington & Bogacz (2019), and Lillicrap, Santoro, et al. (2020)): without lookahead firing rates, networks need much longer to correctly propagate errors across layers, with each layer roughly adding another membrane time constant of 10 ms, and thus cannot cope with realistic input presentation times. In fact, in networks without lookahead output, learning is only successful if plasticity is switched off while the network dynamics is not stationary (Fig. 4c2). As a comparison, neuronal response latency in the primary visual cortex (V1) of rats to flashing stimuli are in the order of 50 ms if the cortex is in a synchronized state, shortens to roughly 40 ms if in a desynchronized state (Wang et al., 2014), and potentially shortens further if the area is preactivated through expectations (Blom et al., 2020).

Implementation in cortical microcircuits

So far, we did not specify how errors e appearing in the differential equation for the voltage (Eq. 7a) are transmitted across the network in a biologically plausible manner. Building on Sacramento et al., 2018, we propose a cortical microcircuit to enable this error transport, with all neuron dynamics evolving according to the NLA principle. Although the idea applies for arbitrarily connected networks, we use the simpler case of functionally feedforward networks to illustrate the flow of information in these microcircuits (Fig. 5a).

Figure 5:
  • Download figure
  • Open in new tab
Figure 5: Hierarchical plastic microcircuits implement real-time Dendritic Error Learning (rt-DeEL).

(a) Microcircuit with ‘top-down’ input (originating from peripheral motor activity, blue line) that is explained away by the lateral input via interneurons (dark red), with the remaining activity representing the error Embedded Image. Plastic connections are denoted with a small red arrow and nudging with a dashed line. (b1) Simulated network with 784-300-10 pyramidal-neurons and a population of 40 interneurons in the hidden layer used for the MNIST learning task where the handwritten digits have to be associated to the 10 fingers. (b2) Test errors for rt-DeEL with joint tabula rasa learning of the forward and lateral weights of the microcircuit. A similar performance is reached as with classical error backpropagation. For comparability, we also show the performance of a shallow network (dashed line). (b3) Angle derived from the Frobenius norm between the lateral pathway Embedded Image and the feedback pathway Bl Wl+1. During training, both pathways align to allow correct credit assignment throughout the network. Indices are dropped in the axis label for readability.

For such an architecture, pyramidal neurons in area l (that is a ‘layer’ of the feedforward network) are accompanied by a pool of interneurons in the same layer (area). The dendrites of the interneurons integrate in time (with time constant τ) lateral input from pyramidal neurons of the same layer (rl) through plastic weights Embedded Image. Additionally, interneurons receive ‘top-down nudging’ from pyramidal neurons in the next layer through randomly initialized and fixed backprojecting synapses Embedded Image targeting the somatic region, and interneuron nudging strength βI. The notion of ‘top-down’ originates from the functionally feed-forward architecture leading from sensory to ‘higher cortical areas’. In the context of motor control, the highest ‘area’ is last stage controlling the muscle lengths, being at the same time the first stage for the proprioceptive input (Fig. 3a).

According to the biophysics of the interneuron, the somatic membrane potential becomes a convex combination of the two types of afferent input (Urbanczik & Senn, 2014), Embedded Image

In the biological implementation, the feedback input is mediated by the low-pass filtered firing rates Embedded Image, not by μl+1 as expressed in the above equation. Yet, we argue that for a threshold-linear ρ the ‘top-down nudging’ by the rate Embedded Image is effectively reduced to a nudging by the voltage μl+1. This is because errors are only backpropagated when the slope of the transfer function is positive, Embedded Image, and hence when the upper-layer voltage is in the linear regime. For more general transfer functions, we argue that short-term synaptic depression may invert the low-pass filtered presynaptic rate back to the presynaptic membrane potential, Embedded Image, provided that the recovery time constant τ matches the membrane time constant (see end of the section and Methods).

Apical dendrites of pyramidal neurons in each layer receive top-down input from the pyramidal population in the upper layer through synaptic weights Bl. These top-down weights could be learned to predict the lower-layer activity (Rao & Ballard, 1999) or to become the transposed of the forward weight matrix (Max et al., 2022), but for simplicity we randomly initialized them and keep them fixed (Lillicrap, Santoro, et al., 2020). Beside the top-down projections the apical dendrites also receive lateral input via an interneuron population in the same layer through synaptic weights – Embedded Image that are plastic and will be learned to obtain suitable dendritic errors. The ‘-’ sign is suggestive for these interneurons to subtract away the top-down input entering through Bl (while the weights can still be positive or negative). Assuming again a conversion of rates to voltages, also for the inhibitory neurons that may operate in a linear regime, the overall apical voltage becomes Embedded Image

What cannot be explained away from the top-down input Embedded Image by the lateral feedback, – Embedded Image, remains as dendritic prediction error Embedded Image in the apical tree (Fig. 5a). If the top-down and lateral feedback weights are learned as outlined next, these apical prediction errors take the role of the backpropagated errors in the classical backprop algorithm.

To adjust the interneuron circuit in each layer (‘area’), the synaptic strengths from pyramidal-to-interneurons, Embedded Image, are learned to minimize the interneuron mismatch energy, Embedded Image. The interneurons, while being driven by the lateral inputs Embedded Image, learn to reproduce the upper-layer activity that also nudges the interneuron voltage. Learning is accomplished if the upper-layer activity, in the absence of an additional error on the upper layer, is fully reproduced in the interneurons by the lateral input.

Once the interneurons learned to represent the ‘error-free’ upper-layer activity, they can be used to explain away the top-down activities that also project to the apical trees. The synaptic strengths from the inter-to-pyramidal neurons, Embedded Image, are learned to minimize the apical mismatch energy, Embedded Image. While in the absence of an upper-layer error, the top-down activity Blμl+1 can be fully cancelled by the interneuron activity Embedded Image, a neuron-specific error will remain in the apical dendrites of the lower-level pyramidal neurons if there was an error endowed in the upper-layer neurons. Gradient descent learning on these two energies results in the learning rules for the P-to-I and I-to-P synapses, Embedded Image

The following theorem on dendritic error learning tells that the plasticity in the lateral feedback loop leads to an appropriate error representation in the apical dendrites of pyramidal neurons.

Theorem 2

(real-time Dendritic Error Learning, rt-DeEL) Consider a cortical microcircuit composed of pyramidal and interneurons, as illustrated in Fig. 5a (with dimensionality constraints specified in Methods, and fixed or dynamic weights Embedded Image to the interneurons). Then the inter-to-pyramidal synapses evolve at each layer l (‘cortical area’) such that the lateral feedback circuit aligns with the top-down feedback circuit, Embedded Image

In the presence of an output nudging, the apical voltages of the layer-l pyramidal neurons (Eq. 10) then represent the ‘B-backpropagated’ error Embedded Image. When modulated by the postsynaptic rate derivative, Embedded Image, this apical error ensures the correct representation of errors Embedded Image for the real-time dendritic error propagation (rt-DeEP, Theorem 1), Embedded Image

Simultaneously learning apical errors and basal signals

Microcircuits following these neuronal and synaptic dynamics are able to learn the classification of hand-written digits from the MNIST dataset while learning the apical signal representation (Fig. 5b1-2). In this case, feedforward weights Wl and lateral weights Embedded Image and Embedded Image are all adapted simultaneously. Including the Embedded Image (by turning on the interneuron nudging from the upper layer, βI > 0 in Eq. 9), greatly speeds up the learning.

With and without Embedded Image, the lateral feedback via interneurons (with effective weight Embedded Image) learns to align with the forward-backward feedback via upper layer pyramidal neurons (with effective weight BlWl+1, Fig. 5b3). The microcircuit extracts the gradient-based errors (Eq. 13), while the forward weights use these errors to reduce these errors to first minimize the neuron-specific mismatch errors, and eventually the output cost.

Since the apical voltage Embedded Image appears as a postsynaptic factor in the plasticity rule for the interneurons Embedded Image, this I-to-P plasticity can be interpreted as Hebbian plasticity of inhbitory neurons, consistent with earlier suggestions (Vogels et al., 2012; Bannon et al., 2020). The plasticity Embedded Image of the P-to-I synapses, in the same way as the plasticity for the forward synapses Embedded Image, can be interpreted as learning from the dendritic prediction of somatic activity (Urbanczik & Senn, 2014). Crucially, choosing a large enough interneuron population, the simultaneous learning of the lateral microcircuit and the forward network can be accomplished without fine-tuning of parameters. As an instance in case, all weights shared the same learning rate. Such stability bolsters the biophysical plausibility of our NLA framework and improves over the previous, more heuristic approach (Sacramento et al., 2018; Mesnard et al., 2019). The stability may be related to the nested gradient descent learning according to which somatic and apical mismatch errors in pyramidal neurons, and somatic mismatch errors in inhibitory neurons are minimized.

Finally, since errors are defined at the level of membrane voltages (Eq. 11), synapses need a mechanism by which they can recover the presynaptic voltage errors from their afferent firing rates. While for threshold-linear transfer functions this is not too involved (Methods), more general neuronal nonlinearities must be matched by corresponding synaptic nonlinearities. Pfister et al. (2010) have previously illustrated how spiking neurons can leverage short-term synaptic depression to estimate the membrane potential of their presynaptic partners. Here, we assume a similar mechanism in the context of our rate-based neurons. The monotonically increasing neuronal activation function, Embedded Image, can be approximately compensated by a vesicle release probability that monotonically decreases with the low-pass filtered presynaptic rate Embedded Image (see SI and Fig. 6 therein). If properly matched, this leads to a linear relationship between the presynaptic membrane potential ul+1 and the postsynaptic voltage contribution.

Figure 6:
  • Download figure
  • Open in new tab
Figure 6: Recovering presynaptic potentials through short term depression.

(a1) Relative voltage response of a depressing cortical synapse (recreated from Abbott et al., 1997), identified as synaptic release probability p. (a2) The product of the low-pass filtered presynaptic firing rate Embedded Image times the synaptic release probability Embedded Image is proportional to the presynaptic membrane potential, Embedded Image. (a3) Average in vivo firing rate of a neuron in the visual cortex as a function of the somatic membrane potential (recreated from Anderson et al., 2000), which can be qualitatively identified as the stationary rate Embedded Image derived in Eq. 43.

Discussion

We introduced a least-action principle for neuronal networks from which we derived the membrane potential dynamics of the involved neurons, together with gradient descent plasticity of the synapses in the network. The central notion of the theory is the somato-dendritic mismatch error in each individual neuron. Given the various findings on the dendritic integration of top-down signals (Larkum, 2013; Takahashi et al., 2020), we suggest that the error is formed in the apical dendrite of pyramidal neurons. Active dendritic and somatic processes compensate for delays caused by the dendritic error propagation and the somatic integration. This leads to the prospective firing of pyramidal neurons that are jointly driven by the forward input on the basal dendrites and the error on the apical dendrite. We derived a local plasticity rule for synapses on the basal tree that, by extracting the apical feedback error, reduces the prospective somato-dendritic mismatch error in an individual neuron. This plasticity is shown to also globally reduce the instantaneous cost at output neurons (defined by deviations from target voltages) at any moment in time while stimuli and targets may continuously change. Motor control offers an intuitive context where self-correcting prospective networks may enter. We put forward the moving equilibrium hypothesis, according to which sensory input, motor commands and muscle spindle feedback are in a recurrent equilibrium at any moment during the movement.

We further showed how a local microcircuit can serve to calculate a neuron-specific error in the apical tree of each individual pyramidal neuron. The apical tree of pyramidal neurons receives feedback from higher-level neurons in the network, while local inhibitory neurons try on the fly to ‘explain away’ the top-down expectations. What cannot be explained away remains as apical prediction error. This prediction error eventually induces error-correcting plasticity at the sensory-driven input on the basal tree. To find the correct error representation in the apical tree, plasticity of the inter-to-pyramidal neurons seeks to homeostatically drive the apical de- or hyperpolarized voltage back to rest. While the network synapses targetting the basal tree are performing gradient descent on the global cost, the microcircuit synapses involved in the lateral feedback are gradient descent on local error functions, both at any moment in time.

Our work builds on three general lines of research on formalizing the learning capabilities of cortical networks. The first line refers to the use of an energy function to jointly infer the neuronal dynamics and synaptic plasticity, originally formulated for discrete-time networks (Hopfield, 1982; Ackley et al., 1985), and recently extended to continuous-time networks (Scellier & Bengio, 2017). The second line refers to understanding error-backpropagation in the brain (Rumelhart et al., 1986; Xie & Seung, 2003; Whittington & Bogacz, 2017; Whittington & Bogacz, 2019; Lillicrap, Santoro, et al., 2020). The third line refers to the use of dendritic compartmentalization in various kinds of computation (Schiess et al., 2016; Poirazi & Papoutsi, 2020), recently linked to deep learning (Guerguiev et al., 2017; Sacramento et al., 2018; Haider et al., 2021).

With regard to energy functions, the NLA principle adds a variational approach to characterize continuous-time neuronal trajectories and plasticity. Variational approaches are studied in the context of optimal control theory where a cost integral is minimized across time, constrained to some network dynamics (Todorov & Jordan, 2002; Meulemans, Farinha, et al., 2021). The NLA represents a unifying notion that allows to infer both, the network dynamics and its optimal control from a single Lagrangian. The error we derive represents prospective control variables that are applied to the voltages of each network neurons so that they push the output neurons towards their target trajectory. The full expression power of this control theoretic framework has yet to be proven when it is extended to genuine temporal processing that includes longer time constants, for instance inherent in a slow threshold adaptation (Bellec et al., 2020). The NLA principle can also treat the case of strong feedback studied so far in relaxation networks only (Meulemans, Zucchet, et al., 2022; Song et al., 2022). Our rt-DeEP Theorem makes a statement for real-time gradient descent learning while the network is in a moving equilibrium, linking to motor learning in the presence of perturbing force fields (Herzfeld et al., 2014) or perturbing visual inputs (Dimitriou, 2016).

With regard to error-backpropagation, the NLA principle relies on feedback alignment (Lillicrap, Cownden, et al., 2016) to correctly interpret the apical errors. Other works have explored learning of feedback weights (Akrout et al., 2019; Kunin et al., 2020; Max et al., 2022). Since these are complementary to the principles suggested here, a promising direction would be to explore how feedback weights can be learned in NLA microcircuits to improve credit assignment in deeper networks.

With regard to dendritic and neuronal processing, we emphasize the prospective nature of apical voltages and somatic firing (Fig. 2). The prospective coding of errors and signals overcomes the various delays inherent in our previous approach (Sacramento et al., 2018; Mesnard et al., 2019). Each neuron of the network instantaneously corrects its voltage so that the output neurons are pushed towards the target trajectories. Ongoing synaptic plasticity adjusts the synaptic strengths so that the errors in each neuron are reduced at any moment in time, without needing to wait for network relaxations (Scellier & Bengio, 2017; Song et al., 2022). Yet, because in the present framework the input rates are still low-pass filtered, step inputs cannot be instantaneously propagated, only their low-pass filterings. Haider et al., 2021 offers a framework that is also suited instantaneously process step inputs.

Motivated by the predictive power of the least-action principle in physics, we ask about experimental confirmation and predictions of the NLA in biology. Given its axiomatic approach, it appears astonishing to find various preliminary matches at the dendritic, somatic, interneuron, synaptic and even behavioural level. Some of these are: (1) the prospective coding of pyramidal neuron firing (Köndgen et al., 2008); (2) the prospective processing of apical signals while propagating to the soma (Ulrich, 2002); (3) the basal synaptic plasticity on pyramidal neurons and synaptic plasticity on interneurons, driven by the postsynaptic activity that is ‘unexplained’ by the distal dendritic voltage (Urbanczik & Senn, 2014); (4) the Hebbian homeostatic plasticity of interneurons targeting the apical dendritic tree of pyramidal neurons (Bannon et al., 2020); (5) the short-term synaptic depression at top-down synapses targetting inhibitory neurons and apical dendrites (akin to Abbott et al., 1997, but with a faster recovery time constant) that invert the presynaptic activation function (see also Pfister et al., 2010); (6) the modulation of the apical contribution to the somatic voltage by the slope of the somatic activation function (for instance by downregulating apical NMDA receptors with increasing rate of backpropagating action potentials, Theis et al., 2018); and (7) the role of muscle spindles in the prospective encoding of motor errors during motor learning (Dimitriou, 2016; Papaioannou & Dimitriou, 2021). More theoretical and experimental work is required to explore these various links.

Overall, our approach introduces a method from theoretical physics to computational neuroscience and couples it with a normative perspective on the dynamical processing of neurons and synapses within global cortical networks and local microcircuits. Given its physical underpinnings, the approach may inspire the rebuilding of computational principles of cortical neurons and circuits in neuromorphic hardware (Bartolozzi et al., 2022). A step in this direction, building on the instantaneous computational capabilities with slowly integrating neurons, has been done (Haider et al., 2021) and a next challenge is to generalize the NLA principle to spiking neurons (Zenke & Ganguli, 2018; Göltz et al., 2021) and longer temporal processing.

Methods

Euler-Lagrange equations as inverse low-pass filters

The theory is based on the lookahead of neuronal quantities. In general, the lookahead of a trajectory x(t) is defined as Embedded Image, with lookahead operator Embedded Image

The lookahead operator is the inverse of the low-pass filter operator denoted by a bar, Embedded Image

This low-pass filtering can also be characterized by the differential equation Embedded Image, see SI. Hence, applying the low-pass filtering to x and then the lookahead operator Embedded Image to Embedded Image, and using the Leibnitz rule for differentiating an integral, we calculate Embedded Image. In turn, applying first the lookahead, and then the low-pass filtering, also yields the original trace back, Embedded Image (see SI).

We consider an arbitrary network architecture with network neurons that are recurrently connected and that receive external input through an overall weight matrix W = (Win, Wnet), aggregated column-wise. The instantaneous presnyaptic firing rates are r = (rin, rnet), interpreted as a single column vector. A subset of network neurons are output neurons, Embedded Image, for which target voltages u* may be imposed. Rates and voltages may change in time t. Network neurons are assigned a voltage u, generating the low-pass filtered rate Embedded Image, and a low-pass filtered error Embedded Image. We further define output errors Embedded Image for Embedded Image, and Embedded Image for non-output neurons Embedded Image. With this, the Lagrangian from Eq. 3 takes the form Embedded Image

We next use that Embedded Image, with the Embedded Image operator defined in Eq. 4, to write out the Lagrangian L in the canonical coordinates Embedded Image as (see also Eq. 3) Embedded Image

The neuronal dynamics is derived from requiring a stationary action (see Eq. 5), which is generally solved by the Euler-Lagrange equations Embedded Image (see Eq. 6). Because Embedded Image only arises in L in the compound Embedded Image, the derivative of L with respect to Embedded Image is identical to the derivative with respect to Embedded Image, Embedded Image

Using the lookahead operator Eq. 14, the Euler-Lagrange equations can then be rewritten as Embedded Image

Since Embedded Image and Embedded Image, the derivative of L with respect to Embedded Image is the same as the derivative of L with respect to Embedded Image. Plugging this into Eq. 19, the Euler-Lagrange equations become a function of u and Embedded Image, Embedded Image

The solution of this differential equation is Embedded Image, and hence any trajectory Embedded Image which satisfy the Euler-Lagrange equations will hence cause Embedded Image to converge to zero with a characteristic time scale of τ. Since we require that the initialisation is at t0 = -∞, we conclude that Embedded Image, as required in the rt-DeEP Theorem.

Deriving the network dynamics from the Euler-Lagrange equations

We now derive the equations of motion from the Euler-Lagrange equations. Noticing that u enters in Embedded Image twice, directly and through Embedded Image, and once in the output error Embedded Image, we calculate from Eq. 16, using Embedded Image and W = (Win, Wnet), Embedded Image

Next, we apply the lookahead operator to this expression, as required by the Euler-Lagrange equations Eq. 19. In general Embedded Image, and we set for Embedded Image the expression on the right-hand side of Eq. 21, Embedded Image, which at the same time is Embedded Image. Hence, the Euler-Lagrange equations in the form of Eq. 20, Embedded Image, translate into Embedded Image

To move from the middle to the last equality we replaced e by e = Embedded Image. In the last equality we interpret e as the sum of the two errors, e = e + βe*, again using the middle equality. This proves Eq. 7.

Notice that the differential equation Embedded Image in Eq. 22 represents an implicit ordinary differential equation as on the right-hand side not only u, but also Embedded Image appears (in r and e). The uniqueness of the solution u(t) for a given initial condition is only guaranteed if it can be converted into an explicit ordinary differential equation (see Sect. C).

In taking the temporal derivative we assumed small learning rates such that terms including Embedded Image can be neglected. The derived dynamics for the membrane potential of a neuron Embedded Image in Eq. 22 show the usual leaky behavior of biological neurons. However, both presynaptic rates Embedded Image and prediction errors Embedded Image enter the equation of motion with lookaheads, i.e., they are advanced (Embedded Image and Embedded Image), cancelling the low-pass filtering. Since Embedded Image, the rate and error, ri and ei, can also be seen as nonlinear extrapolations from the voltage and its derivative into the future.

The instantaneous transmission of information throughout the network at the level of the voltages can now be seen by low-pass filtering Eq. 22 with initialization far back in the past, Embedded Image with Embedded Image interpreted as column vector, and Embedded Image. Hence, solving the voltage dynamics for u (Eq. 7a), with apical voltage Embedded Image derived from Eq. 7b, yields the somatic voltage u satisfying the self-consistency equation (Eq. 23) at any time. In other words, u and Embedded Image ‘propagate instantaneously’.

Deriving the error backpropagation formula

For clarity, we derive the error backpropagation algorithm for layered networks here. These can be seen as a special case of a general network with membrane potentials u and all-to-all weight matrix W (as introduced in appendix H), where the membrane potentials decompose into layerwise membrane potential vectors ul and the weight matrix into according block diagonal matrices Wl (with Wl being the weights that project into layer l).

Assuming a network with N layers, by low-pass filtering the equations of motion we get Embedded Image ∀l ∈ [1, N], with the output error Embedded Image. The error e = ϵ + βe* we obtain from the general dynamics with Embedded Image, see Eq. 21 and Eq. 22, translates to an iterative formula for the error at the current layer l given the error at the downstream layer l +1, inherited from the drive Embedded Image of that downstream layer via Wl+1, Embedded Image and Embedded Image for the output layer. The learning rule that reduces Embedded Image by gradient descent is proportional to this error and the presynaptic rate, as stated by Theorem 1, is Embedded Image for l = 1…N. Eq. 25 and Eq. 26 together take the form of the error backpropagation algorithm, where an output error is iteratively propagated through the network and used to adjust the weights in order to reduce the output cost C. From this, it is easy to see that without output nudging (i.e., β = 0), the output error vanishes and consequently all other prediction errors vanish as well, Embedded Image for all l ≤ N. This also means that in the absence of nudging, no weight updates are performed by the plasticity rule.

The learning rule for arbitrary connectivities is obtained in the same way by dropping the layer-wise notation. In this case, low-pass filtering the equations of motion yields Embedded Image, as calculated in Eq. 23, and the low-pass filtered error Embedded Image, as inferred from Eqs 21 and 22. Hence, the plasticity rule in general reads Embedded Image

Proving Theorem 1 (rt-DeEP)

The implicit assumption in Theorem 1 is that Embedded Image exists in the distributional sense for t > –∞, which is the case for delta-functions in rin and step-functions in u*. Both parts (i) and (ii) of the Theorem are based on the requirement of stationary action δA = 0, and hence on u satisfying the Euler-Lagrange equations in the form of Eq. 22, Embedded Image. From the solution Embedded Image we conclude that for initialization at t0 = -∞ we have Embedded Image for all t. It is the latter stronger condition that we require in the proof. With this, the main ingredient of the proof follows is the mathematical argument of Scellier & Bengio, 2017, according to which the total and partial derivative of L with respect to W are identical, and this in our case is true for any time t, Embedded Image

For convenience we considered Embedded Image to be a column vector, deviating from the standard notations (see tutorial end of SI). Analogously to Eq. 28, we infer Embedded Image. Reading Eq. 28 from the right to the left, we conclude that the learning rule Embedded Image for all β > 0 is gradient descent on L, i.e. Embedded Image. This total derivative of L can be analyzed for large and small β.

  1. We show that in the limit of large β, Embedded Image becomes gradient descent on the mismatch energy Embedded Image. For this we first show that there is a solution of the self-consistency equation Embedded Image that is uniformly bounded for all t and β. For this we assume that the transfer function ρ(u) is non-negative, monotonically increasing and bounded, that its derivative ρ′(u) is bounded too, and that the input rates rin and the target potentials Embedded Image are also uniformly bounded. To show that under these conditions we always find an uniformly bounded solution u(t), we first consider the case where the output voltages are clamped to the target, Embedded Image such that Embedded Image. For simplicity we assume that Embedded Image for |u| ≥ c0. For voltages u with ui ≤ c0 the recurrent input current Embedded Image is bounded, say Embedded Image for some c1 > c0. When including the error term Embedded Image, the total current still remains uniformly bounded, say |F(uj | ≤ c2 for all u with ui ≤ c0. Because for larger voltages ui > c0 the error term vanishes due to a vanishing derivative ρ′(ui) = 0, the mapping F(u) maps the c2-box u (for which |ui | ≤ c2) onto itself. Brouwer’s fixed point theorem then tells us that there is a fixed point u = F(u) within the c2-box. The theorem requires the continuity of F, and this is assured if the neuronal transfer function Embedded Image is continuous.

    We next relax the voltages of the output neurons from their clamped stage, Embedded Image. Remember that these voltages satisfy Embedded Image at any time t. We determine the correction term Embedded Image such that in the limit β → ∞ we get Embedded Image. The correction remains finite, and in the limit must be equal to Embedded Image. For arbitrary large nudging strength β, the output voltage uo deviates arbitrary little from the target voltage, Embedded Image, with target error Embedded Image shrinking like c2/β. Likewise, also for non-output neurons i, the self-consistency solution ui = F(u)i deviates arbitrarily little from the solution of the clamped state. To ensure the smooth drift of the fixed point while 1/β deviates from 0 we require that the Jacobian of F at the fixed point is invertible.

    Because the output Embedded Image shrinks with 1/β, the cost shrinks quadratically with increasing nudging strength, Embedded Image, and hence the cost term Embedded Image that enters in Embedded Image vanishes in the limit β → ∞. In this large β limit, where Embedded Image and hence the outputs are clamped, Embedded Image, the Lagrangian reduces to the mismatch energy, L = EM. Along the least-action trajectories we therefore get Embedded Image. The first equality uses Eq. 28, and the second uses L = EM just derived for β = ∞. This is statement (i) of Theorem 1. In the case of successful learning, EM = 0, we also conclude that the cost vanishes, C = 0. This is the case because EM = 0 implies Embedded Image for all output neurons o. Since Embedded Image, we conclude that Embedded Image, and if the output neurons do not feed back to the network (which we can assume without loss of generality), we conclude that Embedded Image.

  2. To consider the case of small β, we use that the cost C can be expressed as Embedded Image. This is a direct consequence of how C enters in Embedded Image, see Eq. 16 and Scellier & Bengio, 2017. We now put this together with Eq. 28 and the finding that Embedded Image. Since for the Lipschitz continuous function L in u, W and β (L is even smooth in these arguments), the total derivatives interchange (which is a consequence of the Moore-Osgood theorem applied to the limits of the difference quotients), we then get at any t, Embedded Image

The last expression is calculated from the specific form of the Lagrangian (Eq. 17), using that by definition Embedded Image. Finally, in the absence of output nudging, β = 0, we can assume vanishing errors, Embedded Image, as they solve the selfconsistency equation, Embedded Image for all t, see Eq. 27. For these solutions we have Embedded Image. Writing out the total derivative of the function Embedded Image with respect to β at β = 0 as limit of the difference quotient, Embedded Image, using that g(0) =0, we calculate at any t, Embedded Image

Here we assume that Embedded Image is evaluated at β > 0 (that itself approaches 0), while Embedded Image is evaluated at β = 0. Combining Eq. 29 and 30 yields the cost gradient at any t, Embedded Image

This justifies the gradient learning rule Embedded Image in Eq. 27. Learning is stochastic gradient descent on the expected cost, where stochasticity enters in the randomization of the stimulus and target sequences rin(t) and u*(t). For the regularity statement, see ‘From implicit to explicit differential equations’ in the SI. Notice that this proof works for a very general form of the Lagrangian L, until the specific expression for Embedded Image. For a proof in terms of partial derivatives only, see SI, where also a primer on partial and total derivatives is found.

Instantaneous gradient descent on Embedded Image

The cost Embedded Image at each time t is a function of the voltage uo of the output neurons and the corresponding targets. In a feedforward network, due to the instantaneity of the voltage propagation (Eq. 23), uo is in the absence of output nudging (β = 0) an instantaneous function of the voltage at the first layer, Embedded Image. For initialisation at t0 = –∞, the second term vanishes for all t and hence Embedded Image. The output voltage uo(t) therefore becomes a function FW of the low-pass filtered input rate Embedded Image that captures the instantaneous network mapping, Embedded Image, and with this the cost also becomes an instantaneous function of Embedded Image and Embedded Image, namely Embedded Image.

For a general network, again assuming t0 = –∞, the voltage is determined by the vanishing gradient Embedded Image with Embedded Image, see Eq. 21. For the inclusive treatment of the initial transient see SI, Sects C and D. Remember that Embedded Image and Embedded Image. For a given Embedded Image and Embedded Image at time t, the equation f (u, t) = 0 can be locally solved for u if the Hessian Embedded Image is invertible, Embedded Image. This mapping can be restricted to the output voltages uo on the left-hand side, while replacing Embedded Image in the argument on the right-hand side (even if this again introduces uo there). With this we obtain the instantaneous mapping Embedded Image from the low-pass filtered input and the output error to the output itself. Notice that for functional feedforward network, the network weight matrix Wnet is lower triangular, and for small enough β the Hessian H is therefore always positive definite.

Proving Theorem 2 (rt-DeEL)

Here we restrict ourselves to layered network architectures. To prove Theorem 2 first assume that interneurons receive no nudging (βI = 0) and only the lateral interneuron-to-pyramidal weights Embedded Image are plastic. This is already sufficient to prove the rt-DeEL theorem. Yet, simulations showed that shaping the lateral pyramidal-to-interneuron weights Embedded Image so that it mimics the upper layer activity helps tremendously in learning a correct error representation. We consider this case of learning Embedded Image later.

If the microcircuits is ought to correctly implement error backpropagation, all local prediction errors Embedded Image must vanish in the absence of output nudging (β = 0) as there is no target error. Consequently, any remaining errors in the network are caused by a misalignment of the lateral microcircuit, and we show how learning the interneuron-to-pyramidal weights Embedded Image corrects for such misalignments.

To define the gradient descent plasticity of the weights Embedded Image from the interneurons to the pyramidal neurons, we consider the apical error formed by the difference of top-down input and interneuron input, Embedded Image, and define the apical mismatch energy as Embedded Image. Gradient descent along this energy with respect to Embedded Image yields Embedded Image evaluated online while presenting input patterns from the data distribution to the network. We assume that the apical contribution to the somatic voltage is further modulated by the somatic spike rate, Embedded Image. After successful learning, the top-down input Blμl+1 is fully subtracted away by the lateral input in the apical compartment, and we have Embedded Image

Once this condition is reached, the network achieves a state where, over the activity space spanned by the data, top-down prediction errors throughout the network vanish, Embedded Image

We show that this top-down prediction error, after the successful learning of the microcircuit, shares the properties of error-backpropagation for a suitable backprojection weights B.

Due to the vanishing prediction errors, pyramidal cells only receive bottom-up input Embedded Image. Using this expression as well as the expression for interneuron membrane potentials without top-down nudging (βI = 0 in Eq. 9), Embedded Image, and plugging both into Eq. 33, we get Embedded Image

Assuming that Embedded Image has full rank and the low-pass filtered rates Embedded Image span the full nl dimension of layer l when sampled across the data set we conclude that Embedded Image

In other words, the loop via upper layer and back is learned to be matched by a lateral loop through the interneurons. Eq. 36 imposes a restriction on the minimal number of interneurons Embedded Image at layer l. In fact, the matrix product Bl Wl+1 maps a nl-dimensional space onto itself via nl+1-dimensional space. The maximal rank of the this matrix product is limited by the smallest dimension, i.e. rank(Bl Wl+1) ≤ min(nl, nl+1). Analogously, Embedded Image. But since the two ranks are the same according to Eq. 36, we conclude that in general Embedded Image must hold, i.e. there should be at least as many interneurons at layer l as the lowest number of pyramidal neurons at either layer l or l +1. Note that by choosing Embedded Image as in (Sacramento et al., 2018) (or Embedded Image as in this work), the conditions is fulfilled.

With Embedded Image and Eq. 36, the top-down prediction error from Eq. 34, in the presence of output nudging (β > 0), can be written in the backpropagation form Embedded Image Embedded Image Embedded Image

Finally, the simulations showed that learning the lateral weights in the microcircuit greatly benefits from also adapting the pyramidal-to-interneuron weights WIP by gradient descent on Embedded Image, using top-down nudging of the inhibitory neurons (βI > 0), Embedded Image

After learning we have Embedded Image, and plugging in Embedded Image (Eq. 9), we obtain Embedded Image. Since Embedded Image, we conclude as before, Embedded Image

The top-down weights Embedded Image that nudge the lower-layer interneurons has randomized entries and may be considered as full rank. If there are less pyramidal neurons in the upper layer than interneurons in the lower layer, Embedded Image selects a subspace in the interneuron space of dimension Embedded Image. This seems to simplify the learning of the interneuron-to-pyramidal cell connections WPI. In fact, this learning now has only to match the nl+1-dimensional interneuron subspace embedded in Embedded Image dimensions to an equal (nl+1-)dimensional pyramidal cell subspace emedded in nl dimensions.

Learning of the interneuron-to-pyramidal cell connections works with the interneuron nudging as before, and combining Eqs 36 with 39 yields the ‘loop consistency’ Embedded Image

The learning of the microcircuit was described in the absence of output nudging. Conceptually, this is not a problem as one could introduce a pre-learning phase where the lateral connections are first correctly aligned before learning of the feedforward weights begins. In simulations we find that both the lateral connections as well as the forward connections can be trained simultaneously, without the need for such a pre-learning phase. We conjecture that this is due to the fact that our plasticity rules are gradient descent on the energy functions L, EPI and EIP respectively.

Simulation details

The voltage dynamics is solved by a forward-Euler scheme Embedded Image. The derivative Embedded Image is calculated either through (i) the implicit differential equation (Eq. 7) yielding Embedded Image, or (ii) by isolating Embedded Image and solving for the explicit differential equation Embedded Image, as explained after Eq. 49 in the SI.

  1. The implicit differential equation, Embedded Image, see Eq. 22, is iteratively solved by assigning Embedded Image and calculating the error Embedded Image with Embedded Image and Embedded Image. This iteration exponentially converges to a fixed point Embedded Image on a time scale Embedded Image, where 1 – k > 0 is the smallest Eigenvalue of the Hessian Embedded Image, see SI, Sect. C.

  2. The explicit differential equation is obtained by eliminating the Embedded Image from the right-hand side of the implicit differential equation. Since Embedded Image enters linearly we get Embedded Image with Embedded Image. The explicit form is obtained by matrix inversion, Embedded Image, as the Hessian is invertible if it is strictly positive definite. The external input and the target enter through Embedded Image, where the derivative of the target voltage is only added for the output neurons o. This explicit differential equation is shown to be contractive in the sense that for each input trajectory rin(t) and target trajectory μ*(t), the voltage trajectory μ(t) is locally attracting for neighbouring trajectories. This local attracting trajectory is the vanishing-gradient trajectory f (μ, t) = 0, and the gradient remains 0 even if the input contains delta-functions, see SI Sect. D.

Solving the explicit differential equation seems to be more robust when the learning rate for Embedded Image gets larger. The explicit form is also less sensitive to large Euler steps dt, see SI Sect. C. By this reason, the ordinary differential equations (ODE) were solved in the explicit form when including plasticity Embedded Image. The algorithms are summarized as follows, once without interneurons (Algo 1), and once with interneurons (Algo 2):

Details for Fig. 3b

Color coded snapshot of cortical local field potentials (LFPs) in a human brain from 56 deep iEEG electrodes at various locations, converted with the sigmoidal voltage-to-rate function Embedded Image and plotted onto a standard Talairach Brain (Talairach & Tournoux, 1988). The iEEG data is from a patient with pharmacoresistant epilepsy and electrodes implanted during presurgical evaluation, extracted from the data release of Burrello et al., 2019. The locations of the electrodes are chosen in accordance with plausibilty, as the original positions of the electrodes were omitted due to ethical standards to prevent patient identification.

Details for Fig. 3c

Simulations of the voltage dynamics (Eq. 7a) and weight dynamics (Eq. 8), with learning rate η = 10-3, step size dt = 1 ms for the forward Euler integration, membrane time constant τ = 10 ms and logistic activation function. Weights were initialized randomly from a normal distribution Embedded Image with a cut-off at ±0.3. The number of neurons in the network Embedded Image was n = 96, among them 56 output neurons Embedded Image that were simultaneously nudged, and 40 hidden neurons. During training, all output neurons were nudged simultaneously (with β = 0.1), whereas during testing, only 42 out of 56 neurons were nudged, the remaining 14 left to reproduce the traces. Data points of the iEEG signal were sampled with a frequency of 512Hz. For simplicity, we therefore assumed that successive data points are separated by 2ms, and up-sampled the signal via simple interpolation to 1ms resolution as required by our integration scheme. Furthermore, the raw values were normalized by dividing them by a factor of 200 to ensure that they are approximately in a range of ±1 – 2. Training and testing was done on two separate 8s traces of the iEEG recording. Same data as in Fig. 3b1.

Algorithm 1
Algorithm 1

with projection neurons only, for Figs 3 & 4 (using the explicit ODE, i.e. Step 12 instead of 11)

Algorithm 2
Algorithm 2

including plastic interneurons, for Fig. 5 (using the explicit ODE, i.e. Step 13 instead of 12)

Details for Fig. 4

Simulation of the neuronal and synaptic dynamics as given by Eq. 7a, Eq. 7b and Eq. 8. For 5 ms, 10 ms and 50 ms presentation time, we used an integration step size of dt = 0.05 ms, dt = 0.1 ms and dt = 0.5 ms, respectively (and dt = 1 ms otherwise). As an activation function, we used the step-linear function (hard sigmoidal) with Embedded Image for u ≤ 0, Embedded Image for u ≥ 1 and Embedded Image in between. The learning rate was initially set to η = 10-3 and then reduced to η = 10-4 after 22000 s. The nudging strength was β = 0.1 and the membrane time constant τ =10 ms. In these simulations (and only for these) we assumed that at each presynaptic layer l = 0,1, n – 1 there is a first neuron indexed by 0 that fires with constant rate Embedded Image, effectively allowing the postsynaptic neurons Embedded Image to learn a bias through the first column of the weight matrix Wl+1. Weights were initialized randomly from a normal distribution Embedded Image with a cut-off at ±0.03. For an algorithmic conversion see the scheme below.

Details for Fig. 5

Simulation of neuronal and synaptic dynamics with plastic microcircuit, i.e., the pyramidal-to-interneuron and lateral weights of the microcircuit learned during training.

For the results shown in Fig. 5c2, the following parameters were used. As an activation function, we used a hard sigmoid function and the membrane time constant was set to τ = 10ms. Image presentation time is 100 ms. Forward, pyramidal-to-interneuron and interneuron-to-pyramidal weights were initialized randomly from a normal distribution Embedded Image with a cut-off at ±0.03. All learning rates were chosen equal η = 10-3 and were subsequently reduced to η = 10-4 after 22000s training time. The nudging parameters were set to β = 0.1 and Embedded Image. The feedback connections Bl and the nudging matrices Embedded Image were initialized randomly from a normal distribution Embedded Image with a cut-off at ±0.15. The used integration step size was dt = 0.25ms. All weights were trained simultaneously. For an algorithmic conversion see the scheme below. The interneuron membrane potential was calculated by Eq. 9 with a linear transfer function.

Author contributions

WS, MAP and DD designed the conceptual approach. WS, DD, MAP, AFK, JJ, JS and YB contributed to different aspects of the framework and model. WS, DD, MAP and AFK derived different components of the theory. DD, BE and AFK performed the simulations. MAP, DD, JJ and BE wrote the first draft of the manuscript. WS wrote the final draft.

Supplementary information

A Extracting the presynaptic voltage error

Threshold-linear transfer functions

There is an important special case where the presynaptic voltage error can directly be extracted from presynaptic firing rates, without need to invert the transfer function via synaptic depression as shown below. This is the case when voltage errors in the upper layers are small, and the voltage-to-rate transfer function has derivatives ρ′ = 0 or 1, so that Embedded Image. The condition is satisfied, for instance, for a doubly threshold-linear function (a ‘doubly rectified linear unit’, ReLu) defined by Embedded Image for Embedded Image and Embedded Image for Embedded Image, while Embedded Image for larger voltages. In this case we calculate Embedded Image Embedded Image

The approximation uses the Taylor expansion in Embedded Image and assumes that Embedded Image is small. The crucial point of Eq. 41 is that the mismatch error defined on the voltage, Embedded Image, can be factorized into a product of the postsynaptic rate derivative, Embedded Image, and the apical error, and hence it can be expressed as an error defined on the rate. Restricted to the segment Embedded Image where the transfer function is linear and errors do not vanish, the same microcircuit delivers the feedback Embedded Image to the apical tree through the top-down projections, and Embedded Image through the lateral connections from the interneurons. While the plasticity rules for Embedded Image and Embedded Image stay the same, the top-down nudging of the interneurons, see Eq. 9, can then be formulated based on the rate instead of the upper layer voltage, Embedded Image, with transfer function of the interneuron again the (doubly) threshold-linear Embedded Image. Since voltages and rates are identical in the segment Embedded Image for each component, Eqs 36 with 39 can still be inferred.

Rate-to-voltage inversion by short-term synaptic depression

We wish to readout the voltage error also for other nonlinear transfer functions than clipped ReLu’s. To do so, we take inspiration from the classical short-term synaptic depression model (Tsodyks & Markram, 1997; Abbott et al., 1997; Varela et al., 1997), see also Fig. 6a1. We consider a dynamic vesicle release probability that is proportional to the pool size of available vesicles, Embedded Image, and this pool size is postulated to depend on past presynaptic rates, Embedded Image where Embedded Image is the low-pass filtered presynaptic rate, a and d are constants. The proportionality factor is Embedded Image, making a probability out of the vesicle pools size. The effective synaptic strength Bo of a ‘backprojecting top-down’ connection is the product of the absolute synaptic strength Bo and the vesicle pool size v, i.e. Embedded Image. The contribution to the postsynaptic current of the synapse is Wr, and the contribution to the postsynaptic voltage is Embedded Image.

We search for an activation function Embedded Image such that the postsynaptic voltage contribution is the scaled presynaptic potential, Embedded Image. Plugging in the above expression for B yields Embedded Image, and dividing Bo out, Embedded Image, see Fig. 6a2. With the expression for Embedded Image in Eq. 42 we obtain a quadratic equation in Embedded Image that is solved by the non-negative and monotonically increasing function Embedded Image with ‘smooth’ threshold θ = (1 + a)/d and asymptote Embedded Image. This gives us a transfer function Embedded Image that qualitatively matches those observed for pyramidal neurons recorded in the steady state (Anderson et al., 2000; Rauch et al., 2003), see Fig. 6a3.

The approach generalizes to other pairs of strictly monotonic neuronal activation and depression functions Embedded Image, as long as u is not driven below some minimal value, here urest = 0, corresponding to Embedded Image. The last requirement can, for instance, be accomplished by offsetting the activation function into a regime that guarantees that u stays positive.

In our simulations for Fig. 5, we did not explicitly implement a dynamic vesicle pool, i.e. the right-hand side of Embedded Image, but instead directly used the recovered membrane potentials u.

B Looking back and forward in time with derivatives

Since dealing with extrapolations into the future is a crucial notion of the paper we present here some of the calculations. The discounted future voltage was introduced in Eq. 4 as Embedded Image

To show that Embedded Image satisfies Embedded Image, we need to apply the Leibniz integral rule in calculating the derivative Embedded Image. This leads to Embedded Image

Multiplying this equation by τ and using the definition of Embedded Image yields Embedded Image, or Embedded Image.

By applying the Leibniz integral rule one also shows that Embedded Image, defined in Eq. 15, Embedded Image solves Embedded Image. This differential equation can be written as Embedded Image, with lookahead operator Embedded Image defined in Eq. 14. To show that Embedded Image, one applies partial integration to Embedded Image. Note that the equality Embedded Image only holds if we integrate from -∞, and hence if the initialization of the trajectory is far back in the past as compared to the time constant τ.

Uniqueness of the rate function

In the main text we concluded from the postulate Embedded Image and the general relation Embedded Image that Embedded Image. This conclusion is a consequence of the uniqueness of a solution of an ordinary differential equation for a given initial condition (that may include delta-functions on the right-hand side, see e.g. Nedeljkov & Oberguggenberger, 2012. In fact, we may consider both variables ρ and Embedded Image as solutions of the differential equation Embedded Image. Because the solution is unique, we conclude that Embedded Image.

Learning the input time constants

In our applications we assumed that the input rates in the original mapping Embedded Image are low-pass filtered by a common time constant Embedded Image that is also shared as membrane time constant of the neurons. The general setting of learning to map time series is that input time series, rin(t) are low-pass filtered with given time constants Embedded Image, and the target output time series Embedded Image are a function of these low-pass filtered inputs Embedded Image.

To learn to reproduce the input time constants Embedded Image in the student network by τin, we assume that the inputs converge to neurons μ1 that only receive external inputs. Because Embedded Image, the gradient rule for the input time constant is Embedded Image

This local learning rule also globally reduces the Lagrangian, and in the limits of β → ∞ it is gradient descent on the mismatch energy, while in the limit β → 0 it is gradient descent on the cost. The proof works as in Theorem 1. To learn a more complex mapping of time series that includes more complex temporal processing beyond a function of merely Embedded Image, additional variables need to be introduced that form memories (see ‘Generalizations:...’ below).

C From implicit to explicit differential equations

The original Euler-Lagrange equation is Embedded Image, Eq. 20, with the Jacobian Embedded Image. We have shown that Embedded Image is equivalent to the voltage dynamics given in Eq. 22.

Implicit differential equation

The implicit differential equation can be written as Embedded Image

The partial derivative of f (μ, t) with respect to time, Embedded Image, captures the external drive from the inputs and output targets that do not depend on μ, but may directly depend on time. In fact, instead of the argument t of f, we could consider the two arguments Embedded Image and Embedded Image, and we can then write Embedded Image where δio is the Kronecker delta and equal to 1 if i is an output neuron, and 0 else. The partial derivatives of f with respect to μ represents the (symmetric) Hessian of the Lagrangian, Embedded Image, with δij being the Kronecker-delta, Embedded Image defined in Eq. 21 and Embedded Image defined above Eq. 16. Remember that W = (Win, Wnet) and Embedded Image. In vectorial notation the Hessian of the Lagrangian is Embedded Image where 1 is the identity matrix.

Fixing the arguments Embedded Image of K in Eq. 45, we need to find a fixed point of the mapping Embedded Image. In the argument Embedded Image, the mapping is affine, Embedded Image, with matrix Embedded Image. The Banach fixed point theorem asserts that if Embedded Image is strictly contracting with k, i.e. if Embedded Image for all pairs of inputs and 0 ≤ k < 1, then the iteration (here with iteration time step dt) e locally converges to a fixed point Embedded Image. Because Embedded Image, the mapping is k-contractive if the eigenvalues of L have absolute value smaller than k. This is the case if the Hessian Embedded Image, with Embedded Image, is strictly positive definite. Crucially, because the mapping is affine in Embedded Image, the convergence is global.

For strict positive definiteness of the Hession, the Banach fixed point theorem asserts that during (global) convergence the distance to the fixed point is bounded by a constant times Embedded Image, with iteration index i and ‘virtual’ Euler step dt. In an analogue physical device that implements exactly this feedback circuit in continuous time, the dt becomes truly infinitesimal and in this sense the convergence is instantaneous. If dt remains finite, Embedded Image converges to a moving target because the mapping Embedded Image changes with time. The target should not change quicker than the time scale Embedded Image of the Embedded Image convergence. Given a time course of the input rate rin(t) and target μo*(t) that has bounded variation, the dt can be chosen so that convergence becomes arbitrary quick, and in the limit instantaneous.

If rin(t) contains well-separated delta-functions, while otherwise still having bounded variations, the reasoning still applies since at any time point in time, except at the time point of the singularity, f (μ, t) = 0. This is shown in Appendix D below.

There is a caveat for the strictly positive definiteness of the Hessian, when the learning rate becomes too big and Embedded Image starts to change the neuronal dynamics. In this case, the Hessian becomes Embedded Image, and the Eigenvalues may become negative due to the Embedded Image term. Simulations can in fact become unstable with a big learning rate, and this is more pronounced if also the Euler dt is large. The explicit differential equation avoids the fast iteration towards a moving target and hence allows for larger dt. This in particular pays out in the presence of a high learning rate (although the Cholesky decomposition also requires positive definiteness). By this reason, the large-scale simulations involving plasticity are performed with the explicit form described next.

Explicit differential equation

To isolate Embedded Image in the implicit differential equation, we rewrite Embedded Image again as Embedded Image

Plugging the Hessian Embedded Image from Eq. 47 into Eq. 48, we obtain the voltage dynamics from Eq. 22 in the equivalent form Embedded Image

In our applications, the Hessian H appears to be invertible (although this may not be the case for arbitrary networks), and Eq. 49 can be solved for the unique Embedded Image using the Cholesky decompositions. In fact, if H is invertible, the system implicit ordinary differential equations from Eq. 49 can be converted into a system of explicit ordinary differential equations (with Embedded Image and H given in Eq. 47), Embedded Image and as such it has a unique solution for any given initial condition. Because Embedded Image, see Eq. 46, the regularity requirement for τ Embedded Image to be integrable is satisfied even if Embedded Image and Embedded Image contain step functions (and rin(t) delta-functions), see Sect. D for details.

Even in the presence of such step-functions, the Euler-Lagrange equations Embedded Image lead to an f that is a decaying exponential, Embedded Image. For initialization at t0 = -∞ we have Embedded Image at any time. In fact, Eq. 50 is equivalent to Embedded Image, and hence any solution of Eq. 50, even if Embedded Image contains a delta-function, is also a solution of Embedded Image. Possible jumps in Embedded Image or Embedded Image are compensated by the jumps they induce in μ (see below for the full mathematical description with a simple example). To give an intuition, we assume that a recurrent network of our prospevtive neurons has separate fixed points for the two constant input currents Embedded Image and Embedded Image. This network still shows an overall relaxation time of τin (but not longer!) when the input rates instantaneously switch from Embedded Image to Embedded Image. Nevertheless, at any moment during this relaxation process, gradient learning of the mapping Embedded Image towards Embedded Image is still guaranteed (Theorem 1).

In the case of a functional feedforward network, the network weight matrix Wnet is lower triangular. This is itself a lower triangular matrix, but now with unit diagonal, and as such it is invertible. For feedforward netowrks, H remains invertible also for small β > 0, since then the emerging upper diagonal entries remain small compared to the diagonal entries 1. It also remains invertible for growing nudging strengths, β → ∞, provided that the weighted target errors Embedded Image remain small, see (i) in the proof of Theorem 1.

Link to time-varying optimal control

The explicit differential equation, Eq. 50, is a special case of the one in Simonetto, Dall’Anese, et al., 2020, (Eq. 20), where the function to be minimized (their f) can take a general (Lipschitz continuous) form (hence their f is our Lagrangian, f = L). To avoid inverting the Hessian, an iteration algorithm can be applied similar to our implicit form form, although more involved to deal with the more general form of L (Simonetto & Dall’Anese, 2017). The idea of tracking the solution of a time-varying optimization problem with a linear look-ahead in time has been introduced introduced in Zhao & Swamy, 1998.

D Contraction analysis and delta-function inputs

Stability

We show that the voltage dynamics obtained from Embedded Image is always stable, provided that the Hessian H is invertible. For this we rewrite Eq. 49 in the form Embedded Image, where the explicit time dependence of E and f is a short-cut to express the dependence on Embedded Image. Stated in this generality, the stability analysis also applies to the voltage dynamics derived in the Latent Equilibrium (Haider et al., 2021).

According to the implicit function theorem, at any point in time when Embedded Image is invertible, we can locally write Embedded Image. When absorbing the dependence of Embedded Image on Embedded Image into a time dependence, we can rewrite this differential equation as Embedded Image, see Eq. 50. This differential equation is contractive and thus stable if the Jacobian of g with respect to μ is uniformly negative definite (Lohmiller & Slotine, 1998). The contraction analysis tells that locally, where Embedded Image is invertible, we can express Embedded Image as a function Embedded Image that has derivative Embedded Image. For the μ-component we get the Jacobian Embedded Image

According to Eq. 49 we have Embedded Image and calculate Embedded Image, with Embedded Image specified above. Since according to Eq. 46 the partial derivative with respect to t does not depend on μ, we also calculate Embedded Image. Hence, Embedded Image. The term Embedded Imagemay cause a violation of the positive definiteness. However, this appears only transiently after initialization since the gradient f exponentially quickly converges to 0 after initialization. If t0 = -∞, the term Embedded Image vanishes, and with it also Embedded Image. This asserts local contraction property of the fixed point trajectory as then Embedded Image. Hence, around a point μo on the trajectory, the linear approximation of the dynamics is Embedded Image, showing an exponential local contraction to μo.

Global convergence

The above stability analysis yields only the strict contraction property after the convergence of the gradient to f = 0. We have shown that the iteration of Embedded Image globally converges to a fixed point Embedded Image, see Eq. 45. Let us therefore assume that Embedded Image satisfies this fixed point equation. This defines a trajectory μ(t) that globally converges to the fixed points Embedded Image. In fact, these fixed points are characterized by the vanishing gradients, f = 0, and these vanishing gradients are globally reached. This is because f (u(t), t), as a function of time, exponentially decays to 0 in each component, as noticed above. To restate this in a compound version, we consider an arbitrary initialization u(t0) for which in general f (u(t0), t0) ≠ 0. Because at any time we have Embedded Image, the length of f (u(t), t) exponentially converges to 0, as expressed by its temporal derivative, Embedded Image

Hence, starting at any initial point, the voltage dynamics finds a self-consistency solution of f (u(t),t) = 0, or equivalently of Embedded Image, with Embedded Image.

Delta-function inputs keep Embedded Image

We next explain in more details why delta-function in the input rates rin(t) for the NLA the stationarity (‘equilibrium’) condition Embedded Image is always satisfied (the delta-function in rin for the NLA corresponding to step-function in rin for the Latent Equilibrium). We reconsider the explicit differential equation Embedded Image given in Eq. 50, with Embedded Image and H given in Eq. 47.

To simplify matters, we consider a single delta-function at t = 0 as input in the absence of output nudging. In this case we get Embedded Image, where the input matrix Win is typically sparse (not all network neurons receive external input), and δin(t) is a vector of delta-functions restricted to the input neurons. Following Nedeljkov & Oberguggenberger, 2012, Proposition 2.1, we can then write the explicit differential equation, Eq. 49, in the form Embedded Image where Embedded Image is globally Lipschitz continuous. Due to the Lipschitz continuity the change in u evoked by Embedded Image during a small time interval [–ε, ε] around t = 0 vanishes when this interval shrinks, ε → 0. To quantifies the change in u during these intervals it is therefore enough to consider Embedded Image, or equivalently Embedded Image. To estimate the jump induced by the delta-functions, we consider some mollifier φε(t) = ε-1φ(t/ε), where φ(t) is a smooth function on the interval [–1,1] with integral 1. By φin,ε(t) we denote the vector of mollifiers centered at the delta-functions of the input neurons. We now consider the two differential equations, with the second approximating the first on the interval [–ε, ε], but without regular term Embedded Image, Embedded Image Embedded Image

We assume that for all t ∈ [–ε, ε] the matrices H(uε(t)) and Embedded Image are invertible, so that the two Eqs 54a,b can be turned into an explicit differential equations. Analogously to the 1-dimensional case (Nedeljkov & Oberguggenberger, 2012), we conclude that the solution of Eqs 54a,b on the interval [–ε, ε] converge to each other, Embedded Image for ε → 0. As a consequence, the jump of Embedded Image at t = 0 converges to the corresponding jump of uε in the various dimensions.

To calculate the jump of Embedded Image we have to integrate Embedded Image across the time interval [–ε, ε]. Instead of integrating Embedded Image, we first integrate Embedded Image given in Eq. 54b. Moving from right to left yields Embedded Image where Iin is the index vector of the input neurons having a delta-function, i.e. Iin,j = 1 if there is a delta-input, and else 0. Note that Embedded Image. Because H is itself a derivative, Embedded Image, we can explicitly calculate the latter integral (also for β > 0, but for clarity here only done for β = 0). The last integral in Eq. 55 is defined as a vector with i-th component being Embedded Image where in the second last equality we used that Hij (u) = δij – Wuj ρ′(uj)) does only depend on the component uj, see Eq. 47. In the last equality we introduced the ‘network voltage error’ Embedded Image. Following the 1-dimensional case treated in Nedeljkov & Oberguggenberger, 2012, Proposition 1.2 we introduce the ‘jump function’ (called G(y) in the cited work) Embedded Image with Embedded Image thought to represent Embedded Image at some time to before the delta-kick sets in. With this setting, Eq. 55 turns into Embedded Image

In the limit ε → 0 we get a relation between μ(t) immediately before and after the jump, μ(– 0) and μ(+0), using that in this limit the boundary points of the trajectories also converge, Embedded Image, Embedded Image

We now assume that the function J(μ) = τ(v – vo) is invertible around the jump. This is the case if the Jacobian Embedded Image is invertible, and because v = u – Wnet ρ(μ), we require invertability of Embedded Image, with Hessian defined in Eq. 47.

In the case of invertability we get the voltage after the jump as Embedded Image

We next calculate the jump in v. This is easy since v = μ – Wnet ρ(μ) linearly enters in the function J(μ) = τ(v – vo). Plugging the explicit expression for J into Eq. 59 we get τ (v(+0) – vo) = τ (v(–0) – vo) + WinIin, or Embedded Image

Knowing the jump in v helps to show that the equilibrium condition Embedded Image is always satisfied, even immediately after the delta-in put, provided the initialization is at t0 = –∞. To show this, remember that in the absence of nudging we have Embedded Image. The jump size of Embedded Image for a delta-function at t = 0, rin(t) = δin(t) is Embedded Image. This is because Embedded Image satisfies the differential equations Embedded Image, provided that t0 = ∞. Hence, Embedded Image

With Eqs 61 and 62 we conclude that Embedded Image throughout.

E Example of a single recurrently connected neuron

To get an intuition for the instantaneity in a the recurrent case we consider the example of a single, recurrently connected neuron. We also put this into the context of the Latent Equilibrium (Haider et al., 2021). Consider the weight vector W = (Win, Wnet) with an input an input rate rin(t) driving the postsynaptic voltage u. The postsynaptic rate is Embedded Image, and its low-pass filter with respect to τ is Embedded Image. As always, the low-pass filtering reaches back to an initialization at t0 = –∞, see Eq. 15. The Lagrangian has the form Embedded Image

The Euler-Lagrange equations Embedded Image are derived from Embedded Image. Applying the look-ahead operator Embedded Image (Eq. 14), and abbreviating Embedded Image, the Euler-Lagrange equations deliver the voltage dynamics, Embedded Image

To simplify matters, we consider the nudging-free case, β = 0. This implies that Embedded Image. With Embedded Image, we obtain the differential equation Embedded Image

Abbreviating v = u – Wnet ρ(u) as ‘network voltage error’, the above differential equation reads as Embedded Image

Integrating the effective voltage dynamics (Eq. 66), assuming initialization infinitely far in the past, is equal to Embedded Image. This equation is equivalent to the Euler-Lagrange equation Embedded Image being integrate, and because the solution of the Euler Lagrange equation is Embedded Image, we have (using t0 = –∞) Embedded Image

Voltage dynamics for a delta-function input

We next apply a delta-function in the input rate, say rin(t) = δ(t) and consider the dynamics at the level of the voltage, Eq. 65. As in Nedeljkov & Oberguggenberger, 2012, Proposition 1.2 we introduce the ‘jump function’ (called G(y) in the cited work) Embedded Image

As in Nedeljkov & Oberguggenberger, 2012, Proposition 1.2, we show that the voltage u makes a unique jump at the moment of the delta function that J(u) is invertible around the jump.

We set vo = uo – Wnet ρ(uo). Here, uo is some voltage before the jump, say uo = u(–1) evaluated at time t = –1, when the jump is at t = 0. When Embedded Image is the voltage immediately before the jump, the voltage immediately after the jump is Embedded Image specified by Embedded Image

The reason is that the Win-scaled delta-function triggers a step of size Win when integrating over it as done in Eq. 68. The new value Embedded Image is unique if J is invertible, and looking at the defining integral in Eq. 68, this is the case if Embedded Image.

The jump in u translates into a jump in v = u – Wnet ρ(u) from Embedded Image to Embedded Image. This endpoint can also be expressed as Embedded Image

To check this, we assume without loss of generality that vo = uo – Wnet ρ(uo) = 0. Then Embedded Image and Embedded Image according to Eq. 69. Since also Embedded Image, we conclude that Embedded Image, as claimed above.

We finally show that even far away from the initialization, the stationarity condition Embedded Image holds before and immediately after the jump. In fact, for t0 = -∞, the evolution of the ‘network voltage error’ becomes Embedded Image

Here we used that Embedded Image and according to Eq. 70 the v jumps to Embedded Image. Remember that for initialization far in the past, Embedded Image is equivalent to Embedded Image, see Eq. 67. We therefore have to calculate the jump in Embedded Image induced by the delta-input. Since Embedded Image is the solution of Embedded Image for t0 = -∞, we find that Embedded Image

Combing the two Eqs 71 and 72 proves that Embedded Image holds true any moment in time, provided the initialization is far in the past.

One may ask why the delta-kink is different from resetting v at a new intialization off from 0. The reason is that at t = 0 there is a cause for the jump in r(0), while at t0 there is no cause in r(t0). In fact, there is no jump initially, just the start of v at some initial condition. Differently from the initialization at t0, where v(t0) > 0 implies Embedded Image for finite t – t0 > 0, the jump of v(0) at t = 0 to a positive value does leave Embedded Image for all t > 0, provided t0 = -∞.

Linear transfer function

We first consider the case of a linear transfer-function ρ(u) = 0 (or threshold linear, being in the linear regime). Then the differential equation becomes Embedded Image

With initialization at t = -∞ and low-pass filtering Embedded Image defined in Eq. 15 the solution is Embedded Image

The point is that the time constant is τ and is not Embedded Image, as this would be the case without prospective firing rate. In fact, for the ‘classical’ differential equation, Embedded Image and ρ the identity, we obtain the differential equation Embedded Image with Embedded Image and solution Embedded Image that is now the low-pass filtering with respect to the effective time constant.

Sigmoidal transfer function

For a sigmoidal transfer function Embedded Image, a positive feedback weight Wnet > 0, and a constant external input rin, say, the solutions u(t) of Eq. 64 either converge to a fixed point or diverge. When converging, the voltage satisfies the fixed point condition Embedded Image

This fixed point equation can be numerically solved by time-discrete iteration process. But it can also be solved by a time-continuous process that underlies a neural or neuromorphic implementation. The prospective firing rate introduced in the NLA can be seen as a method to quickly find the fixed point in continuous time. When directly solving the implicit differential equation (as opposed to convert this into an explicit differential equation using e.g. the Cholesky decomposition), the fixed point is potentially found with a fewer number of steps.

To estimate the speed of convergence, we look at the initial speed when taking off at initial condition u(0) between the unstable and stable fixed point. The initial speed for the classical differential equation, Eq. 75, and the NLA version, Eq. 64, are, respectively, Embedded Image Embedded Image where we set Embedded Image. As Embedded Image, the initial convergence speed of the NLA solution is larger. The scheme has some resemblance to the Newton algorithm of finding zero’s of a function by using its derivative.

F Generalizations: NLA for conductance-based neurons and more dynamic variables

The mismatch energies and costs can be generalized in different ways. Here we focus on a biophysical version of the mismatch energy that includes conductance-based neurons. This also relates to the least-action principle in physics. But the NLA can also be generalized to include other dynamica variables such as adaptive thresholds or synaptic short-term plasticity.

Equivalent somato-dentritic circuit

For conductance based synapses, the excitatory and inhibitory conductances, gE and gI, are driven by the presynaptic firing rates and have the form gE(t) = WE r(t), and analogously gI(t) = WI r(t). The dynamics of a somatic voltage u and a dendritic voltage v reads as Embedded Image Embedded Image where c and cd are the somatic and dendritic capacitances, EL/E/I the reversal potentials for the leak, the excitatory and inhibitory currents, gsd the transfer conductance from the dendrite to the soma, and gds in the other direction.

We consider the case when the dendritic capacitance cd is small as compared to the sum of conductances gd on the right-hand-side of Eq. 79, yielding a fast dendritic time constant. In this case we can solve this equation in the steady state for v, plug this into Eq. 78, and get Embedded Image with effective reversal potential Embedded Image, total conductance Embedded Image, feedforward dendritic voltage vff = (gL EL + gE EE + gI EI)/gff and feedforward dendritic conductance gff = gL + gE + gI. Because Embedded Image, the conductance depends on the presynaptic voltage and its derivative. Equation 80 describes the effective circuit that has the identical voltage time course as Eqs 78 and 79 with Embedded Image, but with a single time-dependent ‘battery voltage’ V and Ohmic resistance R = 1/g.

Somato-dentritic mismatch power and action

The synaptic inputs gE(t) and gI(t) are continuously driving V(t), and the best what one can hope for the dynamics of u is that it traces V with some integration delay determined by the time constant τ = c/g. In fact, if u follows the dynamics of Eq. 80, then u becomes the low-pass filtered target potential, Embedded Image, where V(t) is filtered with the dynamic time constant τ(t). The defining equations for the low-pass filtering is Embedded Image and this self-consistency equation is equivalent to the explicit form Embedded Image

To capture the voltage dynamics with our NLA principle we recall that the somatic voltage u can be nudged by an ‘apical voltage’ Embedded Image that causes a somatic voltage error Embedded Image. The voltage error drives a current Embedded Image through the conductance g. The electrical power of this current I driven by the voltage Embedded Image is Embedded Image. This motivates the definition of the mismatch power in a network of N neurons by Embedded Image

Embedded Image is a virtual power that, nevertheless, is related to some physical flow of ions. Assume we could measure all the ions flowing in the original circuit of Eqs 78 and 79 (in the limit of small ratio Cd/gd). From this flow, delete the ion flow that cancels at the level of electrical charge exchange due to the counter directed flow. The remaining effective ion flow defines an effective current flowing through the conductance g with driving force (V – u), Eq. 80. If it were only this effective current in the reduced circuit, the voltage u(t), starting at u(0), would converge with time-constant τ to the low-pass filtering Embedded Image. Without additional ‘hidden’ current, the voltage u would then instantaneously follow Embedded Image that is itself given by the forward dendritic input conductances. The deviation of u from Embedded Image, caused by some initial conditions in u or by a feedback current from the network affecting u, builds the mismatch power Embedded Image. The feedback may originate from a target imposed downstream, and the neuron is ‘free’ in how to dynamically match u and Embedded Image. It is therefore tempting to see Embedded Image as a ‘free power’, and the NLA principle as minimizing the corresponding ‘free energy’. In fact, the free-energy principle says that any self-organizing system that is in a dynamic equilibrium with its environment (here Embedded Image in the absence of output nudging) must minimize its free energy (that here builds up by imposing a target) (Friston, 2010).

The NLA principle states that the time-integral of Embedded Image is minimized with respect to the look-ahead voltage Embedded Image. We therefore define the physical mismatch energy as Embedded Image that has the units of energy. Embedded Image takes the role of our neural action (A in the main text) for conductance-based neurons.

Euler-Lagrange equations for conductance-based neurons

The NLA for conductance-based neurons seeks to minimize Embedded Image with respect to variations of u, such that Embedded Image. In the simplest example of the main text we considered prospective rates, Embedded Image, so that the low-pass filtered rates become a function of the instantaneous voltage, Embedded Image. These low-pass filtered presynaptic rates, Embedded Image, determine the postsynaptic voltage u. Analogously, the low-pass filtered reversal potential, Embedded Image, determines the postsynaptic voltage u, and we again postulate that Embedded Image is an instantaneous function of the presynaptic voltage, Embedded Image. Here we argue that active dendritic mechanisms advance to postsynaptic reversal potential V, so that the delayed Embedded Image again becomes instantaneous, similarly to the advancement of the apical dendritic potential observed in cortical pyramidal neurons (Ulrich, 2002), see also Fig. 2b. With this instantaneity, the stationarity of the action with respect to generalized (compact and non-compact) variations, Embedded Image, translates to the condition Embedded Image.

Calculating Embedded Image with Embedded Image given in Eq. 83 and τ = c/g for the total conductance g(u) specified after Eq. 80 is a bit more demanding. For a probabilistic version, where Embedded Image is derived from the negative log-likelihood of a Gaussian density of the voltage, Embedded Image, the calculation is done (Jordan et al., 2022). In the probabilistic version, there is an additional normalization term that enters here as log g. In the deterministic version considered here, this log term is not present and we calculate Embedded Image

Notice that the transpose Embedded Image selects the downstream network neuron to backpropagate from there the first- and second-order errors. From Embedded Image and τ = c/g we conclude that Embedded Image

We next apply the look-ahead operator to the expression in this Eq. 86. Assuming an initialization at t0 = –∞, the condition Embedded Image becomes equivalent to Embedded Image, and hence Eq. 86 becomes equivalent to Embedded Image

A learning rule of the form Embedded Image with an appropriate postsynaptic error Embedded Image can again be derived (see Jordan et al., 2022 for a single neuron in the probabilistic framework). But this, with the above sketch, needs to be worked out yet.

Generalizations: long memories, reinforcement learning

One could also extend the NLA principle by adding e.g. threshold adaptation that endows the dynamics with additional and longer time constants. For this, the rate function is parametrized by an additional threshold, Embedded Image, an the Lagrangian is added by an error term on the threshold. Such an error addition can take the form Embedded Image, where Embedded Image now represents a low-pass filtering of the rate with a long threshold adaptation time constant τϑ. Neurons that show an additional threshold adaptation will still be able to instantaneously transmit a voltage jump through a prospective firing rate, but this will now also depend on the neuron’s history. Short-term plasticity may be included in the same way. Due to the history dependence, the stationarity of the action δA = 0 cannot anymore be reduced to the stationarity of the Lagrangian at any moment in time. As a consequence, the errors will look-ahead into to future more than only based on the local derivatives.

Generalizations are further possible for the cost function that favor voltage regions with high cost, say corresponding to punishment, or negative cost, corresponding to punishment to reward. These extensions will be considered in future work.

Voltage dynamics from the physical least-action principle

We have shown that through the look-ahead in Hamilton’s least-action principle, the notion of friction enters through the backdoor. In the least action formalism in physics, friction is directly introduced by extending the Hamiltonian principle to a generalized D’Alembert principle, where at the level of the Euler-Lagrange equations the generalized force is equated to the dissipation force (Flannery, 2005).

The electro-chemical properties of a membrane can be captured by an equivalent circuit consisting of a battery voltage V, a conductance C and a resistance R, arranged in parallel. The voltage dynamics is derived from the Euler-Lagrange equations that are added by a dissipative force. Formally, in the absence of an inductance defining the kinetic energy, the Lagrangian Embedded Image becomes identical to the potential energy Embedded Image with Embedded Image Embedded Image

Here, Embedded Image is the dissipative Rayleigh energy related to friction and Q represents the charge across the membrane. According to D’Alembert’s principle, the dynamics is characterized by the Euler-Lagrange equation with dissipative force, Embedded Image, and this equation reduces to Embedded Image. Identifying the charge by means of voltage across the capacitance, Q = Cu, this equation can also be written as Embedded Image, or Embedded Image with τ = RC, just as also derived in Eq. 87. Loosly speaking, the minimization of the energy Embedded Image by looking ahead is equivalent to a minimization without looking ahead, but taking account of friction.

G A tutorial on total and partial derivatives as used in the paper

The proof of Theorem 1 given in the Methods makes use of partial and total derivatives and follows the notation of Scellier & Bengio, 2017 and Meulemans et al., 2022. As there is some variability (and community-dependent sloppiness) in the notation of partial and total derivatives, we provide some explanations on how this notation is interpreted, and why, for instance, total derivatives commute with each other, and also partial with each other (although not total with partial).

  1. In a differential geometric setting, the derivative of real-valued function E(μ) on a point μ of a manifold (like the flat Euclidean space) is considered as a mapping of tangent vectors at u to the real numbers. When interpreting the derivative as mapping, the Embedded Image is living in the dual space of u and is therefore a row vector if is a column vector. If a function f (μ) of μ is a vector valued, then its derivative is a matrix with entries Embedded Image, where i is indexing the rows (i.e. running down) and j is indexing the columns (i.e. running right). When r = ρ(μ) is a column vector with ρ applied to each component of μ, we consider the (partial) derivative r′ = ρ′(μ) for convenience as column vector with components ρ′(ui). To strictly follow the formalism, it should be a diagonal matrix.

  2. Because we introduced the error vector Embedded Image as column vector, it is easier to write Embedded Image, where now Embedded Imageis also considered as column vector. To be consistent with the above, we should have written Embedded ImageThe.T appeared us as too heavy so that we neglected it, where it did not have further mathematical consequences (hoping it does not cause confusions). Sticking here to a column vector also renders the backpropagation error to be a column vector in Eq. 21. This error then gets the classical form with the weight transpose, Embedded Image.

  3. We typically have real-valued functions of the form E(μθ, θ), with θ = (W, β) being a vector of parameters, and μ being a function of θ. To get the total derivative of E with respect to θ we consider the values E as a function of θ. This can be done by introducing a new function Embedded Image defined as Embedded Image, where the components θi are considered as independent variables. The total derivative of E with respect to θ is then defined as vector-valued function Embedded Image for a n-dimensional θ. It can be helpful to think of the components of this total derivative as a (total) directional derivatives in the unit directions. For the last (n-th) unit direction Δθ(n) = (0,.., 0,1), for instance, we have Embedded Image.

  4. The cost gradient, Embedded Image has the same dimension as W. Recall that by the cost gradient we mean Embedded Image, where Embedded Image is defined as Embedded Image, with the voltage uo of the output neurons being itself a function of W.

  5. To calculate the partial derivative Embedded Image with respect to θ, we fix the first argument uθ, even if for u we often plugged in the components of the trajectory u = uθ(t) that now does depend on θ. In contrast, the total derivative is Embedded Image. Here, Embedded Image is a row vector, as also Embedded Image, consistent with the convention (that we have broken in Eq. 28 to keep vectors as columns). When u is considered as trajectory, Embedded Image does not vanish in general, but it does when u is simply considered as independent variable.

  6. When replacing the argument u in E(u, θ) by Embedded Image we get the ‘Lagrangian’ Embedded Image. The partial derivative of L with respect to Embedded Image, for instance, is then Embedded Image. The partial derivative Embedded Image considers L as a function of independent arguments Embedded Imageand θ.

  7. We also used that the total derivatives can be exchanged, in the current example Embedded Image. This is generally true for derivatives of Lipschitz continuous functions, for which derivatives exist almost everywhere. The total derivatives (where they exist) then commute because the difference quotients in W and β are uniformly bounded. The Moore-Osgood theorem tells that two limits, of which at least one is uniform in the other, can be commuted. This also applies to the double difference quotients involved in the definition of Embedded Image. Remember that the total derivative, for instance with respect to the n-th parameter, can be written as Embedded Image.

H Proof of Theorem 1, part (ii), using only partial derivatives

This Section proves Eqs 29-31 in terms with only partial derivatives, banning nested functions that require to deal with total derivatives. The 3 equations are also the core for the proof for Equilibrium Propagation Scellier & Bengio, 2017, although there only applied in the steady state after the network converged to a constant activity.

We assume a network of d neurons whose membrane potential is given by u ∈ ℝd and which are connected via weights W ∈ ℝd×d. By ∇u, we denote the gradient with respect to the membrane potentials, i.e., Embedded Image. Similarly, ∇W is a matrix containing the derivatives with respect to the weights, Embedded Image.

To prove the rt-DeEP theorem (theorem 1), we first have to make a few definitions and observations:

  1. For a given (W, β), the dynamics yield certain membrane potentials fu ∈ ℝd. Formally, we define this as Embedded Image

    The ith element of fu is denoted by fui and hence Embedded Image.

  2. We define the following functions:

    • the mismatch energy Embedded Image,

    • the cost function Embedded Image,

    • the Lagrangian Embedded Image.

    To make the dependency of the cost and energies on β and W explicit, we further introduce three auxiliary functions FM, FC and FL:

    • for the mismatch energy Embedded Image,

    • for the cost function Embedded Image,

    • for the Lagrangian Embedded Image

  3. The Euler-Lagrange equations can be written as Embedded Image. Hence, far enough away from initialization and for smooth enough input (and targets) we have ∇ fu L = 0 at all times, even when changing the network input continuously. Note that both the cost and the mismatch energies are defined on low-pass-filtered signals, and it is with respect to the low-pass filtered external input that the low-pass-filtered output error is minimized.

  4. Without output nudging (i.e., β = 0), the output error vanishes and consequently all other prediction errors vanish as well, Embedded Image. This can be easily shown for layered network architectures and holds true for arbitrary connections (e.g., recurrent networks) as long as fu uniquely exists, i.e., Embedded Image has a unique solution for μ. From the form of the mismatch energy, we then get Embedded Image

    Since we are assuming smooth functions, this also implies that Embedded Image

  5. From the assumption of well-behaved (smooth) functions, it also follows that partial derivatives commute Embedded Image.

Our goal is to find a plasticity rule that minimizes the cost C, which we do by calculating ∇W FC|β=0. Similar to Scellier & Bengio (2017), to achieve this, we first calculate the partial derivatives of FL with respect to the nudging strength β Embedded Image Embedded Image Embedded Image Embedded Image and the weights W Embedded Image Embedded Image Embedded Image Embedded Image Embedded Image or in vectorized form Embedded Image

With these identities in place, we can calculate the plasticity rule: Embedded Image Embedded Image Embedded Image Embedded Image Embedded Image Embedded Image Embedded Image where we used that limits can be exchanged for smooth functions. Using the definition of EM, we obtain a plasticity rule that minimizes the cost function Embedded Image

In practice, the prefactor Embedded Image is absorbed into the learning rate.

Acknowledgments

We would like to thank Federico Benitez, Jonathan Binas, Paul Haider, Kevin Max, Alexander Mathis, and Alexander Meulemans and Jean-Pascal Pfister for helpful discussions, Kaspar Schindler for providing the human intracortical EEG data, and the late Karlheinz Meier for his dedication and support throughout the early stages of the project. WS particularly thanks Nicolas Zucchet for inspiring mathematical discussions and hints to literature on time-varying optimal control. This work has received funding from the European Union 7th Framework Programme under grant agreement 604102 (HBP), the Horizon 2020 Framework Programme under grant agreements 720270, 785907 and 945539 (HBP) and the Swiss National Science Foundation (SNSF, Sinergia grant CRSII5180316; João Sacramento is supported by SNSF Ambizione grant PZ00P3_186027). We further express our particular gratitude towards the Manfred Stärk Foundation for their continued support. We acknowledge the use of Fenix Infrastructure resources, which are partially funded from the European Union’s Horizon 2020 research and innovation programme through the ICEI project under the grant agreement No. 800858.

References

  1. 1.↵
    Abbott, L. F., Varela, J. a., Sen, K. & Nelson, S. B. Synaptic depression and cortical gain control. Science 275, 220–224 (1997).
    OpenUrlCrossRefPubMedWeb of Science
  2. 2.↵
    Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A learning algorithm for Boltzmann machines. Cognitive Science 9, 147–169 (1985).
    OpenUrl
  3. 3.↵
    Akrout, M., Wilson, C., Humphreys, P. C., Lillicrap, T. & Tweed, D. Deep learning without weight transport. arXiv (2019).
  4. 4.↵
    Bannon, N. M., Chistiakova, M. & Volgushev, M. Synaptic Plasticity in Cortical Inhibitory Neurons: What Mechanisms May Help to Balance Synaptic Weight Changes? Frontiers in Cellular Neuroscience 14 (2020).
  5. 5.↵
    Bartolozzi, C., Indiveri, G. & Donati, E. Embodied neuromorphic intelligence. Nature Communications 13, 1–14 (2022).
    OpenUrlCrossRef
  6. 6.↵
    Bastos, A. M. et al. Canonical Microcircuits for Predictive Coding. Neuron 76, 695–711. http://linkinghub.elsevier.com/retrieve/pii/S0896627312009592 (2012).
    OpenUrlCrossRefPubMedWeb of Science
  7. 7.↵
    Bellec, G. et al. A solution to the learning dilemma for recurrent networks of spiking neurons. Nature Communications 11, 1–15 (2020).
    OpenUrlCrossRef
  8. 8.↵
    Blom, T., Feuerriegel, D., Johnson, P., Bode, S. & Hogendoorn, H. Predictions drive neural representations of visual events ahead of incoming sensory information. Proceedings of the National Academy of Sciences of the United States of America 117, 7510–7515 (2020).
    OpenUrlAbstract/FREE Full Text
  9. 9.↵
    Borovik, A. A mathematician’s view of the unreasonable ineffectiveness of mathematics in biology. Biosystems 205, 104410 (2021).
    OpenUrl
  10. 10.↵
    Burrello, A., Cavigelli, L., Schindler, K., Benini, L. & Rahimi, A. Laelaps: An Energy-Efficient Seizure Detection Algorithm from Long-term Human iEEG Recordings without False Alarms in Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) (2019), 752–757.
  11. 11.↵
    Coopersmith, J. The Lazy Universe: An Introduction to the Principle of Least Action (Oxford University Press, New York, 2017).
  12. 12.↵
    Dimitriou, M. Enhanced Muscle Afferent Signals during Motor Learning in Humans. Current Biology 26, 1062–1068. http://dx.doi.org/10.1016/j.cub.2016.02.030 (2016).
    OpenUrlCrossRefPubMed
  13. 13.↵
    Dimitriou, M. Human muscle spindles are wired to function as controllable signal-processing devices. eLife 11, 1–14 (2022).
    OpenUrlCrossRef
  14. 14.↵
    Dimitriou, M. & Edin, B. B. Human muscle spindles act as forward sensory models. Current Biology 20, 1763–1767. http://dx.doi.org/10.1016/j.cub.2010.08.049 (2010).
    OpenUrlCrossRefPubMed
  15. 15.↵
    1. Sternad, D.
    Feldman, A. G. & Levin, M. F. in Progress in Motor Control (ed Sternad, D.) 699–726 (2009).
  16. 16.↵
    Feynman, R., Leighton, R. & Sands, M. The Feynman Lectures on Physics, Vol. II: Mainly Electromagnetism and Matter, Chapt. 19 https://www.feynmanlectures.caltech.edu/II_19.html (Basic Books, 2011).
  17. 17.↵
    Göltz, J. et al. Fast and energy-efficient neuromorphic deep learning with first-spike times. Nature Machine Intelligence 3, 823–835 (2021).
    OpenUrl
  18. 18.↵
    Guerguiev, J., Lillicrap, T. P. & Richards, B. A. Towards deep learning with segregated dendrites. eLife 6, 1–37 (2017).
    OpenUrlCrossRefPubMed
  19. 19.↵
    Haider, P. et al. Latent Equilibrium: Arbitrarily fast computation with arbitrarily slow neurons. Advances in Neural Information Processing Systems 34 (2021).
  20. 20.↵
    Hassabis, D., Kumaran, D., Summerfield, C. & Botvinick, M. Neuroscience-Inspired Artificial Intelligence. Neuron 95, 245–258. http://dx.doi.org/10.1016/j.neuron.2017.06.011 (2017).
    OpenUrlCrossRefPubMed
  21. 21.↵
    Herzfeld, D. J., Vaswani, P. A., Marko, M. K. & Shadmehr, R. A memory of errors in sensorimotor learning. Science 345, 1349–1353. http://www.ncbi.nlm.nih.gov/pubmed/25123484 (2014).
    OpenUrlAbstract/FREE Full Text
  22. 22.↵
    Hinton, G. The Forward-Forward Algorithm: Some Preliminary Investigations in NeurIPS (2022).
  23. 23.↵
    Hodgkin, A. L. & Huxley, A. F. A quantitative description of membrane current and its application to conduction and excitation in nerve. Bulletin of Mathematical Biology 117, 500–544 (1952).
    OpenUrlCrossRef
  24. 24.↵
    Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. USA 79, 2554–2558 (1982).
    OpenUrlAbstract/FREE Full Text
  25. 25.↵
    Kawato, M. Internal models for motor control and trajectory planning. Current Opinion in Neurobiology 9, 718–727 (1999).
    OpenUrlCrossRefPubMedWeb of Science
  26. 26.↵
    Köndgen, H. et al. The dynamical response properties of neocortical neurons to temporally modulated noisy inputs in vitro. Cerebral cortex 18, 2086–2097 (2008).
    OpenUrlCrossRefPubMedWeb of Science
  27. 27.↵
    Kunin, D. et al. Two routes to scalable credit assignment without weight symmetry in International Conference on Machine Learning (2020), 5511–5521.
  28. 28.↵
    Larkum, M. A cellular mechanism for cortical associations: An organizing principle for the cerebral cortex. Trends in Neurosciences 36, 141–151 (2013).
    OpenUrlCrossRefPubMedWeb of Science
  29. 29.↵
    Latash, M. L. Muscle coactivation: Definitions, mechanisms, and functions. Journal of Neurophysiology 120, 88–104 (2018).
    OpenUrlCrossRef
  30. 30.↵
    LeCun, Y. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998).
  31. 31.↵
    1. Bakir, G.,
    2. Hofman, T.,
    3. Schoelkopf, B.,
    4. Smola, A. &
    5. Taskar, B.
    LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M. A. & Huang, F. J. in Predicting Structured Data (eds Bakir, G., Hofman, T., Schoelkopf, B., Smola, A. & Taskar, B.) 1–59(MIT Press, 2006).
  32. 32.↵
    Li, S. et al. Coordinated alpha and gamma control of muscles and spindles in movement and posture. Frontiers in Computational Neuroscience 9, 1–15 (2015).
    OpenUrl
  33. 33.↵
    Lillicrap, T. P., Cownden, D., Tweed, D. B. & Akerman, C. J. Random synaptic feedback weights support error backpropagation for deep learning. Nature communications 7, 1–10 (2016).
    OpenUrl
  34. 34.↵
    Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J. & Hinton, G. Backpropagation and the brain. Nature Reviews Neuroscience 21, 335–346 (2020).
    OpenUrlPubMed
  35. 35.↵
    Max, K. et al. Learning efficient backprojections across cortical hierarchies in real time. arXiv, 1–31. http://arxiv.org/abs/2212.10249 (2022).
  36. 36.↵
    Mesnard, T., Vignoud, G., Sacramento, J., Senn, W. & Bengio, Y. Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks. arXiv (2019).
  37. 37.↵
    Meulemans, A., Farinha, M. T., et al. Credit Assignment in Neural Networks through Deep Feedback Control. Advances in Neural Information Processing Systems 34 (2021).
  38. 38.↵
    Meulemans, A., Zucchet, N., Kobayashi, S., von Oswald, J. & Sacramento, J. The least-control principle for local learning at equilibrium. Advances in Neural Information Processing Systems 35 (2022).
  39. 39.↵
    Ostojic, S. et al. Neuronal morphology generates high-frequency firing resonance. Journal of Neuroscience 35, 7056–7068 (2015).
    OpenUrlAbstract/FREE Full Text
  40. 40.↵
    Palmer, S. E., Marre, O., Berry, M. J. & Bialek, W. Predictive information in a sensory population. Proceedings of the National Academy of Sciences 112, 6908–6913 (2015).
    OpenUrlAbstract/FREE Full Text
  41. 41.↵
    Papaioannou, S. & Dimitriou, M. Goal-dependent tuning of muscle spindle receptors during movement preparation. Science Advances 7, 1–14 (2021).
    OpenUrlCrossRef
  42. 42.↵
    Pfister, J.-P., Dayan, P. & Lengyel, M. Synapses with short-term plasticity are optimal estimators of presynaptic membrane potentials. Nature Neuroscience 13, 1271–1275 (2010).
    OpenUrlCrossRefPubMed
  43. 43.↵
    Poirazi, P. & Papoutsi, A. Illuminating dendritic function with computational models. Nature Reviews Neuroscience 21, 303–321 (2020).
    OpenUrl
  44. 44.↵
    Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience 2, 79–87. http://www.nature.com/doifinder/10.1038/4580 (1999).
    OpenUrlCrossRefPubMedWeb of Science
  45. 45.↵
    Richards, B. A. et al. A deep learning framework for neuroscience. Nature neuroscience 22, 1761–1770 (2019).
    OpenUrlCrossRefPubMed
  46. 46.↵
    Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning Representations by Back-propagating Errors. Nature 323, 533–536 (1986).
    OpenUrlCrossRefWeb of Science
  47. 47.↵
    Sacramento, J., Ponte Costa, R., Bengio, Y. & Senn, W. Dendritic cortical microcircuits approximate the back-propagation algorithm. Advances in Neural Information Processing Systems 31, 8721–8732 (2018).
    OpenUrl
  48. 48.↵
    Scellier, B. & Bengio, Y. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in computational neuroscience 11, 24 (2017).
    OpenUrl
  49. 49.↵
    Schiess, M., Urbanczik, R. & Senn, W. Somato-dendritic Synaptic Plasticity and Error-backpropagation in Active Dendrites. PLoS Computational Biology 12, 1–18 (2016).
    OpenUrl
  50. 50.↵
    Simonetto, A., Dall’Anese, E., Paternain, S., Leus, G. & Giannakis, G. B. Time-Varying Convex Optimization: Time-Structured Algorithms and Applications. Proceedings of the IEEE 108, 2032–2048 (2020).
    OpenUrl
  51. 51.↵
    Song, Y. et al. Inferring Neural Activity Before Plasticity: A Foundation for Learning Beyond Backpropagation. bioRxiv (2022).
  52. 52.↵
    Takahashi, N. et al. Active dendritic currents gate descending cortical outputs in perception. Nature Neuroscience 23, 1277–1285. http://dx.doi.org/10.1038/s41593-020-0677-8 (2020).
    OpenUrl
  53. 53.↵
    Talairach, J. & Tournoux, P. Co-planar stereotaxic atlas of the human brain: 3-Dimensional proportional system: An approach to cerebral imaging (Thieme Medical Publishers, Inc., New York, 1988).
  54. 54.↵
    Theis, A. K., Rózsa, B., Katona, G., Schmitz, D. & Johenning, F. W. Voltage gated calcium channel activation by backpropagating action potentials downregulates NMDAR function. Frontiers in Cellular Neuroscience 12, 1–14 (2018).
    OpenUrl
  55. 55.↵
    1. Doya, K.,
    2. Ishii, S.,
    3. Pouget, A. &
    4. Rao, R. P.
    Todorov, E. in The Bayesian Brain (eds Doya, K., Ishii, S., Pouget, A. & Rao, R. P.) 12: 1–28(MIT Press, 2006).
    OpenUrl
  56. 56.↵
    Todorov, E. Optimality principles in sensorimotor control. Nature Neuroscience 7, 907–915 (2004).
    OpenUrlCrossRefPubMedWeb of Science
  57. 57.↵
    Todorov, E. & Jordan, M. I. Optimal feedback control as a theory of motor coordination. Nature Neuroscience 5, 1226–1235 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  58. 58.↵
    Ulrich, D. Dendritic resonance in rat neocortical pyramidal cells. Journal of Neurophysiology 87, 2753–2759 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  59. 59.↵
    Urbanczik, R. & Senn, W. Learning by the Dendritic Prediction of Somatic Spiking. Neuron 81, 521–528 (2014).
    OpenUrl
  60. 60.↵
    Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C. & Gerstner, W. Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks. Science 334,1569–1573 (2012).
    OpenUrl
  61. 61.↵
    Wang, X. D., Chen, C., Zhang, D. & Yao, H. Cumulative latency advance underlies fast visual processing in desynchronized brain state. Proceedings of the National Academy of Sciences of the United States of America 111, 515–520 (2014).
    OpenUrlAbstract/FREE Full Text
  62. 62.↵
    Whittington, J. C. & Bogacz, R. An approximation of the error backpropagation algorithm in a predictive coding network with local Hebbian synaptic plasticity. Neural computation 29, 1229–1262 (2017).
    OpenUrlCrossRefPubMed
  63. 63.↵
    Whittington, J. C. & Bogacz, R. Theories of error back-propagation in the brain. Trends in cognitive sciences 23, 235–250 (2019).
    OpenUrlCrossRefPubMed
  64. 64.↵
    Wigner, E. The Unreasonable Effectiveness of Mathematics in the Natural Sciences. Richard Courant lecture in mathematical sciences. Communications in pure and applied mathematics 13, 1–14 (1959).
    OpenUrl
  65. 65.↵
    Wolpert, D. M. & Ghahramani, Z. Computational principles of motor neuroscience. Nature Neuroscience 3, 1212–1217 (2000).
    OpenUrlCrossRefPubMedWeb of Science
  66. 66.↵
    Xie, X. & Seung, H. S. Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural computation 15, 441–454 (2003).
    OpenUrlCrossRefPubMedWeb of Science
  67. 67.↵
    Zenke, F. & Ganguli, S. SuperSpike: Supervised Learning in Multilayer Spiking Neural Networks. Neural Computation 30, 1514–1541 (2018).
    OpenUrlCrossRefPubMed

References

  1. 1.↵
    Abbott, L. F., Varela, J. a., Sen, K. & Nelson, S. B. Synaptic depression and cortical gain control. Science 275, 220–224 (1997).
    OpenUrlCrossRefPubMedWeb of Science
  2. 2.↵
    Anderson, J. S., Lampl, I., Gillespie, D. C. & Ferster, D. The contribution of noise to contrast invariance of orientation tuning in cat visual cortex. Science 290, 1968–1972 (2000).
    OpenUrlAbstract/FREE Full Text
  3. 3.↵
    Flannery, M. R. The enigma of nonholonomic constraints. American Journal of Physics 73, 265–272 (2005).
    OpenUrl
  4. 4.↵
    Friston, K. The free-energy principle: a unified brain theory? Nature Reviews Neuroscience 11, 127–138 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  5. 5.↵
    Haider, P. et al. Latent Equilibrium: Arbitrarily fast computation with arbitrarily slow neurons. Advances in Neural Information Processing Systems 34 (2021).
  6. 6.↵
    Jordan, J., Sacramento, J., Wybo, W. A. M., Petrovici, M. A. & Senn, W. Learning Bayes-optimal dendritic opinion pooling. arXiv. http://arxiv.org/abs/2104.13238 (2022).
  7. 7.↵
    Lohmiller, W. & Slotine, J.-j. E. On Contraction Analysis for Non-linear Systems. Automatica 34, 683–696 (1998).
    OpenUrl
  8. 8.↵
    Meulemans, A., Zucchet, N., Kobayashi, S., von Oswald, J. & Sacramento, J. The least-control principle for local learning at equilibrium. Advances in Neural Information Processing Systems 35 (2022).
  9. 9.↵
    Nedeljkov, M. & Oberguggenberger, M. Ordinary differential equations with delta function terms. Publications de l’Institut Mathematique 91, 125–135 (2012).
    OpenUrl
  10. 10.↵
    Rauch, A., La Camera, G., Lüscher, H.-R., Senn, W. & Fusi, S. Neocortical Pyramidal Cells Respond as Integrate- and-Fire Neurons to In Vivo-Like Input Currents. Journal of Neurophysiology 90, 1598–1612 (2003).
    OpenUrlCrossRefPubMedWeb of Science
  11. 11.↵
    Scellier, B. & Bengio, Y. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in computational neuroscience 11, 24 (2017).
    OpenUrl
  12. 12.↵
    Simonetto, A. & Dall’Anese, E. Prediction-Correction Algorithms for Time-Varying Constrained Optimization. IEEE Transactions on Signal Processing 65, 5481–5494 (2017).
    OpenUrl
  13. 13.↵
    Simonetto, A., Dall’Anese, E., Paternain, S., Leus, G. & Giannakis, G. B. Time-Varying Convex Optimization: Time-Structured Algorithms and Applications. Proceedings of the IEEE 108, 2032–2048 (2020).
    OpenUrl
  14. 14.↵
    Tsodyks, M. & Markram, H. The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability. Proc. Nat. Acad. Sci. USA 94, 719–723 (1997).
    OpenUrlAbstract/FREE Full Text
  15. 15.↵
    Ulrich, D. Dendritic resonance in rat neocortical pyramidal cells. Journal of Neurophysiology 87, 2753–2759 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  16. 16.↵
    Varela, J. A. et al. A quantitative description of short-term plasticity at excitatory synapses in layer 2/3 of rat primary visual cortex. Journal of Neuroscience 17, 7926–7940 (1997).
    OpenUrlAbstract/FREE Full Text
  17. 17.↵
    Zhao, Y. & Swamy, M. N. Novel technique for tracking time-varying minimum and its applications. Canadian Conference on Electrical and Computer Engineering 2, 910–913 (1998).
    OpenUrl
Back to top
PreviousNext
Posted March 25, 2023.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A Neuronal Least-Action Principle for Real-Time Learning in Cortical Circuits
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A Neuronal Least-Action Principle for Real-Time Learning in Cortical Circuits
Walter Senn, Dominik Dold, Akos F. Kungl, Benjamin Ellenberger, Jakob Jordan, Yoshua Bengio, João Sacramento, Mihai A. Petrovici
bioRxiv 2023.03.25.534198; doi: https://doi.org/10.1101/2023.03.25.534198
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
A Neuronal Least-Action Principle for Real-Time Learning in Cortical Circuits
Walter Senn, Dominik Dold, Akos F. Kungl, Benjamin Ellenberger, Jakob Jordan, Yoshua Bengio, João Sacramento, Mihai A. Petrovici
bioRxiv 2023.03.25.534198; doi: https://doi.org/10.1101/2023.03.25.534198

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Neuroscience
Subject Areas
All Articles
  • Animal Behavior and Cognition (4667)
  • Biochemistry (10330)
  • Bioengineering (7650)
  • Bioinformatics (26277)
  • Biophysics (13495)
  • Cancer Biology (10663)
  • Cell Biology (15388)
  • Clinical Trials (138)
  • Developmental Biology (8477)
  • Ecology (12797)
  • Epidemiology (2067)
  • Evolutionary Biology (16816)
  • Genetics (11377)
  • Genomics (15448)
  • Immunology (10591)
  • Microbiology (25135)
  • Molecular Biology (10185)
  • Neuroscience (54304)
  • Paleontology (399)
  • Pathology (1663)
  • Pharmacology and Toxicology (2889)
  • Physiology (4331)
  • Plant Biology (9223)
  • Scientific Communication and Education (1585)
  • Synthetic Biology (2551)
  • Systems Biology (6768)
  • Zoology (1459)